Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用llama-3.1-8b运行knowledge_eap.ipynb时,报错梯度尺寸不匹配 #9

Open
cnlnpjhsy opened this issue Nov 28, 2024 · 1 comment

Comments

@cnlnpjhsy
Copy link

感谢作者在可解释性方面做出的优秀工作。我目前在用llama-3.1-8b做一些研究,在给transformer_lens中添加了meta-llama/Llama-3.1-8B-Instruct的支持代码后,运行knowledge_eap.ipynb,发现在第6个单元格计算attribute(model, g, data, partial(logit_diff, loss=True, mean=True), method='EAP-IG-case', ig_steps=100)时报错:

{
	"name": "RuntimeError",
	"message": "The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1",
	"stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[6], line 4
      2 start_time = time.time()
      3 # Attribute using the model, graph, clean / corrupted data and labels, as well as a metric
----> 4 attribute(model, g, data, partial(logit_diff, loss=True, mean=True), method='EAP-IG-case', ig_steps=100)
      5 # attribute(model, g, data, partial(direct_logit, loss=True, mean=True), method='EAP-IG-case', ig_steps=30)
      6 # attribute(model, g, dataloader, partial(logit_diff, loss=True, mean=True), method='EAP-IG', ig_steps=30)
      7 g.apply_topn(5000, absolute=True)

File ~/workspace/KnowledgeCircuits/eap/attribute.py:391, in attribute(model, graph, dataloader, metric, aggregation, method, ig_steps, quiet)
    389     scores = get_scores_clean_corrupted(model, graph, dataloader, metric, quiet=quiet)
    390 elif method == 'EAP-IG-case':
--> 391     scores = get_scores_eap_ig_case(model, graph, dataloader, metric, steps=ig_steps, quiet=quiet)
    392 else:
    393     raise ValueError(f\"integrated_gradients must be in ['EAP', 'EAP-IG', 'EAP-IG-partial-activations', 'EAP-IG-activations', 'clean-corrupted'], but got {method}\")

File ~/workspace/KnowledgeCircuits/eap/attribute.py:366, in get_scores_eap_ig_case(model, graph, data, metric, steps, quiet)
    364         logits = model(clean_tokens, attention_mask=attention_mask)
    365         metric_value = metric(logits, clean_logits, input_lengths, label)
--> 366         metric_value.backward()
    368 scores /= total_steps
    370 return scores

File /usr/local/lib/python3.10/dist-packages/torch/_tensor.py:522, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    512 if has_torch_function_unary(self):
    513     return handle_torch_function(
    514         Tensor.backward,
    515         (self,),
   (...)
    520         inputs=inputs,
    521     )
--> 522 torch.autograd.backward(
    523     self, gradient, retain_graph, create_graph, inputs=inputs
    524 )

File /usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py:266, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    261     retain_graph = create_graph
    263 # The reason we repeat the same comment below is that
    264 # some Python versions print out the first line of a multi-line function
    265 # calls in the traceback and some print out the last line
--> 266 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    267     tensors,
    268     grad_tensors_,
    269     retain_graph,
    270     create_graph,
    271     inputs,
    272     allow_unreachable=True,
    273     accumulate_grad=True,
    274 )

File /usr/local/lib/python3.10/dist-packages/torch/utils/hooks.py:138, in BackwardHook._set_user_hook.<locals>.hook(grad_input, _)
    135 res = self._pack_with_none(self.input_tensors_index, grad_input, self.n_inputs)
    137 for hook in self.user_hooks:
--> 138     out = hook(self.module, res, self.grad_outputs)
    140     if out is None:
    141         continue

File ~/workspace/KnowledgeCircuits/transformer_lens/hook_points.py:77, in HookPoint.add_hook.<locals>.full_hook(module, module_input, module_output)
     73 if (
     74     dir == \"bwd\"
     75 ):  # For a backwards hook, module_output is a tuple of (grad,) - I don't know why.
     76     module_output = module_output[0]
---> 77 return hook(module_output, hook=self)

File ~/workspace/KnowledgeCircuits/eap/attribute.py:71, in make_hooks_and_matrices.<locals>.gradient_hook(prev_index, bwd_index, gradients, hook)
     69 except RuntimeError as e:
     70     print(\"Gradient Hook Error\", hook.name, activation_difference.size(), grads.size(), prev_index, bwd_index)
---> 71     raise e

File ~/workspace/KnowledgeCircuits/eap/attribute.py:68, in make_hooks_and_matrices.<locals>.gradient_hook(prev_index, bwd_index, gradients, hook)
     66     s = einsum(activation_difference[:, :, :prev_index], grads,'batch pos forward hidden, batch pos backward hidden -> forward backward')
     67     s = s.squeeze(1)#.to(scores.device)
---> 68     scores[:prev_index, bwd_index] += s
     69 except RuntimeError as e:
     70     print(\"Gradient Hook Error\", hook.name, activation_difference.size(), grads.size(), prev_index, bwd_index)

RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1"
}

这似乎和 #7 不是一个错误(我用了最新的代码,确保clean和corrupted两个字符串在llama tokenizer下是长度一致的),请问作者团队是否有解决方法?谢谢!

@littlefive5
Copy link
Contributor

littlefive5 commented Nov 28, 2024

你好,这个问题是因为Group Attention引起的,很抱歉我现在也没有非常好的解决办法。你可以自行复制group的模块作为临时的解决方案。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants