We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
感谢作者在可解释性方面做出的优秀工作。我目前在用llama-3.1-8b做一些研究,在给transformer_lens中添加了meta-llama/Llama-3.1-8B-Instruct的支持代码后,运行knowledge_eap.ipynb,发现在第6个单元格计算attribute(model, g, data, partial(logit_diff, loss=True, mean=True), method='EAP-IG-case', ig_steps=100)时报错:
meta-llama/Llama-3.1-8B-Instruct
knowledge_eap.ipynb
attribute(model, g, data, partial(logit_diff, loss=True, mean=True), method='EAP-IG-case', ig_steps=100)
{ "name": "RuntimeError", "message": "The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1", "stack": "--------------------------------------------------------------------------- RuntimeError Traceback (most recent call last) Cell In[6], line 4 2 start_time = time.time() 3 # Attribute using the model, graph, clean / corrupted data and labels, as well as a metric ----> 4 attribute(model, g, data, partial(logit_diff, loss=True, mean=True), method='EAP-IG-case', ig_steps=100) 5 # attribute(model, g, data, partial(direct_logit, loss=True, mean=True), method='EAP-IG-case', ig_steps=30) 6 # attribute(model, g, dataloader, partial(logit_diff, loss=True, mean=True), method='EAP-IG', ig_steps=30) 7 g.apply_topn(5000, absolute=True) File ~/workspace/KnowledgeCircuits/eap/attribute.py:391, in attribute(model, graph, dataloader, metric, aggregation, method, ig_steps, quiet) 389 scores = get_scores_clean_corrupted(model, graph, dataloader, metric, quiet=quiet) 390 elif method == 'EAP-IG-case': --> 391 scores = get_scores_eap_ig_case(model, graph, dataloader, metric, steps=ig_steps, quiet=quiet) 392 else: 393 raise ValueError(f\"integrated_gradients must be in ['EAP', 'EAP-IG', 'EAP-IG-partial-activations', 'EAP-IG-activations', 'clean-corrupted'], but got {method}\") File ~/workspace/KnowledgeCircuits/eap/attribute.py:366, in get_scores_eap_ig_case(model, graph, data, metric, steps, quiet) 364 logits = model(clean_tokens, attention_mask=attention_mask) 365 metric_value = metric(logits, clean_logits, input_lengths, label) --> 366 metric_value.backward() 368 scores /= total_steps 370 return scores File /usr/local/lib/python3.10/dist-packages/torch/_tensor.py:522, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs) 512 if has_torch_function_unary(self): 513 return handle_torch_function( 514 Tensor.backward, 515 (self,), (...) 520 inputs=inputs, 521 ) --> 522 torch.autograd.backward( 523 self, gradient, retain_graph, create_graph, inputs=inputs 524 ) File /usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py:266, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs) 261 retain_graph = create_graph 263 # The reason we repeat the same comment below is that 264 # some Python versions print out the first line of a multi-line function 265 # calls in the traceback and some print out the last line --> 266 Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass 267 tensors, 268 grad_tensors_, 269 retain_graph, 270 create_graph, 271 inputs, 272 allow_unreachable=True, 273 accumulate_grad=True, 274 ) File /usr/local/lib/python3.10/dist-packages/torch/utils/hooks.py:138, in BackwardHook._set_user_hook.<locals>.hook(grad_input, _) 135 res = self._pack_with_none(self.input_tensors_index, grad_input, self.n_inputs) 137 for hook in self.user_hooks: --> 138 out = hook(self.module, res, self.grad_outputs) 140 if out is None: 141 continue File ~/workspace/KnowledgeCircuits/transformer_lens/hook_points.py:77, in HookPoint.add_hook.<locals>.full_hook(module, module_input, module_output) 73 if ( 74 dir == \"bwd\" 75 ): # For a backwards hook, module_output is a tuple of (grad,) - I don't know why. 76 module_output = module_output[0] ---> 77 return hook(module_output, hook=self) File ~/workspace/KnowledgeCircuits/eap/attribute.py:71, in make_hooks_and_matrices.<locals>.gradient_hook(prev_index, bwd_index, gradients, hook) 69 except RuntimeError as e: 70 print(\"Gradient Hook Error\", hook.name, activation_difference.size(), grads.size(), prev_index, bwd_index) ---> 71 raise e File ~/workspace/KnowledgeCircuits/eap/attribute.py:68, in make_hooks_and_matrices.<locals>.gradient_hook(prev_index, bwd_index, gradients, hook) 66 s = einsum(activation_difference[:, :, :prev_index], grads,'batch pos forward hidden, batch pos backward hidden -> forward backward') 67 s = s.squeeze(1)#.to(scores.device) ---> 68 scores[:prev_index, bwd_index] += s 69 except RuntimeError as e: 70 print(\"Gradient Hook Error\", hook.name, activation_difference.size(), grads.size(), prev_index, bwd_index) RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1" }
这似乎和 #7 不是一个错误(我用了最新的代码,确保clean和corrupted两个字符串在llama tokenizer下是长度一致的),请问作者团队是否有解决方法?谢谢!
The text was updated successfully, but these errors were encountered:
你好,这个问题是因为Group Attention引起的,很抱歉我现在也没有非常好的解决办法。你可以自行复制group的模块作为临时的解决方案。
Sorry, something went wrong.
No branches or pull requests
感谢作者在可解释性方面做出的优秀工作。我目前在用llama-3.1-8b做一些研究,在给transformer_lens中添加了
meta-llama/Llama-3.1-8B-Instruct
的支持代码后,运行knowledge_eap.ipynb
,发现在第6个单元格计算attribute(model, g, data, partial(logit_diff, loss=True, mean=True), method='EAP-IG-case', ig_steps=100)
时报错:这似乎和 #7 不是一个错误(我用了最新的代码,确保clean和corrupted两个字符串在llama tokenizer下是长度一致的),请问作者团队是否有解决方法?谢谢!
The text was updated successfully, but these errors were encountered: