使用llama-3.1-8b运行knowledge_eap.ipynb时，报错梯度尺寸不匹配 #9

cnlnpjhsy · 2024-11-28T16:38:23Z

感谢作者在可解释性方面做出的优秀工作。我目前在用llama-3.1-8b做一些研究，在给transformer_lens中添加了meta-llama/Llama-3.1-8B-Instruct的支持代码后，运行knowledge_eap.ipynb，发现在第6个单元格计算attribute(model, g, data, partial(logit_diff, loss=True, mean=True), method='EAP-IG-case', ig_steps=100)时报错：

{
	"name": "RuntimeError",
	"message": "The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1",
	"stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Cell In[6], line 4
      2 start_time = time.time()
      3 # Attribute using the model, graph, clean / corrupted data and labels, as well as a metric
----> 4 attribute(model, g, data, partial(logit_diff, loss=True, mean=True), method='EAP-IG-case', ig_steps=100)
      5 # attribute(model, g, data, partial(direct_logit, loss=True, mean=True), method='EAP-IG-case', ig_steps=30)
      6 # attribute(model, g, dataloader, partial(logit_diff, loss=True, mean=True), method='EAP-IG', ig_steps=30)
      7 g.apply_topn(5000, absolute=True)

File ~/workspace/KnowledgeCircuits/eap/attribute.py:391, in attribute(model, graph, dataloader, metric, aggregation, method, ig_steps, quiet)
    389     scores = get_scores_clean_corrupted(model, graph, dataloader, metric, quiet=quiet)
    390 elif method == 'EAP-IG-case':
--> 391     scores = get_scores_eap_ig_case(model, graph, dataloader, metric, steps=ig_steps, quiet=quiet)
    392 else:
    393     raise ValueError(f\"integrated_gradients must be in ['EAP', 'EAP-IG', 'EAP-IG-partial-activations', 'EAP-IG-activations', 'clean-corrupted'], but got {method}\")

File ~/workspace/KnowledgeCircuits/eap/attribute.py:366, in get_scores_eap_ig_case(model, graph, data, metric, steps, quiet)
    364         logits = model(clean_tokens, attention_mask=attention_mask)
    365         metric_value = metric(logits, clean_logits, input_lengths, label)
--> 366         metric_value.backward()
    368 scores /= total_steps
    370 return scores

File /usr/local/lib/python3.10/dist-packages/torch/_tensor.py:522, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    512 if has_torch_function_unary(self):
    513     return handle_torch_function(
    514         Tensor.backward,
    515         (self,),
   (...)
    520         inputs=inputs,
    521     )
--> 522 torch.autograd.backward(
    523     self, gradient, retain_graph, create_graph, inputs=inputs
    524 )

File /usr/local/lib/python3.10/dist-packages/torch/autograd/__init__.py:266, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    261     retain_graph = create_graph
    263 # The reason we repeat the same comment below is that
    264 # some Python versions print out the first line of a multi-line function
    265 # calls in the traceback and some print out the last line
--> 266 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    267     tensors,
    268     grad_tensors_,
    269     retain_graph,
    270     create_graph,
    271     inputs,
    272     allow_unreachable=True,
    273     accumulate_grad=True,
    274 )

File /usr/local/lib/python3.10/dist-packages/torch/utils/hooks.py:138, in BackwardHook._set_user_hook.<locals>.hook(grad_input, _)
    135 res = self._pack_with_none(self.input_tensors_index, grad_input, self.n_inputs)
    137 for hook in self.user_hooks:
--> 138     out = hook(self.module, res, self.grad_outputs)
    140     if out is None:
    141         continue

File ~/workspace/KnowledgeCircuits/transformer_lens/hook_points.py:77, in HookPoint.add_hook.<locals>.full_hook(module, module_input, module_output)
     73 if (
     74     dir == \"bwd\"
     75 ):  # For a backwards hook, module_output is a tuple of (grad,) - I don't know why.
     76     module_output = module_output[0]
---> 77 return hook(module_output, hook=self)

File ~/workspace/KnowledgeCircuits/eap/attribute.py:71, in make_hooks_and_matrices.<locals>.gradient_hook(prev_index, bwd_index, gradients, hook)
     69 except RuntimeError as e:
     70     print(\"Gradient Hook Error\", hook.name, activation_difference.size(), grads.size(), prev_index, bwd_index)
---> 71     raise e

File ~/workspace/KnowledgeCircuits/eap/attribute.py:68, in make_hooks_and_matrices.<locals>.gradient_hook(prev_index, bwd_index, gradients, hook)
     66     s = einsum(activation_difference[:, :, :prev_index], grads,'batch pos forward hidden, batch pos backward hidden -> forward backward')
     67     s = s.squeeze(1)#.to(scores.device)
---> 68     scores[:prev_index, bwd_index] += s
     69 except RuntimeError as e:
     70     print(\"Gradient Hook Error\", hook.name, activation_difference.size(), grads.size(), prev_index, bwd_index)

RuntimeError: The size of tensor a (32) must match the size of tensor b (8) at non-singleton dimension 1"
}

这似乎和 #7 不是一个错误（我用了最新的代码，确保clean和corrupted两个字符串在llama tokenizer下是长度一致的），请问作者团队是否有解决方法？谢谢！

The text was updated successfully, but these errors were encountered:

littlefive5 · 2024-11-28T19:00:13Z

你好，这个问题是因为Group Attention引起的，很抱歉我现在也没有非常好的解决办法。你可以自行复制group的模块作为临时的解决方案。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用llama-3.1-8b运行knowledge_eap.ipynb时，报错梯度尺寸不匹配 #9

使用llama-3.1-8b运行knowledge_eap.ipynb时，报错梯度尺寸不匹配 #9

cnlnpjhsy commented Nov 28, 2024

littlefive5 commented Nov 28, 2024 •

edited

Loading

使用llama-3.1-8b运行knowledge_eap.ipynb时，报错梯度尺寸不匹配 #9

使用llama-3.1-8b运行knowledge_eap.ipynb时，报错梯度尺寸不匹配 #9

Comments

cnlnpjhsy commented Nov 28, 2024

littlefive5 commented Nov 28, 2024 • edited Loading

littlefive5 commented Nov 28, 2024 •

edited

Loading