Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GenericSelfAttention, biases are inconsistent to SelfAttentionLayer #234

Closed
albertz opened this issue Nov 2, 2022 · 9 comments
Closed
Milestone

Comments

@albertz
Copy link
Member

albertz commented Nov 2, 2022

I noticed that the nn.SelfAttention is a bit different to SelfAttentionLayer: SelfAttentionLayer does not have biases for the qkv and proj linear projections, while nn.SelfAttention currently has.

This is relevant for Conformer (e.g. #233) and Transformer.

@albertz
Copy link
Member Author

albertz commented Nov 2, 2022

@albertz
Copy link
Member Author

albertz commented Nov 2, 2022

Also used by default in PyTorch nn.MultiheadAttention (https://pytorch.org/docs/stable/generated/torch.nn.MultiheadAttention.html#torch.nn.MultiheadAttention).

@albertz
Copy link
Member Author

albertz commented Nov 2, 2022

Note that SelfAttentionLayer was designed 1:1 to be equivalent to the Tensor2Tensor code, and also looking at the current T2T code (I think here), it seems there is no bias there. So that was the original Transformer. But since then, it has evolved, and I think Fairseq is probably much more used.

@albertz
Copy link
Member Author

albertz commented Nov 2, 2022

@albertz albertz changed the title GenericSelfAttention, biases are non-standard GenericSelfAttention, biases are inconsistent to SelfAttentionLayer Nov 2, 2022
@albertz
Copy link
Member Author

albertz commented Nov 2, 2022

So, as they seem to be standard nowadays, I think having them enabled is ok.

@albertz albertz closed this as completed Nov 2, 2022
@albertz
Copy link
Member Author

albertz commented Nov 2, 2022

@patrick-wilken Are you aware of this?

@albertz
Copy link
Member Author

albertz commented Nov 2, 2022

I added the option with_bias, so you can specify it explicitly. The default is still True now.

@patrick-wilken
Copy link

No, I wasn't. You won't find papers discussing what difference it makes, right? Maybe I should try it out, bias seems like well spent parameters. 😄

@albertz
Copy link
Member Author

albertz commented Nov 4, 2022

Note that this with_bias was added to returnn-common. It's not available in SelfAttentionLayer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants