-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GenericSelfAttention, biases are inconsistent to SelfAttentionLayer #234
Comments
Well, maybe having those biases is actually standard? E.g. In Fairseq: |
Also used by default in PyTorch |
Note that |
So, as they seem to be standard nowadays, I think having them enabled is ok. |
@patrick-wilken Are you aware of this? |
I added the option |
No, I wasn't. You won't find papers discussing what difference it makes, right? Maybe I should try it out, bias seems like well spent parameters. 😄 |
Note that this |
I noticed that the
nn.SelfAttention
is a bit different toSelfAttentionLayer
:SelfAttentionLayer
does not have biases for theqkv
andproj
linear projections, whilenn.SelfAttention
currently has.This is relevant for Conformer (e.g. #233) and Transformer.
The text was updated successfully, but these errors were encountered: