Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyTorch nn.LayerNorm now takes bias arg - removed custom class #454

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

calmitchell617
Copy link

Hi, I noticed that the PyTorch nn.LayerNorm class now takes a bias arg. This PR removes the custom LayerNorm class and replaces it with the built-in.

I tested the qualitative effect of this change by checking out a fresh version of master branch, then running:

python data/shakespeare_char/prepare.py

python train.py config/train_shakespeare_char.py

I ran the same commands after making the code changes, and compared the results after 1000 iters on an RTX 6000 Ada. The eval results were:

step 1000: train loss 1.2743, val loss 1.5198
step 1000: train loss 1.2760, val loss 1.5265

Not identical, but it seems to be working well enough. A sample taken after 2000 iters:

$ python sample.py --out_dir=out-shakespeare-char
Overriding: out_dir = out-shakespeare-char
number of parameters: 10.65M
Loading meta from data/shakespeare_char/meta.pkl...


KING RICHARD III:
The last through thy beauteous graves
Beating her brother uncontrary'd.

DUKE OF YORK:
Prove, my lord, I'll not speak to thy course;
And that might send in this maid overth and the king
And and selfsame to my life, once by a crown,
And why he's not known to-day sweet to woe more.

KING RICHARD III:
Then at this is a gentleman poor great to
The woman's part son; and therefore you are the prince
Of your hands, being, you are advance.

EDWARD:
It is true.

@sopotc
Copy link

sopotc commented Mar 17, 2024

Just adding that I also noticed that bias is there and did a bunch of measurements, including on GPT-2 owt and the loss delta is negligible.

Ran experiments on a 2x 4090 setup.

sopotc added a commit to sopotc/nano-Force that referenced this pull request Mar 17, 2024
@@ -95,9 +84,9 @@ class Block(nn.Module):

def __init__(self, config):
super().__init__()
self.ln_1 = LayerNorm(config.n_embd, bias=config.bias)
self.ln_1 = nn.LayerNorm(config.n_embd, bias=config.bias)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice. To add another confirmation, I wrote my own model and used nn.LayerNorm directly with no issues: https://github.com/vhmth/gpt-2/blob/main/model.py#L74

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants