Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge upstream #64

Merged
merged 117 commits into from
Apr 4, 2024
Merged

merge upstream #64

merged 117 commits into from
Apr 4, 2024

Conversation

kshitijkg
Copy link
Member

No description provided.

kshitijkg and others added 30 commits August 12, 2023 16:10
* Fixed final value of cosine decay lr

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* Update Dockerfile

* Update Dockerfile
* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Update transformers version

Signed-off-by: Dashiell Stander <[email protected]>

* Update the enwik8 URL to the one HF uses, the old one is down.

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
* Update README.md

Fix broken link

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Fix bugs so we can use bf16 with zero > 0

Signed-off-by: Dashiell Stander <[email protected]>

* Typo

Signed-off-by: Dashiell Stander <[email protected]>

* Typo

Signed-off-by: Dashiell Stander <[email protected]>

* With the DeepSpeed updates there may be no need to do grad_accum in fp32

Signed-off-by: Dashiell Stander <[email protected]>

* Add warning about necessity of fp32 grad_accum with bf16, pp>0, and zero1

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
* Remove lazy dataset implementation option

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Fix SequentialGeneration

* Fix SequentialGeneration
* Fix register_buffer parameter

* Fix register_buffer parameter
* Add flash 2.x message to README.md

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* add s3 checkpoint syncing

* Update NeoXArgs docs automatically

* remove CPCargo requirement

* Update NeoXArgs docs automatically

* Make s3 imports try-except and separate requirements to s3 file

* Update NeoXArgs docs automatically

* Announce feature

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Try out just using the HF implementation

Signed-off-by: Dashiell Stander <[email protected]>

* Rely solely on HF tokenizer.

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
* Pre-commit

Signed-off-by: Dashiell Stander <[email protected]>

* Sequence dimension is 0

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Ensure that LR annealing is correct even after loading from checkpoint. Patch from Eric Nguyen

Co-authored-by: Eric Nguyen <[email protected]>
Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Test whether we need the whole patch

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Turns out we do not need the entire patch, just one line

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: Eric Nguyen <[email protected]>
Co-authored-by: github-actions <[email protected]>
* Use Megatron-DeepSpeed flops calculation

Signed-off-by: Dashiell Stander <[email protected]>

* Use Megatron-DeepSpeed flops calculation

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

* Direct comparison of FLOPS calculations

Signed-off-by: Dashiell Stander <[email protected]>

* Remove test logging

Signed-off-by: Dashiell Stander <[email protected]>

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* adding boilerplate coverity scan to submit to public analysis

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
)

* Add documentation about kicking off distributed jobs

Signed-off-by: Dashiell Stander <[email protected]>

* Add documentation about kicking off distributed jobs

Signed-off-by: Dashiell Stander <[email protected]>

* Add documentation about kicking off distributed jobs

Signed-off-by: Dashiell Stander <[email protected]>

* Update NeoXArgs docs automatically

* Added more info on run command modification and cleaned up a bit

* slight cleanup

* Update NeoXArgs docs automatically

---------

Signed-off-by: Dashiell Stander <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Fix readme typo

* Update NeoXArgs docs automatically

* More typos

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* Update CITATION.cff

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
segyges and others added 29 commits February 22, 2024 16:52
* Switch default command for docker image

* Rename pythia paths docker file for clarity

* Update docker build to use python 3.10

* Update github workflows to use ubuntu 22.04 and python 3.10

* Bump pytorch library patch versions

* Add pytest-html for reasonably formatted test reports

* Fix build after torch and cuda version bump

* Fix apex install for newer version

1) This, empirically, works, as tested by running the build and kicking off training.
2) Apex documentation says it is incorrect syntax and deprecated.
3) It takes so long to compile that it is probably, all by itself, something that needs fixing.
4) I will probably pull the fused adamw out of apex.
5) It has been building for twenty minutes so I am going to go do something else.

* Fix pip version to ensure apex compilation remains good

* Fix unit test for evaluate

* Fix pip requirement

Prevents possible build issues with apex especially across divergent pip versions

* Update dockerfile to point to stripped-down apex repo

* Revert "Update dockerfile to point to stripped-down apex repo"

This reverts commit 40c7656.

* Update apex version in dockerfile

* Switch to downloading prebuilt apex wheel

* Clean up docker copy commands

* Have docker build conditionally get binaries or build apex

* Apply precommit
* Switch default command for docker image

* Rename pythia paths docker file for clarity

* Fix unit test for evaluate

* Update readme for testing to omit --forked argument

* Add pytest-html to requirements-dev.txt

* Revert "Update readme for testing to omit --forked argument"

This reverts commit 19021fc.

* Add data/ directory and .bin and .idx files in /tests/data to .gitignore

This makes it so that git doesn't try to let you commit (or force you to stash) data files

* Make .gitignore for data files slightly more elegant

* Add utility script for doing token counts on processed datasets

* Run precommit hook

* Fix token count script, run precommit
* add support for flash attention 2

* change cosine decay to chinchilla style

* set default warmup to none so that warmup_iters can be set

* fixed bug

* fixed chinchilla lr

* add s3 checkpoint syncing

* rotary embedding in fp32

* fix for seq_len < max_seq_len

* some fixes, still not working

* ?'
:

* fix bugs; evaluate on step 0

* first attempt at gqa

* gqa works in kv_heads==query_heads case

* gqa working

* workaround for FSX quota

* update with llemma

* update with recent PR

* README and requirements updated

* Added Mistral config

* Added sliding window through flash attention 2

* Added sliding window

* Mistral should likely use mp=2 like llama2

* Update gitignore

* Removed unused CPCargo import

* Conversion script (WIP)

* Fixed missing slurm environ vars

* updated mistral config

* updated job script

* initial commit conversion mistral hf to sequential

* Added stacking q, k, v appropriately for mp ranks

* pp=0 support from end of 2023

* Cleaning up config and removing Autoconfig in conversion script

* Cleaned up conversion example script

* cleanup: add back configs folder, discard Llemma readme

* cleanup: remove llemma lr sched changes, re-add requirements/ folder

* docs: add explanation of intermediate_size behavior

* args: add argument checking for num_kv_heads, clean up usage syntax

* args: prevent num KV heads < TP worldsize

* readd triton flash attn func

* cleanup: use tools/ dir from main

* docs: re-add mistral , GQA as supported

* cleanup: delete duplicate tools/ files

* cleanup: use fp32 rope (non-fused) from main

* cleanup: no longer block out GQA codepaths in conversion scripts

* cleanup: gqa code a bit

* add llama2, llemma configs

* add non-flash GQA ; refactor modeling code

* clean up mistral config for commit

* further cleanup configs dir

* remove slurm script from llemma

* update seqlen params for codellama, llemma configs

* add more comments to GQA code, and make reshapes more readable

* make inv_freq non-persistent

* actually, just ensure mistral has inv_freqs as a persistent buffer

* non-flash GQA works, so ensure arguments.py permits it

* no longer use our own copies of flash attention interface functions

* remove unused mpu util fn

* delete unused config file

* fix diff on mpu/utils.py

* remove slurm scripts that won't be in this PR

* run pre-commit

* update tests for conversion scripts

* add flash version check for sliding window

* pre-commit

---------

Co-authored-by: zhangir-azerbayev <[email protected]>
Co-authored-by: haileyschoelkopf <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* possibly fix profiling flag names

* actually, profile_backward already exists

* Update NeoXArgs docs automatically

* neox_args.profile was also used some places, update that too

* Update NeoXArgs docs automatically

* profiling --> profile

* Update NeoXArgs docs automatically

* Revert neox_arguments.md changes

* Update NeoXArgs docs automatically

* Update gen_docs since __name__ only returns the Literal for string args with Python 3.10

* Update NeoXArgs docs automatically

* Another update to preserve non-literals

* Update NeoXArgs docs automatically

* add union

* Update NeoXArgs docs automatically

* pre-commit

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Update cpu_ci.yml

Updating the workflow to point CPU workflow towards self hosted runner versus Github provided runners

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* Improve argument validation for Flash-attn + SWA

* Update NeoXArgs docs automatically

* don't pass window_size if not necessary

* Update NeoXArgs docs automatically

* Update 7B.yml

* Update NeoXArgs docs automatically

* apply precommit

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* Pythia 14M training on ngc pytorch 24.02 container

* pre-commit

---------

Co-authored-by: Quentin Anthony <[email protected]>
* feat: remove unnecessary bf16 conversions since no collective op is performed

* pre-commit

---------

Co-authored-by: Quentin Anthony <[email protected]>
* ignore markdown for pre-commit

* only ignore end of file and trailing whitespace

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* make inv_freq non-persistent by default

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* feat: deepspeed zero lion support

* feat: bump DeeperSpeed version to one that includes DeepSpeed FusedLion

* feat: bump DeeperSpeed version to include pipeline logging fix

* pre-commit

---------

Co-authored-by: Quentin Anthony <[email protected]>
* Add DeepSpeed MoE

Thanks to dayofthepenguin for extensive testing

Closes #479

* Update NeoXArgs docs automatically

* pre-commit

* Update NeoXArgs docs automatically

---------

Co-authored-by: Yang Zhang <[email protected]>
Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* Update requirements.txt

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
- Eliminate already installed apt packages
- sparse attn requirement lead to a triton downgrade
- flash attn is already part of the ngc container (in another version
  that is compatible with TE)
…to set the causal parameter of flash_varlen_qkv_fn to False. Failing to do so will lead to inaccurate results. (#1178)
* initial mamba support (no kernels, no parallelism)

* Mamba runs! Also, add flags for sel. scan and conv1d fused kernels

* Update NeoXArgs docs automatically

* add mamba_inner_fn ; try really hard to make A_log and D no-WD and stored in fp32

* cleanup print statements

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

* add draft conversion script (tested working TP=1)

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

* update parallelism checks for mamba--partition activations works

* add mamba requirements

* clean up and better comment mamba code

* clean up and better comment mamba code

* update arg validation in mamba

* more cleanup

* add flag for fp32 Alog/D, add init_methods support for mamba

* Update NeoXArgs docs automatically

* update conversion script name, add docstring

* name conversion script

* Update NeoXArgs docs automatically

* add demo configs

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

* add arguments to control conv and (in,out)_proj biases in mamba separately

* Update NeoXArgs docs automatically

* make x_proj bias also controlled by flag

* Update NeoXArgs docs automatically

* pre-commit, add comments

* Update NeoXArgs docs automatically

* Add mamba import print

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* add cuda support for flash attn w/ alibi, warn of deprecation of triton

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
* TP works!

* merge TP mamba changes with most current MambaLayer

* cleanup TP, confirmed working still

* make shapes with TP>1 work with conversion

* tested and PP works, so no need for assert blocking it in arguments

* update comment

* Update NeoXArgs docs automatically

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* added ds zero.Init() to get_model

* Clean up conditional with block

* pre-commit

---------

Co-authored-by: Quentin Anthony <[email protected]>
* making PR triggered CPU test for changes to megatron

* Update NeoXArgs docs automatically

* pre-commit

* Update NeoXArgs docs automatically

---------

Co-authored-by: github-actions <[email protected]>
Co-authored-by: Quentin Anthony <[email protected]>
* initial JIT load functions

* passing neox_arge to load() as optional for easy testing

* modified headers for correct copyright statements
… init (#1191)

* added ds zero.Init() to get_model

* Clean up conditional with block

* pre-commit

* ensured deepspeed configs are passed to init

---------

Co-authored-by: Quentin Anthony <[email protected]>
@kshitijkg kshitijkg merged commit 5790435 into CERC-AAI:main Apr 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.