Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade to v0.10.0 #1427

Open
wants to merge 399 commits into
base: habana_alpha
Choose a base branch
from
Open

Conversation

hlahkar
Copy link

@hlahkar hlahkar commented Aug 5, 2024

This PR upgrades the Habana support to llm-foundry v0.10.0

cli99 and others added 30 commits February 15, 2024 02:53
…al when the checkpoint is ready (mosaicml#813)

* working without sharded checkpointing..

* add more debugs

* try this

* more debugging

* yikes dumb bug

* add notes

* fixes

* remove prints

* small updates

* fix typo

* refactor

* fix docstring formatting

* fighting with docstrings

* try this

* add unit tests

* point to composer update

* values -> items

* serialize time

* fix merge

* nits

* warning, small comment update

* add error

---------

Co-authored-by: Daniel King <[email protected]>
* if condition in tie weights added

* unit test for tie weights
* add oom observer callback

* fix format
…g; Make ComposerHFT5 experimental (mosaicml#1007)

* Deprecate features and mark experimental

* fix typo

---------

Co-authored-by: Daniel King <[email protected]>
* add magic filename for sharded state dicts

* Update scripts/train/train.py

Co-authored-by: Daniel King <[email protected]>

* oops forgot to push this

* no shard if no fsdp

* default to full on foundry

---------

Co-authored-by: Daniel King <[email protected]>
* fix bug on metrics

* lint

* lint

* add unit test

* lint
Red button because CI running jobs it doesn't need. Tests passed on main.
dakinggg and others added 29 commits June 21, 2024 19:30
* add retry

* pyright

* slight refactor

---------

Co-authored-by: v-chen_data <[email protected]>
* bumping mlflow version to include buffering

* capping at mlflow 2.15
…#1301)

* ignore logger if excephook is active

* remove logger in data scripts and callback

* undo format of imports

* moved env var check into helper

* formatted

* removed import format

* added docstring

* ran pre-commit

---------

Co-authored-by: Daniel King <[email protected]>
…iding window, reuse prev layer kv cache etc. (mosaicml#1299)

* [WIP] Allows interweaving of arbitrary kinds of 'attention' layers, like RNN, sliding window etc.

* lint

* applying overrides to blocks rather than just attentions

* add docstring

* minor

* changing yaml specification style

* ..

* fixes

* fix

* fix

* fix

* refactoring

* add warning

* compute only query vector when reusing kv

* refactor

* fixing

* adding test for reusing previous layer kv cache

* adding error messages

* ..

* adding test

* add logging

* adding logging

* minor

* bug fix, adding test

* minor

* addressing some comments

* addressing some comments

* setting absolute absolute value for reuse_kv_layer_idx

* lint

* adding tests for override_block_args

* adding error if reusing kv cache from a mismatch layer

* fixing test

* fixing code, test

* fix

* ..

* refactoring

* fix

* ..

* ..

* ..

* refactoring

* ..

* ..

* ..

* adding test for _get_modules_order_expanded

* fixing test

* fixing test

* lint

* lint

* adding test

* addressing comment

* ..

* fixing test

* changing yaml format

* fix configuation

* fixing test

* allowing repeat at top level

* allowing overriding error

* addressing comments

* lint

* addressing comments

* fix

* ..

* ..

* ..

* ..

* ..

* addressing comment

* fixing test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.