Upgrade to v0.10.0 #1427

hlahkar · 2024-08-05T07:21:33Z

This PR upgrades the Habana support to llm-foundry v0.10.0

…al when the checkpoint is ready (mosaicml#813) * working without sharded checkpointing.. * add more debugs * try this * more debugging * yikes dumb bug * add notes * fixes * remove prints * small updates * fix typo * refactor * fix docstring formatting * fighting with docstrings * try this * add unit tests * point to composer update * values -> items * serialize time * fix merge * nits * warning, small comment update * add error --------- Co-authored-by: Daniel King <[email protected]>

Co-authored-by: Daniel King <[email protected]>

* if condition in tie weights added * unit test for tie weights

* add oom observer callback * fix format

This reverts commit e3f214e.

Co-authored-by: Irene Dea <[email protected]>

…g; Make ComposerHFT5 experimental (mosaicml#1007) * Deprecate features and mark experimental * fix typo --------- Co-authored-by: Daniel King <[email protected]>

* add magic filename for sharded state dicts * Update scripts/train/train.py Co-authored-by: Daniel King <[email protected]> * oops forgot to push this * no shard if no fsdp * default to full on foundry --------- Co-authored-by: Daniel King <[email protected]>

* fix bug on metrics * lint * lint * add unit test * lint

Red button because CI running jobs it doesn't need. Tests passed on main.

…l#1003)

Co-authored-by: Daniel King <[email protected]>

* add retry * pyright * slight refactor --------- Co-authored-by: v-chen_data <[email protected]>

* bumping mlflow version to include buffering * capping at mlflow 2.15

…#1301) * ignore logger if excephook is active * remove logger in data scripts and callback * undo format of imports * moved env var check into helper * formatted * removed import format * added docstring * ran pre-commit --------- Co-authored-by: Daniel King <[email protected]>

…cml#1315) * Update config_utils.py * lint

…iding window, reuse prev layer kv cache etc. (mosaicml#1299) * [WIP] Allows interweaving of arbitrary kinds of 'attention' layers, like RNN, sliding window etc. * lint * applying overrides to blocks rather than just attentions * add docstring * minor * changing yaml specification style * .. * fixes * fix * fix * fix * refactoring * add warning * compute only query vector when reusing kv * refactor * fixing * adding test for reusing previous layer kv cache * adding error messages * .. * adding test * add logging * adding logging * minor * bug fix, adding test * minor * addressing some comments * addressing some comments * setting absolute absolute value for reuse_kv_layer_idx * lint * adding tests for override_block_args * adding error if reusing kv cache from a mismatch layer * fixing test * fixing code, test * fix * .. * refactoring * fix * .. * .. * .. * refactoring * .. * .. * .. * adding test for _get_modules_order_expanded * fixing test * fixing test * lint * lint * adding test * addressing comment * .. * fixing test * changing yaml format * fix configuation * fixing test * allowing repeat at top level * allowing overriding error * addressing comments * lint * addressing comments * fix * .. * .. * .. * .. * .. * addressing comment * fixing test

--------- Co-authored-by: Mihir Patel <[email protected]> Co-authored-by: Daniel King <[email protected]>

…peed. (mosaicml#527)

cli99 and others added 30 commits February 15, 2024 02:53

improve error msg when checking target_blocks in activation_checkpoin…

2e8982e

…ting_target (mosaicml#977)

Torch 2.2 upgrade - Part 1 (mosaicml#976)

1ef7409

Torch 2.2 - Part 2 (mosaicml#979)

e0756e1

PyTorch 2.2 - Part 3 (mosaicml#981)

da2c863

Remove torch 2.1 from docker build (mosaicml#982)

3a99270

Token accuracy metrics (mosaicml#983)

2431730

do not mention 1.13 in readme (mosaicml#988)

63c88d0

Co-authored-by: Daniel King <[email protected]>

Patch test, lock mcli version (mosaicml#990)

dff2cf4

Bump gha timeouts (mosaicml#991)

386ae36

Fix readme typo (mosaicml#993)

2478f0a

if condition in tie weights added (mosaicml#989)

e5fffac

* if condition in tie weights added * unit test for tie weights

bump composer version (mosaicml#995)

44fd365

Trim examples ahead of time for auto packing (mosaicml#994)

d527c9b

add oom observer callback (mosaicml#932)

b082511

* add oom observer callback * fix format

Change ci/cd to use ci-testing repo

e3f214e

Revert "Change ci/cd to use ci-testing repo"

5abbca0

This reverts commit e3f214e.

Use ci-testing repo (mosaicml#1000)

2436c00

Co-authored-by: Irene Dea <[email protected]>

Make CodeEval respect device_eval_batch_size (mosaicml#956)

d104d16

Remove try except around imports (mosaicml#1004)

2dea737

Deprecate triton, prefix lm, llama attention patch, and text denoisin…

3880d04

…g; Make ComposerHFT5 experimental (mosaicml#1007) * Deprecate features and mark experimental * fix typo --------- Co-authored-by: Daniel King <[email protected]>

bump (mosaicml#1009)

cbdddf0

Fix evaluators actually pulling eval metrics (mosaicml#1006)

09ff550

* fix bug on metrics * lint * lint * add unit test * lint

Build torch 2.2.1 images (mosaicml#1010)

fd8cbaf

add 2.2.1 tests (mosaicml#1011)

5728969

Bump min torch pin (mosaicml#1013)

f4f6414

Red button because CI running jobs it doesn't need. Tests passed on main.

Fix extra BOS token in front of response for some tokenizers (mosaicm…

cf0f5e5

…l#1003)

Bump min composer pin (mosaicml#1015)

86c8746

add default for eval interval (mosaicml#987)

5261a55

Co-authored-by: Daniel King <[email protected]>

dakinggg and others added 29 commits June 21, 2024 19:30

Allow passing in lbl_process_group directly (mosaicml#1298)

2196d07

Add all transforms to train script (mosaicml#1300)

8b5a1bb

Add Retries to run_query (mosaicml#1302)

fd7b187

* add retry * pyright * slight refactor --------- Co-authored-by: v-chen_data <[email protected]>

Bumping mlflow version to include buffering (mosaicml#1303)

2267bc7

* bumping mlflow version to include buffering * capping at mlflow 2.15

Add curriculum learning callback (mosaicml#1256)

ef14849

Avoid circular import in hf checkpointer (mosaicml#1304)

2412b59

Remove CodeQL workflow (mosaicml#1305)

bbfebda

Update CI test to v0.0.8 (mosaicml#1306)

901eee3

update (mosaicml#1307)

3edce07

bump ci-testing to 0.0.9 (mosaicml#1310)

14348fa

Fix 4 gpu tests (mosaicml#1311)

472d009

2.3.1 (mosaicml#1312)

f141ee1

Provide default seed value in TrainConfig, matching EvalConfig (mosai…

0ebd7c9

…cml#1315) * Update config_utils.py * lint

Refactor hf checkpointer (mosaicml#1318)

88511f7

Add optional logging of text output to EvalOutputLogging (mosaicml#1283)

68c2625

--------- Co-authored-by: Mihir Patel <[email protected]> Co-authored-by: Daniel King <[email protected]>

Update version to release version

742f340

Add support to run MPT-1b training on Habana device (HPU) using DeepS…

e99ec07

…peed. (mosaicml#527)

wip

48b239e

add act ckpt

a76d824

fix

d238edc

cleanup

b723c25

update README

820221a

update reqs to 1.13

a69273d

Add Model sharding support with deepspeed (mosaicml#836)

8ec8716

Make config compliant to 0.10.0

de1240d

Update ds_gaudi.sh

51beeb0

Update README

1939ae4

dakinggg assigned abhi-mosaic Aug 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Upgrade to v0.10.0 #1427

Upgrade to v0.10.0 #1427

hlahkar commented Aug 5, 2024

Upgrade to v0.10.0 #1427

Are you sure you want to change the base?

Upgrade to v0.10.0 #1427

Conversation

hlahkar commented Aug 5, 2024