Releases: huggingface/optimum-graphcore
v0.7.1: Whisper fine-tuning & group-quantized inference, T5 generation optimizations
What's Changed
- Support for Whisper fine-tuning after a slice assignment bug was fixed.
- Whisper inference can now take advantage of group-quantization, where model parameters are stored in INT4, and decoded into FP16 on-the-fly as needed. The memory saving is estimated at 3.5x with minimal degradation in WER, and can be enabled via the
use_group_quantized_linears
parallelize kwarg. - KV caching and on-device generation is now also available for T5.
- Fixed interleaved training and validation for
IPUSeq2SeqTrainer
. - Added notebooks for Whisper fine-tuning, Whisper group-quantized inference, embeddings models, and BART-L summarization.
- UX improvement that ensures a dataset of sufficient size is provided to the
IPUTrainer
.
Commits
- Support C600 card by @katalinic-gc in #446
- Remove deprecated pod_type argument by @jimypbr in #447
- Fix inference replication factor pod type removal by @katalinic-gc in #448
- T5 enable self-attention kv caching by @kundaMwiza in #449
- Workflows: use explicit venv names and use --clear in creation by @jimypbr in #452
- Workflow: add venv with clear for code quality and doc-builder workflows by @jimypbr in #453
- Support overriding *ExampleTester class attribute values in test_examples.py by @kundaMwiza in #439
- Adding missing license headers and copyrights by @jimypbr in #454
- Fix shift tokens right usage which contains slice assignment by @katalinic-gc in #451
- Base models and notebooks for general IPU embeddings model by @arsalanu in #436
- Fix mt5 translation training ipu config by @kundaMwiza in #456
- Add back source optimum graphcore install in embeddings notebook by @arsalanu in #457
- Add parallelize kwargs as an IPU config entry by @katalinic-gc in #427
- Change tests to point to MPNet ipu config by @arsalanu in #458
- T5 enable generation optimisation by @kundaMwiza in #459
- Fix ipus per replica check in whisper cond encoder by @katalinic-gc in #461
- Check that the dataset has enough examples to fill a batch when creat… by @katalinic-gc in #462
- Add notebook for whisper finetuning by @katalinic-gc in #460
- Use index select in BART positional embedding for better tile placement by @katalinic-gc in #463
- Add group quantization for whisper by @jimypbr in #429
- Change max length adaption messages to debug by @katalinic-gc in #465
- Fix finetuning whisper notebook text by @katalinic-gc in #466
- Fix finetuning whisper notebook text v2 by @katalinic-gc in #467
- Add BART-L text summarization notebook by @jayniep-gc in #464
- Fix evaluate then train by @katalinic-gc in #469
- Use token=False in whisper nb by @katalinic-gc in #470
- Add Whisper inference with quantization notebook by @jimypbr in #468
Full Changelog: v0.7.0...v0.7.1
v0.7.0: SDK3.3, Whisper on 1 IPU, MT5, transformers 4.29
What's Changed
- Optimum has been updated to support Poplar SDK 3.3.
- A new feature in that SDK is the
poptorch.cond
operation, which enables conditional compute. This enabled us to implement some new optimisations. - Using the the new
cond
operation we are able to fit Whisper-tiny encoder and decoder on a single IPU. To enable, pass the optionuse_cond_encoder
to Whisper'sparallelize
method. - Added the option for cross-attention KV caching in Whisper, also using the
cond
op. To enable, pass the optionuse_cross_cache
to Whisper'sparallelize
method. - We added support for the MT5 model for summarisation and translation tasks.
- The version of
transformers
has been updated to 4.29. One of the things this enables in Optimum is Whisper timestamp decoding. - Added
optimum.graphcore.models.whisper.WhisperProcessorTorch
- a faster, drop-in replacement fortransformers.WhisperProcessor
. - The
pod_type
argument, which was deprecated in 0.6.1, has been removed.
Commits
- Fixing links to API references by @jayniep-gc in #391
- Do not override replicated_tensor_sharding in the IPUConfig by @kundaMwiza in #393
- Preserve the set padding idx in SerializedEmbedding by @kundaMwiza in #395
- Add MT5 by @kundaMwiza in #392
- deberta/translation/summarization notebook fixes by @kundaMwiza in #396
- MT5 notebooks: prefix exec cache with mt5 by @kundaMwiza in #397
- Flan-T5 Notebook Formatting Tweaks by @hmellor in #398
- Add cross KV caching by @katalinic-gc in #329
- Beam search adjustment by @katalinic-gc in #394
- Updating Whisper notebook so it uses new SDK and new features by @lukem-gc in #399
- Add
padding_idx
to appropriate embedding split by @hmellor in #403 - Bump transformers to 4.29.2 by @katalinic-gc in #389
- Fix Whisper processor torch with transformers 4.29.2 bump by @katalinic-gc in #405
- Fix Stable Diffusion notebooks by @hmellor in #408
- Add IPU support for HF pipelines to Whisper by @paolot-gc in #368
- Throw error is kwargs isn't empty by end of init by @hmellor in #406
- Add Whisper pipeline tests by @katalinic-gc in #409
- Enable fine-tuning of
whisper-tiny
by @hmellor in #400 - Fix issue where exe cache dir was set too late by @hmellor in #411
- Enable generation tests by @kundaMwiza in #407
- Add Seq2Seq trainer test by @kundaMwiza in #404
- Use the generation config to control generation by @katalinic-gc in #410
- Add support for Whisper timestamp decoding with on-device generation by @katalinic-gc in #413
- Fix IPUWhisperTimeStampLogitsProcessor for beam search by @katalinic-gc in #414
- Remove usage of deprecated config:
pod_type
by @hmellor in #416 - Fix
matmul_proportion
ManagedAttribute
usage by @hmellor in #415 - Enable Whisper encoder and decoder to run on 1 IPU by @katalinic-gc in #418
- Enable replication with on device text generation by @katalinic-gc in #420
- Update doc workflows by @regisss in #417
- Update whisper pipeline example for latest features by @katalinic-gc in #421
- Fix text encoder for SD with 4.29 bump by @katalinic-gc in #424
- Use the faster whisper feature extractor in whisper pipelines by @katalinic-gc in #423
- Remove engine references from SD pipelines by @katalinic-gc in #422
- Add support for
whisper-small
fine-tuning by @hmellor in #426 - Use index select for whisper position embedding for better tile utili… by @katalinic-gc in #435
- Print execution time of each example test by @kundaMwiza in #440
- SplitProjection layer: Add output channels serialization mode by @kundaMwiza in #438
- 3.3 Examples CI Fixes by @jimypbr in #443
- Support T5EncoderModel for t5-based embedding models by @alex-coniasse in #437
- Integrate whisper large into the existing notebook by @alex-coniasse in #441
- Bump SDK version to 3.3 in the github workflows by @jimypbr in #444
- Update examples requirements for sdk3.3 by @jimypbr in #434
Full Changelog: v0.6.1...v0.7.0
v0.6.1: Faster Whisper/BART inference; Flan-T5; MT5; UX improvements
Faster Text Generation
0.6.1 provides significant speed-ups of up to 9x for Whisper and BART text generation! We have put the entire text generation loop onto IPU and enabled KV caching for self-attention layers.
- Use buffers to cache the encoder hidden states in decoder wrapper by @jimypbr in #285
- Move whisper decoder projection to IPU 0 since there is weight tying by @katalinic-gc in #309
- move the IndexedInputLinear out of the decoder wrapper by @katalinic-gc in #319
- Add generic KV caching support, use it with Whisper by @katalinic-gc in #307
- On device text generation POC for greedy search by @katalinic-gc in #357
- Add on device beam search by @katalinic-gc in #370
- Add attention serialization to the attention mixin and enable it with Whisper by @katalinic-gc in #372
- BART KV-caching + on-device by @jimypbr in #363
- Fix cached_beam_idx check for non on device generation by @katalinic-gc in #378
- Attn mixin improvements by @katalinic-gc in #381
- Add a faster torch based version of the whisper feature extractor by @katalinic-gc in #376
- Fix BART Positional embeddings for generation without caching by @jimypbr in #386
New Models
Fine-tuning of text generation model
Text generation with IPUSeq2SeqTrainer
is now enabled.
- Fix IPUSeq2SeqTrainer for models that have persistent buffers by @kundaMwiza in #337
- Enable generation in notebooks that use IPUSeq2SeqTrainer by @kundaMwiza in #341
- Fix IPUSeq2SeqTrainer for models that have persistent buffers by @kundaMwiza in #337
- Enable generation in notebooks that use IPUSeq2SeqTrainer by @kundaMwiza in #341
- Fix: reparallelize for training after generation by @kundaMwiza in #387
Wav2vec2 Large
- Adding Wav2vec2 Large pretraining and fine-tuning by @atsyplikhin in #323
Flan-T5
Added support for Flan-T5 inference. This comes with numerical fixes to T5 for running in float16.
- Enable Flan-T5 inference in
float16
by @hmellor in #296 - Add Flan-T5 notebook by @hmellor in #318
- T5 revert fp16 clamping removal by @kundaMwiza in #332
- Skip equal check for denormals in known T5 layer by @hmellor in #383
MT5
Added MT5 model, MT5ForConditionalGeneration
. To support this, two new options were added to IPUConfig
:
serialized_projection_splits_per_ipu
: (List[int]
, optional, defaults to None):
Specifies the number of splits of the embedding layer that will be put on each IPU for pipelined execution.
The format has to be the same as that forlayers_per_ipu
however wildcards are not supported.
For instance:[3, 1, 0, 0]
specifies how to place an embedding layer serialized into
4 sub-embedding layers across a 4-IPU pipeline. IPU-1 has 3 splits and IPU-2 has 1 split.projection_serialization_factor
: (int
, optional, defaults to 1 ifserialized_projection_splits_per_ipu
isNone
):
The factor to use to either serialize the matmuls that are performed in the linear projection layer, or,
serialize the projection layer into a set of individual linear layers that can be optionally placed on different IPUs.
Nothing happens ifprojection_serialization_factor = 1
.
PRs:
- Support sharding serialized layers across ipus by @kundaMwiza in #355
- Add MT5 model and fine-tuning notebook by @kundaMwiza in #392
HubertForCTC
- Add support for HubertForCTC by @jimypbr in #347
- Change hyper-parameters to fix Hubert for CTC CI by @jimypbr in #390
User Experience
The pod_type
argument to IPUTrainingArguments
has now been deprecated and replaced by n_ipu
. Consequently, pod_type
dictionary values of IPUConfig
are no longer supported.
- Pod type sets replication factor by @rahult-graphcore in #271
IPUConfig
now supports inference_
versions of the parameters:
layers_per_ipu
ipus_per_replica
matmul_proportion
serialized_embedding_splits_per_ipu
projection_serialization_factor
serialized_projection_splits_per_ipu
PRs:
- Enable training and inference specific configurations using a single
IPUConfig
by @hmellor in #308 - Matmul proportion support float or len(List[float]) == ipus_per_replica by @kundaMwiza in #375
- Refactor: prefix IPUConfig
ManagedAttribute
s instead of overloading user provided attributes by @kundaMwiza in #366 - Add attribute validation by @kundaMwiza in #371
- Refactor SerializedEmbedding to use to/from_model by @kundaMwiza in #382
Notebooks
- Add narrative to the whisper notebook by @payoto in #312
- Add Flan-T5 notebook by @hmellor in #318
- Deberta notebook to accompany blog post by @lukem-gc in #369
- Add MT5 model and fine-tuning notebook by @kundaMwiza in #392
New Contributors
- @atsyplikhin made their first contribution in #323
- @lukem-gc made their first contribution in #369
Full Changelog: v0.6.0...v0.6.1
v0.6.0: SDK3.2, Text generation, Whisper, Stable Diffusion
Text Generation
This release comes with full support for text generation for GPT2, BART, T5, and Whisper!
- Add text generation support by @jimypbr in #253
- Run encoder on IPU for encoder-decoder text-gen models by @jimypbr in #283
- Efficient decoder text generation wrapper by @jimypbr in #273
- Text Gen slice decoder projection optimisation by @jimypbr in #295
- Add text generation prediction support for IPUSeq2SeqTrainer by @kundaMwiza in #284
IPU pipelined models can call .generate()
. Text generation can also be done with pipelines
.
Stable Diffusion
- We now support Stable Diffusion inference pipelines from Diffusers in
optimum/graphcore/diffusers
@katalinic-gc in #300 - Much improved performance on IPU by running all SD modules on IPU by @katalinic-gc in #274
- Add file for SD ipu configs by @katalinic-gc in #301
UX Improvements
We've improved the usability of IPUConfig
.
- You know longer need to specify both
ipus_per_replica
andlayers_per_ipu
. You can specify just one and the other will be inferred from it: @hmellor in #282 layers_per_ipu
can support a combination of integers and wildcards (-1
) e.g `[1, 1, -1, -1] will put 1 layer each on IPU0 and IPU1, and split the remaining layers evenly between IPU2 and IPU3. If there are an odd number of layers, the extra layer is placed on the last wildcard IPU. @rahult-graphcore in #275
New Models
- Add Groupbert model GroupBert (https://arxiv.org/abs/2106.05822) by @ivansche in #139
- Add Whisper model for inference by @paolot-gc in #262
Notebooks
- Packed bert notebook by @alex-coniasse in #222
- Name Entity Extraction notebook by @anjleeg-gcai in #237
- Whisper notebook for inference by @paolot-gc in #262
Bugfixes
- SerializedEmbedding: override default freeze=True on deserialization by @kundaMwiza in #304
- Fix training mode outputs for roberta and distilbert mlm models by @jimypbr in #254
- Remove the work-around for rebuilding BaseModelOutput in BART and T5 by @jimypbr in #297
- Populate
_hooks
for T5 and BART by @hmellor in #291 - Instantiate optimizer in compile only mode by @kundaMwiza in #292
- Add back removed variables from the wav2vec2 pretraining forward sig by @katalinic-gc in #259
- Pipeline bug fixes by @jimypbr in #260
- Fix PR doc build when the PR comes from a clone with a different name by @regisss in #281
Misc
- Bump transformers to 4.25.1 by @katalinic-gc in #247
- Bump diffusers to 0.12.1 by @katalinic-gc in #302
- Remove deprecated IPU config arguments by @katalinic-gc in #250
- Remove the custom layernorm for convnext by @jimypbr in #255
- Updates for SDK 3.2 by @jimypbr in #256
- Add PopART option that enables the use of models with weights exceeding ~2GB by @hmellor in #277
- Pin the optimum version requirement by @jimypbr in #293
- More concise dev install instructions by @hmellor in #294
New Contributors
- @alex-coniasse made their first contribution in #222
- @ivansche made their first contribution in #139
- @evawGraphcore made their first contribution in #264
- @arsalanu made their first contribution in #266
- @hmellor made their first contribution in #279
- @kundaMwiza made their first contribution in #292
- @paolot-gc made their first contribution in #262
Full Changelog: v0.5.0...v0.6.0
v0.5.0: SDK3.1 + PyTorch 1.13
Changes
- This release makes
optimum-graphcore
compatible with the latest Poplar SDK 3.1 (#239). - Please see Poplar SDK3.1 release notes
- PopTorch is now comptable with PyTorch 1.13 (upgraded from 1.10), all requirements in
optimum-graphcore
have been updated for this. - Small behaviour change:
IPUTrainingArgs
report_to
default is now"none"
instead ofNone
which would default to"all"
. This means that reporting is now opt-in instead of opt-out. (#239)
v0.4.3 Patch Release
v0.4.2: Patch release
Small bug fixes and improvements.
v0.4.1: Patch release
Fixes a bug in the IPUTrainer
breaking the save_model
method (#191).
v0.4.0: PopTorch 3.0, DistilBERT and new notebooks
SDK 3.0
The main feature this release provide is the support of PopTorch version 3.0 (#183), which comes with a new way of tracing PyTorch models: the PyTorch dispatcher is used instead of torch.jit.trace
. Not only tracing is now faster, it is also much more powerful (more about it here).
General
- LAMB does not update the bias parameters anymore (#178)
- The DistilBERT architecture is now supported for the tasks: (#181)
- Masked language modeling
- Multiple choice
- Question answering
- Sequence classification
- Token classification
- The masked language modeling task is now supported by Deberta
- The
IPUTrainer
andIPUTrainingArguments
were synchronized with their transformers counterparts (#179) - Some parameters in the
IPUConfig
were removed:use_popdist
decompose_grad_sum
profile_dir
Bug fixes
- Documentation building fixes
- Wav2vec2 with
dataloder_mode=async_rebatched
fixed (#168)
Notebooks
- Audio classification for HuBERT notebook (#157)
- Language modeling finetuning notebook (#161)
- Question answering notebook (#163)
- Multiple choice notebook (#166)
- A notebook showing how to train a model supported in the librarie (#171)
Documentation
The documentation was updated, and contains more content, for instance:
- The
IPUTrainer
API is described - The
IPUConfig
attributes are explained - A new page explaining how to contribute by adding a new model architecture to the library
v0.3.2: Documentation, Notebooks and Python 3.7+
Documentation
Thanks to @lewtun the building blocks to write and integrate documentation for optimum-graphcore
to the main optimum
documentation is now available.
Notebooks
New notebooks are available:
- Wave2Vec notebooks (#142)
- Summarization notebook (#153)
- Image classification and language modeling notebooks (#152)
Python 3.7+ support
From this release, only Python 3.7 and above are supported.
To use optimum-graphcore
on Python 3.6 and below, please use optimum-graphcore==0.3.1
.