Releases · huggingface/optimum-graphcore

28 Jul 11:36

katalinic-gc

v0.7.1

b6e8169

v0.7.1: Whisper fine-tuning & group-quantized inference, T5 generation optimizations Latest

Latest

What's Changed

Support for Whisper fine-tuning after a slice assignment bug was fixed.
Whisper inference can now take advantage of group-quantization, where model parameters are stored in INT4, and decoded into FP16 on-the-fly as needed. The memory saving is estimated at 3.5x with minimal degradation in WER, and can be enabled via the use_group_quantized_linears parallelize kwarg.
KV caching and on-device generation is now also available for T5.
Fixed interleaved training and validation for IPUSeq2SeqTrainer.
Added notebooks for Whisper fine-tuning, Whisper group-quantized inference, embeddings models, and BART-L summarization.
UX improvement that ensures a dataset of sufficient size is provided to the IPUTrainer.

Commits

Support C600 card by @katalinic-gc in #446
Remove deprecated pod_type argument by @jimypbr in #447
Fix inference replication factor pod type removal by @katalinic-gc in #448
T5 enable self-attention kv caching by @kundaMwiza in #449
Workflows: use explicit venv names and use --clear in creation by @jimypbr in #452
Workflow: add venv with clear for code quality and doc-builder workflows by @jimypbr in #453
Support overriding *ExampleTester class attribute values in test_examples.py by @kundaMwiza in #439
Adding missing license headers and copyrights by @jimypbr in #454
Fix shift tokens right usage which contains slice assignment by @katalinic-gc in #451
Base models and notebooks for general IPU embeddings model by @arsalanu in #436
Fix mt5 translation training ipu config by @kundaMwiza in #456
Add back source optimum graphcore install in embeddings notebook by @arsalanu in #457
Add parallelize kwargs as an IPU config entry by @katalinic-gc in #427
Change tests to point to MPNet ipu config by @arsalanu in #458
T5 enable generation optimisation by @kundaMwiza in #459
Fix ipus per replica check in whisper cond encoder by @katalinic-gc in #461
Check that the dataset has enough examples to fill a batch when creat… by @katalinic-gc in #462
Add notebook for whisper finetuning by @katalinic-gc in #460
Use index select in BART positional embedding for better tile placement by @katalinic-gc in #463
Add group quantization for whisper by @jimypbr in #429
Change max length adaption messages to debug by @katalinic-gc in #465
Fix finetuning whisper notebook text by @katalinic-gc in #466
Fix finetuning whisper notebook text v2 by @katalinic-gc in #467
Add BART-L text summarization notebook by @jayniep-gc in #464
Fix evaluate then train by @katalinic-gc in #469
Use token=False in whisper nb by @katalinic-gc in #470
Add Whisper inference with quantization notebook by @jimypbr in #468

Full Changelog: v0.7.0...v0.7.1

Contributors

jimypbr, kundaMwiza, and 3 other contributors

Assets 2

13 Jul 11:29

jimypbr

v0.7.0

a8fddcf

v0.7.0: SDK3.3, Whisper on 1 IPU, MT5, transformers 4.29

What's Changed

Optimum has been updated to support Poplar SDK 3.3.
A new feature in that SDK is the poptorch.cond operation, which enables conditional compute. This enabled us to implement some new optimisations.
Using the the new cond operation we are able to fit Whisper-tiny encoder and decoder on a single IPU. To enable, pass the option use_cond_encoder to Whisper's parallelize method.
Added the option for cross-attention KV caching in Whisper, also using the cond op. To enable, pass the option use_cross_cache to Whisper's parallelize method.
We added support for the MT5 model for summarisation and translation tasks.
The version of transformers has been updated to 4.29. One of the things this enables in Optimum is Whisper timestamp decoding.
Added optimum.graphcore.models.whisper.WhisperProcessorTorch - a faster, drop-in replacement for transformers.WhisperProcessor.
The pod_type argument, which was deprecated in 0.6.1, has been removed.

Commits

Fixing links to API references by @jayniep-gc in #391
Do not override replicated_tensor_sharding in the IPUConfig by @kundaMwiza in #393
Preserve the set padding idx in SerializedEmbedding by @kundaMwiza in #395
Add MT5 by @kundaMwiza in #392
deberta/translation/summarization notebook fixes by @kundaMwiza in #396
MT5 notebooks: prefix exec cache with mt5 by @kundaMwiza in #397
Flan-T5 Notebook Formatting Tweaks by @hmellor in #398
Add cross KV caching by @katalinic-gc in #329
Beam search adjustment by @katalinic-gc in #394
Updating Whisper notebook so it uses new SDK and new features by @lukem-gc in #399
Add padding_idx to appropriate embedding split by @hmellor in #403
Bump transformers to 4.29.2 by @katalinic-gc in #389
Fix Whisper processor torch with transformers 4.29.2 bump by @katalinic-gc in #405
Fix Stable Diffusion notebooks by @hmellor in #408
Add IPU support for HF pipelines to Whisper by @paolot-gc in #368
Throw error is kwargs isn't empty by end of init by @hmellor in #406
Add Whisper pipeline tests by @katalinic-gc in #409
Enable fine-tuning of whisper-tiny by @hmellor in #400
Fix issue where exe cache dir was set too late by @hmellor in #411
Enable generation tests by @kundaMwiza in #407
Add Seq2Seq trainer test by @kundaMwiza in #404
Use the generation config to control generation by @katalinic-gc in #410
Add support for Whisper timestamp decoding with on-device generation by @katalinic-gc in #413
Fix IPUWhisperTimeStampLogitsProcessor for beam search by @katalinic-gc in #414
Remove usage of deprecated config: pod_type by @hmellor in #416
Fix matmul_proportion ManagedAttribute usage by @hmellor in #415
Enable Whisper encoder and decoder to run on 1 IPU by @katalinic-gc in #418
Enable replication with on device text generation by @katalinic-gc in #420
Update doc workflows by @regisss in #417
Update whisper pipeline example for latest features by @katalinic-gc in #421
Fix text encoder for SD with 4.29 bump by @katalinic-gc in #424
Use the faster whisper feature extractor in whisper pipelines by @katalinic-gc in #423
Remove engine references from SD pipelines by @katalinic-gc in #422
Add support for whisper-small fine-tuning by @hmellor in #426
Use index select for whisper position embedding for better tile utili… by @katalinic-gc in #435
Print execution time of each example test by @kundaMwiza in #440
SplitProjection layer: Add output channels serialization mode by @kundaMwiza in #438
3.3 Examples CI Fixes by @jimypbr in #443
Support T5EncoderModel for t5-based embedding models by @alex-coniasse in #437
Integrate whisper large into the existing notebook by @alex-coniasse in #441
Bump SDK version to 3.3 in the github workflows by @jimypbr in #444
Update examples requirements for sdk3.3 by @jimypbr in #434

Full Changelog: v0.6.1...v0.7.0

Contributors

jimypbr, regisss, and 7 other contributors

Assets 2

25 May 15:10

jimypbr

v0.6.1

aa28093

v0.6.1: Faster Whisper/BART inference; Flan-T5; MT5; UX improvements

Faster Text Generation

0.6.1 provides significant speed-ups of up to 9x for Whisper and BART text generation! We have put the entire text generation loop onto IPU and enabled KV caching for self-attention layers.

Use buffers to cache the encoder hidden states in decoder wrapper by @jimypbr in #285
Move whisper decoder projection to IPU 0 since there is weight tying by @katalinic-gc in #309
move the IndexedInputLinear out of the decoder wrapper by @katalinic-gc in #319
Add generic KV caching support, use it with Whisper by @katalinic-gc in #307
On device text generation POC for greedy search by @katalinic-gc in #357
Add on device beam search by @katalinic-gc in #370
Add attention serialization to the attention mixin and enable it with Whisper by @katalinic-gc in #372
BART KV-caching + on-device by @jimypbr in #363
Fix cached_beam_idx check for non on device generation by @katalinic-gc in #378
Attn mixin improvements by @katalinic-gc in #381
Add a faster torch based version of the whisper feature extractor by @katalinic-gc in #376
Fix BART Positional embeddings for generation without caching by @jimypbr in #386

New Models

Fine-tuning of text generation model

Text generation with IPUSeq2SeqTrainer is now enabled.

Fix IPUSeq2SeqTrainer for models that have persistent buffers by @kundaMwiza in #337
Enable generation in notebooks that use IPUSeq2SeqTrainer by @kundaMwiza in #341
Fix IPUSeq2SeqTrainer for models that have persistent buffers by @kundaMwiza in #337
Enable generation in notebooks that use IPUSeq2SeqTrainer by @kundaMwiza in #341
Fix: reparallelize for training after generation by @kundaMwiza in #387

Wav2vec2 Large

Adding Wav2vec2 Large pretraining and fine-tuning by @atsyplikhin in #323

Flan-T5

Added support for Flan-T5 inference. This comes with numerical fixes to T5 for running in float16.

Enable Flan-T5 inference in float16 by @hmellor in #296
Add Flan-T5 notebook by @hmellor in #318
T5 revert fp16 clamping removal by @kundaMwiza in #332
Skip equal check for denormals in known T5 layer by @hmellor in #383

MT5

Added MT5 model, MT5ForConditionalGeneration. To support this, two new options were added to IPUConfig:

serialized_projection_splits_per_ipu: (List[int], optional, defaults to None):
Specifies the number of splits of the embedding layer that will be put on each IPU for pipelined execution.
The format has to be the same as that for layers_per_ipu however wildcards are not supported.
For instance: [3, 1, 0, 0] specifies how to place an embedding layer serialized into
4 sub-embedding layers across a 4-IPU pipeline. IPU-1 has 3 splits and IPU-2 has 1 split.
projection_serialization_factor: (int, optional, defaults to 1 if serialized_projection_splits_per_ipu is None):
The factor to use to either serialize the matmuls that are performed in the linear projection layer, or,
serialize the projection layer into a set of individual linear layers that can be optionally placed on different IPUs.
Nothing happens if projection_serialization_factor = 1.

PRs:

Support sharding serialized layers across ipus by @kundaMwiza in #355
Add MT5 model and fine-tuning notebook by @kundaMwiza in #392

HubertForCTC

Add support for HubertForCTC by @jimypbr in #347
Change hyper-parameters to fix Hubert for CTC CI by @jimypbr in #390

User Experience

The pod_type argument to IPUTrainingArguments has now been deprecated and replaced by n_ipu. Consequently, pod_type dictionary values of IPUConfig are no longer supported.

Pod type sets replication factor by @rahult-graphcore in #271

IPUConfig now supports inference_ versions of the parameters:

layers_per_ipu
ipus_per_replica
matmul_proportion
serialized_embedding_splits_per_ipu
projection_serialization_factor
serialized_projection_splits_per_ipu

PRs:

Enable training and inference specific configurations using a single IPUConfig by @hmellor in #308
Matmul proportion support float or len(List[float]) == ipus_per_replica by @kundaMwiza in #375
Refactor: prefix IPUConfig ManagedAttributes instead of overloading user provided attributes by @kundaMwiza in #366
Add attribute validation by @kundaMwiza in #371
Refactor SerializedEmbedding to use to/from_model by @kundaMwiza in #382

Notebooks

Add narrative to the whisper notebook by @payoto in #312
Add Flan-T5 notebook by @hmellor in #318
Deberta notebook to accompany blog post by @lukem-gc in #369
Add MT5 model and fine-tuning notebook by @kundaMwiza in #392

New Contributors

@atsyplikhin made their first contribution in #323
@lukem-gc made their first contribution in #369

Full Changelog: v0.6.0...v0.6.1

Contributors

atsyplikhin, jimypbr, and 6 other contributors

Assets 2

04 Apr 15:58

jimypbr

v0.6.0

2a530ba

v0.6.0: SDK3.2, Text generation, Whisper, Stable Diffusion

Text Generation

This release comes with full support for text generation for GPT2, BART, T5, and Whisper!

Add text generation support by @jimypbr in #253
Run encoder on IPU for encoder-decoder text-gen models by @jimypbr in #283
Efficient decoder text generation wrapper by @jimypbr in #273
Text Gen slice decoder projection optimisation by @jimypbr in #295
Add text generation prediction support for IPUSeq2SeqTrainer by @kundaMwiza in #284

IPU pipelined models can call .generate(). Text generation can also be done with pipelines.

Stable Diffusion

We now support Stable Diffusion inference pipelines from Diffusers in optimum/graphcore/diffusers @katalinic-gc in #300
Much improved performance on IPU by running all SD modules on IPU by @katalinic-gc in #274
Add file for SD ipu configs by @katalinic-gc in #301

UX Improvements

We've improved the usability of IPUConfig.

You know longer need to specify both ipus_per_replica and layers_per_ipu. You can specify just one and the other will be inferred from it: @hmellor in #282
layers_per_ipu can support a combination of integers and wildcards (-1) e.g `[1, 1, -1, -1] will put 1 layer each on IPU0 and IPU1, and split the remaining layers evenly between IPU2 and IPU3. If there are an odd number of layers, the extra layer is placed on the last wildcard IPU. @rahult-graphcore in #275

New Models

Add Groupbert model GroupBert (https://arxiv.org/abs/2106.05822) by @ivansche in #139
Add Whisper model for inference by @paolot-gc in #262

Notebooks

Packed bert notebook by @alex-coniasse in #222
Name Entity Extraction notebook by @anjleeg-gcai in #237
Whisper notebook for inference by @paolot-gc in #262

Bugfixes

SerializedEmbedding: override default freeze=True on deserialization by @kundaMwiza in #304
Fix training mode outputs for roberta and distilbert mlm models by @jimypbr in #254
Remove the work-around for rebuilding BaseModelOutput in BART and T5 by @jimypbr in #297
Populate _hooks for T5 and BART by @hmellor in #291
Instantiate optimizer in compile only mode by @kundaMwiza in #292
Add back removed variables from the wav2vec2 pretraining forward sig by @katalinic-gc in #259
Pipeline bug fixes by @jimypbr in #260
Fix PR doc build when the PR comes from a clone with a different name by @regisss in #281

Misc

Bump transformers to 4.25.1 by @katalinic-gc in #247
Bump diffusers to 0.12.1 by @katalinic-gc in #302
Remove deprecated IPU config arguments by @katalinic-gc in #250
Remove the custom layernorm for convnext by @jimypbr in #255
Updates for SDK 3.2 by @jimypbr in #256
Add PopART option that enables the use of models with weights exceeding ~2GB by @hmellor in #277
Pin the optimum version requirement by @jimypbr in #293
More concise dev install instructions by @hmellor in #294

New Contributors

@alex-coniasse made their first contribution in #222
@ivansche made their first contribution in #139
@evawGraphcore made their first contribution in #264
@arsalanu made their first contribution in #266
@hmellor made their first contribution in #279
@kundaMwiza made their first contribution in #292
@paolot-gc made their first contribution in #262

Full Changelog: v0.5.0...v0.6.0

Contributors

jimypbr, regisss, and 10 other contributors

Assets 2

21 Dec 17:28

jimypbr

v0.5.0

5e039b7

v0.5.0: SDK3.1 + PyTorch 1.13

Changes

This release makes optimum-graphcore compatible with the latest Poplar SDK 3.1 (#239).
Please see Poplar SDK3.1 release notes
PopTorch is now comptable with PyTorch 1.13 (upgraded from 1.10), all requirements in optimum-graphcore have been updated for this.
Small behaviour change: IPUTrainingArgs report_to default is now "none" instead of None which would default to "all". This means that reporting is now opt-in instead of opt-out. (#239)

Assets 2

07 Dec 16:49

michaelbenayoun

v0.4.3

e458a6f

v0.4.3 Patch Release

Minor improvements and bug fixes:

Disable automatic loss scaling option for inference (#213)
Improved error messages when IPU config is not compatible with model (#210)
Set enable-half-partials by default to True (#209)

Assets 2

23 Nov 17:26

michaelbenayoun

v0.4.2

1a498a0

v0.4.2: Patch release

Small bug fixes and improvements.

Assets 2

14 Oct 12:04

michaelbenayoun

v0.4.1

2c15426

v0.4.1: Patch release

Fixes a bug in the IPUTrainer breaking the save_model method (#191).

Assets 2

10 Oct 15:02

michaelbenayoun

v0.4.0

e679336

v0.4.0: PopTorch 3.0, DistilBERT and new notebooks

SDK 3.0

The main feature this release provide is the support of PopTorch version 3.0 (#183), which comes with a new way of tracing PyTorch models: the PyTorch dispatcher is used instead of torch.jit.trace. Not only tracing is now faster, it is also much more powerful (more about it here).

General

LAMB does not update the bias parameters anymore (#178)
The DistilBERT architecture is now supported for the tasks: (#181)
- Masked language modeling
- Multiple choice
- Question answering
- Sequence classification
- Token classification
The masked language modeling task is now supported by Deberta
The IPUTrainer and IPUTrainingArguments were synchronized with their transformers counterparts (#179)
Some parameters in the IPUConfig were removed:
- use_popdist
- decompose_grad_sum
- profile_dir

Bug fixes

Documentation building fixes
Wav2vec2 with dataloder_mode=async_rebatched fixed (#168)

Notebooks

Audio classification for HuBERT notebook (#157)
Language modeling finetuning notebook (#161)
Question answering notebook (#163)
Multiple choice notebook (#166)
A notebook showing how to train a model supported in the librarie (#171)

Documentation

The documentation was updated, and contains more content, for instance:

The IPUTrainer API is described
The IPUConfig attributes are explained
A new page explaining how to contribute by adding a new model architecture to the library

Assets 2

10 Aug 13:33

michaelbenayoun

v0.3.2

9e43aa5

v0.3.2: Documentation, Notebooks and Python 3.7+

Documentation

Thanks to @lewtun the building blocks to write and integrate documentation for optimum-graphcore to the main optimum documentation is now available.

Notebooks

New notebooks are available:

Wave2Vec notebooks (#142)
Summarization notebook (#153)
Image classification and language modeling notebooks (#152)

Python 3.7+ support

From this release, only Python 3.7 and above are supported.
To use optimum-graphcore on Python 3.6 and below, please use optimum-graphcore==0.3.1.

Misc

Layerdrop is now supported for HuBERT (#149)
It is possible to provide an eval_data_collator (#120)
pad_on_batch_axis collator now allows to train / eval models on datasets not dividing the combined batch size by repeating samples for the batch to reach the proper size (#154)

Contributors

lewtun

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Commits

Contributors

What's Changed

Commits

Contributors

Faster Text Generation

New Models

Fine-tuning of text generation model

Wav2vec2 Large

Flan-T5

MT5

HubertForCTC

User Experience

Notebooks

New Contributors

Contributors

Text Generation

Stable Diffusion

UX Improvements

New Models

Notebooks

Bugfixes

Misc

New Contributors

Contributors

Changes

SDK 3.0

General

Bug fixes

Notebooks

Documentation

Documentation

Notebooks

Python 3.7+ support

Misc

Contributors

Releases: huggingface/optimum-graphcore

v0.7.1: Whisper fine-tuning & group-quantized inference, T5 generation optimizations

What's Changed

Commits

Contributors

v0.7.0: SDK3.3, Whisper on 1 IPU, MT5, transformers 4.29

What's Changed

Commits

Contributors

v0.6.1: Faster Whisper/BART inference; Flan-T5; MT5; UX improvements

Faster Text Generation

New Models

Fine-tuning of text generation model

Wav2vec2 Large

Flan-T5

MT5

HubertForCTC

User Experience

Notebooks

New Contributors

Contributors

v0.6.0: SDK3.2, Text generation, Whisper, Stable Diffusion

Text Generation

Stable Diffusion

UX Improvements

New Models

Notebooks

Bugfixes

Misc

New Contributors

Contributors

v0.5.0: SDK3.1 + PyTorch 1.13

Changes

v0.4.3 Patch Release

v0.4.2: Patch release

v0.4.1: Patch release

v0.4.0: PopTorch 3.0, DistilBERT and new notebooks

SDK 3.0

General

Bug fixes

Notebooks

Documentation

v0.3.2: Documentation, Notebooks and Python 3.7+

Documentation

Notebooks

Python 3.7+ support

Misc

Contributors