Skip to content

Releases: huggingface/optimum-graphcore

v0.7.1: Whisper fine-tuning & group-quantized inference, T5 generation optimizations

28 Jul 11:36
b6e8169
Compare
Choose a tag to compare

What's Changed

  • Support for Whisper fine-tuning after a slice assignment bug was fixed.
  • Whisper inference can now take advantage of group-quantization, where model parameters are stored in INT4, and decoded into FP16 on-the-fly as needed. The memory saving is estimated at 3.5x with minimal degradation in WER, and can be enabled via the use_group_quantized_linears parallelize kwarg.
  • KV caching and on-device generation is now also available for T5.
  • Fixed interleaved training and validation for IPUSeq2SeqTrainer.
  • Added notebooks for Whisper fine-tuning, Whisper group-quantized inference, embeddings models, and BART-L summarization.
  • UX improvement that ensures a dataset of sufficient size is provided to the IPUTrainer.

Commits

Full Changelog: v0.7.0...v0.7.1

v0.7.0: SDK3.3, Whisper on 1 IPU, MT5, transformers 4.29

13 Jul 11:29
Compare
Choose a tag to compare

What's Changed

  • Optimum has been updated to support Poplar SDK 3.3.
  • A new feature in that SDK is the poptorch.cond operation, which enables conditional compute. This enabled us to implement some new optimisations.
  • Using the the new cond operation we are able to fit Whisper-tiny encoder and decoder on a single IPU. To enable, pass the option use_cond_encoder to Whisper's parallelize method.
  • Added the option for cross-attention KV caching in Whisper, also using the cond op. To enable, pass the option use_cross_cache to Whisper's parallelize method.
  • We added support for the MT5 model for summarisation and translation tasks.
  • The version of transformers has been updated to 4.29. One of the things this enables in Optimum is Whisper timestamp decoding.
  • Added optimum.graphcore.models.whisper.WhisperProcessorTorch - a faster, drop-in replacement for transformers.WhisperProcessor.
  • The pod_type argument, which was deprecated in 0.6.1, has been removed.

Commits

Full Changelog: v0.6.1...v0.7.0

v0.6.1: Faster Whisper/BART inference; Flan-T5; MT5; UX improvements

25 May 15:10
Compare
Choose a tag to compare

Faster Text Generation

0.6.1 provides significant speed-ups of up to 9x for Whisper and BART text generation! We have put the entire text generation loop onto IPU and enabled KV caching for self-attention layers.

New Models

Fine-tuning of text generation model

Text generation with IPUSeq2SeqTrainer is now enabled.

  • Fix IPUSeq2SeqTrainer for models that have persistent buffers by @kundaMwiza in #337
  • Enable generation in notebooks that use IPUSeq2SeqTrainer by @kundaMwiza in #341
  • Fix IPUSeq2SeqTrainer for models that have persistent buffers by @kundaMwiza in #337
  • Enable generation in notebooks that use IPUSeq2SeqTrainer by @kundaMwiza in #341
  • Fix: reparallelize for training after generation by @kundaMwiza in #387

Wav2vec2 Large

Flan-T5

Added support for Flan-T5 inference. This comes with numerical fixes to T5 for running in float16.

MT5

Added MT5 model, MT5ForConditionalGeneration. To support this, two new options were added to IPUConfig:

  • serialized_projection_splits_per_ipu: (List[int], optional, defaults to None):
    Specifies the number of splits of the embedding layer that will be put on each IPU for pipelined execution.
    The format has to be the same as that for layers_per_ipu however wildcards are not supported.
    For instance: [3, 1, 0, 0] specifies how to place an embedding layer serialized into
    4 sub-embedding layers across a 4-IPU pipeline. IPU-1 has 3 splits and IPU-2 has 1 split.
  • projection_serialization_factor: (int, optional, defaults to 1 if serialized_projection_splits_per_ipu is None):
    The factor to use to either serialize the matmuls that are performed in the linear projection layer, or,
    serialize the projection layer into a set of individual linear layers that can be optionally placed on different IPUs.
    Nothing happens if projection_serialization_factor = 1.

PRs:

HubertForCTC

User Experience

The pod_type argument to IPUTrainingArguments has now been deprecated and replaced by n_ipu. Consequently, pod_type dictionary values of IPUConfig are no longer supported.

IPUConfig now supports inference_ versions of the parameters:

  • layers_per_ipu
  • ipus_per_replica
  • matmul_proportion
  • serialized_embedding_splits_per_ipu
  • projection_serialization_factor
  • serialized_projection_splits_per_ipu

PRs:

  • Enable training and inference specific configurations using a single IPUConfig by @hmellor in #308
  • Matmul proportion support float or len(List[float]) == ipus_per_replica by @kundaMwiza in #375
  • Refactor: prefix IPUConfig ManagedAttributes instead of overloading user provided attributes by @kundaMwiza in #366
  • Add attribute validation by @kundaMwiza in #371
  • Refactor SerializedEmbedding to use to/from_model by @kundaMwiza in #382

Notebooks

New Contributors

Full Changelog: v0.6.0...v0.6.1

v0.6.0: SDK3.2, Text generation, Whisper, Stable Diffusion

04 Apr 15:58
Compare
Choose a tag to compare

Text Generation

This release comes with full support for text generation for GPT2, BART, T5, and Whisper!

  • Add text generation support by @jimypbr in #253
  • Run encoder on IPU for encoder-decoder text-gen models by @jimypbr in #283
  • Efficient decoder text generation wrapper by @jimypbr in #273
  • Text Gen slice decoder projection optimisation by @jimypbr in #295
  • Add text generation prediction support for IPUSeq2SeqTrainer by @kundaMwiza in #284

IPU pipelined models can call .generate(). Text generation can also be done with pipelines.

Stable Diffusion

  • We now support Stable Diffusion inference pipelines from Diffusers in optimum/graphcore/diffusers @katalinic-gc in #300
  • Much improved performance on IPU by running all SD modules on IPU by @katalinic-gc in #274
  • Add file for SD ipu configs by @katalinic-gc in #301

UX Improvements

We've improved the usability of IPUConfig.

  • You know longer need to specify both ipus_per_replica and layers_per_ipu. You can specify just one and the other will be inferred from it: @hmellor in #282
  • layers_per_ipu can support a combination of integers and wildcards (-1) e.g `[1, 1, -1, -1] will put 1 layer each on IPU0 and IPU1, and split the remaining layers evenly between IPU2 and IPU3. If there are an odd number of layers, the extra layer is placed on the last wildcard IPU. @rahult-graphcore in #275

New Models

Notebooks

Bugfixes

  • SerializedEmbedding: override default freeze=True on deserialization by @kundaMwiza in #304
  • Fix training mode outputs for roberta and distilbert mlm models by @jimypbr in #254
  • Remove the work-around for rebuilding BaseModelOutput in BART and T5 by @jimypbr in #297
  • Populate _hooks for T5 and BART by @hmellor in #291
  • Instantiate optimizer in compile only mode by @kundaMwiza in #292
  • Add back removed variables from the wav2vec2 pretraining forward sig by @katalinic-gc in #259
  • Pipeline bug fixes by @jimypbr in #260
  • Fix PR doc build when the PR comes from a clone with a different name by @regisss in #281

Misc

New Contributors

Full Changelog: v0.5.0...v0.6.0

v0.5.0: SDK3.1 + PyTorch 1.13

21 Dec 17:28
5e039b7
Compare
Choose a tag to compare

Changes

  • This release makes optimum-graphcore compatible with the latest Poplar SDK 3.1 (#239).
  • Please see Poplar SDK3.1 release notes
  • PopTorch is now comptable with PyTorch 1.13 (upgraded from 1.10), all requirements in optimum-graphcore have been updated for this.
  • Small behaviour change: IPUTrainingArgs report_to default is now "none" instead of None which would default to "all". This means that reporting is now opt-in instead of opt-out. (#239)

v0.4.3 Patch Release

07 Dec 16:49
Compare
Choose a tag to compare

Minor improvements and bug fixes:

  • Disable automatic loss scaling option for inference (#213)
  • Improved error messages when IPU config is not compatible with model (#210)
  • Set enable-half-partials by default to True (#209)

v0.4.2: Patch release

23 Nov 17:26
Compare
Choose a tag to compare

Small bug fixes and improvements.

v0.4.1: Patch release

14 Oct 12:04
Compare
Choose a tag to compare

Fixes a bug in the IPUTrainer breaking the save_model method (#191).

v0.4.0: PopTorch 3.0, DistilBERT and new notebooks

10 Oct 15:02
Compare
Choose a tag to compare

SDK 3.0

The main feature this release provide is the support of PopTorch version 3.0 (#183), which comes with a new way of tracing PyTorch models: the PyTorch dispatcher is used instead of torch.jit.trace. Not only tracing is now faster, it is also much more powerful (more about it here).

General

  • LAMB does not update the bias parameters anymore (#178)
  • The DistilBERT architecture is now supported for the tasks: (#181)
    • Masked language modeling
    • Multiple choice
    • Question answering
    • Sequence classification
    • Token classification
  • The masked language modeling task is now supported by Deberta
  • The IPUTrainer and IPUTrainingArguments were synchronized with their transformers counterparts (#179)
  • Some parameters in the IPUConfig were removed:
    • use_popdist
    • decompose_grad_sum
    • profile_dir

Bug fixes

  • Documentation building fixes
  • Wav2vec2 with dataloder_mode=async_rebatched fixed (#168)

Notebooks

  • Audio classification for HuBERT notebook (#157)
  • Language modeling finetuning notebook (#161)
  • Question answering notebook (#163)
  • Multiple choice notebook (#166)
  • A notebook showing how to train a model supported in the librarie (#171)

Documentation

The documentation was updated, and contains more content, for instance:

  • The IPUTrainer API is described
  • The IPUConfig attributes are explained
  • A new page explaining how to contribute by adding a new model architecture to the library

v0.3.2: Documentation, Notebooks and Python 3.7+

10 Aug 13:33
Compare
Choose a tag to compare

Documentation

Thanks to @lewtun the building blocks to write and integrate documentation for optimum-graphcore to the main optimum documentation is now available.

Notebooks

New notebooks are available:

  • Wave2Vec notebooks (#142)
  • Summarization notebook (#153)
  • Image classification and language modeling notebooks (#152)

Python 3.7+ support

From this release, only Python 3.7 and above are supported.
To use optimum-graphcore on Python 3.6 and below, please use optimum-graphcore==0.3.1.

Misc

  • Layerdrop is now supported for HuBERT (#149)
  • It is possible to provide an eval_data_collator (#120)
  • pad_on_batch_axis collator now allows to train / eval models on datasets not dividing the combined batch size by repeating samples for the batch to reach the proper size (#154)