Releases · openvinotoolkit/openvino

20 Nov 13:12

artanokhov

2024.5.0

db64e5c

2024.5.0 Latest

Latest

Summary of major features and improvements  

More Gen AI coverage and framework integrations to minimize code changes
- New models supported: Llama 3.2 (1B & 3B), Gemma 2 (2B & 9B), and YOLO11.
- LLM support on NPU: Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3
- Noteworthy notebooks added: Sam2, Llama3.2, Llama3.2 - Vision, Wav2Lip, Whisper, and Llava.
- Preview: support for Flax, a high-performance Python neural network library based on JAX. Its modular design allows for easy customization and accelerated inference on GPUs.
Broader Large Language Model (LLM) support and more model compression techniques.
- Optimizations for built-in GPUs on Intel® Core™ Ultra Processors (Series 1) and Intel® Arc™ Graphics include KV Cache compression for memory reduction along with improved usability, and model load time optimizations to improve first token latency for LLMs..
- Dynamic quantization was enabled to improve first token latency for LLMs on built-in Intel® GPUs without impacting accuracy on Intel® Core™ Ultra Processors (Series 1). Second token latency will also improve for large batch inference.
- A new method to generate synthetic text data is implemented in the Neural Network Compression Framework (NNCF). This will allow LLMs to be compressed more accurately using data-aware methods without datasets. Coming soon: This feature will soon be accessible via Optimum Intel on Hugging Face.
More portability and performance to run AI at the edge, in the cloud, or locally.
- Support for Intel® Xeon® 6 Processors with P-cores (formerly codenamed Granite Rapids) and Intel® Core™ Ultra 200V series processors (formerly codenamed Arrow Lake-S).
- Preview: GenAI API enables multimodal AI deployment with support for multimodal pipelines for improved contextual awareness, transcription pipelines for easy audio-to-text conversions, and image generation pipelines for streamlined text-to-visual conversions..
- Speculative decoding feature added to the GenAI API for improved performance and efficient text generation using a small draft model that is periodically corrected by the full-size model.
- Preview: LoRA adapters are now supported in the GenAI API for developers to quickly and efficiently customize image and text generation models for specialized tasks.
- The GenAI API now also supports LLMs on NPU allowing developers to specify NPU as the target device, specifically for WhisperPipeline (for whisper-base, whisper-medium, and whisper-small) and LLMPipeline (for Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3 Mini-instruct). Use driver version 32.0.100.3104 or later for best performance.

Support Change and Deprecation Notices

Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
Discontinued in 2024.0:
- Runtime components:
  - Intel® Gaussian & Neural Accelerator (Intel® GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
  - OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
  - All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
  - 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
  - Deployment Manager. See installation and deployment guides for current distribution options.
  - Accuracy Checker.
  - Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
  - A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
  - Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
Deprecated and to be removed in the future:
- The macOS x86_64 debug bins will no longer be provided with the OpenVINO toolkit, starting with OpenVINO 2024.5.
- Python 3.8 is no longer supported, starting with OpenVINO 2024.5.
  - As MxNet doesn’t support Python version higher than 3.8, according to the MxNet PyPI project, it is no longer supported by OpenVINO, either.
- Discrete Keem Bay support is no longer supported, starting with OpenVINO 2024.5.
- Support for discrete devices (formerly codenamed Raptor Lake) is no longer available for NPU.

You can find OpenVINO™ toolkit 2024.5 release here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c conda-forge openvino=2024.5.0
OpenVINO™ for Python: pip install openvino==2024.5.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@aku221b
@halm-zenger
@hibahassan1
@hub-bla
@jagadeeshmadinni
@nashez
@tianyiSKY1
@tiebreaker4869

Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://docs.openvino.ai/2024/about-openvino/release-notes-openvino.html

Contributors

jagadeeshmadinni, nashez, and 6 other contributors

Assets 2

19 Sep 12:25

artanokhov

2024.4.0

c3152d3

2024.4.0

Summary of major features and improvements  

More Gen AI coverage and framework integrations to minimize code changes
- Support for GLM-4-9B Chat, MiniCPM-1B, Llama 3 and 3.1, Phi-3-Mini, Phi-3-Medium and YOLOX-s models.
- Noteworthy notebooks added: Florence-2, NuExtract-tiny Structure Extraction, Flux.1 Image Generation, PixArt-α: Photorealistic Text-to-Image Synthesis, and Phi-3-Vision Visual Language Assistant.
Broader Large Language Model (LLM) support and more model compression techniques.
- OpenVINO™ runtime optimized for Intel® Xe Matrix Extensions (Intel® XMX) systolic arrays on built-in GPUs for efficient matrix multiplication resulting in significant LLM performance boost with improved 1st and 2nd token latency, as well as a smaller memory footprint on Intel® Core™ Ultra Processors (Series 2).
- Memory sharing enabled for NPUs on Intel® Core™ Ultra Processors (Series 2) for efficient pipeline integration without memory copy overhead.
- Addition of the PagedAttention feature for discrete GPUs* enables a significant boost in throughput for parallel inferencing when serving LLMs on Intel® Arc™ Graphics or Intel® Data Center GPU Flex Series.
More portability and performance to run AI at the edge, in the cloud, or locally.
- Support for Intel® Core Ultra Processors Series 2 (formerly codenamed Lunar Lake) on Windows.
- OpenVINO™ Model Server now comes with production-quality support for OpenAI-compatible API which enables significantly higher throughput for parallel inferencing on Intel® Xeon® processors when serving LLMs to many concurrent users.
- Improved performance and memory consumption with prefix caching, KV cache compression, and other optimizations for serving LLMs using OpenVINO™ Model Server.
- Support for Python 3.12.
- Support for Red Hat Enterprise Linux (RHEL) version 9

Support Change and Deprecation Notices

Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
Discontinued in 2024.0:
- Runtime components:
  - Intel® Gaussian & Neural Accelerator (Intel® GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
  - OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
  - All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
  - 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
  - Deployment Manager. See installation and deployment guides for current distribution options.
  - Accuracy Checker.
  - Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
  - A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
  - Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
Deprecated and to be removed in the future:
- The macOS x86_64 debug bins will no longer be provided with the OpenVINO toolkit, starting with OpenVINO 2024.5.
- Python 3.8 is now considered deprecated, and it will not be available beyond the 2024.4 OpenVINO version.
- dKMB support is now considered deprecated and will be fully removed with OpenVINO 2024.5
- Intel® Streaming SIMD Extensions (Intel® SSE) will be supported in source code form, but not enabled in the binary package by default, starting with OpenVINO 2025.0
- The openvino-nightly PyPI module will soon be discontinued. End-users should proceed with the Simple PyPI nightly repo instead. More information in Release Policy.
- The OpenVINO™ Development Tools package (pip install openvino-dev) will be removed from installation options and distribution channels beginning with OpenVINO 2025.0.
- Model Optimizer will be discontinued with OpenVINO 2025.0. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
- OpenVINO property Affinity API will be discontinued with OpenVINO 2025.0. It will be replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
- OpenVINO Model Server components:
  - “auto shape” and “auto batch size” (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.
- A number of notebooks have been deprecated. For an up-to-date listing of available notebooks, refer to the OpenVINO™ Notebook index (openvinotoolkit.github.io).

You can find OpenVINO™ toolkit 2024.4 release here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c conda-forge openvino=2024.4.0
OpenVINO™ for Python: pip install openvino==2024.4.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@hub-bla
@awayzjj
@jvr0123
@Pey-crypto
@nashez
@qxprakash

Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://docs.openvino.ai/2024/about-openvino/release-notes-openvino.html

Contributors

jvr0123, nashez, and 4 other contributors

Assets 2

31 Jul 14:33

artanokhov

2024.3.0

1e3b88e

2024.3.0

Summary of major features and improvements  

More Gen AI coverage and framework integrations to minimize code changes
- OpenVINO pre-optimized models are now available in Hugging Face making it easier for developers to get started with these models.
Broader Large Language Model (LLM) support and more model compression techniques.
- Significant improvement in LLM performance on Intel discrete GPUs with the addition of Multi-Head Attention (MHA) and OneDNN enhancements.
More portability and performance to run AI at the edge, in the cloud, or locally.
- Improved CPU performance when serving LLMs with the inclusion of vLLM and continuous batching in the OpenVINO Model Server (OVMS). vLLM is an easy-to-use open-source library that supports efficient LLM inferencing and model serving.
- Ubuntu 24.04 long-term support (LTS), 64-bit (Kernel 6.8+) (preview support)

Support Change and Deprecation Notices

Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
Discontinued in 2024.0:
- Runtime components:
  - Intel® Gaussian & Neural Accelerator (Intel® GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
  - OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
  - All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
  - 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
  - Deployment Manager. See installation and deployment guides for current distribution options.
  - Accuracy Checker.
  - Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
  - A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
  - Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
Deprecated and to be removed in the future:
- The OpenVINO™ Development Tools package (pip install openvino-dev) will be removed from installation options and distribution channels beginning with OpenVINO 2025.0.
- Model Optimizer will be discontinued with OpenVINO 2025.0. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
- OpenVINO property Affinity API will be discontinued with OpenVINO 2025.0. It will be replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
- OpenVINO Model Server components:
  - “auto shape” and “auto batch size” (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.
- A number of notebooks have been deprecated. For an up-to-date listing of available notebooks, refer to the OpenVINO™ Notebook index (openvinotoolkit.github.io).

You can find OpenVINO™ toolkit 2024.3 release here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c conda-forge openvino=2024.3.0
OpenVINO™ for Python: pip install openvino==2024.3.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@rghvsh
@PRATHAM-SPS
@duydl
@awayzjj
@jvr0123
@inbasperu
@DannyVlasenko
@amkarn258
@kcin96
@Vladislav-Denisov

Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://docs.openvino.ai/2024/about-openvino/release-notes-openvino.html

Contributors

DannyVlasenko, kcin96, and 8 other contributors

Assets 2

17 Jun 17:21

artanokhov

2024.2.0

5c0f38f

2024.2.0

Summary of major features and improvements  

More Gen AI coverage and framework integrations to minimize code changes
- Llama 3 optimizations for CPUs, built-in GPUs, and discrete GPUs for improved performance and efficient memory usage.
- Support for Phi-3-mini, a family of AI models that leverages the power of small language models for faster, more accurate and cost-effective text processing.
- Python Custom Operation is now enabled in OpenVINO making it easier for Python developers to code their custom operations instead of using C++ custom operations (also supported). Python Custom Operation empowers users to implement their own specialized operations into any model.
- Notebooks expansion to ensure better coverage for new models. Noteworthy notebooks added: DynamiCrafter, YOLOv10, Chatbot notebook with Phi-3, and QWEN2.
Broader Large Language Model (LLM) support and more model compression techniques.
- GPTQ method for 4-bit weight compression added to NNCF for more efficient inference and improved performance of compressed LLMs.
- Significant LLM performance improvements and reduced latency for both built-in GPUs and discrete GPUs.
- Significant improvement in 2nd token latency and memory footprint of FP16 weight LLMs on AVX2 (13th Gen Intel® Core™ processors) and AVX512 (3rd Gen Intel® Xeon® Scalable Processors) based CPU platforms, particularly for small batch sizes.
More portability and performance to run AI at the edge, in the cloud, or locally.
- Model Serving Enhancements:
  - Preview: OpenVINO Model Server (OVMS) now supports OpenAI-compatible API along with Continuous Batching and PagedAttention, enabling significantly higher throughput for parallel inferencing, especially on Intel® Xeon® processors, when serving LLMs to many concurrent users.
  - OpenVINO backend for Triton Server now supports built-in GPUs and discrete GPUs, in addition to dynamic shapes support.
  - Integration of TorchServe through torch.compile OpenVINO backend for easy model deployment, provisioning to multiple instances, model versioning, and maintenance.
- Preview: addition of the Generate API, a simplified API for text generation using large language models with only a few lines of code. The API is available through the newly launched OpenVINO GenAI package.
- Support for Intel Atom® Processor X Series. For more details, see System Requirements.
- Preview: Support for Intel® Xeon® 6 processor.

Support Change and Deprecation Notices

Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using discontinued features, you will have to revert to the last LTS OpenVINO version supporting them. For more details, refer to the OpenVINO Legacy Features and Components page.
Discontinued in 2024.0:
- Runtime components:
  - Intel® Gaussian & Neural Accelerator (Intel® GNA)..Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
  - OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
  - All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
  - 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
  - Deployment Manager. See installation and deployment guides for current distribution options.
  - Accuracy Checker.
  - Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
  - A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
  - Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
Deprecated and to be removed in the future:
- The OpenVINO™ Development Tools package (pip install openvino-dev) will be removed from installation options and distribution channels beginning with OpenVINO 2025.0.
- Model Optimizer will be discontinued with OpenVINO 2025.0. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
- OpenVINO property Affinity API will be discontinued with OpenVINO 2025.0. It will be replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
- OpenVINO Model Server components:
  - “auto shape” and “auto batch size” (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.
- A number of notebooks have been deprecated. For an up-to-date listing of available notebooks, refer to the OpenVINO™ Notebook index (openvinotoolkit.github.io).

You can find OpenVINO™ toolkit 2024.2 release here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c conda-forge openvino=2024.2.0
OpenVINO™ for Python: pip install openvino==2024.2.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@siddhant-0707
@adismort14
@LucaTamSapienza
@hongbo-wei
@awayzjj
@qxprakash
@keyonjie
@Huanli-Gong
@hegdeadithyak
@inbasperu
@Thodoris1999
@hongbo-wei
@himanshugupta11002
@tranchung163
@SANJITH-KUMAR-20
@anzr299
@Vladislav-Denisov

Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino/2024-2.html

Contributors

keyonjie, siddhant-0707, and 14 other contributors

Assets 2

06 May 15:14

artanokhov

2022.3.2

e2c7e4d

2022.3.2

Major Features and Improvements Summary

This is a Long-Term Support (LTS) release. LTS versions are released every year and supported for two years (one year for bug fixes, and two years for security patches). Read Intel® Distribution of OpenVINO™ toolkit Long-Term Support (LTS) Policy  v.2 for more details.

This 2022.3.2 LTS release provides functional and security bug fixes for the previous 2022.3.1 Long-Term Support (LTS) release, enabling developers to deploy applications powered by Intel® Distribution of OpenVINO™ toolkit more efficiently.
Intel® Movidius™ VPU-based products are supported in this release.

You can find OpenVINO™ toolkit 2022.3.2 release here:

Download archives* with OpenVINO™ Runtime for C/C++
OpenVINO™ Runtime for Python: pip install openvino==2022.3.2
OpenVINO™ Development tools: pip install openvino-dev==2022.3.2

Release documentation is available here: https://docs.openvino.ai/2022.3/

Release Notes are available here: https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino-lts/2022-3.html

Assets 2

25 Apr 14:45

artanokhov

2024.1.0

f4afc98

2024.1.0

Summary of major features and improvements  

More Generative AI coverage and framework integrations to minimize code changes.
- Mixtral and URLNet models optimized for performance improvements on Intel® Xeon® processors.
- Stable Diffusion 1.5, ChatGLM3-6B, and Qwen-7B models optimized for improved inference speed on Intel® Core™ Ultra processors with integrated GPU.
- Support for Falcon-7B-Instruct, a GenAI Large Language Model (LLM) ready-to-use chat/instruct model with superior performance metrics.
- New Jupyter Notebooks added: YOLO V9, YOLO V8 Oriented Bounding Boxes Detection (OOB), Stable Diffusion in Keras, MobileCLIP, RMBG-v1.4 Background Removal, Magika, TripoSR, AnimateAnyone, LLaVA-Next, and RAG system with OpenVINO and LangChain.
Broader Large Language Model (LLM) support and more model compression techniques.
- LLM compilation time reduced through additional optimizations with compressed embedding. Improved 1st token performance of LLMs on 4th and 5th generations of Intel® Xeon® processors with Intel® Advanced Matrix Extensions (Intel® AMX).
- Better LLM compression and improved performance with oneDNN, INT4, and INT8 support for Intel® Arc™ GPUs.
- Significant memory reduction for select smaller GenAI models on Intel® Core™ Ultra processors with integrated GPU.
More portability and performance to run AI at the edge, in the cloud, or locally.
- The preview NPU plugin for Intel® Core™ Ultra processors is now available in the OpenVINO open-source GitHub repository, in addition to the main OpenVINO package on PyPI.
- The JavaScript API is now more easily accessible through the npm repository, enabling JavaScript developers’ seamless access to the OpenVINO API.
- FP16 inference on ARM processors now enabled for the Convolutional Neural Network (CNN) by default.

Support Change and Deprecation Notices

Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using Discontinued features, you will have to revert to the last LTS OpenVINO version supporting them.
For more details, refer to the OpenVINO Legacy Features and Components page.
Discontinued in 2024.0:
- Runtime components:
  - Intel® Gaussian & Neural Accelerator (Intel® GNA). Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
  - OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
  - All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
  - 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
  - Deployment Manager. See installation and deployment guides for current distribution options.
  - Accuracy Checker.
  - Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
  - A Git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
  - Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
Deprecated and to be removed in the future:
- The OpenVINO™ Development Tools package (pip install openvino-dev) will be removed from installation options and distribution channels beginning with OpenVINO 2025.0.
- Model Optimizer will be discontinued with OpenVINO 2025.0. Consider using the new conversion methods instead. For more details, see the model conversion transition guide.
- OpenVINO property Affinity API will be discontinued with OpenVINO 2025.0. It will be replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
- OpenVINO Model Server components:
  - “auto shape” and “auto batch size” (reshaping a model in runtime) will be removed in the future. OpenVINO’s dynamic shape models are recommended instead.

You can find OpenVINO™ toolkit 2024.1 release here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c conda-forge openvino=2024.1.0
OpenVINO™ for Python: pip install openvino==2024.1.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@LucaTamSapienza
@AsakusaRinne
@awayzjj
@MonalSD
@siddhant-0707
@qxprakash
@FredBill1
@Pranshu-S
@vshampor
@PRATHAM-SPS
@inbasperu
@linzs148
@chux0519
@ccinv
@Vishwa44
@rghvsh
@Aryan8912
@BHbean
@Vladislav-Denisov
@MeeCreeps
@YaritaiKoto
@Godwin-T
@mory91
@Bepitic
@akiseakusa
@kuanxian1
@himanshugupta11002
@mengbingrock

Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino/2024-1.html

Contributors

Bepitic, mory91, and 26 other contributors

Assets 2

06 Mar 14:21

artanokhov

2024.0.0

34caeef

2024.0.0

Summary of major features and improvements  

More Generative AI coverage and framework integrations to minimize code changes.
- Improved out-of-the-box experience for TensorFlow* sentence encoding models through the installation of OpenVINO™ toolkit Tokenizers.
- OpenVINO™ toolkit now supports Mixture of Experts (MoE), a new architecture that helps process more efficient generative models through the pipeline.
- JavaScript developers now have seamless access to OpenVINO API. This new binding enables a smooth integration with JavaScript API.
- New and noteworthy models validated: Mistral, StableLM-tuned-alpha-3b, and StableLM-Epoch-3B.
Broader Large Language Model (LLM) support and more model compression techniques.
- Improved quality on INT4 weight compression for LLMs by adding the popular technique, Activation-aware Weight Quantization, to the Neural Network Compression Framework (NNCF). This addition reduces memory requirements and helps speed up token generation.
- Experience enhanced LLM performance on Intel® CPUs, with internal memory state enhancement, and INT8 precision for KV-cache. Specifically tailored for multi-query LLMs like ChatGLM.
- Easier optimization and conversion of Hugging Face models – compress LLM models to INT8 and INT4 with Hugging Face Optimum command line interface and export models to OpenVINO format. Note this is part of Optimum-Intel which needs to be installed separately.
- The OpenVINO™ 2024.0 release makes it easier for developers, by integrating more OpenVINO™ features with the Hugging Face* ecosystem. Store quantization configurations for popular models directly in Hugging Face to compress models into INT4 format while preserving accuracy and performance.
More portability and performance to run AI at the edge, in the cloud, or locally.
- A preview plugin architecture of the integrated Neural Processor Unit (NPU) as part of Intel® Core™ Ultra processor is now included in the main OpenVINO™ package on PyPI.
- Improved performance on ARM* by enabling the ARM threading library. In addition, we now support multi-core ARM platforms and enabled FP16 precision by default on MacOS*.
- Improved performance on ARM platforms using throughput hint, which increases efficiency in utilization of CPU cores and memory bandwidth.
- New and improved LLM serving samples from OpenVINO™ Model Server for multi-batch inputs and Retrieval Augmented Generation (RAG).

Support Change and Deprecation Notices

Using deprecated features and components is not advised. They are available to enable a smooth transition to new solutions and will be discontinued in the future. To keep using Discontinued features, you will have to revert to the last LTS OpenVINO version supporting them.
For more details, refer to the OpenVINO Legacy Features and Components page.
Discontinued in 2024.0:
- Runtime components:
  - Intel® Gaussian & Neural Accelerator (Intel® GNA). Consider using the Neural Processing Unit (NPU) for low-powered systems like Intel® Core™ Ultra or 14th generation and beyond.
  - OpenVINO C++/C/Python 1.0 APIs (see 2023.3 API transition guide for reference).
  - All ONNX Frontend legacy API (known as ONNX_IMPORTER_API)
  - 'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API
- Tools:
  - Deployment Manager. See installation and deployment guides for current distribution options.
  - Accuracy Checker.
  - Post-Training Optimization Tool (POT). Neural Network Compression Framework (NNCF) should be used instead.
  - a git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
  - Support for Apache MXNet, Caffe, and Kaldi model formats. Conversion to ONNX may be used as a solution.
Deprecated and to be removed in the future:
- The OpenVINO™ Development Tools package (pip install openvino-dev) will be removed from installation options and distribution channels beginning with OpenVINO 2025.0.
- Model Optimizer will be discontinued with OpenVINO 2025.0. Consider using OpenVINO Model Converter (API call: OVC) instead. Follow the model conversion transition guide for more details.
- OpenVINO property Affinity API will be discontinued with OpenVINO 2025.0. It will be replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).
- OpenVINO Model Server components:
  - Reshaping a model in runtime based on the incoming requests (auto shape and auto batch size) is deprecated and will be removed in the future. Using OpenVINO’s dynamic shape models is recommended instead.

You can find OpenVINO™ toolkit 2024.0 release here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c conda-forge openvino=2024.0.0
OpenVINO™ for Python: pip install openvino==2024.0.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@rghvsh
@YaritaiKoto
@Abdulrahman-Adel
@jvr0123
@sami0i
@guy-tamir
@rupeshs
@karanjakhar
@abhinav231-valisetti
@rajatkrishna
@lukazlim
@siddhant-0707
@tiger100256-hu

Release documentation is available here: https://docs.openvino.ai/2024
Release Notes are available here: https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino/2024-0.html

Contributors

rupeshs, siddhant-0707, and 11 other contributors

Assets 2

24 Jan 13:10

artanokhov

2023.3.0

ceeafaf

2023.3.0

Summary of major features and improvements  

More Generative AI coverage and framework integrations to minimize code changes.
- Introducing OpenVINO Gen AI repository on GitHub that demonstrates native C and C++ pipeline samples for Large Language Models (LLMs). String tensors are now supported as inputs and tokenizers natively to reduce overhead and ease production.
- New and noteworthy models validated; Mistral, Zephyr, Qwen, ChatGLM3, and Baichuan.
- New Jupyter Notebooks for Latent Consistency Models (LCM) and Distil-Whisper. Updated LLM Chatbot notebook to include LangChain, Neural Chat, TinyLlama, ChatGLM3, Qwen, Notus, and Youri models.
- Torch.compile is now fully integrated with OpenVINO, which now includes a hardware 'options' parameter allowing for seamless inference hardware selection by leveraging the plugin architecture in OpenVINO.
Broader Large Language Model (LLM) support and more model compression techniques.
- As part of the Neural Network Compression Framework (NNCF), INT4 weight compression model formats are now fully supported on Intel® Xeon® CPUs in addition to Intel® Core™ and iGPU, adding more performance, lower memory usage, and accuracy opportunity when using LLMs.
- Improved performance of transformer-based LLM on CPU and GPU using stateful model technique to increase memory efficiency where internal states are shared among multiple iterations of inference.
- Easier optimization and conversion of Hugging Face models – compress LLM models to INT8 and INT4 with Hugging Face Optimum command line interface and export models to OpenVINO format. Note this is part of Optimum-Intel which needs to be installed separately.
- Tokenizer and TorchVision transform support is now available in the OpenVINO runtime (via new API) requiring less preprocessing code and enhancing performance by automatically handling this model setup. More details on Tokenizers support in the Ecosystem section.
More portability and performance to run AI at the edge, in the cloud, or locally.
- Full support for 5th Gen Intel® Xeon® Scalable processors (codename Emerald Rapids)
- Further optimized performance on Intel® Core™ Ultra (codename Meteor Lake) CPU with latency hint, by leveraging both P-core and E-cores.
- Improved performance on ARM platforms using throughput hint, which increases efficiency in utilization of CPU cores and memory bandwidth.
- Preview JavaScript API to enable node JS development to access JavaScript binding via source code. See details below.
- Improved model serving of LLMs through OpenVINO Model Server. This not only enables LLM serving over KServe v2 gRPC and REST APIs for more flexibility but also improves throughput by running processing like tokenization on the server side. More details in the Ecosystem section.

Support Change and Deprecation Notices

The OpenVINO™ Development Tools package (pip install openvino-dev) is deprecated and will be removed from installation options and distribution channels beginning with the 2025.0 release. For more details, refer to the OpenVINO Legacy Features and Components page.
Ubuntu 18.04 support is discontinued in the 2023.3 LTS release. The recommended version of Ubuntu is 22.04.
Starting with 2023.3 OpenVINO longer supports Python 3.7 due to the Python community discontinuing support. Update to a newer version (currently 3.8-3.11) to avoid interruptions.
All ONNX Frontend legacy API (known as ONNX_IMPORTER_API) will no longer be available in the 2024.0 release.
'PerfomanceMode.UNDEFINED' property as part of the OpenVINO Python API will be discontinued in the 2024.0 release.
Tools:
- Deployment Manager is deprecated and will be supported for two years according to the LTS policy. Visit the selector tool to see package distribution options or the deployment guide documentation.
- Accuracy Checker is deprecated and will be discontinued with 2024.0.  
- Post-Training Optimization Tool (POT) has been deprecated and the 2023.3 LTS is the last release that supports the tool. Developers are encouraged to use the Neural Network Compression Framework (NNCF) for this feature.
- Model Optimizer is deprecated and will be fully supported until the 2025.0 release. We encourage developers to perform model conversion through OpenVINO Model Converter (API call: OVC). Follow the model conversion transition guide for more details.
- Deprecated support for a git patch for NNCF integration with huggingface/transformers. The recommended approach is to use huggingface/optimum-intel for applying NNCF optimization on top of models from Hugging Face.
- Support for Apache MXNet, Caffe, and Kaldi model formats is deprecated and will be discontinued with the 2024.0 release.
Runtime:
- Intel® Gaussian & Neural Accelerator (Intel® GNA) will be deprecated in a future release. We encourage developers to use the Neural Processing Unit (NPU) for low-powered systems like Intel® CoreTM Ultra or 14th generation and beyond.
- OpenVINO C++/C/Python 1.0 APIs are deprecated and will be discontinued in the 2024.0 release. Please use API 2.0 in your applications going forward to avoid disruption.
  OpenVINO property Affinity API will be deprecated from 2024.0 and will be discontinued in 2025.0. It will be replaced with CPU binding configurations (ov::hint::enable_cpu_pinning).

You can find OpenVINO™ toolkit 2023.3 release here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c conda-forge openvino=2023.3.0
OpenVINO™ for Python: pip install openvino==2023.3.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@rghvsh,
@YaritaiKoto,
@siddhant-0707,
@sydarb,
@kk271kg,
@ahmadchalhoub,
@ma7555,
@Bhaskar365

Release documentation is available here: https://docs.openvino.ai/2023.3
Release Notes are available here: https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino-lts/2023-3.html

Contributors

ma7555, siddhant-0707, and 6 other contributors

Assets 2

16 Nov 15:21

artanokhov

2023.2.0

cfd42bd

2023.2.0

Summary of major features and improvements  

More Generative AI coverage and framework integrations to minimize code changes.
- Expanded model support for direct PyTorch model conversion – automatically convert additional models directly from PyTorch or execute via torch.compile with OpenVINO as the backend.
- New and noteworthy models supported – we have enabled models used for chatbots, instruction following, code generation, and many more, including prominent models like LLaVA, chatGLM, Bark (text to audio), and LCM (Latent Consistency Models, an optimized version of Stable Diffusion).
- Easier optimization and conversion of Hugging Face models – compress LLM models to Int8 with the Hugging Face Optimum command line interface and export models to the OpenVINO IR format.
- OpenVINO is now available on Conan – a package manager which enables more seamless package management for large-scale projects for C and  C++ developers.
Broader Large Language Model (LLM) support and more model compression techniques.
- Accelerate inference for LLM models on Intel® Core™ CPU and iGPU with the use of Int8 model weight compression.
- Expanded model support for dynamic shapes for improved performance on GPU.
- Preview support for Int4 model format is now included. Int4 optimized model weights are now available to try on Intel® Core™ CPU and iGPU, to accelerate models like Llama 2 and chatGLM2.
- The following Int4 model compression formats are supported for inference in runtime:
  - Generative Pre-training Transformer Quantization (GPTQ); with GPTQ-compressed models, you can access them through the Hugging Face repositories.
  - Native Int4 compression through Neural Network Compression Framework (NNCF).
More portability and performance to run AI at the edge, in the cloud, or locally.
- In 2023.1 we announced full support for ARM architecture, now we have improved performance by enabling FP16 model formats for LLMs and integrating additional acceleration libraries to improve latency.

Support Change and Deprecation Notices

The OpenVINO™ Development Tools package (pip install openvino-dev) is deprecated and will be removed from installation options and distribution channels with 2025.0. To learn more, refer to the OpenVINO Legacy Features and Components page. To ensure optimal performance, install the OpenVINO package (pip install openvino), which includes essential components such as OpenVINO Runtime, OpenVINO Converter, and Benchmark Tool.
Tools: 
- Deployment Manager is deprecated and will be removed in the 2024.0 release.
- Accuracy Checker is deprecated and will be discontinued with 2024.0.   
- Post-Training Optimization Tool (POT)  is deprecated and will be discontinued with 2024.0. 
- Model Optimizer is deprecated and will be fully supported up until the 2025.0 release. Model conversion to the OpenVINO IR format should be performed through OpenVINO Model Converter which is part of the PyPI package. Follow the Model Optimizer to OpenVINO Model Converter transition guide for smoother transition. Known limitations are TensorFlow model with TF1 Control flow and object detection models. These limitations relate to the gap in TensorFlow direct conversion capabilities which will be addressed in upcoming releases.
- PyTorch 1.13 support is deprecated in Neural Network Compression Framework (NNCF).
Runtime: 
- Intel® Gaussian & Neural Accelerator (Intel® GNA) will be deprecated in a future release. We encourage developers to use the Neural Processing Unit (NPU) for low powered systems like Intel® Core™ Ultra or 14th generation and beyond.  
- OpenVINO C++/C/Python 1.0 APIs will be discontinued with 2024.0. 
- PyTorch 1.13 support is deprecated in Neural Network Compression Framework (NNCF).

You can find OpenVINO™ toolkit 2023.2 release here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c conda-forge openvino=2023.2.0
OpenVINO™ for Python: pip install openvino==2023.2.0

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@siddhant-0707,
@NsdHSO,
@mahimairaja,
@SANTHOSH-MAMIDISETTI,
@rsato10,
@PRATHAM-SPS

Release documentation is available here: https://docs.openvino.ai/2023.2
Release Notes are available here: https://www.intel.com/content/www/us/en/developer/articles/release-notes/openvino/2023-2.html

Contributors

siddhant-0707, NsdHSO, and 4 other contributors

Assets 2

27 Sep 12:40

artanokhov

2023.2.0.dev20230922

e7c1344

2023.2.0.dev20230922 Pre-release

Pre-release

NOTE: This version is pre-release software and has not undergone full release validation or qualification. No support is offered on pre-release software and APIs/behavior are subject to change. It should NOT be incorporated into any production software/solution and instead should be used only for early testing and integration while awaiting a final release version of this software.

OpenVINO™ toolkit pre-release definition:

It is introduced to get early feedback from the community.
The scope and functionality of the pre-release version is subject to change in the future.
Using the pre-release in production is strongly discouraged.

You can find OpenVINO™ toolkit 2023.2.0.dev20230922 pre-release version here:

Download archives* with OpenVINO™
Install it via Conda: conda install -c "conda-forge/label/openvino_dev" openvino=2023.2.0.dev20230922
OpenVINO™ for Python: pip install --pre openvino or pip install openvino==2023.2.0.dev20230922

Release notes are available here: https://docs.openvino.ai/nightly/prerelease_information.html
Release documentation is available here: https://docs.openvino.ai/nightly/

What's Changed

CPU runtime:
- Optimized Yolov8n and YoloV8s models on BF16/FP32.
- Optimized Falcon model on 4th Generation Intel® Xeon® Scalable Processors.
GPU runtime:
- int8 weight compression further improves LLM performance. PR #19548
- Optimization for gemm & fc in iGPU. PR #19780
TensorFlow FE:
- Added support for Selu operation. PR #19528
- Added support for XlaConvV2 operation. PR #19466
- Added support for TensorListLength and TensorListResize operations. PR #19390
PyTorch FE:
- New operations supported
  - aten::minimum aten::maximum. PR #19996
  - aten::broadcast_tensors. PR #19994
  - added support aten::logical_and, aten::logical_or, aten::logical_not, aten::logical_xor. PR #19981
  - aten::scatter_reduce and extend aten::scatter. PR #19980
  - prim::TupleIndex operation. PR #19978
  - mixed precision in aten::min/max. PR #19936
  - aten::tile op PR #19645
  - aten::one_hot PR #19779
  - PReLU. PR #19515
  - aten::swapaxes. PR #19483
  - non-boolean inputs for or and and operations. PR #19268
Torchvision NMS can accept negative scores. PR #19826

New openvino_notebooks:

Visual Question Answering and Image Captioning using BLIP

Fixed GitHub issues

Fixed #19784 “[Bug]: Cannot install libprotobuf-dev along with libopenvino-2023.0.2 on Ubuntu 22.04” with PR #19788
Fixed #19617 “Add a clear error message when creating an empty Constant” with PR #19674
Fixed #19616 “Align openvino.compile_model and openvino.Core.compile_model functions” with PR #19778
Fixed #19469 “[Feature Request]: Add SeLu activation in the OpenVino IR (TensorFlow Conversion)” with PR #19528
Fixed #19019 “[Bug]: Low performance of the TF quantized model.” With PR #19735
Fixed #19018 “[Feature Request]: Support aarch64 python wheel for Linux” with PR #19594
Fixed #18831 “Question: openvino support for Nvidia Jetson Xavier ?” with PR #19594
Fixed #18786 “OpenVINO Wheel does not install Debug libraries when CMAKE_BUILD_TYPE is Debug #18786” with PR #19197
Fixed #18731 “[Bug] Wrong output shapes of MaxPool” with PR #18965
Fixed #18091 “[Bug] 2023.0 Version crashes on Jetson Nano - L4T - Ubuntu 18.04” with PR #19717
Fixed #7194 “Conan for simplifying dependency management” with PR #17580

Acknowledgements

Thanks for contributions from the OpenVINO developer community:
@siddhant-0707,
@PRATHAM-SPS,
@okhovan

Full Changelog: 2023.1.0.dev20230811...2023.2.0.dev20230922

Contributors

siddhant-0707, okhovan, and PRATHAM-SPS

Assets 2

Releases: openvinotoolkit/openvino

2024.5.0

Summary of major features and improvements

More Gen AI coverage and framework integrations to minimize code changes

Broader Large Language Model (LLM) support and more model compression techniques.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

Contributors

2024.4.0

Summary of major features and improvements

More Gen AI coverage and framework integrations to minimize code changes

Broader Large Language Model (LLM) support and more model compression techniques.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

Contributors

2024.3.0

Summary of major features and improvements

More Gen AI coverage and framework integrations to minimize code changes

Broader Large Language Model (LLM) support and more model compression techniques.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

Contributors

2024.2.0

Summary of major features and improvements

More Gen AI coverage and framework integrations to minimize code changes

Broader Large Language Model (LLM) support and more model compression techniques.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

Contributors

2022.3.2

Major Features and Improvements Summary

2024.1.0

Summary of major features and improvements

More Generative AI coverage and framework integrations to minimize code changes.

Broader Large Language Model (LLM) support and more model compression techniques.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

Contributors

2024.0.0

Summary of major features and improvements

More Generative AI coverage and framework integrations to minimize code changes.

Broader Large Language Model (LLM) support and more model compression techniques.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

Contributors

2023.3.0

Summary of major features and improvements

More Generative AI coverage and framework integrations to minimize code changes.

Broader Large Language Model (LLM) support and more model compression techniques.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

Contributors

2023.2.0

Summary of major features and improvements

More Generative AI coverage and framework integrations to minimize code changes.

Broader Large Language Model (LLM) support and more model compression techniques.

More portability and performance to run AI at the edge, in the cloud, or locally.

Support Change and Deprecation Notices

Contributors

2023.2.0.dev20230922

Contributors

Summary of major features and improvements  

Summary of major features and improvements  

Summary of major features and improvements  

Summary of major features and improvements  

Summary of major features and improvements  

Summary of major features and improvements  

Summary of major features and improvements  

Summary of major features and improvements