-
Notifications
You must be signed in to change notification settings - Fork 6
VII. GPU configuration
Graphics Processing Units (GPUs) play a crucial role in enhancing the speed and efficiency of neural network calculations. They excel due to their ability to simultaneously manage multiple calculations, making them especially well-matched for tasks involving extensive matrix operations, which are fundamental to neural network processes. In order to harness the benefits GPUs offer to Deep Learning, it's important for JDLL to support GPU utilization. While JDLL functions on a CPU without any configuration, using it with a GPU requires additional setup. The specific steps depend on the deep learning framework you intend to use. For instance, to utilize Pytorch 2.0.0 with a GPU, follow the provided instructions for Pytorch 2.0.0 GPU integration. It's worth noting that these instructions vary across different frameworks and versions.
This page provides information about how to enable GPU computation with JDLL for the different DL frameworks.
The Java libraries used by DeepImageJ can connect with an installed GPU using CUDA. Therefore, the connection can be made as long as the required version of CUDA and CUDnn drivers is installed.
The needed version of CUDA depends on the DL framework wanted and its version. NOTE THAT MACOS SYSTEMS DO NOT SUPPORT GPU COMPUTATION. HOWEVER, NEWER M1-M2 MACOS COMPUTERS PROVIDE HIGH SPEED AND EFFICIENT COMPUTATION, CLOSE TO CONSUMER GPUS, THANKS TO THEIR NEW CHIP ARCHITECTURE (ARM64).
Deep Learning library | CUDA versions |
---|---|
TensorFlow 2.10.1 | 11.2 |
TensorFlow 2.7.1 | 11.2 |
TensorFlow 2.7.0 | 11.2 |
TensorFlow 2.4.1 | 11.0 |
TensorFlow 2.3.1 | 10.1 |
TensorFlow 1.15 | 10.0 |
TensorFlow 1.14 | 10.0 |
TensorFlow 1.13.1 | 10.0 |
TensorFlow 1.12 | 9.0 |
PyTorch 2.0.0 | 11.8 |
PyTorch 1.13.1 | 11.7 |
PyTorch 1.13.0 | 11.7 |
PyTorch 1.12.1 | 11.6 |
PyTorch 1.11.0 | 11.3 |
PyTorch 1.10.0 | 10.2 / 11.3 |
PyTorch 1.9.1 | 10.2 / 11.1 |
PyTorch 1.9.0 | 10.2 / 11.1 |
PyTorch 1.8.1 | 10.2 / 11.1 |
PyTorch 1.7.1 | 10.1 / 10.2 / 11.0 |
Onnx opset 18 (1.13.1) | 11.6 |
Onnx opset 17 (1.12.1) | 11.4 |
Onnx opset 16 (1.11.0) | 11.4 |
Onnx opset 15 (1.10.0) | 11.4 |
Onnx opset 14 (1.9.0) | 11.4 |
Onnx opset 13 (1.8.1) | 11.0.3 |
For more information about the compatibility between CUDA and each DL framework:
- Onnx: https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html
- Tensorflow: https://www.tensorflow.org/install/source?hl=es-419#gpu
- Pytorch: https://pytorch.org/get-started/locally/
After selecting the CUDA version, a compatible CuDNN with CUDA has to be installed too. Further information about CuDNN version compatibility according to the NVIDIA hardware and CUDA versions: https://docs.nvidia.com/deeplearning/cudnn/support-matrix/index.html#abstract
Not all the JDLL engines support GPU. Only the engines that recisely specify that they support GPU will be able to load a model and make inference using it.
The engines that support GPU explicitly indicate gpu: true
in the engines json file. When installing locally, if an engine supports GPU acceleration, the folder name housing it must include the GPU tag.. In the example below it can be observed how all the installed Pytorch engines support GPU, but none of the Tensorflow ones do:
Installed Pytorch engines support GPU (it says it on their name), whereas the installed Tensorflow engines do not (the name only specifies GPU).
- For Windows systems, the GPUs require a minimum compute capability of 3.5.
- In Linux systems, the Tensorflow Java library requires a GPU with compute capability of 6.0, which is relatively big and a known issue (https://github.com/tensorflow/tensorflow/issues/36228).
You can check your GPU compute capability here
We provide some guidelines to install a specific version of CUDA in your Windows OS. You can also follow the official documentation.
-
Your computer has to be CUDA capable. Check that your machine fulfills all the requirements to run CUDA with GPU:
-
(Optional) If you can, delete all previous installations of CUDA. Even though CUDA allows having multiple versions of CUDA at the same time, it is advisable to have only one version of CUDA installed to avoid conflicts. As mentioned previously, it is possible and easy to have several CUDA versions installed in the same machine.
-
Go to the Cuda Toolkit Archive (https://developer.nvidia.com/cuda-toolkit-archive ) and select the required CUDA distribution, e.g., CUDA 10.0
- Select the options that fit your machine. We selected the installer shown in the figure below. You can also choose between a network installer or a local installer.
- Download and execute the installer. Follow the instructions and install all the recommended settings.
- Once the installation is finished, check the environment variables. There should be a new environment variable named ‘CUDA_PATH’ that points to the folder where CUDA was installed, usually
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0
. In the environment variablepath
there should have been added two new directories, one pointing towardsC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin
and the other one towardsC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\libnvvp
.
To have CUDA accelerated Deep Neural Networks, the installation of another package, CUDnn, is required:
- Go to https://developer.nvidia.com/rdp/cudnn-download or https://developer.nvidia.com/rdp/cudnn-archive (you will have to create an account if you do not already have one).
- Download the corresponding CUDnn version for Windows depending on the CUDA version that you installed following the table above; e.g., for CUDA version 10.0 you would need CUDnn 7.4.1.
- Unzip the downloaded folder. Inside you will find three folders:
lib
,bin
, andinclude
. Move the contents fromlib
to the folder C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\lib, include to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\include andbin
toC:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0\bin
.
If you want to install another CUDA version, you just need to follow the same steps. To move from one CUDA version to another one, you only need to change the environment variables. Set the environment variable CUDA_PATH
to the folder containing the desired CUDA version and change the directories of the PATH
to the corresponding CUDA version.
Note that after installing CUDA version X.y
, the environment variable CUDA_PATHXy
will be created. It will always point to the directory containing CUDA X.y
. You do not need to change it.
We provide some guidelines to install a specific version of CUDA in your Unix/Linux OS. You can also follow the official documentation.
-
Your computer has to be CUDA capable. Check that your machine fulfills all the requirements to run CUDA with GPU:
-
Make sure
gcc
is installed in your machine. Check it on the terminal with :gcc --version
-
If an error message appears, install
gcc
on your system. On Ubuntu 20.4:sudo apt update sudo apt install build-essential
-
(Optional) If you can, delete all previous installations of CUDA. Even though CUDA allows having multiple versions of CUDA at the same time, it is advisable to have only one version of CUDA installed to avoid conflicts. As mentioned previously, it is possible and easy to have several CUDA versions installed in the same machine.
-
Go to the Cuda Toolkit Archive and select the required CUDA distribution, e.g., CUDA 10.0.
-
Select the options that fit your machine. We selected the installer shown in the figure below. You can also choose between several installers.
-
Follow the installation instructions provided by Nvidia. In step 2 of the example shown below, in order to know which string you should substitute
<version>
with the directory that appears at/var/
and look for the directory. I our case (shown in the image) we have two CUDA versions installed.- If we were installing version 10.0 the command would be:
sudo apt-key add /var/cuda-repo-10.0.130-410.45/7fa2af80.pub
- And If we were installing CUDA 9.0
sudo apt-key add /var/cuda-repo-9.0-local/7fa2af80.pub
- If we were installing version 10.0 the command would be:
- Installation with the
runfile
is similar to the one on Windows. Just accept all the recommended settings.
Once the installation is finished, the CUDA-X.y/bin
directory must be added to the PATH
, where X.y
is the corresponding CUDA version.
To add the variable:
export PATH=/usr/local/cuda-11.2/bin${PATH:+:${PATH}}
We also need to add another directory to the LD_LIBRARY_PATH
:
- For 64-bit operating systems:
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64\ {LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
- For 32-bit operating systems:
export LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib\ {LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Note that these variables will only stay during the session. Once the computer is restarted, it will vanish. To add them permanently follow the instructions here.
To have CUDA accelerated Deep Neural Networks, the installation of another package, CUDnn, is required:
- Go to https://developer.nvidia.com/rdp/cudnn-download or https://developer.nvidia.com/rdp/cudnn-archive (you will have to create an account if you do not already have one) and download the corresponding CUDnn version for Linux depending on the CUDA version that you installed following the table above. For example, for CUDA version 10.0 you would need CUDnn 7.4.1.
- Unzip the downloaded folder.
- Copy the contents into its corresponding folder inside the CUDA directory. Assuming that CUDA was located in the default directory
/usr/local/cuda-10.0
, the commands to execute are:sudo cp -P Downloads/cuda/include/cudnn.h /usr/local/cuda-10.0/include/ sudo cp -P Downloads/cuda/lib/libcudnn* /usr/local/cuda-10.0/lib64/ sudo chmod a+r /usr/local/cuda-10.0/lib64/libcudnn*
- Run the following command to help DeepImageJ finding the newly installed CUDA version:
sudo updated
With these steps, you can install as many CUDA versions as needed.
Having incompatible CUDA versions installed might be a source of conflict. In Windows, if a non-compatible version is installed, the plugin will fail to load the model. This is a known bug. For example, if Pytorch 1.13.1 is being used and CUDA 6.0 is installed, JDLL will not be able to load a model. If there is a CUDA version installed, JDLL is not able to fall back to CPU mode if that CUDA version does not work with the Pytorch version.
If you experience this error:
- Remove
CUDA_PATH
(if exists) from your system environment variables. - Make sure that your PATH does not include any directory or file with the words
Nvidia
orCUDA
.- Go to
Edit the system environment variables
orEdit environment variables for your account
. - Click on
Environment variables
. - Check the
Path
andCUDA_PATH
variables (note that Windows is not case sensitive so they might be written asPATH
orpath
).
- Go to
Unfortunately, JDLL does not provide any tool to confirm whether a model has been loaded on the GPU or if the inference is being performed with it.
The best way to know whether the GU is being used is to use the nvidia-smi
command on the terminal. For more information, click here.
It is important to use this command because sometimes, it is tricky to know whethet the GPU is being used or not.
Furthermore, loading a model on the GPU does not mean that the model inference is going to be executed on the GPU. There are at least two cases for this:
- If the CUDnn libraries have not been correctly installed or there is a missing environment variable, the model will be loaded and executed in the GPU but its execution will not be enhanced.
- If the compute capability is lower than required, the process will be shown in the
nvidia-smi
command as loaded but it will fall back to CPU. The latter can be checked by runningnvidia-smi
during model inference and checking that the GPU is using memory.
Nevertheless, the most accurate information will be always given by the nvidia-smi
command.
To avoid misunderstandings, it is always advisable to execute the command at least once each time CUDA or Tensorflow versions are upgraded.