-
Notifications
You must be signed in to change notification settings - Fork 683
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
📋 [TASK] Can I use GPU for training? I looked at the code and it doesn't seem to have this interface #2401
Comments
Hi. The Engine is based on Lightning trainer, so you can pass any Trainer argument to the Engine. Keep in mind that currently multi-GPU training is not supported, but you can use a single GPU like you would using Trainer. |
Then you can't pass any Trainer argument to the Engine... devices=[0,1,2,3,4,5,6,7] I've honestly found anomalib very difficult to use. |
@vmiller987 can you check you installed the torch with cuda option. GPU in Anomalib training should be automatically picked up if you have the correct torch. With that being said, please note that multi-GPU is currently not supported, but we are working on it to enable it in v2. I would love to get your feedback regarding which parts of anomalib you find it difficult to work with |
@samet-akcay I have the correct torch. I can get it to run on one GPU. I have to use an environment variable in order to assign Anomalib to a specific GPU. I can't pass the Engine I am a novice when it comes to unsupervised learning. I am trying to learn as I mainly have experience with supervised learning. Anomalib doesn't have a good place that explains it's models and how they should be used. The notebooks mainly revolve around Padim it seems. I am looking through the core papers/repo's for the other models to try and understand them.
|
I'm going to retract this part. I was doing something very silly and fixed it. I'm able to get quite a few of them to run including Ganomaly. |
This is a known issue, I've created a PR for this, which has not been merged yet. We are also working on a better solution, where you will be able to choose the device ID or train multi GPU |
1、 Because I wanted to use GPU, I changed it to the following code snippet: datamodule.setup() 2、 Then reinstalled CUDA and TORCH that support GPU training: 3、 Finally appeared 4、 I would like to ask, what would the code look like if I use GPU training correctly? And then how much is required for the torch version? |
Hello! Thank you for your reply! 1、 Because I wanted to use GPU, I changed it to the following code snippet: datamodule.setup() 2、 Then reinstalled CUDA and TORCH that support GPU training: 3、 Finally appeared 4、 I would like to ask, what would the code look like if I use GPU training correctly? And then how much is required for the torch version? |
Hello, thank you for your reply! I found that I don't know how to use the GPU, and the downloaded torch version seems to have problems as well. But if I download the default torch version, it only supports CPU training, so I downloaded other torch versions that support CUDA, but in the end, it still doesn't work. How did you solve this problem? |
How do you install torch? Regarding the torch version, anomalib requires the following torch requirement. Your torch version could also be one of the issues: Line 52 in 6ed0067
|
You don't need to specify the GPU as accelerator, as it is automatically in For example here is the setup I tried. Available GPU❯ nvidia-smi
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:17:00.0 Off | N/A |
| 31% 38C P8 24W / 350W | 3062MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
| 1 NVIDIA GeForce RTX 3090 Off | 00000000:65:00.0 Off | N/A |
| 30% 41C P8 18W / 350W | 283MiB / 24576MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+ Code# Import the required modules
from anomalib.data import MVTec
from anomalib.engine import Engine
from anomalib.models import Patchcore
# Initialize the datamodule, model and engine
datamodule = MVTec()
model = Patchcore()
engine = Engine()
# Train the model
engine.fit(datamodule=datamodule, model=model) OutputFutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
@torch.cuda.amp.custom_fwd(cast_inputs=torch.float32)
>>> # Look at here to see if you have GPU installed, and are using it
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
<<<
You are using a CUDA device ('NVIDIA GeForce RTX 3090') that has Tensor Cores. To properly utilize them, you should set `torch.set_float32_matmul_precision('medium' | 'high')` which will trade-off precision for performance. For more details, read https://pytorch.org/docs/stable/generated/torch.set_float32_matmul_precision.html#torch.set_float32_matmul_precision
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0,1]
/home/sakcay/.pyenv/versions/3.11.8/envs/anomalib/lib/python3.11/site-packages/lightning/pytorch/core/optimizer.py:181: `LightningModule.configure_optimizers` returned `None`, this fit will run with no optimizer
| Name | Type | Params
------------------------------------------------------------
0 | pre_processor | PreProcessor | 0
1 | post_processor | OneClassPostProcessor | 0
2 | model | PatchcoreModel | 24.9 M
3 | image_metrics | AnomalibMetricCollection | 0
4 | pixel_metrics | AnomalibMetricCollection | 0
------------------------------------------------------------
24.9 M Trainable params
0 Non-trainable params
24.9 M Total params
99.450 Total estimated model params size (MB)
Epoch 0: 0%| | 0/7 [00:00<?, ?it/s]/home/sakcay/.pyenv/versions/3.11.8/envs/anomalib/lib/python3.11/site-packages/lightning/pytorch/loops/optimization/automatic.py:132: `training_step` returned `None`. If this was on purpose, ignore this warning...
Epoch 0: 100%|██████████████████████████████████████████████| 7/7 [00:01<00:00, 4.84it/s^Selecting Coreset Indices.: 16%|███ | 2685/16385 [00:03<00:17, 795.60it/s] |
I'm moving this to the Q&A as I don't think this is a bug on Anomalib, but an installation issue on your end. Feel free to ask your questions there. Thanks |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Describe the task
Can I use GPU for training? I looked at the code and it doesn't seem to have this interface
Acceptance Criteria
Priority
High
Related Epic
No response
Estimated Time
No response
Current Status
Not Started
Additional Information
No response
The text was updated successfully, but these errors were encountered: