AMASSS: Possible fix for CUDA memory issue (torch.cuda.OutOfMemoryError: CUDA out of memory) #66

tschreiner · 2024-01-09T23:49:45Z

Hi there,

as other people in this thread I am dealing with the CUDA running out of GPU memory (already reported in issue #17).

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.04 GiB (GPU 0; 8.00 GiB total capacity; 5.17 GiB already allocated; 673.43 MiB free; 5.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The concrete exception happens when the function sliding_window_inference is called in AMASSS_CLI.py Line 976.

In a Project MONAI issue (Project-MONAI/MONAI#1189) I found, that it is possible to use GPU computation but CPU memory by replacing

SlicerAutomatedDentalTools/AMASSS_CLI/AMASSS_CLI.py

Line 976 in 8602659

    
           val_outputs = sliding_window_inference(input_img, cropSize, args["nbr_GPU_worker"], net,overlap=args["precision"])

by

val_outputs = sliding_window_inference(input_img, cropSize, args["nbr_GPU_worker"], net,overlap=args["precision"], sw_device="cuda", device="cpu")

in AMASS_CLI.py.

I have just tested it and now I am able to segment everything on a NVIDIA GeForce GTX 1060 6GB without any crashs due to no memory.

On a machine with Intel Core i7-6700 3.4 GHz CPU with 16GB memory and a NVIDIA GeForce GTX 1060 6GB a CBCT scan with a 0.4mm resolution took 780.25 seconds to segment (mandible, maxilla, cranial base and skin).

Can someone check and confirm the fix, please?

Thanks,
Tedd

cogitas3d · 2024-01-10T00:43:39Z

@tschreiner It works perfectly on Linux Ubuntu 20.04 with a 2070 nVidia. Thank you Tedd!

tschreiner · 2024-01-10T08:28:44Z

@cogitas3d Thank you for the confirmation!

tschreiner · 2024-01-10T08:36:42Z

It might not be the best idea to set the CPU as the default stitching device by default. There probably should be a input parameter be added to choose whether to use the CPU or the GPU for stitching. See the documentation of the sliding_window_inference function for explaination.

I suggest adding a parameter like "stitchingDevice" as command line argument in AMASSS_CLI.py and AMASSS_CLI.xml.

Any opinions from @allemangD or @GaelleLeroux to this topic?

Jeanneclre · 2024-01-10T17:05:12Z

Hi @tschreiner

Thanks for sharing this solution with us, I'll look into it further. I'm also curious to know what David thinks.

allemangD · 2024-01-11T19:28:55Z

Adding a parameter is probably the best option, since the appropriate value depends on the user's hardware and input data. The default should probably still be to use GPU memory, as that will be more performant on hardware that supports it. It might be possible to infer the correct approach based on the available GPU and input data, but that's probably not worth the effort. I'd display a warning if CPU memory is inferred.

In either case, one should catch a memory error and display a meaningful error to the user with a suggestion to try enabling the host memory option.

tschreiner · 2024-01-12T05:21:31Z

I think this sounds good. I think, it says somewhere in the docs that min. 12 GB GPU memory is required?

What about adding the parameter and

If System GPU memory < 12GB set CPU and display warning

Otherwise

Leave GPU as selected device?

This would save the effort for calculating the exact required memory for the operation and would still be a good compromise between usability, robustness and flexibility.

As far as I remember did nobody with 6-8 GPU memory succeed.. so 8GB could be a very safe threshold. But 12gb should be fine imho

This was referenced Jan 15, 2024

BUG: add option to choose host memory #75

Merged

AMASSS working problem #78

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AMASSS: Possible fix for CUDA memory issue (torch.cuda.OutOfMemoryError: CUDA out of memory) #66

AMASSS: Possible fix for CUDA memory issue (torch.cuda.OutOfMemoryError: CUDA out of memory) #66

tschreiner commented Jan 9, 2024 •

edited

Loading

cogitas3d commented Jan 10, 2024

tschreiner commented Jan 10, 2024

tschreiner commented Jan 10, 2024

Jeanneclre commented Jan 10, 2024

allemangD commented Jan 11, 2024

tschreiner commented Jan 12, 2024 •

edited

Loading

AMASSS: Possible fix for CUDA memory issue (torch.cuda.OutOfMemoryError: CUDA out of memory) #66

AMASSS: Possible fix for CUDA memory issue (torch.cuda.OutOfMemoryError: CUDA out of memory) #66

Comments

tschreiner commented Jan 9, 2024 • edited Loading

cogitas3d commented Jan 10, 2024

tschreiner commented Jan 10, 2024

tschreiner commented Jan 10, 2024

Jeanneclre commented Jan 10, 2024

allemangD commented Jan 11, 2024

tschreiner commented Jan 12, 2024 • edited Loading

tschreiner commented Jan 9, 2024 •

edited

Loading

tschreiner commented Jan 12, 2024 •

edited

Loading