Too much time and ram while saving the inference results #2014

iskenderkahramanoglu · 2024-03-14T22:26:39Z

Hello!

I have trained two model with full resolution.
The first model have 32 class and the second is 5 class.
When i try to take test, sometimes inference time is too long.

Cases:
512x512x246 nifti,
32 class model: 4 seconds for inference, 3 minutes to save results (112 steps in tqdm)
5 class model: 1:29 minutes for inference, 20 seconds to save results (245 steps in tqdm)

801x801x458 nifti,
32 class model: 10:17 minutes for inference, 2:30 minutes to save results (1100 steps in tqdm)
5 class model: 11:12 minutes for inference, 53 minutes to save results (2080 steps in tqdm)

What is the difference?
Less class model takes more time.
Have can I calculate tqdm step size for each model and nifti?
Is there any way to take results as another type from nifti, for example json?
Because CPU saving is too slow and use too much RAM memory.
Sometimes it use all RAM and system crashes.
I have 220 GB RAM, Tesla V100 and 46 thread processor.
But saving the result is use only one thread.

What can I do for less inference time?

iskenderkahramanoglu · 2024-03-15T13:56:16Z

Today I tried to take inference a nifti with 1000x1000x1000 size and (0.2, 0.2, 0.2) pixel spacing.
32 class model takes inference in 16 seconds, but save as nifti spents too much ram memory and crashes my system.
When I resample the nifti, the new size is 500x500x500, new spacing is (0.4, 0.4, 0.4) and time inference time is same.
Previously, when I resampled a nifti and increase the pixel spacing, the extraction time was down.
The results are successful, but it is impossible to use them in real life with these inference times.
What should I do? Will a better graphics card, more ram, etc. help me reduce these times?

iskenderkahramanoglu · 2024-03-17T03:06:50Z

Hi @FabianIsensee , do you have any idea about this?

ancestor-mithril · 2024-03-18T08:51:16Z

You can try to split the bigger volume into multiple smaller patches. For example, you can split it into 9, 16 or 25 patches. Then you do the inference on each patch separately and ultimately aggregate the result into a segmentation for the bigger volume.
In this case, you have to pay attention to have some overlapping between patches. Otherwise, the segmentation for the margins might not be very accurate.

iskenderkahramanoglu · 2024-03-19T07:15:46Z

You can try to split the bigger volume into multiple smaller patches. For example, you can split it into 9, 16 or 25 patches. Then you do the inference on each patch separately and ultimately aggregate the result into a segmentation for the bigger volume. In this case, you have to pay attention to have some overlapping between patches. Otherwise, the segmentation for the margins might not be very accurate.

There is no set standard for the volumes to be tested. There are also multiple models predicting different parts of the volume. Therefore, overlap between patches will be very difficult. Distortion at the margins will be inevitable. Despite this, do you recommend split the volume into small patches or would you have a different suggestion?

ancestor-mithril · 2024-03-19T08:16:23Z

Overlap between patches is not difficult, nnUNet already does this. You just need to patchify with overlap once more in order to reduce RAM usage and to speed up the inference.

iskenderkahramanoglu · 2024-03-19T09:11:29Z

Overlap between patches is not difficult, nnUNet already does this. You just need to patchify with overlap once more in order to reduce RAM usage and to speed up the inference.

Thanks for reply!

Do you mean physically splitting the test nifti file to 9 (or 16 or 25) pieces, or is there a simple way to do this in nnUnet? Can I split it into smaller patches by changing the "patch_size" parameter in the json?

What determines the testing and recording time of a nifti? Why does a nifti of size 801x801x458 with (0.4, 0.4, 0.4, 0.4) pixel spacing take longer to test and record than a nifti of size 1000x1000x1000 with (0.2, 0.2, 0.2, 0.2) pixel spacing?

ancestor-mithril · 2024-03-19T10:42:09Z

I suggest to physically split the test nifti file (you can use the patchly library).
You can't change the "patch_size" parameter in the json, unless you want to retrain the model. Each model is trained with a specific "patch_size" and "spacing".

What determines the testing and recording time of a nifti?

nnUNet does 3 things for inference:

preprocessing (cropping + normalization + resampling)
sliding window model inference
postprocessing (resampling and exporting)

Why does a nifti of size 801x801x458 with (0.4, 0.4, 0.4, 0.4) pixel spacing take longer to test and record than a nifti of size 1000x1000x1000 with (0.2, 0.2, 0.2, 0.2) pixel spacing?

It depends on the target spacing with which nnUNet model was trained because each case is resampled to that spacing. Or the 1000x1000x1000 case may have been cropped to a smaller size because it is zero on the margins.

iskenderkahramanoglu · 2024-03-19T13:38:12Z

OK, I will try to split a file.
Thank you very much.

iskenderkahramanoglu · 2024-03-21T13:41:03Z

I suggest to physically split the test nifti file (you can use the patchly library). You can't change the "patch_size" parameter in the json, unless you want to retrain the model. Each model is trained with a specific "patch_size" and "spacing".

What determines the testing and recording time of a nifti?

nnUNet does 3 things for inference:

preprocessing (cropping + normalization + resampling)

sliding window model inference

postprocessing (resampling and exporting)

Why does a nifti of size 801x801x458 with (0.4, 0.4, 0.4, 0.4) pixel spacing take longer to test and record than a nifti of size 1000x1000x1000 with (0.2, 0.2, 0.2, 0.2) pixel spacing?

It depends on the target spacing with which nnUNet model was trained because each case is resampled to that spacing. Or the 1000x1000x1000 case may have been cropped to a smaller size because it is zero on the margins.

I am looked patchly library but I didn't understand.
In nnUnet, prediction saving as a nifti file.
If I didn't understand wrongly, patchly split the image to virtual patches, takes prediction then merge.
How can I use patchly library in nnUnet, is there any example?

ancestor-mithril · 2024-03-21T13:43:49Z

You split the images into patches and then save them as nifti files. After predicting on all the patches, you aggregate the segmentation results into full-sized images.

iskenderkahramanoglu · 2024-03-22T12:00:33Z

You split the images into patches and then save them as nifti files. After predicting on all the patches, you aggregate the segmentation results into full-sized images.

I tried to split nifti file to 27 patchs and inference them.
Inference time is under 1 second for each patch but this also use all 200 GB RAM and system is crashes.
I have another question, if I convert label files (maybe image files also) from float64 to uint8, will inference time and ram using decrease?
Will this decrease training success?

ancestor-mithril · 2024-03-22T12:55:44Z

To reduce RAM usage you can decrease the number of processes used for preprocessing and segmentation export (see nnUNetv2_predict -h). You should use only 1 process for preprocessing and 1 for segmentation.

iskenderkahramanoglu · 2024-03-22T13:51:09Z

To reduce RAM usage you can decrease the number of processes used for preprocessing and segmentation export (see nnUNetv2_predict -h). You should use only 1 process for preprocessing and 1 for segmentation.

Using process as 1 is reduced RAM usage, system is not crashed.
But there is 27 patch file and prediction waiting a few time after each file.
So total time is 11 minutes.
I am also try to inference the full nifti file with set process as 1.
Again RAM usage reduced, but time is 12 minutes.
So this prediction time is not acceptable for me.

x1y9 · 2024-04-01T05:15:02Z

I have the same issue, the export time is about 35s and the GPU inference time is only about 8s, after disable the TTA, the GPU inference time drop to 1s, but the export time is still 35s.

So the performance bottleneck is the exporting.

mrokuss · 2024-04-23T13:08:50Z

Hey @iskenderkahramanoglu

It indeed seems like you have issues with the segmentation export. So if you increase the number of workers the export is of course faster, however on the other hand you risk running out of RAM. Large 3D volumes are always tricky to work with. Regarding your issue with the overlapping patches, the nnUNetPredictor takes the following default arguments:

nnUNetPredictor(tile_step_size: float = 0.5,
                   use_gaussian: bool = True,
                   use_mirroring: bool = True,
                   perform_everything_on_device: bool = True,
                   device: torch.device = torch.device('cuda'),
                   verbose: bool = False,
                   verbose_preprocessing: bool = False,
                   allow_tqdm: bool = True)

Here you can set the tile_step_size to a value higher than 0.5 (but maximum 1) in order to determine how much overlap there is between the patches. Higher overlap usually leads to better performance though. If you set use_mirroring = False (disabling test time augmentation) your inference will be much faster, again at the cost of performance.

mrokuss · 2024-05-28T09:14:01Z

Closing. Feel free to reopen if you still have questions!

YUjh0729 · 2024-08-04T04:58:29Z

I encountered a very strange issue when using the nnUNetv2_predict command. The program can't proceed and is unable to output the prediction results.
These are the results I predicted on the cloud server,
`Predicting FLARE22_010:
perform_everything_on_device: True
0%| | 0/360 [00:00<?, ?it/s]resizing data, order is 3
data shape (1, 227, 512, 512)
11%|████████████████████▍ | 38/360 [00:05<00:48, 6.65it/s]resizing segmentation, order is 1 order z is 0
data shape (1, 227, 512, 512)
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 360/360 [00:54<00:00, 6.60it/s]
sending off prediction to background worker for resampling and export
done with FLARE22_010

Predicting FLARE22_011:
perform_everything_on_device: True
38%|██████████████████████████████████████████████████████████████████████████▊ | 23/60 [00:03<00:05, 6.61it/s]resizing data, order is 1
data shape (14, 250, 628, 628)
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 60/60 [00:08<00:00, 6.74it/s]
sending off prediction to background worker for resampling and export
done with FLARE22_011
resizing data, order is 1
data shape (14, 109, 430, 430)`

and these are the results I tested locally，The output is similar, but there are these two additional lines of output.
Both environments are identical:torch2.0.1，cudu11.8，python3.10
perform_everything_on_device: True Prediction on device was unsuccessful, probably due to a lack of memory. Moving results arrays to CPU

mrokuss · 2024-08-04T08:50:08Z

Hey @YUjh0729

This is hard to judge form afar but my first guess would be that your local GPU does not have sufficient VRAM and fails for that particular FLARE case. nnUNet always tries to perform as many operations as possible on the GPU (already storing the image here instead of just the patches) in order to increase speed. If this fails then it falls back to using the GPU just for the individual patches and keeps the image on the CPU, this however takes longer. And this is also when you get this error message. You can set „perform_everything_on_device=False“ in the Predictor to immediately go with the second option.

Best, Max

YUjh0729 · 2024-08-04T12:27:17Z

Hi @mrokuss
Thank you very much for your response. I tried setting it to False, but essentially, it is the same issue. The data is stuck in the CPU and, after processing, it cannot be exported to the output folder. The program gets stuck and cannot proceed further.

pooya-mohammadi · 2024-10-15T08:11:47Z

@YUjh0729 check my implementation https://github.com/pooya-mohammadi/nnUNet
I also created a pull request #2545

YUjh0729 · 2024-11-21T07:33:25Z

@pooya-mohammadi
Hi, cool! It works. Thank you.

FabianIsensee assigned mrokuss Mar 14, 2024

mrokuss closed this as completed May 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Too much time and ram while saving the inference results #2014

Too much time and ram while saving the inference results #2014

iskenderkahramanoglu commented Mar 14, 2024

iskenderkahramanoglu commented Mar 15, 2024

iskenderkahramanoglu commented Mar 17, 2024

ancestor-mithril commented Mar 18, 2024

iskenderkahramanoglu commented Mar 19, 2024

ancestor-mithril commented Mar 19, 2024

iskenderkahramanoglu commented Mar 19, 2024

ancestor-mithril commented Mar 19, 2024

iskenderkahramanoglu commented Mar 19, 2024

iskenderkahramanoglu commented Mar 21, 2024

ancestor-mithril commented Mar 21, 2024

iskenderkahramanoglu commented Mar 22, 2024 •

edited

Loading

ancestor-mithril commented Mar 22, 2024

iskenderkahramanoglu commented Mar 22, 2024

x1y9 commented Apr 1, 2024

mrokuss commented Apr 23, 2024

mrokuss commented May 28, 2024

YUjh0729 commented Aug 4, 2024

mrokuss commented Aug 4, 2024

YUjh0729 commented Aug 4, 2024

pooya-mohammadi commented Oct 15, 2024

YUjh0729 commented Nov 21, 2024

Too much time and ram while saving the inference results #2014

Too much time and ram while saving the inference results #2014

Comments

iskenderkahramanoglu commented Mar 14, 2024

iskenderkahramanoglu commented Mar 15, 2024

iskenderkahramanoglu commented Mar 17, 2024

ancestor-mithril commented Mar 18, 2024

iskenderkahramanoglu commented Mar 19, 2024

ancestor-mithril commented Mar 19, 2024

iskenderkahramanoglu commented Mar 19, 2024

ancestor-mithril commented Mar 19, 2024

iskenderkahramanoglu commented Mar 19, 2024

iskenderkahramanoglu commented Mar 21, 2024

ancestor-mithril commented Mar 21, 2024

iskenderkahramanoglu commented Mar 22, 2024 • edited Loading

ancestor-mithril commented Mar 22, 2024

iskenderkahramanoglu commented Mar 22, 2024

x1y9 commented Apr 1, 2024

mrokuss commented Apr 23, 2024

mrokuss commented May 28, 2024

YUjh0729 commented Aug 4, 2024

mrokuss commented Aug 4, 2024

YUjh0729 commented Aug 4, 2024

pooya-mohammadi commented Oct 15, 2024

YUjh0729 commented Nov 21, 2024

iskenderkahramanoglu commented Mar 22, 2024 •

edited

Loading