Training process ends with message "process 1 terminated with signal SIGKILL" #1530
Unanswered
lukasvanderstricht
asked this question in
Q&A
Replies: 2 comments 3 replies
-
When u run train.. try not use multi GPU.. and see if you can train on single gpu |
Beta Was this translation helpful? Give feedback.
2 replies
-
Thank you @SachidanandAlle I indeed get another error when I uncheck the box "multi-gpu" in the Developer mode of the MONAILabel Slicer module before I start the training. The output is the following:
I hope this helps in the debugging. Kind regards Lukas |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello everyone!
For the past few weeks I have been training a model using MONAILabel and 3D Slicer. I have been doing this on a Google Cloud Virtual Machine. For several reasons, I now want to migrate my work to a different VM. I managed to transfer all my data and my radiology app that contains all the models I have trained.
Now I would want to continue training in this new location. When I start the server like I did before, everything works fine. But then, once I press the
Train
button, a problem occurs. The logs of the training process come to a halt before the first epoch even begins. After quite a long period of time, I get an error message statingprocess 1 terminated with signal SIGKILL
.I think there must be something wrong with my installation or configuration as everything works fine on the original VM and I'm using the same
monailabel
version with which the model was trained (0.4.1). The full logs can be found below.monailabel start_server --app radiology_psoas_azd --studies psoas-azd-images/train-images/ --conf models deepedit
Using PYTHONPATH=/home/dellxpsazdelta:
Does anyone have an idea where it went wrong?
Kind regards
Lukas
Beta Was this translation helpful? Give feedback.
All reactions