-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hardware specifications for IMN model #23
Comments
@KJGithub2021 |
@JasonForJoy thanks for your reply. And yes I already checked the sh files, but the default configuration I was specifying was given in the train.py source code file instead. Anyhow I will use the one you pointed out. Secondly can you let me know how long the model took to complete training and evaluation on the specs that you mentioned? |
@KJGithub2021 It took about 90h (including evaluation on the dev set every 1000 steps) under the default setting, i.e., 10 epochs and 96 batch_size on a single NVIDIA GeForce 1080 (12G) GPU card. |
@JasonForJoy and this time is taken on the original UDC V2 dataset, which is composed of 957101 train dialogs and 19560 valid dialogs ? |
@KJGithub2021
|
Okay thankyou for the information! will let you know if I come across anything. |
@JasonForJoy |
@JasonForJoy |
@JasonForJoy can you kindly respond on this query and give some direction ? I really appreciate your help. |
@KJGithub2021 |
Okay...but what was the purpose of your code to save model checkpoints ? |
@JasonForJoy Understood. But how did you plan to resume training through your code from saved checkpoint otherwise ? |
@JasonForJoy Can you please confirm if you also used batch_size 96 for test dataset or 128? |
Hi @JasonForJoy can you please confirm if reducing the batch size (due to low-end GPU machine availability) can impact the performance values of the model ? |
Hello team @JasonForJoy,
Is there any document that lists the hardware requirements of the model, i.e. the minimum specs of your computer system to run the model using original UDC (900k training dialogs) on the default training parameters (i.e. 1000000 epochs and 128 batch_size with evaluation every 1000 step) ?
I have tried running it with a colab Pro account with a premium GPU A100 and high RAM enabled, using only a reduced dataset (around 10000 train dialogs) for just 4 epochs, and it took hell lot of time like around 15+ hrs just to just 4 epochs on this setting!
Am I missing something here that you guys can help me speed up the training time ??
Looking for fast and sincere help here.
Regards.
The text was updated successfully, but these errors were encountered: