-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get Cosmoflow running on Rivanna via a prebuilt singularity image #1
Comments
Instructions on how to do it: https://www.rc.virginia.edu/userinfo/howtos/rivanna/docker-images-on-rivanna/ |
as discussed its just a matter of hopefully reading and understanding documentation now the second part is here Running Image Non-Interactively as Slurm jobs #!/usr/bin/env bash module purge containerdir=~ .sif image needs to be replaced |
sbatch parameters i distriuted previously. Nate successfully uses them |
create completely new README-singularity.md that does this. YOu do not have to modify the main/README.md for now. FOcus on singularity. You can copy the portions on how to do ssh and log into rivanna from main, as well as the git code management. remember you have two repos dsc-spidal/mlcommons-cosmoflow and mlcommons/hpc |
Install docker on local machine, test out the docker pull on steve's image |
Utilize the image to run the train.py script on a small dataset |
utilized properly on a small dataset, get it working on the larger dataset |
Email Chain:
Thank you for pointing this out. This dockerfile is the most recent:
https://github.com/mlcommons/hpc/blob/main/cosmoflow/builds/Dockerfile
The other dockerfile was for running on Cori CPU, and yeah it's a bit old and should be removed.
I also think we may want to add a requirements.txt file in general to the code so that that is used within the images or natively
I like that idea. At this point I should tell you that we may swap this tensorflow implementation out for a pytorch one. It is not finalized yet but I aim to have it ready and validated before we freeze mlperf hpc v3 in June. I can similarly try to prepare that one with requirements.txt + dockerfile.
Please not we only have singularity on the machine and not docker
You can use dockerfiles and/or docker images, though, right?
We run shifter at NERSC and just convert the docker images into shifter images.
The text was updated successfully, but these errors were encountered: