-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NERSC-specific installation documentation #122
Comments
What is the "ceci path issue" referring to? Is it the issue that there is an old copy of ceci somewhere in the base python path on NERSC? I usually get around that by creating a custom conda env from scratch, which seems to resolve that issue. For NERSC-specific instructions and creating a custom conda environment, getting mpi4py and hdf5 writing set up correctly at NERSC can be a bit of a pain. This may already be included somewhere in RAIL docs, but in case it's not, there's a NERSC page addressing this: https://docs.nersc.gov/development/languages/python/parallel-python/ I've gotten things working by installing both mpi4py and h5py from source with the following procedure (I'm going to copy/paste from a slack message I sent to Josue a while back): Following the directions on that Parallel Python page, I could not get the pre-built conda environments nersc-mpi4py or nersc-h5py to work correctly, either mpi4py or h5py would have problems. The solution that worked for me was to install both mpi4py and h5py myself in a new conda environment, following the instructions for that on the NERSC webpage. Here's the rough procedure for how I put together an environment to run rail_tpz in parallel two weeks ago:
We could probably set up a conda environment with steps 1-8 somewhere that users can clone to make things easier. This should work with the pre-built |
I will try this on NERSC and if it works out we can put this into the documentation and close |
coincidentally, I just did the above set of instructions again today to set up a fresh environment to re-train a rail_tpz model, and things worked fine, I submitted a job to the debug queue using 5 processors and everything worked as intended. oh, and I missed the (hopefully obvious) step 3.5 in the above instructions: |
I can follow Sam's guide to run the tpz notebook on NERSC, we should include this in the installation documentation (of rail_tpz?). |
I think this is more general than rail_tpz, I follow the same procedure if I want to run rail_flexzboost in parallel at NERSC, for example. Not sure where the best place for this would be. |
Notes from the meeting:
|
There are two items from #51 that will not necessarily be addressed for v1, but we may still want to include:
ceci
path issueThe text was updated successfully, but these errors were encountered: