Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Example run: No such file or directory: 'Refine3D/job001/run_it001_half1_class001_unfil.mrc' #13

Open
wlugmayr opened this issue May 15, 2024 · 6 comments

Comments

@wlugmayr
Copy link

here is my commandline:

srun --mpi=pmi2 which relion_refine_mpi --o Refine3D/job001/run --auto_refine --split_random_halves --i job025_tutorial.star --ref HA_reference.mrc --firstiter_cc --ini_high 10 --dont_combine_weights_via_disc --pool 3 --pad 2 --ctf --particle_diameter 170 --flatten_solvent --zero_mask --solvent_mask mask.mrc --oversampling 1 --healpix_order 2 --auto_local_healpix_order 3 --offset_range 5 --offset_step 2 --sym C3 --low_resol_join_halves 40 --norm --scale --j 1 --gpu "" --external_reconstruct --keep_lowres --pipeline_control Refine3D/job001/

here is parts of the run.out

Expectation iteration 1
7.45/7.43 min ............................................................~~(,_,">
Averaging half-reconstructions up to 40 Angstrom resolution to prevent diverging orientations ...
Note that only for higher resolutions the FSC-values are according to the gold-standard!
Calculating gold-standard FSC ...
Maximization ...

  • Making system call for external reconstruction: /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/bin/python /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/site-packages/spIsoNet/bin/relion_wrapper.py Refine3D/job001/run_it001_half1_class001_external_reconstruct.star
    iter = 001
    set CUDA_VISIBLE_DEVICES=0
    set CONDA_ENV=spisonet-1.0.0
    set ISONET_WHITENING=True
    set ISONET_WHITENING_LOW=10
    set ISONET_RETRAIN_EACH_ITER=True
    set ISONET_BETA=0.5
    set ISONET_ALPHA=1
    set ISONET_START_HEALPIX=3
    set ISONET_ACC_BATCHES=2
    set ISONET_EPOCHS=5
    set ISONET_KEEP_LOWRES=False
    set ISONET_LOWPASS=True
    set ISONET_ANGULAR_WHITEN=False
    set ISONET_3DFSD=False
    set ISONET_FSC_05=False
    set ISONET_FSC_WEIGHTING=True
    set ISONET_START_RESOLUTION=15.0
    set ISONET_KEEP_LOWRES= True
    healpix = 2
    symmetry = C3
    mask_file = mask.mrc
    pixel size = 1.309998
    resolution at 0.5 and 0.143 are 999.0 and 999.0
    real limit resolution to 10.0

RELION version: 4.0.1
exiting with an error ...

and here the run.err:

The following warnings were encountered upon command-line parsing:
WARNING: Option --keep_lowres is not a valid RELION argument
Traceback (most recent call last):
File "/gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/site-packages/spIsoNet/bin/relion_wrapper.py", line 362, in
shutil.copy(mrc_unfil, mrc_unfil_backup)
File "/gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/shutil.py", line 417, in copy
copyfile(src, dst, follow_symlinks=follow_symlinks)
File "/gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/shutil.py", line 254, in copyfile
with open(src, 'rb') as fsrc:
FileNotFoundError: [Errno 2] No such file or directory: 'Refine3D/job001/run_it001_half1_class001_unfil.mrc'
in: /gpfs/cssb/software/tmp/install/relion-4.0.1/src/backprojector.cpp, line 1294
ERROR:
ERROR: there was something wrong with system call: /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/bin/python /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/site-packages/spIsoNet/bin/relion_wrapper.py Refine3D/job001/run_it001_half1_class001_external_reconstruct.star
=== Backtrace ===
/gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi(_ZN11RelionErrorC2ERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES7_l+0x69) [0x4c7eb9]
/gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi() [0x44f710]
/gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi12maximizationEv+0x17dc) [0x4ffb4c]
/gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi(_ZN14MlOptimiserMpi7iterateEv+0x482) [0x500b52]
/gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi(main+0x59) [0x4b6a49]
/lib64/libc.so.6(+0x3feb0) [0x14a1bf43feb0]
/lib64/libc.so.6(__libc_start_main+0x80) [0x14a1bf43ff60]
/gpfs/cssb/software/rhel9/x86_64/relion/4.0.1/bin/relion_refine_mpi(_start+0x25) [0x4b9ba5]

ERROR:
ERROR: there was something wrong with system call: /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/bin/python /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/site-packages/spIsoNet/bin/relion_wrapper.py Refine3D/job001/run_it001_half1_class001_external_reconstruct.star

$ find Refine3D

Refine3D
Refine3D/job001
Refine3D/job001/run_it000_half2_class001_angdist.bild
Refine3D/job001/run.err
Refine3D/job001/default_pipeline.star
Refine3D/job001/run_it001_half2_class001_external_reconstruct.star
Refine3D/job001/run_it000_sampling.star
Refine3D/job001/run_it001_half1_class001_external_reconstruct_data_real.mrc
Refine3D/job001/run_it001_half2_class001_external_reconstruct_data_real.mrc
Refine3D/job001/run.out
Refine3D/job001/run_it001_half2_class001_external_reconstruct_weight.mrc
Refine3D/job001/run_it000_half1_model.star
Refine3D/job001/run_it000_optimiser.star
Refine3D/job001/run_it000_half1_class001.mrc
Refine3D/job001/run_it001_half1_class001_external_reconstruct.star
Refine3D/job001/run_it000_half2_class001.mrc
Refine3D/job001/.run.err.tail
Refine3D/job001/.run.out.tail
Refine3D/job001/run_submit.script
Refine3D/job001/job_pipeline.star
Refine3D/job001/job.star
Refine3D/job001/run_it000_half2_model.star
Refine3D/job001/run_it001_half1_class001_external_reconstruct_data_imag.mrc
Refine3D/job001/run_it000_data.star
Refine3D/job001/run_it001_half2_class001_external_reconstruct_data_imag.mrc
Refine3D/job001/run_it001_half1_class001_external_reconstruct_weight.mrc
Refine3D/job001/note.txt
Refine3D/job001/run_it000_half1_class001_angdist.bild
Refine3D/job001/RELION_JOB_EXIT_FAILURE
Refine3D/job001/run_it001_half1_class001_external_reconstruct.mrc
Refine3D/job001/run_it001_half2_class001_external_reconstruct.mrc
Refine3D/spisonet

@procyontao
Copy link
Collaborator

Hi

I wonder whether this error will appear when you add --solvent_correct_fsc into the command

@wlugmayr
Copy link
Author

Hi,

yes now it comes to iteration 5..

commandline:

srun --mpi=pmi2 which relion_refine_mpi --o Refine3D/job001/run --auto_refine --split_random_halves --i job025_tutorial.star --ref HA_reference.mrc --firstiter_cc --ini_high 10 --dont_combine_weights_via_disc --pool 3 --pad 2 --ctf --particle_diameter 170 --flatten_solvent --zero_mask --solvent_mask mask.mrc --oversampling 1 --healpix_order 2 --auto_local_healpix_order 3 --offset_range 5 --offset_step 2 --sym C3 --low_resol_join_halves 40 --norm --scale --j 1 --gpu "" --external_reconstruct --keep_lowres --solvent_correct_fsc --pipeline_control Refine3D/job001/

possible new error messages:

File "/gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/site-packages/torch/autograd/graph.py", line 744, in _engine_run_backward
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [64]] is at version 4; expected version 3 instead. Hint: enable an
omaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

FileNotFoundError: [Errno 2] No such file or directory: 'Refine3D/job001/corrected_run_it005_half1_class001_unfil.mrc'
in: /gpfs/cssb/software/tmp/install/relion-4.0.1/src/backprojector.cpp, line 1294

I installed torch like:
pip install torch --index-url https://download.pytorch.org/whl/cu118

logfiles.zip

@procyontao
Copy link
Collaborator

Hi,

I have also experienced this problem. This is because data have to pass through the same network more than once. I do not know exact solution to it now. What I current experience is the following, (probably not correct):

  1. This could happens when spIsoNet uses one GPU
  2. This is also related to the version of torch and graphic cards.

@wlugmayr
Copy link
Author

Yes with multiple GPUs it is working now.
At the beginning I did not specify CUDA_VISIBLE_DEVICES and got an error. So I set it to:
CUDA_VISIBLE_DEVICES=0
But when is set them now e.g. 4 GPU node to
CUDA_VISIBLE_DEVICES=0 1 2 3
the program is running without error to the end in Relion4 & Relion5 (for both I used the full path to python to avoid clashes with the relion5 conda dependencies) - environment modules style:

setenv RELION_EXTERNAL_RECONSTRUCT_EXECUTABLE {/gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/bin/python /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/lib/python3.10/site-packages/spIsoNet/bin/relion_wrapper.py}
setenv CONDA_ENV spisonet-1.0.0
setenv CUDA_VISIBLE_DEVICES {0 1 2 3}

Why do you write in your documentation that spIsoNet does not work with Relion5? Is the output mrc wrong?

@procyontao
Copy link
Collaborator

If you can run through relion5 it should be totally great. Saying the spIsoNet does not work for relion5 is because of the clashing of the conda environment or blush. It would be great if you can share the details on what environment need to be set for relion5. whether it need to deactivate conda for relion5 and use spisonet's instead?

@wlugmayr
Copy link
Author

Well the solution is quite simple:

  • First install Relion5 as described including the (one) conda environment containing blush, modelangelo, ...
  • Then install spIsoNet in a second/different conda environment (e.g. conda create -n spisonet-1.0.0 -y python=3.10, ...)
  • Now set and load the Relion5 software environment incl. its conda

The trick is to provide the full path to the python executable to spIsoNet. Here some tests:

$ which python
/gpfs/cssb/software/rhel9/anaconda3/envs/relionconda-5.0.1/bin/python
$ /gpfs/cssb/software/rhel9/anaconda3/envs/relionconda-5.0.1/bin/python -m pip list | grep blush
relion-blush 0.0.1
$ /gpfs/cssb/software/rhel9/anaconda3/envs/relionconda-5.0.1/bin/python -m pip list | grep spisonet

$ /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/bin/python -m pip list | grep blush
$ /gpfs/cssb/software/rhel9/anaconda3/envs/spisonet-1.0.0/bin/python -m pip list | grep spisonet
spIsoNet 1.0

The dedicated python executable knows its packages so there should be no clashes between different conda environments. For the spIsoNet wrapper you do not have to activate the spIsoNet conda.

So instead of setting (which will end up in using the Relion5 python):
export RELION_EXTERNAL_RECONSTRUCT_EXECUTABLE='python /fullpath_to_spisonet_wrapper/relion_wrapper.py'
you set:
export RELION_EXTERNAL_RECONSTRUCT_EXECUTABLE=' /fullpath_to_spisonet_python/python /fullpath_to_spisonet_wrapper/relion_wrapper.py'

In the Relion Gui I have set Reference -> Use Blush regularisation? -> No and the job runs technically to the end generating an mrc output file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants