Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem while trying to run the short example of AbacusHOD #144

Open
MinaEnceladus opened this issue Jul 25, 2024 · 5 comments
Open

Problem while trying to run the short example of AbacusHOD #144

MinaEnceladus opened this issue Jul 25, 2024 · 5 comments
Labels
HOD for abacusnbody.hod

Comments

@MinaEnceladus
Copy link

Hi,

I'm running AbacusHOD through the new BinderHub.

First, I tried to run the first part of the process, running the prepare_sim code for z=0.500.

The first time, it took a few hours to reach slab number 33, producing two output files:
halos_xcom_32_seed600_abacushod_oldfenv_new.h5
particles_xcom_32_seed600_abacushod_oldfenv_new.h5

Next time, slab 31 and:
halos_xcom_30_seed600_abacushod_oldfenv_new.h5
particles_xcom_30_seed600_abacushod_oldfenv_new.h5

I also repeated for z = 0.200 and 0.100.

Now, when I run the short example, I receive this error:

FileNotFoundError: [Errno 2] Unable to synchronously open file (unable to open file: name = '.../output/subsamples/AbacusSummit_base_c000_ph000/z0.100/halos_xcom_0_seed600_abacushod_oldfenv_new.h5', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

Also, it creates empty folders in the output directory for galaxies.
.../output/galalxies/AbacusSummit_base_c000_ph000/z0.500

@lgarrison
Copy link
Member

Does the halos_xcom_0_seed600_abacushod_oldfenv_new.h5 file exist somewhere? @SandyYuan can confirm, but I think that file should be produced by prepare_sim. There should probably be files named halos_xcom_0_... through halos_xcom_33_.... If not, it may mean that prepare_sim didn't run correctly or ran out of memory.

@lgarrison lgarrison added the HOD for abacusnbody.hod label Jul 25, 2024
@epaillas
Copy link
Contributor

epaillas commented Jul 25, 2024

If it is of any help, I had similar issues when trying to run prepare_sim for z = 0.5 periodic boxes a few weeks ago. The problem was that the script was configured to load 3 slabs in parallel, which ended up requiring too much memory and it would not correctly generate the output files (as Lehman says, it should generate 34 files halos_xcom_i_... with i running from 0 to 33.

I switched

prepare_sim:
    Nparallel_load: 2

in the yaml configuration file and this brought down the memory consumption to something that was manageable for NERSC and solved the problem. Not sure what number will be adequate for the cluster you're using.

(I'm having similar issues with the lightcone mocks as we are discussing in the other thread, but in that case even Nparallel_load: 1 won't do the trick. However, for periodic boxes I found tweaking this parameter was enough).

@MinaEnceladus
Copy link
Author

Thanks, @lgarrison and @epaillas.

You're right. It appears that the system ran out of memory.
I don't have access to NERSC or any other cluster. I used the binder, and I only have 128 GB of memory.

I also checked z = 0.100 once with Nparallel_load: 2 and again with Nparallel_load: 1.
In the second attempt, after more than 4 hours, only a few files were produced (0, 9, 18, and 27).

@lgarrison
Copy link
Member

I wonder if there could be a CPU problem, too. Binder is a bit strange in that it looks to applications as if they have 96 cores, but really they're sharing 4 (cgroups). You might want to set nthreads = 4 here:

np.floor(multiprocessing.cpu_count() / config['prepare_sim']['Nparallel_load'])
(We should make this an parameter, I'll open an issue)

If memory is the problem, though, then this might not help. The base simulations are big, unfortunately! You might want to try a smaller simulation if your application allows. hugebase is often a good place to start, because it's the same volume but lower mass resolution.

@MinaEnceladus
Copy link
Author

Thanks @lgarrison.
I've managed to successfully run "prepare_sim" and the Short Example.

sim_name: 'AbacusSummit_hugebase_c000_ph000'
z_mock: 0.100
Nparallel_load: 1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HOD for abacusnbody.hod
Projects
None yet
Development

No branches or pull requests

3 participants