Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSMF HOD #149

Open
wants to merge 13 commits into
base: main
Choose a base branch
from
Open

CSMF HOD #149

wants to merge 13 commits into from

Conversation

pburger112
Copy link
Contributor

I updated the GRAND_HOD.py and the abacus_hod.py files such that the user can predict an HOD based on the conditional stellar mass function described in https://arxiv.org/pdf/0807.4932. The code also predicts the stellar mass for each galaxy. For the protection of the stellar mass, I implemented two functions. The first creates only stellar mass from a range of possible values, and the second (linear interpolation) can predict arbitrary stellar masses. However, the latter sometimes creates a segmentation fault. I have not figured out why that is the case. Therefore, the current version uses the simpler version.

This involved implementing the equations in https://arxiv.org/pdf/0807.4932 and updating most of the functions in GRAND_HOD.py. I have kept the same code structure and just added a few lines.

I am using the current implementation for my paper, which I plan to publish soon. If you are interested in seeing it, I am happy to share a preliminary version.
Furthermore, please let me know if you want to merge it to the main branch and if I need to make any adjustments.

pburger112 and others added 4 commits September 20, 2024 08:40
Add line of code that are able to compute CSMF HOD
back to origin
Addition of a conditional stellar mass function HOD
@lgarrison lgarrison added the HOD for abacusnbody.hod label Sep 21, 2024
Copy link
Member

@lgarrison lgarrison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I think it would be nice to have user-contributed models in AbacusHOD. From a high level, though, it looks like the CSMF is implemented as a galaxy "type" in GRAND_HOD.py, like LRG, ELG, etc, in such a way that it can have multi-tracer exclusion with these other types. That's what this if/else chain is about:

if randoms[i] <= LRG_marker:
    Nout[tid, 0, 0] += 1  # counting
    keep[i] = 1
elif randoms[i] <= ELG_marker:
    Nout[tid, 1, 0] += 1  # counting
    keep[i] = 2
elif randoms[i] <= QSO_marker:
    Nout[tid, 2, 0] += 1  # counting
    keep[i] = 3
elif randoms[i] <= CSMF_marker:
    Nout[tid, 3, 0] += 1  # counting
    keep[i] = 4

Is it really the spirit of the model that a CSMF galaxy is a type, like an LRG, ELG, etc?

To put it another way, should CSMF be implemented as a part of GRAND_HOD, or a separate model beside it? GRAND HOD refers the model from Sandy's paper, so I suspect that CSMF should be considered its own model and go in its own module. abacus_hod.py probably needs to be generalized to handle more than one HOD model, as we've only dealt with GRAND HOD up to now.

In terms of debugging the numba segfault, I would recommend running a small problem with the NUMBA_DISABLE_JIT=1 environment variable set. Most problems that cause a segfault in Numba, like index overrun, will raise a Python exception instead when JIT is disabled. You can also sometimes remove the @njit decorator from individual functions instead.

@pburger112
Copy link
Contributor Author

Hello @lgarrison,
Thank you for your answer. You raised a good point that I haven't considered. In principle, I could rename the CSMF to BGS. The reason is that BGS is a magnitude-limit survey. Therefore, you need something like a CSMF/CLF HOD to convert a magnitude-limit sample into a volume-limited sample, which is needed for a HOD. This is, for instance, done here: https://arxiv.org/pdf/2210.03110 in Figure 1. What do you think about renaming CSMF to BGS? Then, it would be a galaxy type consistent with LRG or ELG.

However, I am also fine with writing a separate file for the CSMF HOD if you prefer that approach.

Lastly, thank you for sharing your idea about the seg fault. I will try what you said. However, the code runs without any issues as long as the simplistic function (which is activated currently) is used to draw a random stellar mass.

@lgarrison
Copy link
Member

I see, thanks for clarifying that this is for BGS galaxies. I do think it would be worth putting most of the functions in a new CSMF_HOD.py file; we want to be clear about which functions belong to which model, and GRAND_HOD.py is already huge. In an ideal world, GRAND_HOD.py wouldn't need to be modified at all, and we could select models from abacus_hod.py. That might be difficult without a big refactor, though. Abacus HOD is just not set up to be extensible right now, unfortunately. This is most obvious in really verbose, repetitious parts of the code like

LRG_dict = Dict.empty(key_type=types.unicode_type, value_type=float_array)
ELG_dict = Dict.empty(key_type=types.unicode_type, value_type=float_array)
QSO_dict = Dict.empty(key_type=types.unicode_type, value_type=float_array)
CSMF_dict = Dict.empty(key_type=types.unicode_type, value_type=float_array)
ID_dict = Dict.empty(key_type=types.unicode_type, value_type=int_array)
LRG_dict['x'] = lrg_x
LRG_dict['y'] = lrg_y
LRG_dict['z'] = lrg_z
LRG_dict['vx'] = lrg_vx
LRG_dict['vy'] = lrg_vy
LRG_dict['vz'] = lrg_vz
LRG_dict['mass'] = lrg_mass
ID_dict['LRG'] = lrg_id
ELG_dict['x'] = elg_x
ELG_dict['y'] = elg_y
ELG_dict['z'] = elg_z
ELG_dict['vx'] = elg_vx
ELG_dict['vy'] = elg_vy
ELG_dict['vz'] = elg_vz
ELG_dict['mass'] = elg_mass
ID_dict['ELG'] = elg_id

etc.

Could you start by moving as much as possible into a new CSMF_HOD.py file? If you want to tackle any refactoring beyond that, that would be great, but I'll leave that up to you.

Besides that, there are a few places in the Numba code where you use lists and then convert them to NumPy arrays, like this:

cdf = []
cdf_current_value = 0
for k in range(len(delta_stellar_masses)):
cdf_current_value = cdf_current_value + CSMF_cen[k] * delta_stellar_masses[k]
cdf.append(cdf_current_value)
cdf = np.array(cdf)

In both Numba and regular Python, it's better to pre-allocate arrays with np.empty() and fill them. In this case, you could even use np.cumsum() to replace the entire loop.

Could you also add a test of your model? You can follow the example here: https://github.com/abacusorg/abacusutils/blob/main/tests/test_hod.py. The idea is to generate a mock catalog on a small sim and compare against a saved result.

Finally, the red X on the PR is from the pre-commit checks. You can click on the "Details" button next to that check to see the issues it found. It looks like there are some undefined variables, and some unused variables. If you want to set up pre-commit to run locally, you can follow these instructions: https://abacusutils.readthedocs.io/en/latest/installation.html#pre-commit

Thanks!

pburger112 and others added 2 commits October 9, 2024 15:56
Separation of GRAND_HOD to GRAND_HOD and CSMF_HOD, also removed the for loops
@pburger112
Copy link
Contributor Author

Hi @lgarrison
Thank you for your suggestions. I have separated the GRAND_HOD.py to GRAND_HOD.py and CSMF_HOD.py.
I only had to modify the abacus_hod.py slightly to call the correct functions.

Furthermore, I have removed the lists as you indicated and used np.cumsum.

I will now work on a test example. But could you please give it a look and let me know if you're okay with the current structure?

All the best
Pierre

pburger112 and others added 7 commits October 9, 2024 16:06
removed the unused variables
Corrected undefined and unused variables
ds
I created a test notebook showing how to create galaxy catalogues.
I added Google Drive links to download the reference Haloes and Galaxy catalogues
@pburger112
Copy link
Contributor Author

@lgarrison
I have created a Jupyter notebook that shows the syntax for running the CSMF HOD code and compares it to a reference Galaxy catalogue.
Instead of uploading large files, I provide Google Drive links, which can be downloaded if one tests the code.

Let me know if you are okay with using a jupyter notebook instead of a Python file.

All the best
Pierre

@lgarrison
Copy link
Member

Sorry for the delay, I've been thinking about the best approach here. Right now, CSMF_HOD.py and GRAND_HOD.py are really similar; there's a lot of repeated code and sections that are the same except for dictionary keys, specific HOD functions, etc. I don't think we want to end up in a situation where we try to modernize Abacus HOD and end up having to repeat all the changes for both GRAND HOD and CSMF HOD. But I also don't think we want all models to be forced to live in the same module, for the same reasons as before.

So here's what I'm thinking. abacusutils can define an entry point for user packages to provide custom HOD implementations. This means that if a package with a custom HOD is pip-installed, abacusutils will know about it and be able to call it as one of the models. Custom HOD packages can also specify abacusutils version constraints, so if we do make breaking changes to Abacus HOD, downstream packages can add a version constraint until compatibility is restored.

How does that sound to you? To be clear, I'd be happy to implement the entry point on the Abacus side (unless you wanted to). And sorry for making this a moving target! This is the first user-contributed HOD we've had, so we haven't had to think about these questions before.

@pburger112
Copy link
Contributor Author

Hi, no worries about the delay.
Your suggestion sounds great to me. This would make it more user-friendly.

However, may I also make a suggestion:
Since I am the first user to contribute HOD code, it will likely not happen often. We could merge the code as it is already so I can refer to it in my paper. Like this, the interested user can use the CSMF already.

And then we work together to modernize the abacusHOD code now, instead of finding an intermediate solution and modernizing it in the future? We could, for instance, meet and discuss how we want to restructure the code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
HOD for abacusnbody.hod
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants