CSMF HOD #149

pburger112 · 2024-09-20T15:36:05Z

I updated the GRAND_HOD.py and the abacus_hod.py files such that the user can predict an HOD based on the conditional stellar mass function described in https://arxiv.org/pdf/0807.4932. The code also predicts the stellar mass for each galaxy. For the protection of the stellar mass, I implemented two functions. The first creates only stellar mass from a range of possible values, and the second (linear interpolation) can predict arbitrary stellar masses. However, the latter sometimes creates a segmentation fault. I have not figured out why that is the case. Therefore, the current version uses the simpler version.

This involved implementing the equations in https://arxiv.org/pdf/0807.4932 and updating most of the functions in GRAND_HOD.py. I have kept the same code structure and just added a few lines.

I am using the current implementation for my paper, which I plan to publish soon. If you are interested in seeing it, I am happy to share a preliminary version.
Furthermore, please let me know if you want to merge it to the main branch and if I need to make any adjustments.

Add line of code that are able to compute CSMF HOD

back to origin

Addition of a conditional stellar mass function HOD

for more information, see https://pre-commit.ci

lgarrison

Thanks! I think it would be nice to have user-contributed models in AbacusHOD. From a high level, though, it looks like the CSMF is implemented as a galaxy "type" in GRAND_HOD.py, like LRG, ELG, etc, in such a way that it can have multi-tracer exclusion with these other types. That's what this if/else chain is about:

if randoms[i] <= LRG_marker:
    Nout[tid, 0, 0] += 1  # counting
    keep[i] = 1
elif randoms[i] <= ELG_marker:
    Nout[tid, 1, 0] += 1  # counting
    keep[i] = 2
elif randoms[i] <= QSO_marker:
    Nout[tid, 2, 0] += 1  # counting
    keep[i] = 3
elif randoms[i] <= CSMF_marker:
    Nout[tid, 3, 0] += 1  # counting
    keep[i] = 4

Is it really the spirit of the model that a CSMF galaxy is a type, like an LRG, ELG, etc?

To put it another way, should CSMF be implemented as a part of GRAND_HOD, or a separate model beside it? GRAND HOD refers the model from Sandy's paper, so I suspect that CSMF should be considered its own model and go in its own module. abacus_hod.py probably needs to be generalized to handle more than one HOD model, as we've only dealt with GRAND HOD up to now.

In terms of debugging the numba segfault, I would recommend running a small problem with the NUMBA_DISABLE_JIT=1 environment variable set. Most problems that cause a segfault in Numba, like index overrun, will raise a Python exception instead when JIT is disabled. You can also sometimes remove the @njit decorator from individual functions instead.

pburger112 · 2024-09-23T11:31:23Z

Hello @lgarrison,
Thank you for your answer. You raised a good point that I haven't considered. In principle, I could rename the CSMF to BGS. The reason is that BGS is a magnitude-limit survey. Therefore, you need something like a CSMF/CLF HOD to convert a magnitude-limit sample into a volume-limited sample, which is needed for a HOD. This is, for instance, done here: https://arxiv.org/pdf/2210.03110 in Figure 1. What do you think about renaming CSMF to BGS? Then, it would be a galaxy type consistent with LRG or ELG.

However, I am also fine with writing a separate file for the CSMF HOD if you prefer that approach.

Lastly, thank you for sharing your idea about the seg fault. I will try what you said. However, the code runs without any issues as long as the simplistic function (which is activated currently) is used to draw a random stellar mass.

lgarrison · 2024-09-27T19:39:08Z

I see, thanks for clarifying that this is for BGS galaxies. I do think it would be worth putting most of the functions in a new CSMF_HOD.py file; we want to be clear about which functions belong to which model, and GRAND_HOD.py is already huge. In an ideal world, GRAND_HOD.py wouldn't need to be modified at all, and we could select models from abacus_hod.py. That might be difficult without a big refactor, though. Abacus HOD is just not set up to be extensible right now, unfortunately. This is most obvious in really verbose, repetitious parts of the code like

abacusutils/abacusnbody/hod/GRAND_HOD.py

Lines 774 to 795 in 3f62344

    
           LRG_dict = Dict.empty(key_type=types.unicode_type, value_type=float_array) 
        
           ELG_dict = Dict.empty(key_type=types.unicode_type, value_type=float_array) 
        
           QSO_dict = Dict.empty(key_type=types.unicode_type, value_type=float_array) 
        
           CSMF_dict = Dict.empty(key_type=types.unicode_type, value_type=float_array) 
        
           ID_dict = Dict.empty(key_type=types.unicode_type, value_type=int_array) 
        
           LRG_dict['x'] = lrg_x 
        
           LRG_dict['y'] = lrg_y 
        
           LRG_dict['z'] = lrg_z 
        
           LRG_dict['vx'] = lrg_vx 
        
           LRG_dict['vy'] = lrg_vy 
        
           LRG_dict['vz'] = lrg_vz 
        
           LRG_dict['mass'] = lrg_mass 
        
           ID_dict['LRG'] = lrg_id 
        
           ELG_dict['x'] = elg_x 
        
           ELG_dict['y'] = elg_y 
        
           ELG_dict['z'] = elg_z 
        
           ELG_dict['vx'] = elg_vx 
        
           ELG_dict['vy'] = elg_vy 
        
           ELG_dict['vz'] = elg_vz 
        
           ELG_dict['mass'] = elg_mass 
        
           ID_dict['ELG'] = elg_id

etc.

Could you start by moving as much as possible into a new CSMF_HOD.py file? If you want to tackle any refactoring beyond that, that would be great, but I'll leave that up to you.

Besides that, there are a few places in the Numba code where you use lists and then convert them to NumPy arrays, like this:

abacusutils/abacusnbody/hod/GRAND_HOD.py

Lines 70 to 75 in 3f62344

    
           cdf = [] 
        
           cdf_current_value = 0 
        
           for k in range(len(delta_stellar_masses)): 
        
               cdf_current_value = cdf_current_value + CSMF_cen[k] * delta_stellar_masses[k] 
        
               cdf.append(cdf_current_value) 
        
           cdf = np.array(cdf)

In both Numba and regular Python, it's better to pre-allocate arrays with np.empty() and fill them. In this case, you could even use np.cumsum() to replace the entire loop.

Could you also add a test of your model? You can follow the example here: https://github.com/abacusorg/abacusutils/blob/main/tests/test_hod.py. The idea is to generate a mock catalog on a small sim and compare against a saved result.

Finally, the red X on the PR is from the pre-commit checks. You can click on the "Details" button next to that check to see the issues it found. It looks like there are some undefined variables, and some unused variables. If you want to set up pre-commit to run locally, you can follow these instructions: https://abacusutils.readthedocs.io/en/latest/installation.html#pre-commit

Thanks!

Separation of GRAND_HOD to GRAND_HOD and CSMF_HOD, also removed the for loops

for more information, see https://pre-commit.ci

pburger112 · 2024-10-09T20:00:38Z

Hi @lgarrison
Thank you for your suggestions. I have separated the GRAND_HOD.py to GRAND_HOD.py and CSMF_HOD.py.
I only had to modify the abacus_hod.py slightly to call the correct functions.

Furthermore, I have removed the lists as you indicated and used np.cumsum.

I will now work on a test example. But could you please give it a look and let me know if you're okay with the current structure?

All the best
Pierre

removed the unused variables

for more information, see https://pre-commit.ci

Corrected undefined and unused variables

ds

for more information, see https://pre-commit.ci

I created a test notebook showing how to create galaxy catalogues. I added Google Drive links to download the reference Haloes and Galaxy catalogues

for more information, see https://pre-commit.ci

pburger112 · 2024-10-15T14:57:57Z

@lgarrison
I have created a Jupyter notebook that shows the syntax for running the CSMF HOD code and compares it to a reference Galaxy catalogue.
Instead of uploading large files, I provide Google Drive links, which can be downloaded if one tests the code.

Let me know if you are okay with using a jupyter notebook instead of a Python file.

All the best
Pierre

lgarrison · 2024-10-15T19:40:44Z

Sorry for the delay, I've been thinking about the best approach here. Right now, CSMF_HOD.py and GRAND_HOD.py are really similar; there's a lot of repeated code and sections that are the same except for dictionary keys, specific HOD functions, etc. I don't think we want to end up in a situation where we try to modernize Abacus HOD and end up having to repeat all the changes for both GRAND HOD and CSMF HOD. But I also don't think we want all models to be forced to live in the same module, for the same reasons as before.

So here's what I'm thinking. abacusutils can define an entry point for user packages to provide custom HOD implementations. This means that if a package with a custom HOD is pip-installed, abacusutils will know about it and be able to call it as one of the models. Custom HOD packages can also specify abacusutils version constraints, so if we do make breaking changes to Abacus HOD, downstream packages can add a version constraint until compatibility is restored.

How does that sound to you? To be clear, I'd be happy to implement the entry point on the Abacus side (unless you wanted to). And sorry for making this a moving target! This is the first user-contributed HOD we've had, so we haven't had to think about these questions before.

pburger112 · 2024-10-16T11:56:41Z

Hi, no worries about the delay.
Your suggestion sounds great to me. This would make it more user-friendly.

However, may I also make a suggestion:
Since I am the first user to contribute HOD code, it will likely not happen often. We could merge the code as it is already so I can refer to it in my paper. Like this, the interested user can use the CSMF already.

And then we work together to modernize the abacusHOD code now, instead of finding an intermediate solution and modernizing it in the future? We could, for instance, meet and discuss how we want to restructure the code.

pburger112 and others added 4 commits September 20, 2024 08:40

Add CSMF to abacusutilities

c295771

Add line of code that are able to compute CSMF HOD

back to origin

2abddc8

back to origin

CSMF HOD

7b2b433

Addition of a conditional stellar mass function HOD

[pre-commit.ci] auto fixes from pre-commit.com hooks

3f62344

for more information, see https://pre-commit.ci

lgarrison requested review from SandyYuan and lgarrison September 21, 2024 02:48

lgarrison added the HOD for abacusnbody.hod label Sep 21, 2024

lgarrison reviewed Sep 22, 2024

View reviewed changes

pburger112 and others added 2 commits October 9, 2024 15:56

Separate CSMF python file

4ea04d4

Separation of GRAND_HOD to GRAND_HOD and CSMF_HOD, also removed the for loops

[pre-commit.ci] auto fixes from pre-commit.com hooks

3c77aa4

for more information, see https://pre-commit.ci

pburger112 and others added 7 commits October 9, 2024 16:06

Update CSMF_hod.py

3601cad

removed the unused variables

[pre-commit.ci] auto fixes from pre-commit.com hooks

d907717

for more information, see https://pre-commit.ci

Corrected undefined and unused variables

90838e2

Corrected undefined and unused variables

ds

5468ef8

ds

[pre-commit.ci] auto fixes from pre-commit.com hooks

6a64bd3

for more information, see https://pre-commit.ci

test notebook

a20048c

I created a test notebook showing how to create galaxy catalogues. I added Google Drive links to download the reference Haloes and Galaxy catalogues

[pre-commit.ci] auto fixes from pre-commit.com hooks

0334c72

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CSMF HOD #149

CSMF HOD #149

pburger112 commented Sep 20, 2024

lgarrison left a comment

pburger112 commented Sep 23, 2024

lgarrison commented Sep 27, 2024

pburger112 commented Oct 9, 2024

pburger112 commented Oct 15, 2024

lgarrison commented Oct 15, 2024

pburger112 commented Oct 16, 2024

CSMF HOD #149

Are you sure you want to change the base?

CSMF HOD #149

Conversation

pburger112 commented Sep 20, 2024

lgarrison left a comment

Choose a reason for hiding this comment

pburger112 commented Sep 23, 2024

lgarrison commented Sep 27, 2024

pburger112 commented Oct 9, 2024

pburger112 commented Oct 15, 2024

lgarrison commented Oct 15, 2024

pburger112 commented Oct 16, 2024