Merge processes at runtime #267

pkausw · 2022-01-18T12:13:29Z

Suppose you have a datacard with a process class with some sub-processes. For example, in our analysis we have a data-driven control region (CR) that is derived as the difference between data_CR and some other processes proc1_CR and proc2_CR. Since the complete process in the CR is estimated during the fit, the processes need to be considered individually in the datacard for a correct parameterization. However, when interpreting the results and plotting everything, we're only interested in the total process in the CR and would like to correctly account for the post-fit correlations between the individual *_CR processes in order to derive sensible uncertainty bands.

This PR introduces an option to PostFitShapesFromWorkspace to merge processes at runtime. The processes can be defined with the syntax NAME='expr', where expr is the regular expression that is parsed to the process filter of the harvester instance.

Additionally, the PR introduces an option for additional printout during the calculation of the post-fit uncertainty bands. If activated, the option will print out the difference between the bin content of the nominal post-fit template and each toy used for the construction of the uncertainty band, for each bin respectively. This functionality can be used to understand abnormally large fluctuations in the uncertainty band and is intended for debugging purposes.

Merge from original branch

pkausw · 2023-03-03T14:20:25Z

Hi, are there any updates regarding this PR? I think it might be related to issue #291

kcormi · 2023-03-06T11:06:06Z

Thanks for this!

I had a few inline comments to the code, another couple things here:

Firstly, perhaps it is possible to simplify the code by packaging the loop over processes (or merged processes) into a function which is reused. Since now we have two loops over prefit (processes, then merged processes) and two loops over postfit (same as before). They are handling slightly different objects and with slightly different treatment for pre- and postfit-, so maybe its not so simple; but much of the code is replicated and it might be nice to compactify this.

Secondly, I think we would also want to add the merged processes into the loop over processes when the postfit/prefit yields are printed, right? https://github.com/cms-analysis/CombineHarvester/pull/267/files#diff-850d8a5178aa5366c05bacce2ec563fa72bfa4ce6ef5509f33b134d0dae89e5eL442

kcormi · 2023-03-06T10:28:22Z

CombineTools/src/CombineHarvester_Evaluate.cc

+        TH1F rand_shape = this->GetShapeInternal(lookup);
+        for (int j = 1; j <= rand_shape.GetNbinsX(); ++j) {
+          tmp.Form("\tBin Content %i: %f\n", j, rand_shape.GetBinContent(j));
+          // std::cout << tmp.Data();
+          full_rand_shape_summary.at(j-1) = rand_shape_summary;
+          full_rand_shape_summary.at(j-1) += tmp.Data();
+        }


This seems like quite a lot of information (checking the updated shape for bin for every updated parameter value). I wonder if it is maybe a bit too much, and could be avoided? But perhaps this was useful in some debugging which I'm simply not aware of. It just seems it would be a lot to even read through for large models, and perhaps targeting more specific information might be better?

Yes definitely! I originally implemented this when we generated post-fit uncertainty bands for our ttHbb analysis and so some very large fluctuations in the bands. The intend there was really to find the reason for these fluctuations, which is why I wanted to output as much information as possible.

I think the comment by Andrew is valid, and it might be better to simply change to toys themselves into a dedicated root file. I can also update this PR accordingly if we want to have something like that from the start, or we can do it at some later point in time

kcormi · 2023-03-06T10:31:06Z

CombineTools/src/CombineHarvester_Evaluate.cc

-          std::fabs(rand_shape.GetBinContent(i) - shape.GetBinContent(i));
-      shape.SetBinError(i, err*err + shape.GetBinError(i));
+          std::fabs(rand_shape.GetBinContent(j) - shape.GetBinContent(j));
+      if (std::fabs(err/shape.GetBinContent(j)) >= 0.5 and verbose){


I may be over optimizing here, but perhaps some of these extra steps could be avoided by default when not using the verbose mode? Perhaps just an if( verbose) before this loop, and in other places, for example?

I'm also wondering if it would be useful to make this 0.5 threshold a configurable parameter?

kcormi · 2023-03-06T10:48:41Z

CombineTools/bin/PostFitShapesFromWorkspace.cpp

+          post_shapes[bin][proc] = cmb_proc.GetShape();
+        } else {
+          post_shapes[bin][proc] =
+              sampling ? cmb_proc.GetShapeWithUncertainty(res, samples, verbose)


The deafult behaviour of PostFitShapesFromWorkspace should be to use sampling, and now use of the actual --sampling flag prints out a warning. So the code should not check the --sampling flag, but rather the --no-sampling flag, and always do sampling unless --no-sampling has been set.

ajgilbert · 2023-03-06T14:11:06Z

Hi, I remembered I made some related changes a while ago, and never merged them. See #295 which I just opened. That adds the option to merge channels (not processes), so is in principle complementary.

I tend to agree with @kcormi that the changes to GetShapeWithUncertainty take some care - especially not to run any excess code if verbose is not true. I see the motivation to have access to the variations though for debugging. To make it more flexible, we could see if instead of building a big string internally with some arbitrary thresholds involved, we could instead add a way to return the full set of varied TH1s, saved to the output, which the user would be free to inspect with their own code.

pkausw · 2023-03-06T17:50:17Z

Hi, thanks for the review and the comments! Indeed, I did most/some of these changes in the meanwhile in my local setup - I'm in the process of getting my local repository in shape to update this PR asap.

Regarding the additional debugging information: Yes, totally agree that this should be completely optional (which it is in my local version of the Harvester at this point). @ajgilbert , your comment about saving the toys themselves instead of a very long string is excellent I think - I'll have a look at this as soon as this PR is updated.

…kspace.cpp

pkausw · 2023-03-07T18:50:05Z

Hi! I am finally done with the update for this PR. In particular, I

merged with the main version of the CombineHarvester
restructured the c++ code
made the generation of the yields + uncertainties optional

Unfortunately, the PR is now a little messy since I changed quite a few things in one go (sorry). Let me know if you'd like me to change something more, sorry for the mess!

…threshold parameter to report in verbose mode (not excessible right now)

ajgilbert · 2023-03-09T14:05:51Z

Thanks, this now has a lot of changes :-)
In general I think the restructuring is a good idea, I'll try to review it (may be delayed because of Moriond season)

Conflicts: CombineTools/bin/PostFitShapesFromWorkspace.cpp

…merge_processes_inline

…merge_processes_inline Conflicts: CombineTools/src/CombineHarvester_Evaluate.cc

pkausw · 2023-06-01T13:38:34Z

Hi @anigamova , as discussed earlier, I've updated this PR. I have cherry-picked the updates from #295 and included the latest changes from main. Therefore, this PR should now enable to merge analysis bins and/or processes at runtime :) Let me know if you have any comments!

anigamova · 2023-11-17T08:51:44Z

Hi @pkausw,

It would be great to do what @ajgilbert suggested in PR #295, i.e. try to call GetShapeWithUncertainty method once, so that we don't generate 500 toys for each histogram but rather save the list of all of the shapes we want to produce and pass it to the GetShapeWithUncertainty function at the end and loop over the shapes for each toy once. I will also leave few comments where I think it would make sense to implement these changes with your PR.

anigamova · 2023-11-17T08:51:30Z

CombineTools/bin/PostFitShapesFromWorkspace.cpp

+        cmb.GetShape();
+  } else {
+    shape_container[proc] =
+        no_sampling ? cmb.GetShapeWithUncertainty()


Here to avoid sampling for each histogram you could call another method to update the list/map of all of shapes.

anigamova · 2023-11-17T09:03:52Z

CombineTools/src/CombineHarvester_Evaluate.cc

+          // std::cout << tmp.Data();
+          rand_shape_summary = tmp.Data();
+          // std::cout << rand_shape_summary.c_str() << std::endl;
+          TH1F rand_shape = this->GetShapeInternal(lookup);


This is the place where the loop through the list/map of shapes would go then

pkausw and others added 7 commits March 19, 2019 11:50

Merge pull request #1 from cms-analysis/master

4a83cf8

Merge from original branch

added option to set label size in plot

cbd6e7f

finalize merge with upstream

3d56d83

Merge remote-tracking branch 'upstream/master'

b2c6755

implemented option to merge processes at runtime

5f57af8

implemented verbose mode in generation of uncertainty band for debugging

5ce7aa8

implemented option for verbose generation of postfit uncertainty bands

31c1e5c

anigamova linked an issue Mar 3, 2023 that may be closed by this pull request

PostFitShapesFromWorkspace development #291

Open

4 tasks

kcormi reviewed Mar 6, 2023

View reviewed changes

Philip Daniel Keicher added 4 commits March 7, 2023 17:14

merge with main

d083a1f

restructure code, implement useful additional options

a88c577

included previously overwritten older changes in PostFitShapesFromWor…

9e29eec

…kspace.cpp

make yield generation optional

31354dd

pkausw changed the base branch from master to main March 7, 2023 18:52

optimized verbose mode for GetShapeWithUncertainties, introduced new …

f79ffa9

…threshold parameter to report in verbose mode (not excessible right now)

ajgilbert mentioned this pull request Mar 27, 2023

Ability to define merged channels in PostFitShapesFromWorkspace (+ a few random fixes) #295

Open

ajgilbert and others added 7 commits April 27, 2023 14:01

Add option for pattern-based merged bins in PostFitShapesFromWorkspace

351229a

Conflicts: CombineTools/bin/PostFitShapesFromWorkspace.cpp

finalized cherry-pick

521207c

removed LimitCompare

a742eb5

removed LimitCompare from Build xml

69ba4f3

Merge branch 'main' of github.com:cms-analysis/CombineHarvester into …

dd29bd3

…merge_processes_inline

Merge branch 'main' of github.com:cms-analysis/CombineHarvester into …

f96b253

…merge_processes_inline Conflicts: CombineTools/src/CombineHarvester_Evaluate.cc

resolve minor merge conflict issues with main

9397ef7

anigamova reviewed Nov 17, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge processes at runtime #267

Merge processes at runtime #267

pkausw commented Jan 18, 2022

pkausw commented Mar 3, 2023 •

edited

Loading

kcormi commented Mar 6, 2023

kcormi Mar 6, 2023

pkausw Mar 7, 2023

kcormi Mar 6, 2023

kcormi Mar 6, 2023

ajgilbert commented Mar 6, 2023

pkausw commented Mar 6, 2023

pkausw commented Mar 7, 2023

ajgilbert commented Mar 9, 2023

pkausw commented Jun 1, 2023

anigamova commented Nov 17, 2023

anigamova Nov 17, 2023

anigamova Nov 17, 2023

Merge processes at runtime #267

Are you sure you want to change the base?

Merge processes at runtime #267

Conversation

pkausw commented Jan 18, 2022

pkausw commented Mar 3, 2023 • edited Loading

kcormi commented Mar 6, 2023

kcormi Mar 6, 2023

Choose a reason for hiding this comment

pkausw Mar 7, 2023

Choose a reason for hiding this comment

kcormi Mar 6, 2023

Choose a reason for hiding this comment

kcormi Mar 6, 2023

Choose a reason for hiding this comment

ajgilbert commented Mar 6, 2023

pkausw commented Mar 6, 2023

pkausw commented Mar 7, 2023

ajgilbert commented Mar 9, 2023

pkausw commented Jun 1, 2023

anigamova commented Nov 17, 2023

anigamova Nov 17, 2023

Choose a reason for hiding this comment

anigamova Nov 17, 2023

Choose a reason for hiding this comment

pkausw commented Mar 3, 2023 •

edited

Loading