Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"RuntimeWarning: Missing cross-reference index" displayed during execution #142

Open
eguiraud opened this issue May 18, 2023 · 12 comments
Open

Comments

@eguiraud
Copy link
Contributor

IIUC these warnings are harmless, but I guess it would be nice to remove them altogether, they pollute the notebook output.

It should be enough to run the ttbar notebook locally (by using AF: local in config.yaml) to reproduce them.

home/blue/Scratchpad/work/agc/analysis-grand-challenge/analyses/cms-open-data-ttbar/venv2/lib/python3.11/site-packages/coffea/nanoevents/schemas/nanoaod.py:201: RuntimeWarning: Missing cross-reference index for FatJet_subJetIdx1 => SubJet
  warnings.warn(
/home/blue/Scratchpad/work/agc/analysis-grand-challenge/analyses/cms-open-data-ttbar/venv2/lib/python3.11/site-packages/coffea/nanoevents/schemas/nanoaod.py:201: RuntimeWarning: Missing cross-reference index for FatJet_subJetIdx2 => SubJet
  warnings.warn(

image

@alexander-held
Copy link
Member

I think NanoAODSchema.warn_missing_crossrefs = False is meant to address those, perhaps we need to move that elsewhere if it currently is not catching this.

@eguiraud
Copy link
Contributor Author

This does not seem to happen with an IterativeExecutor, so maybe the problem is that NanoAODSchema.warn_missing_crossrefs = False is not propagated to the dask tasks? Just guessing :)

@alexander-held
Copy link
Member

That sounds very likely to me. Not sure where it would have to be put instead, @lgray @nsmith- do you have an idea?

@lgray
Copy link

lgray commented May 22, 2023

Which executor(s) are you using?

@eguiraud
Copy link
Contributor Author

I saw the warnings with the DaskExecutor and a local dask client (dask.distribute.Client()).

@lgray
Copy link

lgray commented May 22, 2023

Gotcha - looking around you should be able to control the warnings behavior with:
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONWARNINGS

and you can even make it fairly fine grained (i.e. a particular warning to be ignored).

You can set that env var for dask workers or otherwise.

For local dask distributed (i.e. with a distributed.Client) you can specify:

import os
os.environ["PYTHONWARNINGS"] = "ignore"
client = Client()

For stuff on batch there's a few ways to do it either by dask config file or running functions that alter the environment on the worker.

@eguiraud
Copy link
Contributor Author

Thank you @lgray , but doesn't this mean NanoAODSchema.warn_missing_crossrefs = False is ~useless in the standard AGC configurations?

@lgray
Copy link

lgray commented May 23, 2023

@eguiraud For data exploration and initial knowing-what-you're-working with it's still useful. I'd want to know that certain parts of functionality are not available in a given file.

For production grade analyses it is not useful, I agree.

However, you can control it with an environment variable. Therefore, I think the current implementation is fine. I will entertain counterarguments.

@eguiraud
Copy link
Contributor Author

Yes I agree the warning is useful, I meant NanoAODSchema.warn_missing_crossrefs = False is not so useful as a way to turn them off since 1. it doesn't work in some fairly common scenarios and 2. the "proper" way to turn off those warnings seems to be the standard Python mechanisms you mentioned.

@lgray
Copy link

lgray commented May 23, 2023

Gotcha. I can think of some ways to make it more useful in coffea 2023 so that the flag is more uniformly set.

No reasonable way to fix it in coffea 0.7 though.

Please make an issue on coffea github. I'll see what I can do!

@lgray
Copy link

lgray commented May 23, 2023

Actually - hold that thought a second. I'll double check it but I think it's already effectively "fixed" in the new implementation. The code that generates that error is only ever run once and only on the node local to the user.

@lgray
Copy link

lgray commented May 28, 2023

Indeed, this is not an annoyance in coffea 2023. Suggest using the python environment variable in coffea 0.7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants