-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: always request total needed memory, as snakemake seems to count as single process #4
Conversation
…as single process
P.S.: With the fix, it reserves the memory as I would expect and runs through. Forgot to clearly state this... 😅 |
I'm going to need to take a look at this. The issue is that on some clusters, The previous LSF profile actually used the Snakemake If you have provided per-thread memory requests, I have a helper function to dynamically correct that. Could you please let me know if |
The relevant documentation is here: https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=o-r#bsub_R_description__title__3 From what I understand, this setting will overwrite any cluster-wide settings and ensure, that memory is always reserved for the entire job (not per host or task). I think this is what we want snakemake to always do in this context, as this always submits one job and we know how much memory the entire job should use.
I amended my PR after some further digging. Here's the reasoning behind this, with the relevant documentation being: From what I understand, the Does this make sense? Could you try this "at home"? |
I have confirmed that |
Good point, I think I have now done this. But please double-check in your code review. |
This looks good from me reading it, but we don't have any real unit tests in the repo other than code quality. I am going to manually change the pyproject.toml and make a new release later today, and then I will do some more testing. (I don't have Release Please set up right now. Thanks so much for the help! |
Many thanks for setting up this executor. It's a huge head start for the snakemake 8 transition. And thanks for being so responsive. I'll keep filing PRs as further stuff comes up. And should you want to share the maintenance burden at some point, let me know. |
Also, I'll watch out for the bioconda bot auto-update of the respective recipe, and shepherd that along. Unless you beat me to it... ;) |
It seems like this And this does not get parsed correctly by LSF, if you add a resource reservation method to the |
What if we have an env variable that can be set as a workaround that
modifies the behavior of the executor? One value for total memory without
/job, and one value for memory/core. If the variable is unset, the executor
behaves like it does now.
…On Tue, Mar 19, 2024, 7:01 AM David Laehnemann ***@***.***> wrote:
It seems like this /job syntax actually runs into an LSF bug on the
system that I work on. They check for a correct setting of the
LSB_SUB_MEM_USAGE parameter mentioned here:
https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=controls-configuration-enable-job-submission-execution
And this does not get parsed correctly by LSF, if you add a resource
reservation method to the rusage string. So currently this breaks our
setup, but I haven't found a way to more generally check this. So this is
basically just a ping, in case you have another idea of how to achieve a
more general behaviour. Otherwise, we'll have to wait for a workaround on
our system, or a fix of the bug in LSF...
—
Reply to this email directly, view it on GitHub
<#4 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAZ2Z2AU7KV3FRNJMM6ELODYZALJ5AVCNFSM6AAAAABET7OWACVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMBWHA2TCMZQG4>
.
You are receiving this because you modified the open/close state.Message
ID: ***@***.***>
|
That's a good idea, we could try that. An alternative thought: Do you know which exact setting determines that memory is requested per thread? If so, we could query that setting and only append the |
I have looked and not been able to find this. I thought I had, but no luck. |
So I got some extra pointers from our IT. From what I understand, this is how they set up things so that memory requests are set and enforced per job (and per host):
And instead they enforce it via setting:
There is a docs page for Do these variables point to any differences that you can see in your local setup, that would explain the per-thread behaviour? Otherwise, I think this command-line option sounds like a good alternative to the https://www.ibm.com/docs/en/spectrum-lsf/10.1.0?topic=options-hl So I will open a PR with this, and then you can also test this in your setting. |
I was previously using and maybe misunderstanding |
Just to link this up, the new PR is #5 . |
With the state of the docs, my working theory is that any server setup will end up being a config edge case... 👀 Also, lots to misunderstand in all these docs, and with stuff seemingly implicitly happening before and during submission, that are not transparent to the user. But let's try to at least get a solution, that works for both our setups. 🙈 |
We have a cluster with the following setting:
For this case, the lsf docs on
LSB_JOB_MEMLIMIT
specify:So your interpretation previously was, that each requested cpu is its own process. However, I think that the whole job is always one single process, because it simply runs snakemake again in the main bsub job. So I think we always need to request the full amount of needed memory via
-R rusage[mem={mem_}]
, no matter what theLSB_JOB_MEMLIMIT
setting.As a bit of backup / reasoning, here's what I saw for a rule that has
threads: 8
specified, andresources: mem_mb = lambda wc, threads: threads * 4000
when run with the executor-pluging-lsf without the change proposed here. TL;DR is, it will only reserve the memory for one thread / cpu, but the process requires the total memory. Here's the logging output (shortened and redacted a bit):