-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FileNotFoundError triggered when running a task with a splitter #745
Comments
Looks like it might be a hashing issue. With the line
does the |
Seems to be a issue with cross process hashing as the serial plugin seems to work |
Something strange seems to be happening, unless I'm getting the wrong end of the stick with my debugging. The "read" task never seems to get run (I put a print statement in there and can't see an output directory created for it in the cache directory) but is dropped from the list of future tasks at some point in the expansion of the workflow |
I am glad to hear that it's not an issue with how I used the splitter. I like the new interface better actually. I am a bit skeptical about this issue though. It is a very simple setup, i.e. one producer of files followed by a task applied to each file, but quite representative of what any linear workflow would look like. The test suite probably need expanding to cover more realistic use cases than individual tasks and toy workflows. What do you think? |
Yes, definitely needs to be added as a test case. Does it hold if you read a directory of text files and append a value to each file? This has got me stumped. I have set breakpoints in lots of different places and the "read" node ever seems to be executed using the CF plugin. I can track the execution down to Lines 178 to 189 in e52e32b
|
Could it be an |
i can't replicate this with the current released version of pydra (0.23 on python 3.11.8 on macos m1). here is a simplified form of the code. from pathlib import Path
from pydra import Submitter, Workflow
from pydra.mark import annotate, task
@task
@annotate({"return": {"t1w_images": list[Path]}})
def read_t1w_images(bids_dir: Path) -> list[Path]:
return list(bids_dir.rglob("*"))
@task
@annotate({"return": {"smoothed_image": Path}})
def smooth_image(input_image: Path, smoothed_image: Path) -> Path:
smoothed_image = (
smoothed_image
or Path.cwd() / (input_image.name.split(".", maxsplit=1)[0] + "_smoothed")
)
with open(smoothed_image, "wt") as fp:
fp.write(str(smoothed_image))
return smoothed_image
wf = Workflow("test", input_spec=["bids_dir"], bids_dir=Path.cwd())
wf.add(read_t1w_images(name="read", bids_dir=wf.lzin.bids_dir))
wf.add(smooth_image(name="smooth").split("input_image", input_image=wf.read.lzout.t1w_images))
wf.set_output({"smoothed_images": wf.smooth.lzout.smoothed_image})
with Submitter() as sub:
res = sub(wf) |
the above code reads any number of files in your current directory, so just make sure it's not a lot! |
also i wasn't sure how the second function runs without an input for the |
That is a good point, the smoothed_image arg should be attrs.NOTHING. In my setup it isn't reaching that point (but then I'm not getting the file not found error that @ghisvail is so maybe something else is not working for me) |
are you unable to run my version of the code either? and are you using the current released version or the main branch? |
Ahhh, that is a bit of a trap for beginners. I was running into a common multi-proc issue explained here & here. Maybe we should think about defaulting to the serial worker so that new users don't run into it. When I place the workflow code into a if __name__ == "__main__":
main() the workflow runs as expected 🤦♂️ @ghisvail I think your problem might be really simple as @satra suggested, i.e. that you just forgot to provide a value for your output In general, we really need to be picking up errors like this up and letting the user know. The reason we don't do it when the task is initiated is we allow the inputs to be set later, but we should probably be running a check before the workflow is submitted to check that all inputs have been provided unless they are explicitly flagged as being optional. Probably, specifying the type as |
@ghisvail I think that with pre < 0.23 you probably got in the habit of typing all inputs and outputs as |
Indeed. I ran the example code above in the REPL initially, which is a sensible thing to do for a beginner wanting to try Pydra out. Using
I'll check it out. Thanks to you both for investigating this.
Indeed. Surfacing multiprocessing errors when the workflow is in an undesirable state is terrible from a usability POV.
I tried
Indeed. For the sake of simplicity, I went for |
In the ideal case, one would wish that: from __future__ import annotations
@task
def smooth_image(input_image: Path, smoothed_image: Path | None = None) -> Path:
from nilearn.image import load_img, smooth_img
smoothed_image = (
smoothed_image
or Path.cwd() / (input_image.name.split(".", maxsplit=1)[0] + "_smoothed.nii.gz")
)
smooth_img(load_img(input_image), fwhm=3).to_filename(smoothed_image)
return smoothed_image just works, i.e. both the optionality effect and default value to |
I defined and ran the following workflow:
and get the following error:
The tasks run fine if I exercise them individually with:
Not sure whether this is an issue with the actual definition of the splitter, or in the actual implementation. Either or, letting multiprocessing surface this error does not give much of a clue of what's going on.
The text was updated successfully, but these errors were encountered: