Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal on handling RemoteData #105

Open
yakutovicha opened this issue Nov 21, 2024 · 3 comments
Open

Proposal on handling RemoteData #105

yakutovicha opened this issue Nov 21, 2024 · 3 comments

Comments

@yakutovicha
Copy link

yakutovicha commented Nov 21, 2024

The current behaviour w.r.t. RemoteData object is to copy/symlink all its content in the current folder. But this creates an issue of handling files with the same names (e.g. #58). It becomes especially hard when one uses multiple RemoteData folders as inputs.

I would propose to copy/symlink the folders as is, giving them the name as the input node link.

In practice, it means that a call like the following:

    remote_folder = RemoteData(...)
    results, node = launch_shell_job(
        'cubehandler',
        nodes={'previous_calc': remote_folder},
        ...
        )

would generate a folder previous_calc that is a copy/symlink of the remote_folder.

It is a breaking change, but since the project is still in a pre-release phase (0.x version), I assume it is acceptable.

@yakutovicha
Copy link
Author

yakutovicha commented Nov 21, 2024

In practice, that would require to change the following lines:

remote_nodes = [node for node in inputs.get('nodes', {}).values() if isinstance(node, RemoteData)]
instructions = [(computer_uuid, f'{node.get_remote_path()}/*', '.') for node in remote_nodes]

To something like that:

        remote_nodes = [(name, node) for (name, node) in inputs.get('nodes', {}).items() if isinstance(node, RemoteData)]
        instructions = [(computer_uuid, f'{node.get_remote_path()}', name) for (name, node) in remote_nodes]

Happy to make a PR :)

@sphuber
Copy link
Owner

sphuber commented Nov 21, 2024

I see the problem and it would be great to support it, but I don't think the proposed solution is the way to go. Most simple use cases rely on the contents of a RemoteData to copied directly in the working directory. Changing this would break that and force all these simple use cases to do more work to have the command read from a subdirectory, if this is even possible.

Instead, I think we should keep the default behavior and just make it possible for the user to specify the target directory for RemoteData inputs. This feature already exists:
https://aiida-shell.readthedocs.io/en/latest/howto.html#running-a-shell-command-with-folders-as-arguments

With the filenames argument you can specify the base directory for any file data provided by input nodes. I don't see why this couldn't or shouldn't also apply to RemoteData nodes. I quickly looked at the code and the ShellCalculation.handle_remote_data_nodes does not consider filenames whereas it probably should. There also seems to be a bug:

self.handle_remote_data(node)

The method ShellCalculation.handle_remote_data doesn't exist and so should raise. I guess this line just never gets called currently.

@yakutovicha
Copy link
Author

With the filenames argument you can specify the base directory for any file data provided by input nodes. I don't see why this couldn't or shouldn't also apply to RemoteData nodes.

This solution is also ok for me 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants