Proposal on handling `RemoteData` #105

yakutovicha · 2024-11-21T10:01:16Z

The current behaviour w.r.t. RemoteData object is to copy/symlink all its content in the current folder. But this creates an issue of handling files with the same names (e.g. #58). It becomes especially hard when one uses multiple RemoteData folders as inputs.

I would propose to copy/symlink the folders as is, giving them the name as the input node link.

In practice, it means that a call like the following:

    remote_folder = RemoteData(...)
    results, node = launch_shell_job(
        'cubehandler',
        nodes={'previous_calc': remote_folder},
        ...
        )

would generate a folder previous_calc that is a copy/symlink of the remote_folder.

It is a breaking change, but since the project is still in a pre-release phase (0.x version), I assume it is acceptable.

The text was updated successfully, but these errors were encountered:

yakutovicha · 2024-11-21T10:04:45Z

In practice, that would require to change the following lines:

aiida-shell/src/aiida_shell/calculations/shell.py

Lines 340 to 341 in 14866d1

    
           remote_nodes = [node for node in inputs.get('nodes', {}).values() if isinstance(node, RemoteData)] 
        
           instructions = [(computer_uuid, f'{node.get_remote_path()}/*', '.') for node in remote_nodes]

To something like that:

        remote_nodes = [(name, node) for (name, node) in inputs.get('nodes', {}).items() if isinstance(node, RemoteData)]
        instructions = [(computer_uuid, f'{node.get_remote_path()}', name) for (name, node) in remote_nodes]

Happy to make a PR :)

sphuber · 2024-11-21T10:32:29Z

I see the problem and it would be great to support it, but I don't think the proposed solution is the way to go. Most simple use cases rely on the contents of a RemoteData to copied directly in the working directory. Changing this would break that and force all these simple use cases to do more work to have the command read from a subdirectory, if this is even possible.

Instead, I think we should keep the default behavior and just make it possible for the user to specify the target directory for RemoteData inputs. This feature already exists:
https://aiida-shell.readthedocs.io/en/latest/howto.html#running-a-shell-command-with-folders-as-arguments

With the filenames argument you can specify the base directory for any file data provided by input nodes. I don't see why this couldn't or shouldn't also apply to RemoteData nodes. I quickly looked at the code and the ShellCalculation.handle_remote_data_nodes does not consider filenames whereas it probably should. There also seems to be a bug:

aiida-shell/src/aiida_shell/calculations/shell.py

Line 410 in 14866d1

self.handle_remote_data(node)

The method ShellCalculation.handle_remote_data doesn't exist and so should raise. I guess this line just never gets called currently.

yakutovicha · 2024-11-21T10:55:08Z

With the filenames argument you can specify the base directory for any file data provided by input nodes. I don't see why this couldn't or shouldn't also apply to RemoteData nodes.

This solution is also ok for me 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal on handling `RemoteData` #105

Proposal on handling `RemoteData` #105

yakutovicha commented Nov 21, 2024 •

edited

Loading

yakutovicha commented Nov 21, 2024 •

edited

Loading

sphuber commented Nov 21, 2024

yakutovicha commented Nov 21, 2024

Proposal on handling RemoteData #105

Proposal on handling RemoteData #105

Comments

yakutovicha commented Nov 21, 2024 • edited Loading

yakutovicha commented Nov 21, 2024 • edited Loading

sphuber commented Nov 21, 2024

yakutovicha commented Nov 21, 2024

Proposal on handling `RemoteData` #105

Proposal on handling `RemoteData` #105

yakutovicha commented Nov 21, 2024 •

edited

Loading

yakutovicha commented Nov 21, 2024 •

edited

Loading