Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix design issues inside ssh.py #809

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open

fix design issues inside ssh.py #809

wants to merge 2 commits into from

Conversation

mpenkov
Copy link
Collaborator

@mpenkov mpenkov commented Mar 5, 2024

The functional design in ssh.py was broken.

All other modules share these design characteristics:

  • module.open: accepts module-specific keyword parameters
  • module.open_uri function: accepts a URI and transport_params dict, function signature common across all modules
  • module.open_uri unpacks transport_params dict and passes it to module.open

The SSH submodule, on the other hand, violates these characteristics. ssh.open_uri passes transport_params to ssh.open as-is, without unpacking them. It looks like this snuck into the code in this commit 4e67683 and then further developed more recently in 269c3a2.

This PR brings ssh.py back in line with the common design characteristics shared by other submodules.

@mpenkov
Copy link
Collaborator Author

mpenkov commented Mar 5, 2024

@mrk-its and @wbeardall Can you please review?

Copy link
Contributor

@wbeardall wbeardall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, except that the docstring in ssh.py (line 261-262) should change

"""
If ``username`` or ``password`` are specified in *both* the uri and
``transport_params``, ``transport_params`` will take precedence
"""

to

"""
If ``username`` or ``password`` are specified *both* as function arguments 
and in ``connect_kwargs``, ``connect_kwargs`` will take precedence.
"""

if connect_kwargs:
connect_kwargs = copy.deepcopy(connect_kwargs)
else:
connect_kwargs = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can there be mutable values inside connect_kwargs? if its a simple dict[str, str] or so, a mere connect_kwargs = connect_kwargs.copy() would suffice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can technically be mutable values in connect_kwargs; key_filename could potentially be a list or other mutable iterable (see SSHClient). Using the regular copy should be fine, as I don't think Paramiko will ever modify the provided iterable, but I think I just left it as a deepcopy as a failsafe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have code that sets connect_kwargs["pkey"] as an instance of a Paramiko PKey subclass, as well as connect_kwargs["sock"] to an instance of Paramiko ProxyCommand, so those are likely mutable. They definitely shouldn't change though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hypothetically: if you'd want to share a sock across multiple threads (assuming sock is thread-safe), wouldn't deepcopy be unwanted here?

here is such a scenario but with s3. if smart_open would be calling deepcopy on my boto client in every call to open, that would defeat the purpose

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be a moot point, paramiko itself isn't thread-safe. The recommended solution is to open a new connection entirely in each thread (so in smart_open implementation terms, ignore the cached connection). So the smart_open implementation already isn't thread-safe.

I would assume, though can't find any documentation, that this implies ProxyCommand is also not thread-safe. Looking at the implementation of ProxyCommand, I'm not even sure what deepcopying it would do - it's basically a wrapper around a spawned subprocess, and I don't know enough about the Python (deep)copy implementation to know if deepcopying would spawn a new instance of the subprocess to give to the deepcopy, or if it'd preserve the reference to the original spawned process and just pass the same subprocess to the deepcopy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants