-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix design issues inside ssh.py #809
base: develop
Are you sure you want to change the base?
Conversation
@mrk-its and @wbeardall Can you please review? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me, except that the docstring in ssh.py (line 261-262) should change
"""
If ``username`` or ``password`` are specified in *both* the uri and
``transport_params``, ``transport_params`` will take precedence
"""
to
"""
If ``username`` or ``password`` are specified *both* as function arguments
and in ``connect_kwargs``, ``connect_kwargs`` will take precedence.
"""
if connect_kwargs: | ||
connect_kwargs = copy.deepcopy(connect_kwargs) | ||
else: | ||
connect_kwargs = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can there be mutable values inside connect_kwargs? if its a simple dict[str, str]
or so, a mere connect_kwargs = connect_kwargs.copy()
would suffice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There can technically be mutable values in connect_kwargs
; key_filename
could potentially be a list or other mutable iterable (see SSHClient). Using the regular copy
should be fine, as I don't think Paramiko will ever modify the provided iterable, but I think I just left it as a deepcopy as a failsafe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have code that sets connect_kwargs["pkey"]
as an instance of a Paramiko PKey
subclass, as well as connect_kwargs["sock"]
to an instance of Paramiko ProxyCommand
, so those are likely mutable. They definitely shouldn't change though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hypothetically: if you'd want to share a sock
across multiple threads (assuming sock
is thread-safe), wouldn't deepcopy be unwanted here?
here is such a scenario but with s3. if smart_open would be calling deepcopy on my boto client in every call to open, that would defeat the purpose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a moot point, paramiko itself isn't thread-safe. The recommended solution is to open a new connection entirely in each thread (so in smart_open implementation terms, ignore the cached connection). So the smart_open implementation already isn't thread-safe.
I would assume, though can't find any documentation, that this implies ProxyCommand is also not thread-safe. Looking at the implementation of ProxyCommand
, I'm not even sure what deepcopying it would do - it's basically a wrapper around a spawned subprocess, and I don't know enough about the Python (deep)copy implementation to know if deepcopying would spawn a new instance of the subprocess to give to the deepcopy, or if it'd preserve the reference to the original spawned process and just pass the same subprocess to the deepcopy.
The functional design in ssh.py was broken.
All other modules share these design characteristics:
The SSH submodule, on the other hand, violates these characteristics. ssh.open_uri passes transport_params to ssh.open as-is, without unpacking them. It looks like this snuck into the code in this commit 4e67683 and then further developed more recently in 269c3a2.
This PR brings ssh.py back in line with the common design characteristics shared by other submodules.