-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make wrapped C++ functions pickleable #30099
Conversation
… important); PyPy still generates 2 different kinds of errors: test_print failure on macOS with Python 3.8; Python 3.9, 3.10 have no leading `<`
…tion_or_method object`
…ith full list of values.
… `PYBIND11_PLATFORM_ABI_ID_V4` to version `function_record_PyTypeObject` `tp_name`
…xes flaky behavior of test_gil_scoped.py). IncludeCleaner fixes.
…_record_PyTypeObject)` also with a simple `static bool first_call`
…t works with all other Python versions). Instead call `function_record_PyTypeObject_PyType_Ready()` from `get_internals()`.
…_detail_function_record_import_helper` (proof of concept).
…nals()` so that it is always called when `get_internals()` is called the first time.
…ield-initializers")`
…unction_record_pickle_helper()` call-once initializations triggered from `cpp_function::initialize_generic()`
…per()` call-once initialization triggered from `cpp_function::initialize_generic()`
… to stop clang-format from breaking up a string literal.
Much simpler! (Note that the `function_record_pickle_helper()` code is NOT removed in this commit.) This approach was discovered in an attempt to solve the problem that stubgen picks up `_function_record_pickle_helper_v1`. For example (tensorflow_text/core/pybinds/tflite_registrar.pyi): ```diff +from typing import Any + +def _function_record_pickle_helper_v1(*args, **kwargs) -> Any: ... ```
pybind11_fail("UNEXPECTED CALL OF function_record_PyTypeObject_methods::tp_free_impl"); | ||
} | ||
|
||
inline PyObject *reduce_ex_impl(PyObject *self, PyObject *, PyObject *) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I worry this implementation violated the protocol: the returned function call should have created an object of the same type as the one that was reduce_ex-d. aka:
reduce = obj.__reduce_ex__()
type(reduce[0](*reduce[1])) == type(obj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoting from the Python documentation:
- A callable object that will be called to create the initial version of the object.
- A tuple of arguments for the callable object. An empty tuple must be given if the callable does not accept any argument.
There is no mention that the callable has to be of the same type is the object being pickled. Such a restriction would be completely artificial and severely limiting:
There are types that for good reasons are not referenced from any importable module. There is no way that pickle.load()
could find them.
This is the case for Boost.Python and pybind11 functions.
Going deeper into the weeds:
Python has __builtins__
as home for some similar types. I was thinking of adding the pybind11 function type to __builtins__
, but I was told that future Python versions will make this impossible.
Even deeper:
We could create something like __pybind11__builtins__
(I'd actually call that __pybind11_internals__
), which would have the benefit of solving problems like here, BUT, how does that spring into existence? — Python creates __builtins__
on startup. We don't have that luxury.
Also note that there may be multiple pybind11 versions / ABIs in use simultaneously in any given Python interpreter.
Conclusion: I don't think there is any practical way to implement a callable of the "correct" pybind11 function record type that pickle.load()
could find.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main concern is the type of the return value of the callable, which is supposed to be the "initial version of the object". From what I read the current approach is:
- function_recorder's reduce_ex callable returns a module object, even though there is a PyObject for function_record itself.
- Another Python side object's "callable" looks up the attr from the module object to return the callable. (I could not quite find the relevant implementation however).
I argue that step 1's callable should have returned the PyObject for function_record instead, because of the "returns initial value" requirement. As function_record has a name, and a scope, I think we should have enough information for that. The records can be either attached to a pybind11_builtin module, or the actual scope where the CCallable is attached.
Then step 2 can be adjusted accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Initial" here refers to what happens when the object is unpickled.
When the object is unpickled: the callable creates an object, then calls __getstate__
(if there is one) to mutate the "initial" object.
The step 2. is something we have no control over.
Because it is coded into the pickle implementation how it handles built-in method
objects repr(m.simple_callable)
:
<built-in method simple_callable of pybind11_detail_function_record_v1__gcc_libstdcpp_cxxabi1018 object at 0x...>
Unless we want to completely change how pybind11 functions are implemented (nanobind actually did that!), we are forced to design the __reduce_ex__
of the function_record to produce:
A callable that returns an (intermediate) object for with getattr(obj, 'simple_callable
)` returns our function object.
The intermediate object could be anything that has that behavior.
Additional important requirement: it needs to assume that the module with the function was not imported already, otherwise the pickle load step (we have to assume it's in a new process) will fail.
In other words, in or before the getattr(obj, 'simple_callable')
step the module needs to be imported, and we cannot rely on any side-effects of importing the module before we trigger the import:
Yes, the function object has a name and a scope, but the type object does not have a scope.
This will be a problem even for nanobind, which doesn't use PyCFunction
but has it's own nanobind.nb_func
type. To stay with that concrete example: unless nanobind
exists as an importable module, the pickle load stage cannot work. This is what I meant by the "spring into existence" problem mentioned before.
I was thinking of giving giving the function_record type an __init__
that could act as the callable in the pickle mechanism, but that's the exact same problem: how does it spring into existence in the pickle load stage?
Note that this PR solves that problem in a very minimalistic and clean fashion:
(<built-in function eval>, ("__import__('importlib').import_module('pybind11_tests.pickling')",))
It could be even better (avoid the eval
) if we had something like __import_module__
, which could then act as the callable directly:
(<built-in function __import_module__>, ("pybind11_tests.pickling",))
Unfortunately that does not exist. I actually tried under this PR to make such a callable myself, and to inject it into the rec->scope
when a cpp_function
is built (so that it springs into existence just in time), but that fell flat because stubgen picks it up (>500 TGP failures because we have ~130 auto-generated but checked-in stubgen files). See message of the commit that replaced the "pickle helper" callable with eval
:
I had _function_record_pickle_helper_abi_stamp
as the name at that time. I also tried __function_record_pickle_helper_abi_stamp
and __function_record_pickle_helper_abi_stamp__
but stubgen stubbornly picked it up no matter what. The eval
approach was born out of an intense search for a solution that does not make it necessary to change stubgen, or worse, 130+ checked-in auto-generated files all over the codebase.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @rainwoodman for taking the time to go through this yesterday 1:1, and for correcting my understanding of your concern.
I captured the experimental code we worked on in 02b31b1 (exactly as it was when we left off).
Our conclusion is captured in the long comment added with bcdfc69. Could you please take another look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding the comment and the test. This looks good!
Logging result of simple experiment: nanobind functions are not pickleable
--- a/nanobind/tests/test_functions.py
+++ b/nanobind/tests/test_functions.py
@@ -1,9 +1,14 @@
import test_functions_ext as t
+import pickle
import pytest
import sys
import re
+def test_pickle():
+ pickle.dumps(t.test_01, protocol=-1)
+
+
def fail_fn(): # used in test_30
raise RuntimeError("Foo") |
Logging result of simple experiment: SWIG functions are pickleable
|
Cross-reference to simple experiment: Boost.Python functions are not pickleable |
Logging result of simple experiment: Python functions (built-in & native) are pickleable (unless they are nested)
|
…` and a long comment to explain the unusual behavior.
tests/test_pickling.py
Outdated
assert deserialized() == 20220426 | ||
assert deserialized is m.simple_callable | ||
|
||
# UNUSUAL: A pickle roundtrip starting with `m.simple_callable.__self__` yields `m`: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's hit a few keywords to hopefully make it more likely to land here from a google search:
# UNUSUAL: function record pickling roundtrip returns a module, not the function record object:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done: 1c24995
Thanks!
…htly differently as suggested).
…cord_pyobject.h, google#30099).
…changes). These test were added while working on google/pybind11clif#30099. Before google/pybind11clif#30099, the `SimpleCallable` pickle tests were failing with PyCLIF-pybind11. PiperOrigin-RevId: 608694227
Description
For pybind11 users:
This PR resolves pybind/pybind11#1261. For an example of the behavior difference, see the change in tests/test_pickling.py (for easy reference: that test was added with pybind/pybind11#3906).
Notes:
repr()
for pybind11 functions is made much more informative and truthful:Before this PR:
(
PyCapsule
objects do not have methods.)With this PR:
-DCMAKE_BUILD_TYPE=MinSizeRel
):Before this PR:
With this PR:
Factor: 5355256 / 5346584 = 1.0016219702150009
Motivation for this work
PyCLIF-pybind11 has to satisfy many million tests. On the order of 10 tests fail because pybind11 functions are not pickleable. While this number is small, the failures present a major problem:
Apache beam (which uses dill) requires that functions are pickleable: PyCLIF-pybind11 could not be used with beam. Similarly, some of the failing tests involve cloudpickle. It has to be expected that many of the projects depending on cloudpickle will run into issues, too.
The estimated effort to retrofit workarounds in the ~10 failing use cases is "at least one month, possibly two". This is because the use cases tend to be very complex, and because owners need to be convinced that workarounds are meaningful (which they really are not, as this PR proves).
Note also that the effort behind this PR is about one week (slightly less).
Comparison with stock Python and other systems for Python-C++ bindings:
Python functions (built-in & native) are pickleable.
SWIG functions are pickleable.
PyCLIF-C-API functions are pickleable. (Because they are implemented exactly like stock Python build-in functions.)
Boost.Python functions are not pickleable.
nanobind functions are not pickleable.
Everything below are very technical details:
Why is
not pickleable before this PR?
The reason is that the
simple_callable
is implemented as shown byrepr(simple_callable)
:To Python it appears to be a bound function of the type
PyCapsule
. (The reasons for this choice of implementation are deep and omitted here.)pickle.dumps(simple_callable)
is capable of pickling built-in methods. It does that in two steps:produces this tuple (see
__reduce_ex__
documentation):pickle then recurses, attempting to serialize
<built-in function getattr>
(no problem),<capsule object NULL at 0x...>
(problem), and'simple_callable'
(no problem).The attempt to serialize
<capsule object NULL at 0x...>
fails with this exception (copy-pasted from pytest output):While this is not the desired behavior when pickling
simple_callable
, there are very good reasons in general thatPyCapsule
objects are not pickleable. See, for example, here:The solution implemented in this PR is to
replace the
PyCapsule
with a custom built-in type, wrapping thepybind11::detail::function_record
C++ type the good-old way, with manually written bindings (implemented in include/pybind11/detail/function_record_pyobject.h),which then makes it possible to provide a
__reduce_ex__
implementation to achieve the desired behavior.The new
repr(simple_callable)
is:Side note: The Python type name is versioned to ensure ABI compatibility, but to maximize compatibility it is independent of
PYBIND11_INTERNALS_VERSION
.simple_callable.__reduce_ex__(0)
now produces:When pickle recurses to call
__reduce_ex__
of the wrappedfunction_record
object, the result is:Very simple! — Your author has to admit though that it took quite a while to figure this out. See the development history of this PR for the many dead ends explored before this solution was discovered.
To explain how this works in the
pickle.load()
step:eval("__import__('importlib').import_module('pybind11_tests.pickling')")
produces a reference to the importedpybind11_tests.pickling
module.getattr(imported_module, 'simple_callable')
simply accesses what was just imported.Note that
__import__('pybind11_tests.pickling')
only produces the top-level module (pybind11_tests
in this case), as explained in theimportlib.import_module()
documentation.Suggested changelog entry: