Bug: Can't update inside of a metadata context when iterating over a different query #541

kbolashev · 2024-10-28T09:55:26Z

Reproduction test (assuming a datasource with 1.txt..5.txt datapoints):

def test_add_in_a_different_context(datasource: "Datasource"):
    path = "1.txt"
    field = "field"

    with datasource.metadata_context() as ctx:
        ctx.update_metadata(path, {field: "old"})

    with datasource.metadata_context() as ctx:
        for dp in datasource[field].is_null().all():
            dp[field] = "new"

    res = [dp.metadata.get(field, None) for dp in datasource.all()]
    assert res == ["old", "new", "new", "new", "new"]

Right now this is failing, because res == ["old", None, None, None, None]

Happens because datasource[field].is_null().all() ends up creating a new datasource object, and so closing the original datasource's context doesn't end up uploading the implicit datapoints.

To fix this I extracted all implicit contexts in a global dictionary keyed by the datasource ID, that way no matter if you're doing subqueries, you're always getting the metadata context of this datasource that ends up being uploaded at some point.

Made sure that all the backend E2E tests are passing after this.

dagshub · 2024-10-28T09:55:29Z

Join the discussion on DagsHub!

sdafni · 2024-10-28T13:13:16Z

dagshub/data_engine/model/datasource.py

@@ -158,6 +158,9 @@ def to_dict(self, ds: "Datasource") -> Dict[str, Any]:
        return res_dict


+_metadata_contexts: Dict[Union[int, str], "MetadataContextManager"] = {}


too bad we need a global thing, but i can't think of any other solution

sdafni · 2024-10-28T13:14:07Z

where is this test than?
separate PR?

Extract the implicit contexts to a global dict

bbedbcf

kbolashev added the bug Something isn't working label Oct 28, 2024

kbolashev requested a review from sdafni October 28, 2024 09:55

kbolashev self-assigned this Oct 28, 2024

sdafni reviewed Oct 28, 2024

View reviewed changes

sdafni approved these changes Oct 28, 2024

View reviewed changes

kbolashev merged commit 99ec378 into master Oct 28, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Can't update inside of a metadata context when iterating over a different query #541

Bug: Can't update inside of a metadata context when iterating over a different query #541

kbolashev commented Oct 28, 2024 •

edited

Loading

dagshub bot commented Oct 28, 2024

sdafni Oct 28, 2024

sdafni commented Oct 28, 2024

		@@ -158,6 +158,9 @@ def to_dict(self, ds: "Datasource") -> Dict[str, Any]:
		return res_dict


		_metadata_contexts: Dict[Union[int, str], "MetadataContextManager"] = {}

Bug: Can't update inside of a metadata context when iterating over a different query #541

Bug: Can't update inside of a metadata context when iterating over a different query #541

Conversation

kbolashev commented Oct 28, 2024 • edited Loading

dagshub bot commented Oct 28, 2024

sdafni Oct 28, 2024

Choose a reason for hiding this comment

sdafni commented Oct 28, 2024

kbolashev commented Oct 28, 2024 •

edited

Loading