Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SONATA Python bindings should expose the schema of the file or datatypes of attributes #141

Open
matz-e opened this issue Apr 8, 2021 · 6 comments
Assignees

Comments

@matz-e
Copy link
Member

matz-e commented Apr 8, 2021

Currently, I have to iterate over the attribute names, extract a bogus array with an empty selection to get the schema of a SONATA file. It would be nice if there was some direct access to the column data types or the complete schema.

@mgeplf
Copy link
Contributor

mgeplf commented Apr 8, 2021

By schema, do you mean dtypes?

@matz-e
Copy link
Member Author

matz-e commented Apr 8, 2021

Yes

@mgeplf
Copy link
Contributor

mgeplf commented Apr 8, 2021

Seems to make sense to me, although I like your work around. What would the C++ API look like?

@matz-e
Copy link
Member Author

matz-e commented Apr 9, 2021

I think there's no one-to-one mapping. IIRC, in C++ you just request the HighFive dataset and HDF5 will convert between what you request to read, and what is on disk. Basically a lazy access to attributes with deferred reading that should let you peak at the datatype.

But one could have a C++ schema that's just unordered_map<string, HighFive::DataType> (or equivalent).

(note that this should also include data types of {source,target}_node_id)

@mgeplf
Copy link
Contributor

mgeplf commented Apr 9, 2021

I think there's no one-to-one mapping.

Not sure what you mean here.

But one could have a C++ schema that's just unordered_map<string, HighFive::DataType>

Is the HighFive::DataType really what you want, though? That seems to be exposing an implementation detail out of the interface, which I don't agree with. There is also a mismatch between the what the HighFive::DataType is, and what is actually returned, which an API consumer would find confusing, I think.

I guess make a PR, and we can nail the details down in that.

@matz-e
Copy link
Member Author

matz-e commented Apr 9, 2021

I don't really think we need a C++ counterpart to this, going through the call chain, we convert to whatever the "customer" requests. Whereas in Python, we reflect the datatype on disk. So I would just implement a Python side of this for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants