-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data-type forwarding to TripleIterator #279
Comments
I don't have a direct answer to the question, "whether there is a way to retrieve the datatypes stored inside the HDT file and have them passed (perhaps separately) through the iterator returned by the HDT::Search method." I would have to do some research to figure that out. Combining HDT storage with a SPARQL Query Engine is useful work and integrating Rasqal with HDT sounds like a good approach. For reference, I have a branch of a fork of Oxigraph available that uses the Oxigraph SPARQL Query Engine and the Rust HDT Library for reading the HDT files. Since that implementation is in Rust rather than C, there will be differences in approach, but the code might show one technique for how the datatypes are handled when going from the HDT contents to the SPARQL query processing. |
Thanks for your feedback @donpellegrino . I had a look at your Rust HDT Library and overall, you're using the same/a similar approach as what I am doing at the moment. It's mainly string manipulation/regex. In my case, it's quite fast and so far has not given problems but the context in which my implementation will be used requires us to avoid doing this kind of magic tricks (Automotive industry). Meanwhile, I've been digging further into the matter with the C++ library and am still far from having a real solution, but I'm starting to understand what the actual problem is. In the screenshot above, the element being extracted would be the Object, required for the Triple being queried by Rasqal. CSD_PFC::extractInBlock is called with This actually returns the correct value that we're looking for, but here's the catch: The data-type suffix I'm interested in comes right after the suffix where we stopped and this is the case in all of the common/shared prefixes as in this case. In this particular case, at the beginning we have the "0" prefix, then move forward by 14 suffixes, extract the length which is (1) and store it in tmpStr and we stop here. Result yields the value "1". Which is correct.
There is also another case where My guess is that this is an architectural issue as it depends on how the prefixes and suffixes are pooled and there's no real workaround for it. |
I tried to hack the code a bit, thinking that the datatypes are "always" right after the first common/shared prefix.. So I just accessed that location directly with and then append this suffix to the
It works only partially. There are some results that get the correct data-type suffix, while others get nothing at all. So it's unreliable, too, as it depends on how the data is stored in the blocks. |
Hello,
I'm using this library together with Rasqal as the SPARQL Query Engine. Problem is that Rasqal expects the triples to be in the correct datatype format (e.g. URI, double, boolean, etc) to do its magic while HDT provides only the raw strings.
Question is whether there is a way to retrieve the datatypes stored inside the HDT file and have them passed (perhaps separately) through the iterator returned by the HDT::Search method.
Been investigating this on my own and so far I only managed to find out that CSD_PFC is where the strings are stored, which also contain the datatypes I'm interested in, in textual format (e.g. "0"^^xsd:integer)
But when I retrieve the results from the iterator, even though it accesses the same (?!) methods, the portion after ^^ is gone and only "0" is returned.
Having this additional information returned would simplify the integration with Rasqal in my case. Right now I'm doing some basic string manipulation to determine which datatype might be correct, although I feel like it's error-prone and not reliable at 100% and having the real type provided directly by the library would be safer.
The text was updated successfully, but these errors were encountered: