Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IRIs but what about double quotes? #41

Open
idimopoulos opened this issue Oct 13, 2023 · 4 comments
Open

Support IRIs but what about double quotes? #41

idimopoulos opened this issue Oct 13, 2023 · 4 comments
Labels
question Further information is requested

Comments

@idimopoulos
Copy link

I am opening an issue because because I would like to first have a small feedback if I am thinking this correctly to not lose too much time here. This is kind of a follow up from #29 where we improved the support for literals and we cleared out a lot about what characters are supported and what are not.

Recently, I had the same issue but for the subject. From what I can read online, the RDF data, support IRIs (Internaltional RIs). That means that IRIs like http://example.com/person#name="Ηλίας" is perfectly valid as a resource identifier.

2 issues here:

  1. We did not update in Successor of #25: Boost Ntriples performance by @idimopoulos #29 the subjects, we only handled objects. That means that all characters are currently encoded, and the URL above is converted completely using the old system (characters are replaced by their \0x#### version or their \u#### version.
  2. The second problem, is the double quotes. While it should be perfectly fine, there is a problem when running queries and how they are transported if there is a way. For example, in Virtuoso (community edition 7.2) the queries
WITH <http://my-example-graph>
INSERT { <http://example.com/person#"Ηλίας"> <http://example.com/predicates#hasName> "Ilias" } 
WITH <http://my-example-graph>
INSERT { <http://example.com/person#\"Ηλίας\"> <http://example.com/predicates#hasName> "Ilias" } 
WITH <http://my-example-graph>
INSERT { <http://example.com/person#\u0022Ηλίας\u0022> <http://example.com/predicates#hasName> "Ilias" } 
WITH <http://my-example-graph>
INSERT { <http://example.com/person#%22Ηλίας%22> <http://example.com/predicates#hasName> "Ilias" } 

either throw an error, or the last one works, but the %22 is not decoded, is passed as is. However, if I am to do this:

WITH <http://my-example-graph>
INSERT { ?identifier <http://example.com/predicates#hasName> "Ilias" }
WHERE { SELECT ?identifier WHERE {
  BIND ( IRI("http://example.com/person#\"Ηλίας\"") as ?identifier )
}}

I am able to insert the quotes in the database as a normal IRI. Do you have @k00ni any experience with this? Does @zozlak has any experience on how this should be handled?

@zozlak
Copy link

zozlak commented Oct 16, 2023

Just as a note - I'm a little busy this week and will read this issue carefully only on Friday.

@idimopoulos
Copy link
Author

Sure, take your time, I will deal with the current status in our project first.
Mainly, I was questioning if we should proceed with the following:

diff --git a/lib/Serialiser/Ntriples.php b/lib/Serialiser/Ntriples.php
index 4e7a8ab..4db84d8 100644
--- a/lib/Serialiser/Ntriples.php
+++ b/lib/Serialiser/Ntriples.php
@@ -209,7 +209,7 @@ class Ntriples extends Serialiser
      */
     protected function serialiseResource($res)
     {
-        $escaped = $this->escapeString($res);
+        $escaped = NtriplesUtil::escapeIri($res);
         if ('_:' == substr($res, 0, 2)) {
             return $escaped;
         } else {

@k00ni k00ni added the question Further information is requested label Oct 16, 2023
@zozlak
Copy link

zozlak commented Oct 19, 2023

I was questioning if we should proceed with the following:

Makes a lot of sense to me. The more I look at how it's done now in the easyrdf, the more I think, it's broken :-)

When it comes to your Virtuso issue, I'm afraid it's complex. On the normative side the SPARQL specification says:

RDF URI references containing "<", ">", '"' (double quote), space, "{", "}", "|", "", "^", and "`" are not IRIs. The behavior of a SPARQL query against RDF statements composed of such RDF URI references is not defined.

so basically it can be vendor-specific and you just need to do it in a way your vendor (here Virtuoso) implemented it.
Which brings us to an important remark - the serialization of IRIs and literals in SPARQL has its own rules which are not exactly the same as for n-triples nor turtle (hurray! we all love the RDF ecosystem, don't we? reading RDF specifications is admittedly depressing).

@k00ni
Copy link
Member

k00ni commented Feb 13, 2024

Any news on your site?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants