Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solid-default link traversal takes exponential time to execute SPARQL #60

Closed
phochste opened this issue Apr 22, 2022 · 8 comments
Closed
Labels

Comments

@phochste
Copy link

phochste commented Apr 22, 2022

Issue type:

  • 🐛 Bug

Description:

The solid-default configuration of the Comunica Link Traversal client takes exponential time to execute queries on a LDP container with many resources.

In my example Pod I have LDN inboxes that contain hundreds to thousands of JSON-LD resources. Every JSON-LD resource has the same structure: an object key which contains a subject, predicate and object key. E.g.

  ...
  "object": {
    "id": "e179deef-c575-45ba-8fcf-2b2fa6809311",
    "type": "Relationship",
    "relationship": "http://www.scholix.org/References",
    "subject": "https://doi.org/10.3390/en10111697",
    "object": "https://data.mendeley.com/datasets/mcgc3636xr"
  },
 ...

I would like to have a list of all such keys over all resources in an LDP container. The query I use is:

PREFIX as: <https://www.w3.org/ns/activitystreams#>

SELECT 
 DISTINCT ?subject ?pred ?object
WHERE {
  ?id a as:Announce ;
  		  as:object ?x .
  ?x as:relationship ?pred ;
          as:subject ?subject ;
          as:object  ?object .
}

I have 4 example LDP containers:

Executing the SPARQL on resource 209 takes 17.2 seconds and has 209 results.

Executing the SPARQL on resource 402 is after 1500 seconds still running (170 results so far). The first results appeared after 250 seconds.


Environment:

Comunica version: 2.1.0

On Chrome 100.0.4896.127

Crash log:

@github-actions
Copy link

Thanks for reporting!

@phochste
Copy link
Author

On the 402 resource I see up to 90s network activity in the console (805 requests). After 90s no new network activity.
No errors in the console. In previous experiment I saw setTimeout error popping up.

@phochste phochste changed the title Solid-default linkt traversal takes exponential time to execute SPARQL Solid-default link traversal takes exponential time to execute SPARQL Apr 22, 2022
@rubensworks
Copy link
Member

I'm not surprised about this :-)
I expect the execution time to increase, the more triple patterns occur within your query.

This is (most likely) due to the zero-knowledge query planner that we're using for link traversal, which produces non-optimal query plans. (it's the only thing that exists for traversal atm, so it's the best we got)
Ideally, we'd need an adaptive query planner that re-orders join entries based on whatever comes in as intermediary results.
Related to #45 and #48.

@rubensworks
Copy link
Member

@phochste Could you make those containers public (so I can test), or test yourself again to see if the issue is any better (I don't expect it to be fully resolved yet, but probably better).

@phochste
Copy link
Author

@rubensworks the demo repositories above have been made world readable again

@rubensworks
Copy link
Member

I don't seem to be getting any results anymore. Perhaps the underlying data changed?

@phochste
Copy link
Author

phochste commented May 27, 2023

@rubensworks No the underlying data did not change. But I see there is an error in the JSON-LD data. I had

  "actor" : {
      "type" : "OPENAIRE",
      "id" : "https://scholexplorer.openaire.eu/#about",
      "name" : "OPENAIRE Scholexplorer"
   }

This is $.actor.type is not valid and for some reason somewhere in the pipeline between Solid CSS and Comunica it generates an illegal triple.

I'm changing the demo documents right now to create a valid $.actor.type.

@rubensworks
Copy link
Member

I'm going to close this issue due to non-reproducibility. Happy to re-open if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

2 participants