Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPARQL doc page geneontology.org/docs/sparql #269

Closed
lpalbou opened this issue Jan 11, 2021 · 9 comments
Closed

SPARQL doc page geneontology.org/docs/sparql #269

lpalbou opened this issue Jan 11, 2021 · 9 comments
Labels
wontfix This will not be worked on

Comments

@lpalbou
Copy link
Contributor

lpalbou commented Jan 11, 2021

From #267 (comment) from @pgaudet :

Comment 1: I think the examples are wrong here ?

Example 1 : sequence specific DNA binding should be part_of DNA binding transcription factor activity, shouldn't it ?

Example 2: the DNA binding transcription factor should he linked to the downstram activities by 'positively regulates'

Right @vanaukenk @thomaspd ?
Thanks, Pascale

From #267 (comment) from @lpalbou :

Thanks for the feedback.

sequence specific DNA binding should be part_of DNA binding transcription factor activity, shouldn't it ?

I don't think that's right. First, I am not sure we allow an activity (sequence specific DNA binding) to be part of another activity (DNA binding transcription factor activity): go-cam-shapes. The usual pattern is more activity part of BP. Second, the specific binding to DNA indeed triggers the transcription factor activity; without it, there is no transcription, so I do see this as a causal relationship and part_of is not a causal. As a note, my thesis with Dino was on nuclear receptors such as RXR, RAR, VDR, GCR & co..

Are we set on using Hsap and Cele etc to describe species ? Again this is a non-standard, non-intuitive representation.

I would prefer the standard uniprot convention too, I think we discussed it once. But it's a larger issue independent of this page as this comes from noctua graph and our triplestore (and probably affects quite a lot of other resources). Maybe create a separate project or at least a ticket on minerva repo ?

In the Table describing the relations, you could simplify by removing the 'Description' link and making the relations themselves clickable ?

I don't have a strong opinion on this, if you think that's more readable, I can change it. My intent was to make the description explicitly visible/accessible. Just note that not all IRIs resolve to a web page (here those do), they are just identifiers.

'part_of' is a BFO term but is also present in RO - maybe 'occurs' in can also be added to RO, and in this case we may be able to claim we are using a single ontology ?

Currently, all the part_of in GO-CAMs refers to BFO, not RO, so this documentation has to reflect that so that users can create valid queries. In the ontology world, I don't know if that's better to state that we are using a single ontology ? part_of should probably never be in 2 ontologies in the first place, unless we mean something different.

@pgaudet
Copy link
Collaborator

pgaudet commented Jan 12, 2021

Tagging @vanaukenk and @ukemi who have volunteered to provide new examples.

@pgaudet
Copy link
Collaborator

pgaudet commented Jan 12, 2021

About species names:
Should we have a discussion about this ? @lpalbou Are you not developing a new viewer targeted at users - maybe this should be done for the new viewer, and we can keep doing what we are doing for the curation tool ?

It would be nice to know what we want to do before opening a ticket.

Thanks, Pascale

@tmushayahama
Copy link

btw, just for info, another way to generate interactive SPARQL examples is using the search api on landing page by click of a button, Ben has provided what sparql query was used for any search, so this might be helpful to get more dynamic examples @vanaukenk @ukemi

image

For example selecting production models with "species: homo sapien created on 2021-01-20 results is search api query on production server http://barista.berkeleybop.org/search/models?offset=0&limit=50&exactdate=2021-01-20&taxon=NCBITaxon:9606&debug gives back sparql query and search results

tagging @lpalbou @cmungall @balhoff

@lpalbou
Copy link
Contributor Author

lpalbou commented Feb 5, 2021

I linked the SPARQL documentation page from the Tools & Guide page and is therefore now accessible from the GO site (note it was already accessible through the SPARQL endpoint URL provided in the GO NAR article): #280

Discussion about species short names have to be handled at the GO project level as the SPARQL endpoint and the various UIs only display the information they receive from Minerva. Quick fixes on the UI side are possible if needed but not recommended as they would easily introduce discrepancies/inconsistencies between the various GO pages & solutions. If things are to change for species, please create the appropriate project and tickets.

@vanaukenk please ping me if you wish to change the examples in the SPARQL doc page, however this documentation is aimed at developers to understand the underlying data model (RDF, OWL), associated file system (TTL, triple store) and SPARQL language & endpoint. In essence, to teach how to create queries, independently of what the current GO-CAM curation best practices are, which will certainly continue to evolve over time. Since rewriting this in-depth documentation do take time, I would recommend to leave the examples as they are as they do serve their purposes: teaching how to query GO-CAMs. If you agree, I will close this ticket.

@pgaudet
Copy link
Collaborator

pgaudet commented Feb 5, 2021

Hi @lpalbou
Where is it accessible from ? I cannot find the link. I would expect it to be in 'tools' http://geneontology.org/docs/tools-overview/

We really need to change the example, if you still have transcription. I updated the template I had from a couple of year ago,
http://noctua.berkeleybop.org/editor/graph/gomodel:59bee34700000179?model_id=gomodel:59bee34700000179

This is consistent with the papers we are publishing with the GREEKC consortium. Please use this model.

Thanks, Pascale

@cmungall
Copy link
Member

I don't understand what the action should be for this ticket or who needs to be involved. Consider closing it and either (a) making multiple smaller actionable ticket or (b) make a superticket (see the GO github guide)

It seems this is mostly a ticket about content on a page somewhere? Consider making the first comment in an issue be a broad description of the problem.

Species name: I agree with @lpalbou, let us not overload this ticket. The 4 letter codes may or may not be a good idea. But if this is to be fixed, it should be fixed globally. In this case, the 4 letter codes are inserted as part of the neo build.

RO vs BFO: all our relations are in RO. Some have a BFO prefix, but they are in RO. See RO Docs. Yes, this is objectively very confusing for many people, not just GO users. But let's not try and solve that problem here.

@lpalbou
Copy link
Contributor Author

lpalbou commented Feb 25, 2021

The SPARQL endpoint is referenced from the API section, the http://sparql.geneontology.org, and the GO search.

@cmungall the proposal is to rewrite half of the technical SPARQL documentation with a GO-CAM that would better reflect newer curation practices (e.g. geneontology/go-shapes#256). For the moment, the documentation uses a GO-CAM in production to illustrate how the data is linked from TTL to Triple Store, visualization and SPARQL.

The model suggested above is not in production and not valid according to shex, so this ticket is pending for an appropriate model:
Screen Shot 2021-02-25 at 2 57 26 PM

In addition, rather than rewriting half of a very detailed technical documentation that still serve its teaching purpose to the bioinformatic community, I would favor instead creating and maintaining interactive notebooks and make a better GO API.

@lpalbou
Copy link
Contributor Author

lpalbou commented Feb 26, 2021

I looked a bit more at the suggested model: http://noctua.geneontology.org/editor/graph/gomodel:59bee34700000179

Couple of issues unfolding:

@pgaudet
Copy link
Collaborator

pgaudet commented Feb 26, 2021

Thanks for looking into that. I dont think I have ever put one of my models in production !

  • We can remove SO:gene for now, this is not critical for the model and it does require some discussion as to the link between that term and CHEBI:bioinformation macromolecule.
  • Otherwise I am a bit unsure of your process to approve production models and templates. Happy to change the status but I want to make sure everyone is happy with it.

Thanks, Pascale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

5 participants