-
Notifications
You must be signed in to change notification settings - Fork 4
VLPB
Objective: Generate a prioritized list of candidate genes from a QTL region based on phenotype information.
Output: Semantic integration of plant genomic and phenotypic data to enable ranking of candidate genes associated with fruit ripening in tomatoes (Solanum lycopersicum).
Fig 1. Biological entities and data flow in tomato SGN-LD.
Platform: Virtuoso Universal Server (OSE). The installation and deployment instructions can be found here.
Data sources:
-
Non-RDF sources
-
SGN: (wild) tomato genome annotations and genetic markers (detailed description here)
-
QTLs extracted from Europe PMC articles/tables. Note: This work has been part of the candYgene NLeSC project in collaboration with Plant Breeding, Wageningen University & Research.
-
TGRC for tomato mutants (deprecated)
-
-
RDF sources
- Ensembl Plants: tomato genome annotations (release 33)
- UniProt: (reference) tomato proteome (release 2016_11)
-
Database cross-references
Features:
- web-based Faceted Browser on Linked Data sets
- (Google-like)Text Search (e.g.
fruit quality
,Myb 12
,SGN-M6466
) -
Entity Label Lookup including genome, chromosome/location (
chromosome 11
), QTL (QTL:PMC4321030_4_1_54
), trait (fruit ripening
), genetic marker (variation gene231_0-i11
), gene symbol/ID (gene Solyc11g008770.1
), protein accession/ID (K4D5D7
), GO term/ID (GO:0009835
), pathway (carotenoid biosynthesis
) -
Entity URI Lookup (e.g.
http://purl.obolibrary.org/obo/TO_0002728
)
- (Google-like)Text Search (e.g.
- programmatic data access via SPARQL endpoint including some example queries & output
- Docker-ized Virtuoso server to easy on premise deployment
- automated data ingest & reconciliation procedures, which can aid in future updates of the platform when new releases of data sources become available
Current issues & limitations
- see this list of open (or closed) issues
- making tomato SGN data (in GFF) and QTLs from literature (in CSV) available in RDF requires manual effort aided by OpenRefine and a custom script in Python
- (non-)RDF data quality & curation (e.g. some Ensembl links to other resources)
- data licensing & re-use by private partners
Possible extensions:
- couple the Linked Data platform with an algorithm(s) to score/rank (candidate) genes associated with the trait of interest
- web interface including data visualization tailored to domain scientists
Partner experience: (tbd VLPB)
Platform EKP: (tbd Anneke)
ODEX4all