-
Notifications
You must be signed in to change notification settings - Fork 4
Breed4Food
Arnold Kuzniar edited this page Mar 30, 2017
·
145 revisions
Objective: Generate a prioritized list of candidate genes from a QTL region based on phenotype information.
Output: Semantic integration of genomic and phenotypic data to enable ranking of candidate genes associated with the number of teats in pig (Sus scrofa).
Fig 1. Biological entities and data flow in pig QTLdb-LD.
Platform: Virtuoso Universal Server (OSE). The installation and deployment instructions can be found here.
Data sources:
-
Non-RDF sources
-
RDF sources
- Ensembl: genome annotations for pig and human (release 86)
- UniProt: (reference) pig proteome (release 2016_11)
- Bio2RDF OMIM: human genes and genetic disorders (release 4)
-
Database cross-references
Features:
- pig genes in QTLs linked via orthologs to human genetic diseases in OMIM
- (protein-coding) gene-QTL associations based on overlapping sequence regions for all traits (as defined by ontologies used)
- web-based Faceted Browser on Linked Data sets
- (Google-like)Text Search (e.g.
vertebrae
,kinase
) - Entity Label Lookup including QTL ID, trait, breed name, chromosome (location), gene ID/symbol, protein accession or PubMed ID
-
Entity URI Lookup (e.g.
http://identifiers.org/pigQTLdb/66299
)
- (Google-like)Text Search (e.g.
- programmatic data access via SPARQL endpoint including some example queries & output
- Docker-ized Virtuoso server to easy on premise deployment
- automated data ingest & reconciliation procedures, which can aid in future updates of the platform when new releases of data sources become available
Current issues & limitations
- see this list of open (or closed) issues
- non-coding gene-QTL associations are not computed
- making pig QTLdb data (in GFF) available in RDF requires manual effort aided by OpenRefine+custom script in Python
- (non-)RDF data quality & curation (e.g. some Ensembl links to other resources)
- some notes on preliminary data processing & analysis
- data licensing & re-use by private partners (e.g. OMIM is NOT distributed as open access)
Possible extensions:
- add QTL-related statistics to the RDF graph (e.g. using STATO, MeSH)
- couple the Linked Data platform with an algorithm(s) to score/rank (candidate) genes associated with the trait of interest
- web interface including data visualization tailored to domain scientists
Partner experience: (tbd B4F)
Platform EKP: (tbd Anneke)
ODEX4all