Skip to content

Breed4Food

Arnold Kuzniar edited this page Mar 30, 2017 · 145 revisions

Objective: Generate a prioritized list of candidate genes from a QTL region based on phenotype information.

Output: Semantic integration of genomic and phenotypic data to enable ranking of candidate genes associated with the number of teats in pig (Sus scrofa).

workflow
Fig 1. Biological entities and data flow in pig QTLdb-LD.

Platform: Virtuoso Universal Server (OSE). The installation and deployment instructions can be found here.

Data sources:

Features:

  • pig genes in QTLs linked via orthologs to human genetic diseases in OMIM
  • (protein-coding) gene-QTL associations based on overlapping sequence regions for all traits (as defined by ontologies used)
  • web-based Faceted Browser on Linked Data sets
    • (Google-like)Text Search (e.g. vertebrae, kinase)
    • Entity Label Lookup including QTL ID, trait, breed name, chromosome (location), gene ID/symbol, protein accession or PubMed ID
    • Entity URI Lookup (e.g. http://identifiers.org/pigQTLdb/66299)
  • programmatic data access via SPARQL endpoint including some example queries & output
  • Docker-ized Virtuoso server to easy on premise deployment
  • automated data ingest & reconciliation procedures, which can aid in future updates of the platform when new releases of data sources become available

Current issues & limitations

  • see this list of open (or closed) issues
  • non-coding gene-QTL associations are not computed
  • making pig QTLdb data (in GFF) available in RDF requires manual effort aided by OpenRefine+custom script in Python
  • (non-)RDF data quality & curation (e.g. some Ensembl links to other resources)
  • some notes on preliminary data processing & analysis
  • data licensing & re-use by private partners (e.g. OMIM is NOT distributed as open access)

Possible extensions:

  • add QTL-related statistics to the RDF graph (e.g. using STATO, MeSH)
  • couple the Linked Data platform with an algorithm(s) to score/rank (candidate) genes associated with the trait of interest
  • web interface including data visualization tailored to domain scientists

Partner experience: (tbd B4F)

Platform EKP: (tbd Anneke)