This notebook aims at demonstrating how to leverage workflow provenance (information on data processing chains) with a knowledge graph to produce human and machine -oriented data summaries.** We propose to leverage domain-specific annotation (EDAM ontology) from the bioinformatics tools registry Bio.Tools to automatically annotate workflow processed data in the form of data summaries.
All the process can be reproduced through the online platform.
- Alban Gaignard: [email protected]
- Hala Skaf-Molli: [email protected]
- Khalid Belhajjame: [email protected]
Alban Gaignard, Hala Skaf-Molli and Khalid Belhajjame Findable and Reusable Workflow DataProducts: A Genomic Workflow Case Study. Accepted at Semantic Web Journal 2020. http://www.semantic-web-journal.net/content/findable-and-reusable-workflow-dataproducts-genomic-workflow-case-study
Here are the main steps of this demonstration :
- Knowledge graph loading (With assume that a provenance is already available)
- Machine-oriented provenance mining queries
- Human-oriented provenance mining queries
Here is an example of the generated human-oriented data summaries.
...
The file Samples/Sample1/BAM/Sample1.realign.bai results from
tool gatk2_indel_realigner-IP which Locally align two or more molecular
sequences.
It was produced in the context of Rare Coding Variants in ANGPTL6 Are
Associated with Familial Forms of Intracranial Aneurysm
...