This repository contains code for combining and harmonizing primary and secondary evidnce from different sources (Pubmed, ClinicalTrials.gov, CIViC, GGPONC). It consists of a database, an application server with a REST API, and a Vue.js frontend.
An online demo is available at: https://we.analyzegenomes.com/nge/
The code for the web frontend is maintained separately: https://gitlab.hpi.de/florian.borchert/nge_app
We use poetry
as a build tool..
Therefore, the dependencies can be installed by running
poetry install
To also install the dev dependencies, run
poetry install --with dev
On an M1 Mac, the installation might fail due to a bug in pygraphviz
,
a fix can be found here.
Additionally, pre-commit
is used to run a few checks and fixes before commits.
In order to use them, run
poetry run pre-commit install
The system expects the following environment variables to be set. We used a .env
file placed in the root directory of the repository for this purpose:
PUBMED_API_KEY
for accessing eUtils (only needed for populating the DB)UMLS_API_KEY
for downloading UMLS (needed for populating the DB and for the API)
To download the necessary data and to populate the database, run
`poetry run populate`
You may run to populate individual parts of the database individually, e.g.,:
`poetry run populate ggponc`
Get access to the latest GGPONC release and place its contents in data/ggponc/
(or adapt the part in the config.ini
.
The code to download and process PubMed articles is released separately.
poetry run populate
automatically identifies the latest monthly dump from AACT and downloads it if necessary.
poetry run populate
automatically identifies the latest nightly dump from CIViC and downloads it if necessary.
To start the application server and REST API, please run
poetry run api
An overview of the systems features and its evaluation can be found it the notebooks in the repository's root directory.
To show the documentation, run
poetry shell
pdoc integration
TODO