-
Notifications
You must be signed in to change notification settings - Fork 0
Home
A micro-service for handling summary statistics submissions as part of the GWAS submission/deposition service. The endpoints are not intended to be exposed publicly. Instead, they are intended to be called by the deposition backend when a sumstats submission is made.
It handles the uploaded summary statistics files, validates them, reports errors to the deposition backend and puts valid files in the queue for sumstats file harmonisation and publishing on the FTP.
It is a Flask app exposing the endpoints listed below for registering sumstats, getting sumstats validation statuses and updating sumstats. Operations with the sumstats service are done at the submission level rather than the study level (a submission can contain many studies). Celery worker(s) perform the validation tasks in the background. They can work from anywhere the app is installed and can see the RabbitMQ queue.
Specific documentation for GWAS Catalog: https://www.ebi.ac.uk/seqdb/confluence/display/GOCI/Summary+Statistics+Service
- Requires: RabbitMQ and Python 3.6
- Clone the repository
git clone https://github.com/EBISPOT/gwas-sumstats-service.git
cd gwas-sumstats-service
- Set up environment
virtualenv --python=python3.6 .env
source activate .env/bin/activate
- Install
pip install .
pip install -r requirements.txt
- Run this, to setup up a RabbitMQ server, run the tests, and tear it all down.
tox
- Spin up a RabbitMQ server on the port (
BROKER_PORT
) specified in the config e.g.rabbitmq-server
- Start the flask app with gunicorn http://localhost:8000
- from
gwas-sumstats-service
: gunicorn -b 0.0.0.0:8000 sumstats_service.app:app --log-level=debug
- from
- Start a celery worker for the database side
- from
gwas-sumstats-service
: celery -A sumstats_service.app.celery worker --loglevel=debug --queues=postval
- from
- Start a celery worker for the validation side
- from
gwas-sumstats-service
: celery -A sumstats_service.app.celery worker --loglevel=debug --queues=preval
- from
- Spin up the Flask and RabbitMQ and Celery docker containers
- clone repo as above
docker-compose build
docker-compose up
- Start up a celery worker on the machine validating and storing the files
- follow the local installation as above
- set
BROKER_HOST
to that of RabbitMQ host e.g.localhost
in config.py celery -A sumstats_service.app.celery worker --queues=preval --loglevel=debug
- First, deploy rabbitmq using helm
helm install --name rabbitmq --namespace rabbitmq --set rabbitmq.username=<user>,service.type=NodePort,service.nodePort=<port> stable/rabbitmq
- create kubernetes secrets for the ssh keys and Globus
kubectl --kubeconfig=<path to config> -n <namespace> create secret generic ssh-keys --from-file=id_rsa=<path/to/id_rsa> --from-file=id_rsa.pub=/path/to/id_rsa.pub> --from-file=known_hosts=/path/to/known_hosts
kubectl --kubeconfig=<path to config> -n gwas create secret generic globus --from-file=refresh-tokens.json=<path/to/refresh-tokens.json>
- deploy the sumstats service
helm install --name gwas-sumstats k8chart/ --wait
- Start a celery worker from docker
docker run -it -d --name sumstats -v /path/to/data/:$INSTALL_PATH/sumstats_service/data -e CELERY_USER=<user> -e CELERY_PASSWORD=<pwd> -e QUEUE_HOST=<host ip> -e QUEUE_PORT=<port> gwas-sumstats-service:latest /bin/bash
docker exec sumstats celery -A sumstats_service.app.celery worker --loglevel=debug --queues=preval
Register a submission of summary stats. This triggers the summary stats validation. A callback ID is returned, which is used to retrieve the sumstats validation status from the /v1/sum-stats/<callbackID
(GET) endpoint.
POST a payload with the sumstats metadata.
Payload object:
{
"skipValidation": <Boolean>, # Skip validation entirely, do not look for files or publish any.
"minrows": <Int>, # Minimum number of rows for the sumstats files to be deemed valid
"forceValid": <Boolean>, # Force the files to be valid
"zeroPvalue": <Boolean>, # Allow p-values of zero
"requestEntries": [{
"id": <study ID>,
"filePath": <sumstats file path>,
"md5": <md5 checksum of sumsats file>,
"assembly": <genome assembly>,
"readme": <author readme>, # optional
"entryUUID": <Globus endpoint UUID>
}]
}
Example POST method:
# request
curl -i -H "Content-Type: application/json" -X POST -d '{"requestEntries":[{"id":"abc123","filePath":"formatted_test.tsv","md5":"16e89d9993cad683c3857d754595cb28","assembly":"GRCh38", "readme":"optional text", "entryUUID": "curator_sumstats"},{"id":"bcd234","filePath":"formatted_test.tsv","md5":"16e89d9993cad683c3857d","assembly":"GRCh38", "entryUUID": "curator_sumstats"}]}' http://localhost:8000/v1/sum-stats
# response
HTTP/1.0 201 CREATED
Content-Type: application/json
Content-Length: 26
Server: Werkzeug/0.15.4 Python/3.6.5
Date: Wed, 17 Jul 2019 15:15:23 GMT
{"callbackID": "TiQS2yxV"}
Request the validation status for a submission of summary stats referring to the callback ID from the POST.
Response object:
{
"callbackID": <callbackID>,
"completed": <submission validation status>, # boolean
"statusList": [ # list of studies/sumstats validation statuses within submission
{
"id": <study ID>,
"status": <validation status>, # options: "VALID"|"INVALID"|"RETRIEVING"
"error": <validation error message>, # error message string OR null
"gcst": <GWAS study accession> # optional
},
{
"id": <study ID>,
"status": <validation status>,
"error": <validation error message>,
"gcst": <GWAS study accession>
}
]
}
Example GET method (using callback id from above):
# request
curl http://localhost:8000/v1/sum-stats/TiQS2yxV
# response
{
"callbackID": "TiQS2yxV",
"completed": false,
"statusList": [
{
"id": "abc123",
"status": "VALID",
"error": null
},
{
"id": "bcd234",
"status": "INVALID",
"error": "md5sum did not match the one provided"
}
]
}
Update the already registered sumstats submission with GCST accessions. This request will trigger two actions:
- Assign GCSTs to the studies in the submission
- Stage the summary stats in the submission for publication on the FTP and queue for harmonisation
Payload object:
{
"pmid": <pubmed ID>, # optional
"authorName": <author name>, # optional: (FullNameStandard)
"studyList": [{
"id": <study ID>,
"gcst": <GCST accession ID>
},
{
"id": <study ID>,
"gcst": <GCST accession ID>
}
]
}
Example PUT request:
# request
curl -i -H "Content-Type: application/json" -X PUT -d ' {"studyList": [{"id": "xyz321","gcst": "GCST123456"},{"id": "abc123","gcst":"GCST234567"}]}' http://localhost:8000/v1/sum-stats/TiQS2yxV