hattivatti
is a job submission service for the INTERVENE platform. For each requested job instance it configures compute resources to:
- transfer and decrypt on the fly data to a secure area (i.e. a bucket)
- calculate polygenic scores using the PGS Catalog Calculator
- upload results to a different secure area (i.e. a different bucket)
- monitor and notify the backend about job status
hattivatti
should be deployed on a Kubernetes cluster, and a Helm chart is available to simplify deployment.
A python package called pyvatti
runs the service. It requires some environment variables to be set. See pyvatti/src/pyvatti/config.py for a description.
Set these variables in the helm values file chart/values-example.yaml.
And run:
$ helm install hattivatti . -n <namespace> -f <path_to_values.yaml>
sequenceDiagram
Backend->>pyvatti: Consume Kafka job request
create participant Cloud Storage
pyvatti->>Cloud Storage: Make buckets
create participant Nextflow Job
pyvatti->>Nextflow Job: helm install nextflow Job (GKE)
Nextflow Job->>Cloud Batch: Launch workers
Nextflow Job->>Seqera:Monitoring
loop Every minute
pyvatti->>Seqera:Poll job status
Note over Seqera,pyvatti: Until error or success
end
Seqera->>pyvatti:Job status: success
pyvatti->>Backend:Produce Kafka notify msg
destroy Nextflow Job
pyvatti->>Nextflow Job:helm uninstall
destroy Cloud Storage
pyvatti->>Cloud Storage:Clean up
The basic idea is that the python application:
- builds a helm values file from a kafka message
- creates compute resources, including:
- buckets for doing work
- installing a job to the local Kubernetes cluster (
helmvatti
) using the helm CLI
- monitors and manages the resources for the lifetime of the job
- sends kafka messages when a workflow status changes