Skip to content

ebi-gdp/hattivatti

Repository files navigation

hattivatti

hattivatti is a job submission service for the INTERVENE platform. For each requested job instance it configures compute resources to:

  • transfer and decrypt on the fly data to a secure area (i.e. a bucket)
  • calculate polygenic scores using the PGS Catalog Calculator
  • upload results to a different secure area (i.e. a different bucket)
  • monitor and notify the backend about job status

hattivatti should be deployed on a Kubernetes cluster, and a Helm chart is available to simplify deployment.

Configuration and deployment

A python package called pyvatti runs the service. It requires some environment variables to be set. See pyvatti/src/pyvatti/config.py for a description.

Set these variables in the helm values file chart/values-example.yaml.

And run:

$ helm install hattivatti . -n <namespace> -f <path_to_values.yaml>

Deploying jobs with Google Cloud Platform

sequenceDiagram
    Backend->>pyvatti: Consume Kafka job request
    create participant Cloud Storage
    pyvatti->>Cloud Storage: Make buckets
    create participant Nextflow Job
    pyvatti->>Nextflow Job: helm install nextflow Job (GKE)
    Nextflow Job->>Cloud Batch: Launch workers
    Nextflow Job->>Seqera:Monitoring
    loop Every minute
    pyvatti->>Seqera:Poll job status
    Note over Seqera,pyvatti: Until error or success
    end
    Seqera->>pyvatti:Job status: success
    pyvatti->>Backend:Produce Kafka notify msg
    destroy Nextflow Job
    pyvatti->>Nextflow Job:helm uninstall
    destroy Cloud Storage
    pyvatti->>Cloud Storage:Clean up
Loading

The basic idea is that the python application:

  • builds a helm values file from a kafka message
  • creates compute resources, including:
    • buckets for doing work
    • installing a job to the local Kubernetes cluster (helmvatti) using the helm CLI
  • monitors and manages the resources for the lifetime of the job
  • sends kafka messages when a workflow status changes