Deployment

Requirements:

BigQuery tables : transactions, errors, dedupe_state, transaction_types
PubSub topic for transactions
GCS bucket : Used for dataflow templates, staging and as temp location
ETL Pipeline from PubSub to BigQuery:
1. PubSub subscription
2. Service account with following roles: BigQuery Data Editor, Dataflow Worker, Pub/Sub Subscriber, and Storage Admin
Deduplication Task
1. Service account with following roles: BigQuery Data Editor, BigQuery Job User, Monitoring Metric Writer
Mirror Importer
1. Service account with following roles: PubSub Publisher
(Optional) ETL Pipeline from PubSub to GCS
1. GCS Bucket: For output of pipeline
2. Service account with following roles: Dataflow Worker, Pub/Sub Editor (for creating subscription), and Storage Admin

Resource creation can be automated using setup-gcp-resources.sh. Google Cloud SDK is required to run the script.

Steps

Deploy ETL pipeline

Use deploy-etl-pipeline.sh script to deploy the etl pipeline to GCP Dataflow.

Deploy Deduplication task

TODO

Deploy Hedera Mirror Node Importer to publish transactions to the pubsub topic. See Mirror Nodes installation and configuration for more details.