- BigQuery tables : transactions, errors, dedupe_state, transaction_types
- PubSub topic for transactions
- GCS bucket : Used for dataflow templates, staging and as temp location
- ETL Pipeline from PubSub to BigQuery:
- PubSub subscription
- Service account with following roles: BigQuery Data Editor, Dataflow Worker, Pub/Sub Subscriber, and Storage Admin
- Deduplication Task
- Service account with following roles: BigQuery Data Editor, BigQuery Job User, Monitoring Metric Writer
- Mirror Importer
- Service account with following roles: PubSub Publisher
- (Optional) ETL Pipeline from PubSub to GCS
- GCS Bucket: For output of pipeline
- Service account with following roles: Dataflow Worker, Pub/Sub Editor (for creating subscription), and Storage Admin
Resource creation can be automated using setup-gcp-resources.sh. Google Cloud SDK is required to run the script.
- Deploy ETL pipeline
Use deploy-etl-pipeline.sh script to deploy the etl pipeline to GCP Dataflow.
- Deploy Deduplication task
TODO
- Deploy Hedera Mirror Node Importer to publish transactions to the pubsub topic. See Mirror Nodes installation and configuration for more details.