Skip to content

Latest commit

 

History

History
213 lines (166 loc) · 10 KB

ci-cd-pipeline.md

File metadata and controls

213 lines (166 loc) · 10 KB

CI/CD pipeline

This document introduces the CI/CD pipeline that powers Bank of Anthos' production instance (hosted here: https://cymbal-bank.fsi.cymbal.dev/) as well as how you can get started deploying it in your own Google Cloud project (with your own domain name).

Note: This is a more advanced view of Bank of Anthos and is not required to deploy Bank of Anthos or any of its deployment options. Instead, this is meant to showcase how one can deploy a full end-to-end CI/CD environment with Cloud Build and Cloud Deploy, with the help of Terraform, Skaffold, and Kustomize. The end result is multiple environments in a multi-stage pipeline (development, staging, production) which features Google Kubernetes Engine (GKE), Anthos Config Management (ACM), Anthos Service Mesh (ASM), and Cloud SQL.

What does this solution contain?

The CI/CD pipeline set-up includes:

  • Terraform scripts for all Google Cloud resources
  • 3 GKE Autopilot clusters in a fleet
  • 1 Cloud Build trigger for GitHub PRs
  • 6 Cloud Build triggers for staging (1 per service)
  • 2 Cloud SQL databases (1 for staging, 1 for production)
  • 2-stage Cloud Deploy pipelines (staging and production)
  • Anthos Config Management set-up for staging and production
  • Anthos Service Mesh set-up for staging and production
  • Artifact Registry repository for container images
  • Cloud Storage bucket for Terraform state
  • Cloud Storage bucket for ledger monolith artifacts
  • IAM bindings and service accounts

This results in:

  • CI per service with Skaffold profile per environment
  • CD per service with Skaffold profile per environment
  • Development environment:
    • GKE Autopilot (one namespace per deployment)
    • ACM for base setup
    • In-cluster databases
    • Deployed from Cloud Build
  • Staging environment:
    • GKE Autopilot
    • Anthos Config Management for base setup
    • Anthos Service Mesh (namespace: bank-of-anthos-staging)
    • Cloud SQL database
    • Deployed from Cloud Deploy
  • Production environment:
    • GKE Autopilot
    • ACM for base setup
    • Anthos Service Mesh (namespace: bank-of-anthos-production)
    • Cloud SQL database
    • Deployed from Cloud Deploy
  • Use of kustomize components & skaffold profiles to keep it DRY
  • Minimal service account permissions
  • Cloud Foundation Toolkit for GKE

Set-up instructions

Prerequisites

To deploy the CI/CD pipeline, you need:

  • Google Cloud project, connected to an active billing account
  • A domain name for the production deployment.
  • The gcloud, kubectl, skaffold, terraform command line tools
  1. Clone the GitHub repository.

    git clone https://github.com/GoogleCloudPlatform/bank-of-anthos

    Note: If you are deploying this pipeline in your own Google Cloud project, it is preferable to first fork the repository, to be able to commit variable changes.

  2. Set environment variables.

    export PROJECT_ID="YOUR_PROJECT_ID"
    export REGION="YOUR_REGION"
    export DOMAIN="YOUR_DOMAIN"

Setting-up GitHub repository connection

  1. Set up a repository connection in Cloud Build:
    1. Open Cloud Build in Cloud Console (enable its API if needed).
    2. Navigate to Triggers and set Region to your preferred region.
    3. Click on Manage repositories.
    4. Click on Connect repository and follow the UI. Do not create a trigger.
  2. [Optional] If your Google Cloud organization has the compute.vmExternalIpAccess constraint in place, you can reset it on a project level:
    gcloud org-policies reset constraints/compute.vmExternalIpAccess --project=$PROJECT_ID`

Updating placeholder variables

These steps are necessary for all Google Cloud projects that are not bank-of-anthos-ci. If you are deploying the CI/CD pipeline for the official Bank of Anthos deployment, skip this section.

  1. Replace all occurrences of bank-of-anthos-ci in the Terraform scripts with your Google Cloud project ID.
    # run from repository root
    find iac/acm-multienv-cicd-anthos-autopilot/* -type f -exec sed -i 's/bank-of-anthos-ci/'"$PROJECT_ID"'/g' {} +
    find iac/tf-multienv-cicd-anthos-autopilot/* -type f -exec sed -i 's/bank-of-anthos-ci/'"$PROJECT_ID"'/g' {} +
  2. Replace all occurrences of us-central1 in the Terraform scripts with your preferred region.
    # run from repository root
    find iac/acm-multienv-cicd-anthos-autopilot/* -type f -exec sed -i 's/us-central1/'"$REGION"'/g' {} +
    find iac/tf-multienv-cicd-anthos-autopilot/* -type f -exec sed -i 's/us-central1/'"$REGION"'/g' {} +
  3. Replace all occurrences of bank-of-anthos-tf-state in the Terraform scripts with your bucket.
    # run from repository root
    find iac/tf-multienv-cicd-anthos-autopilot/* -type f -exec sed -i 's/bank-of-anthos-tf-state/'"$PROJECT_ID-boa-tf-state"'/g' {} +
  4. Replace all occurrences of cymbal-bank.fsi.cymbal.dev in the Terraform scripts with your domain name.
    # run from repository root
    find src/frontend/* -type f -exec sed -i 's/cymbal-bank.fsi.cymbal.dev/'"$DOMAIN"'/g' {} +
  5. Commit and push your changes to your repository.
    git add .
    git commit -m "Substitute project ID, region, and domain references in ACM config"
    git push

Provisioning the infrastructure

  1. Create a Cloud Storage bucket in your project to hold your Terraform state.
    gsutil mb gs://${PROJECT_ID}-boa-tf-state
    gsutil versioning set on gs://${PROJECT_ID}-boa-tf-state
  2. Verify the Terraform variables in iac/tf-multienv-cicd-anthos-autopilot/terraform.tfvars. In particular, project_id and region are set to the same values you used earlier.
  3. Provision the infrastructure with Terraform.
    # run from iac/tf-multienv-cicd-anthos-autopilot
    terraform init && \
    terraform apply
  4. Verify the Terraform output and approve it.
  5. Wait for Anthos Service Mesh to be provisioned on all clusters. You can check the status with gcloud container fleet mesh describe and wait for all entries to be in state: ACTIVE. This may take several minutes to complete.
  6. Wait for Anthos Config Management to have synced the clusters. You can check the status here. This may take several minutes.

Initializing CloudSQL databases with sample data

  1. Initialize the staging CloudSQL database with data.

    gcloud container fleet memberships get-credentials staging-membership
    
    skaffold config set default-repo $REGION-docker.pkg.dev/$PROJECT_ID/bank-of-anthos
    skaffold run --profile=init-db-staging --module=accounts-db
    skaffold run --profile=init-db-staging --module=ledger-db
    
    kubectl wait --for=condition=complete job/populate-accounts-db job/populate-ledger-db -n bank-of-anthos-staging --timeout=300s
  2. Initialize the production CloudSQL database with data.

    gcloud container fleet memberships get-credentials production-membership
    
    skaffold config set default-repo $REGION-docker.pkg.dev/$PROJECT_ID/bank-of-anthos
    skaffold run --profile=init-db-production --module=accounts-db
    skaffold run --profile=init-db-production --module=ledger-db
    
    kubectl wait --for=condition=complete job/populate-accounts-db job/populate-ledger-db -n bank-of-anthos-production --timeout=300s

Deploying the application (manually)

Before we run the CI/CD pipelines, we should manually deploy the application once on the staging, and on the production clusters. This step is not necessary, but it will prevent end-to-end test failures when the CI triggers run for the first time.

  1. Deploy Bank of Anthos on the staging environment.

    gcloud container fleet memberships get-credentials staging-membership
    skaffold run -p staging --skip-tests=true
  2. Deploy Bank of Anthos on the production environment.

    gcloud container fleet memberships get-credentials production-membership
    skaffold run -p production --skip-tests=true

Staging the application (through Cloud Build)

  1. Run the Cloud Build CI triggers once for each service.
    gcloud beta builds triggers run frontend-ci --region $REGION
    gcloud beta builds triggers run accounts-contacts-ci --region $REGION
    gcloud beta builds triggers run accounts-userservice-ci --region $REGION
    gcloud beta builds triggers run ledger-balancereader-ci --region $REGION
    gcloud beta builds triggers run ledger-ledgerwriter-ci --region $REGION
    gcloud beta builds triggers run ledger-transactionhistory-ci --region $REGION

Setting-up the DNS and certificate

  1. In your domain registrar, add or modify an A route pointing to the production cluster.

    You can find the IP address in Cloud Load Balancing. Find the production ingress LB, and copy the IP that is listed. Alternatively:

    kubectl get ingress frontend-ingress --namespace bank-of-anthos-production -o jsonpath='{.status.loadBalancer.ingress[0].ip}'
  2. Wait for the certificate to be provisioned. This will show up in the Classic Certificates section of the Certificate Manager. Once provisioned, its state will change to Active.

  3. Once the DNS and certificates are fully propagated, you should be able to access Bank of Anthos through https://$DOMAIN. Note that the propagation may take several minutes.

Troubleshooting

  • If terraform apply fails due to a timeout or race conditions from API-enablement, you can try simply running terraform apply again.
  • Sometimes the database seeding jobs' pods get stuck due to a failed sidecar container. This can be easily fixed by deleting the pods stuck with 2/3 containers.
  • For production deployment, ensure that the DNS for your $DOMAIN has been set up to point to the IP of the production ingress.