This is an Google Cloud AppEngine python 3 script which automates the process of making getting Braze data to Google BigQuery.
Only standard attributes and data types(String, Number) has been tested
The following permission and Google Services are required:
- Google BigQuery with master table already created
- Google API & Services enabled
- Google Cloud Storage
- Google App Engine
- Braze API Key with users.export.segment permissions
- Access to Amazon AWS S3 with write and delete permissions (Required only if S3 Exports is setup)
- Google Cloud Task - Optional, use if segment exports is expected to be big or for performance reasons.
Note: Ensure BigQuery and Cloud Storage are running from the same geo-location to avoid issues Important: Files are uncompressed and process in memory, so this process is design to update incremental exports. Check your process to ensure there's no memory issues due to the size of the segment export. For large export, S3 and enabling Cloud Task is recommended.
The following is an outline of the process:
- An API call to the Braze User by Segment endpoint using a predefined segment with a callback to the app.
- Waits for Braze to trigger the callback.
- If S3 for exports is enabled, reads from S3, then moves files to a processed directory.
- Otherwise, pulls info from a zip S3 url.
- Converts files to a csv, and uploads to Google Cloud Storage.
- Create temporary BigQuery table off the csv file.
- Merge the temporary table with master table.
To deploy to your Google Cloud Project, clone this repo locally or via Google Cloud Shell.
git clone [email protected]:Appboy/braze-growth-shares-braze-to-bigquery.git
Create an app.yaml
, see app_example.yaml and deploy to your project using gcloud cli.
gcloud app deploy
To set the environment variables for the script to run with, make an app.yaml
(see app_example.yaml) file in the root directly and updated the following:
env_variables:
gcsproject: [Google Project Name]
bigquery_dataset: [BigQuery DataSet]
bigquery_table: [BigQuery Destination Table - this should already exist]
bigquery_temptable_duration: [BigQuery TempTable Expiration (Seconds)]
brazerestendpoint: [Braze API REST Endpoint ie https://rest.braze.com/]
brazeapikey: [Braze API Key with User Segment Export Permissions]
brazesegmentid: [Braze Segment ID]
brazesegmentendpoint: [Braze API Endpoint ie /users/export/segment]
brazesegmentfields: [Braze export fields ie external_id,random_bucket,first_name]
brazesegmenttype: [Braze export field type ie STRING,INTEGER,STRING]
gcsprimarykey: [BigQuery primary key external_id]
gcsplatformprefix: [Google Appengine URL (Optional) ie .appspot.com]
gcspath: [Google Cloud Store path ie brazeexport]
gcsmaxlines: [Batch Record rows per table. Adjust to avoid memory limits - ie 100000]
s3enabled: [Boolean if AWS S3 is used]
s3accessid: [AWS Access ID]
s3secretkey: [AWS Secret Key]
s3bucketname: [AWS Bucket Name]
s3path: [AWS Bucket Prefix, optional]
s3processedprefix: [AWS Bucket Prefix for processed files]
gcslocation: [Google App Engine location used for Cloud Tasks]
gcsusetask: [Boolean if Cloud Tasks should be used]
gcstaskqueue: [Cloud Tasks Queue Name]
- Set
gcsmaxlines
to an appropriate limit. gcsusetask
for large exports, Google Cloud Tasks is recommended. See below- adding
custom_attributes
tofields_to_export
will export ALLcustom_attributes
. Please be aware of the potential file sized and records that may be exported. Reference
Example:
env_variables:
gcsproject: BrazeBigQuery
bigquery_dataset: bgdataset
bigquery_table: mastertable
bigquery_temptable_duration: 86400
brazerestendpoint: https://rest.iad-01.braze.com/
brazeapikey: api-key-with-user-export-segment-permission
brazesegmentid: segmentidfromsegmentcreation
brazesegmentendpoint: /users/export/segment
brazesegmentfields: external_id,random_bucket,first_name
brazesegmenttype: STRING,INTEGER,STRING
gcsprimarykey: external_id
gcsplatformprefix: .appspot.com
gcspath: brazeexport
gcsmaxlines: 100000
s3enabled: true
s3accessid: aws-access-id
s3secretkey: aws-secret-key
s3bucketname: bucket-name
s3path: brazeexports
s3processedprefix: processed
gcsusetask: true
gcslocation: northamerica-northeast1
gcstaskqueue: braze
A cron job can be tasked to trigger the Braze API Export using the internal /schedule
endpoint. See cron.yaml for example.
cron:
- description: "schedule hourly processing of Braze exports"
url: /schedule
schedule: every 1 hours
retry_parameters:
job_retry_limit: 2
min_backoff_seconds: 2.5
max_backoff_seconds: 10
max_doublings: 3
To deploy run:
gcloud app deploy cron.yaml
Any updates to the cron.yaml
will require the cron.yaml
file to be redeployed.
For large segment exports, it's recommended to use Google Cloud Tasks to queue the job processing in the background.
- Enable Google Cloud Tasks API
- Created an Tasks queue via gcloud:
gcloud tasks queues create [TasksQueueName] --max-concurrent-dispatches 10 --max-attempts 1
- Enable
gcsusetask
- Set
gcstaskqueue
to[TasksQueueName]
- set
gcslocation
Google App Engine Location