Query Google Analytics data and send records to a backdrop bucket.
We recommend you use virtualenv and install dependencies with pip.
# Create a virtualenv
$ virtualenv ~/.virtualenvs/backdrop-ga-collector
# Activate the virtualenv
$ source ~/.virtualenvs/backdrop-ga-collector/bin/activate
# Install dependencies
$ pip install -r requirements.txt
Backdrop GA collector expects a file called config/credentials.json
containing information about the google analytics credentials. This can be in one of two forms based on client secrets or a service account. If you don't know which type you want it's probably best to start with the client secrets method.
- Go to the Google API Console and create a new client ID for an installed application.
- Once created click the Download JSON link. This is your client secrets file.
Run python tools/generate-credentials.py /path/to/client_secret.json
.
This will generate the required credentials files in the ./config
directory.
A config is made up of four parts; the data type, the query, mappings and the target.
{
"dataType": "govuk_weekly_reach",
"query": {
"id": "ga:12345678",
"metrics": [
"visits",
"visitors"
],
"dimensions": [
"customVarValue1",
"customVarValue2",
"eventCategory"
],
"filters": [
"eventCategory=~^categoryPrefix:"
]
},
"mappings": {
"customVarValue1": "slug",
"customVarValue2": "department",
"eventCategory_0": "categoryId",
"eventCategory_1": "categoryLabel"
},
"idDimension": "eventCategory",
"target": {
"url": "http://write.backdrop.dev.gov.uk/foo",
"token": "foo-bearer-token"
}
}
Added to all backdrop records to distinguish them.
A Google Analytics query.
Allows fields in the google analytics response to be mapped to other fields in the backdrop record.
If a multi value field mapping is specified, the field will be split by the delimiter :
and mapped accordingly. Example: given the field key
has the value foo:bar
, the mapping key_0
will have the value of foo
and key_1
will have the value of bar
.
Allows you to configure one of the dimensions fields as the ID for the row, rather than the auto generated unique ID.
The backdrop URL and bearer token that the records will be sent to.
Install dependencies and run the collect.py
script passing in the path to the query config file, credentials file, a start date and an end date.
NB. You should probably be doing this in a virtualenv.
python collect.py --credentials=path/to/credentials.json --query=path/to/config.json --start=2012-12-12 --end=2013-01-12