[TOC]
The course-inventory application is designed to gather current-term Canvas LMS data about courses, enrollments, users, and course activity -- as well as data about the usage of other technologies, including Zoom and MiVideo -- in order to inform leadership at the University of Michigan about the usage of tools for teaching and learning. Currently, the application collects data from various APIs and data services managed by Unizin Consortium. It then then stores the data in an external MySQL database. Tableau dashboards and other processes then consume that data to generate reports and visualizations.
The sections below provide instructions for configuring, installing, using, and changing the application. Depending on the environment you plan to run the application in, you may also need to install some or all of the following:
While performing any of the actions described below, use a terminal, text editor, or file utility as necessary. Some sample command-line instructions are provided for some steps.
To configure the application before installation and usage (see the next section), you must first perform a few steps.
This includes the creation of a configuration file called env.hjson
using the HJSON format --
a more lenient and customizable variant of JSON. Complete the following items in order.
-
Clone and navigate into the repository.
git clone https://github.com/tl-its-umich-edu/course-inventory.git # HTTPS git clone [email protected]:tl-its-umich-edu/course-inventory.git # SSH cd course-inventory
-
Set up a MySQL database.
If you plan to run the application using
virtualenv
, you will need to have MySQL installed on your machine. You will also need to create a test database and user.If you use Docker, instead you will use the database credentials specified in the
docker-compose.yaml
. This is in theenvironment
block (ignoringMYSQL_ROOT_PASSWORD
) for themysql
service.Whether you use
virtualenv
or Docker, provide the database credentials within theINVENTORY_DB
object. This is described more in step 4. -
Copy the template configuration file,
env_blank.hjson
from theconfig
directory, re-name itenv.hjson
, and place it inside thesecrets
subdirectory.mv config/env_blank.hjson config/secrets/env.hjson
-
Change the default values inside
env.hjson
(empty strings,0
s, and provided values) with the desired values, ensuring they are the same data type. The table below describes the meaning and expected values of each key-value pair. If the value of the outermost key is an object, the description may refer instead to the nested key column. The application will also validate the configuration file you create using JSON Schema, so look for error messages when first running the application.Key Nested Key Description LOG_LEVEL
The minimum level for log messages that will appear in output. INFO
orDEBUG
is recommended for most use cases; see Python's logging module.JOB_NAMES
The names of one or more jobs (not case sensitive) that have been implemented and defined in run_jobs.py
(see the Implementing a New Job section below).CREATE_CSVS
A Boolean value ( true
orfalse
) indicating whether CSVs should be generated by the execution.MAX_REQ_ATTEMPTS
The number of times a specific request will be attempted. NUM_ASYNC_WORKERS
Number of workers for asynchronous API calls; the default is 8. CANVAS
CANVAS_ACCOUNT_ID
The Canvas instance root account ID number associated with the courses for which data will be collected. CANVAS
CANVAS_TERM_IDS
The Canvas instance term ID numbers that will be used to limit queries for Canvas courses. CANVAS
ADD_COURSE_IDS
Additional Canvas course IDs to retrieve when using online_meetings/canvas_zoom_meetings.py
. Duplicate courses found also usingCANVAS_TERM_IDS
will be removed.CANVAS
API_BASE_URL
The base URL for making requests using the U-M API Directory; the default value should be correct. CANVAS
API_SCOPE_PREFIX
The scope prefix that will be added after the API_BASE_URL
; this is usually an acronym for the university location and the API Directory subscription name in CamelCase, separated by/
.CANVAS
API_SUBSCRIPTION_NAME
The name of the API Directory subscription all in lowercase. CANVAS
API_CLIENT_ID
The client ID for authenticating to the API Directory. CANVAS
API_CLIENT_SECRET
The client secret for authenticating to the API Directory. CANVAS
CANVAS_URL
The Canvas instance URL to be used as the base URL for API requests that use the CANVAS TOKEN
.CANVAS
CANVAS_TOKEN
The Canvas token used for authenticating to the API when not using the U-M API Directory. MIVIDEO
udp_service_account_json_filename
The name of the JSON credential file for accessing UDP's Google BigQuery service account. It should be the umich-its-tl-reports-prod.json
credential file for UMich ITS TL. This file name is appended to the value ofENV_DIR
(which is/config/secrets
, by default) to determine the full path to the file.
If this key's value is set toumich-its-tl-reports-prod.json
andENV_DIR
has its default value, the full path to the file will be/config/secrets/umich-its-tl-reports-prod.json
.MIVIDEO
default_last_timestamp
The MiVideo procedures use the last timestamp found in its tables in this application's DB to query for data newer than that time. If that timestamp isn't found (e.g., the first time the application runs) the value of this property will be used. This must be a valid ISO 8601 timestamp in the UTC time zone. The recommended value is 2020-03-01T00:00:00+00:00
.MIVIDEO
kaltura_partner_id
This is an integer that represents the Kaltura account number. UMich ITS TL users can find this value in the usual security files folder. MIVIDEO
kaltura_user_secret
This is a string that represents an administrator's key for the Kaltura account. UMich ITS TL users can find this value in the usual security files folder. MIVIDEO
kaltura_categories_full_name_in
Filter for the Kaltura API to return media that have at least one category that begins with the string value of this key. The default value is " Canvas_UMich
".UDW
An object containing the necessary credential information for connecting to the Unizin Data Warehouse, where data will be pulled from. INVENTORY_DB
An object containing the necessary credential information for connecting to a MySQL database, where output data will be inserted.
This project provides a docker-compose.yaml
file to help simplify the development and testing process.
Invoking docker-compose
will set up MySQL and a database in a container.
It will then create a separate container for the job, which will ultimately insert records into the MySQL container's database.
Before beginning, perform the following additional steps to configure the project for Docker.
-
Create two paths in your home directory (i.e.,
~
or${HOME}
):secrets/course-inventory
anddata/course-inventory
.The
docker-compose.yaml
file specifies two volumes that are mapped to these directories. The first,secrets/course-inventory
, is mapped toconfig/secrets
. The application expects to find theenv.hjson
file in this location. The second,data/course-inventory
, is mapped to the project'sdata
directory. This will allow later access to CSV files optionally generated by the application. -
Move the
env.hjson
file tosecrets/course-inventory
so it will be mapped into thejob
container.mv config/secrets/env.hjson ~/secrets/course-inventory
Once these steps are completed, you can use the standard docker-compose
commands to build and run the application.
-
Build the images for the
mysql
andjob
services.docker-compose build
-
Start up the services.
docker-compose up
docker-compose-up
will first start the MySQL container and then the job container.
When the job finishes, the job container will stop, but the MySQL container will continue running.
This allows you to enter the container and execute queries.
docker exec -it course_inventory_mysql /bin/bash
mysql --user=ci_user --password=ci_pw
Use ^C
to stop the running MySQL container,
or -- if you used the detached flag -d
with docker-compose up
-- use docker-compose down
.
Data in the MySQL database will persist after the container is stopped.
The MySQL data is stored in a volume mapped to the .data/
directory in the project.
To completely reset the database, delete the .data
directory.
-
Build images for all services…
docker-compose build
-
(Optional) Run the DB service,
mysql
, in the background…Note that if this optional step is skipped, docker-compose will automatically run the DB service in the background when the main application service is started. That's because the application depends on the DB, so docker-compose will conveniently run it based on the dependencies described in
docker-compose.yaml
.docker-compose up -d mysql
The
-d
option (short for--detach
), detaches the process from the terminal, and will "Run containers in the background, print new container names."-
If you need to see the console output of the
mysql
service while it runs in the background, use thelogs
command and the service name…docker-compose logs mysql
-
-
-
Run the main application service,
job
, in the foreground…docker-compose up job
That will show the output from
job
, then return you to the shell prompt. -
Do some development of
job
's code. (Go ahead, we'll wait.) -
When ready to run
job
again, use the same command as before…docker-compose up job
As before, that will show the output from
job
, then return you to the shell prompt.This will work as long as
docker-compose.yaml
is configured to mount the project source code directory as/app
in the container.-
If the container is not running with the project source code mounted as
/app
, then most code changes will require you to specify that the service needs to be rebuilt…docker-compose up --build job
-
-
Repeat the previous two steps (3 and 4) as necessary.
-
To start up the job with VSCode Debug use this command and attach with VSCode.
docker-compose -f docker-compose.yaml -f ./.vscode/docker-compose-ptvsd.yaml up job
You can also set up the application using virtualenv
by doing the following:
-
Create a virtual environment using
virtualenv
.virtualenv venv source venv/bin/activate # for Mac OS
-
Install the dependencies specified in
requirements.txt
.pip install -r requirements.txt
-
Initialize the database using
create_db.py
.python create_db.py
-
Run the application.
python run_jobs.py
Deploying the application as a job using OpenShift and Jenkins involves several steps, which are beyond the scope of this README. However, a few details about how the job is configured are provided below.
-
The
env.hjson
file described in the Configuration section above needs to be made available to running course-inventory containers via an OpenShift ConfigMap, a type of Resource. A volume containing the ConfigMap should be mapped to theconfig/secrets
subdirectory. These details will be specified in a YAML configuration file defining the pod. -
By default, the application will run with the assumption that the HJSON configuration file will be named
env.hjson
. However,environ.py
will also check for the environment variablesENV_DIR
andENV_FILE
. These variables can be set using the OpenShift pod configuration file. To use a different name for the JSON file, setENV_FILE
to the desired file name. The default value isenv.hjson
. To use a different directory containing the HJSON file, setENV_DIR
to the desired directory path. The default value is/config/secrets
.- To ensure that the
yoyo-migrations
dependency can run successfully in a containerized environment, the environment variableUSER
should be defined. - For the value of
USER
, use the name of the project running the job. Theyoyo-migrations
library will obtain this value by using thegetpass.getuser
function from the Python standard library.
With the above variables set, the
env
block in the YAML file will look something like this:- env: - name: ENV_DIR value: /config/test_secrets - name: ENV_FILE value: env_test.json - name: USER value: project_name
- To ensure that the
The application was designed with the goal of being extensible -- in order to aid collaboration,
integrate new data sources, and satisfy new requirements.
This is primarily made possible by enabling the creation of new jobs,
which are managed by the run_jobs.py
file (the starting point for Docker).
When executed, the file will attempt to run all jobs provided in the value for the JOB_NAMES
variable in env.hjson
.
Only jobs previously defined in the codebase will be actually executed.
Follow the steps below to implement a new job that can be executed from run_jobs.py
.
All the changes described below (minus the configuration changes) should be included in the pull request.
-
Place files used only by the new job within a separate, appropriately named package (e.g.
course_inventory
oronline_meetings
). -
Make use of variables from the
env.hjson
configuration file by importing theENV
variable fromenviron.py
. -
Ensure you have one function or method defined that will kick off all other steps in the job. It should return a list of
DataSourceStatus
objects, each containing the name of a data source used during the job, and a timestamp of when that data was updated (or collected).These objects are used to create new records in the
data_source_status
table of the application database. Objects are instantiated in the following way:DataSourceStatus(ValidDataSourceName.VALID_DATA_SOURCE_NAME_MEMBER)
In place of
VALID_DATA_SOURCE_NAME_MEMBER
, use a member of theValidDataSourceName
enumeration defined invocab.py
. The resulting object will include a timestamp for the current time at which the object was instantiated. That is sufficient if the data source doesn't provide a timestamp for the data.If the data source does provide a timestamp for the data, use that. It can be passed into the instantiation as the second, optional argument:
DataSourceStatus(ValidDataSourceName.VALID_DATA_SOURCE_NAME_MEMBER, some_timestamp)
The value for
some_timestamp
must be adatetime
object (or equivalent; e.g.,pd.Timestamp
) with the time zone set to UTC. -
Add a new entry to the
ValidJobName
enumeration withinvocab.py
. The name (on the left) should be in all capitals. The value (on the right) should be a period-delimited path string, where the first element is the package name, the second is the module or file name, and the third is the name of the job's entry method or function. Seevocab.py
for examples. -
If you are introducing a new data source, you also need to add an entry to the
ValidDataSourceName
enumeration. The name should be all capitals; the value has no meaning for the application, soauto()
is sufficient. -
Add the job name to the
JOB_NAMES
environment variable.
Currently, the database is version-controlled and managed using the yoyo-migrations
Python library.
The migration files are located in the db/migrations
directory.
To make changes to the database schema, perform the follow steps in order.
-
Add a new migration file to the
migrations
directory calledXXXX.add_something.py
.XXXX
is the next migration number (preceded by0
s until the number is four digits)add_something
is an action describing the change made. -
Within the file, import the
step
function fromyoyo
. For each desired schema change, pass a SQL string tostep
. Multiple step invocations can be enclosed in a list and assigned to asteps
variable. Place eachstep
in the order it should be applied. Migrations can also specify dependencies on previous migrations using the format__depends__ = {"000X.migration_name_without_file_ending"}
.
Refer to the existing migrations if examples are needed.
Relevant Canvas API Documentation
- Courses in account: https://canvas.instructure.com/doc/api/accounts.html#method.accounts.courses_api
- Course object: https://canvas.instructure.com/doc/api/courses.html#Course
- GraphQL: https://canvas.instructure.com/doc/api/file.graphql.html
Other Technology in Use
- HJSON: https://hjson.github.io/
hjson
Python package: https://pypi.org/project/hjson/- JSON Schema: https://json-schema.org/understanding-json-schema/