-
Notifications
You must be signed in to change notification settings - Fork 39
Dev: Cron Job
TODO: Document setting up the cron job (or link to another guide)
TODO: Fill in the settings relevant to cron
- CRON_BQ_IN_LIMIT - Limit for the number of courses processed in one loop of BigQuery. The default currently is 20 and this has been determined to be too low. You should set this to something much higher like 1000. This will likely be removed in a future version.
Cron has a few operations depending on the data updated:
- Currently for terms: (TODO: Document term update logic)
- Currently for assignments, submissions and users (grades). Every run it deletes everything from the table and repopulates entirely from UDW.
- For Resources (BigQuery) there is update logic to save time and costs. The update logic works like this:
Resources update logic (update_with_bq_access
in cron.py)
Resources update runs based on an "upsert" only inserting the new data. So it tries to determine how far back it needs to go.
- If there are any
course.data_last_updated
columns withnull
value then update based on the earliestdate_start
date of all of this set of courses. If all courses have this set go to next step.- There was a bug identified where if any courses in this set are in the future, nothing is updated, at least one course has to be in the past.
- There was another bug where if it was a really old course it would do a full scan instead of just a semester scan.
- Update based on the earliest
course.data_last_update
. These should currently all be the same and indicated based on the last cron run.
- Check the
course.date_start
field in database, if set use that.- This may be populated manually in MyLA or if this is set on the course settings in Canvas. If not set go to next step.
- Check the term. If term is not null and term has a
date_start
that date. If not go to next step. - Else use today as the start date for the course
After the date range is determined, all records after this date are deleted from resource_accessed
and new records are inserted into both resource
and resource_access
from BigQuery. The queries run and data loaded are based on the various settings for configuring the cron.
After all resource_access
is loaded, a second process update_canvas_resource
will run. This updates the names based on the data in UDW and also will remove any resources that are not currently available in Canvas. It's possible a file could be accessed in the past but is no longer available.
After everything has run, the unizin_metadata
table is updated with information from the UDW. Also all of the course.data_last_updated
dates are updated.
There is currently a bug where even if there is a caught exception, course.data_last_updated
is still marked as being updated.