ETL refers to the general process of:
- taking raw source data ("Extract")
- doing some stuff to get the data in shape, possibly involving intermediate derived files ("Transform")
- producing final output in a more usable form (for "Loading" into something that consumes the data - be it an app, a system, a visualization, etc.)
At DataMade, GNU Make forms the cornerstone of our ETL pipeline. We maintain detailed documentation of our Make patterns and expanded Unix utility toolkit here.
Sometimes, we write ETL components, such as processors or data loading commands, in Python. This directory contains documentation for our preferred tooling for those components.