A data pipeline to extract Spotify data from a playlist that is created by students.
Output is a Google Data Studio report, providing insight into the track features and preferences.
It provided a good opportunity to develop skills and experience in a range of tools. As such, project is more complex than required, utilising dbt, airflow, docker and cloud based storage, and usage of localstack for testing.
- Extract data using Spotify API
- Simulate AWS S3 locally for testing with localstack
- Load into AWS S3
- Copy into Snowflake
- Transform using dbt
- Create Google Looker Studio Dashboard
- Orchestrate with Airflow in Docker
- Final output from Google Looker Studio. Link here. Note that Dashboard is reading from a static CSV output from Snowflake.
NOTE: This was developed using Windows 10. If you're on Mac or Linux, you may need to amend certain components if issues are encountered.
git clone https://github.com/salimt/Spotify-API-Pipeline.git
cd Spotify-API-Pipeline