End-to-End Azure Data Engineering Project

Basic Architecture

The basic architecture of the project is illustrated below:

Requirements:

In most data projects, Raw data is captured from different sources. For this project we will be using a set data to be captured from an external source. The data can either be downloaded or scraped directly from Kaggle (Tokyo Olympic Data set).
All provisioned services must be housed within an Azure VNet, for security purposes.
Raw data is ingested via an HTTP pipeline within Azure Data Factory which is then stored into a Azure Data Lake Gen2.
Create connectivity between Vnet and internal Azure Services via Azure App Registrations (Azure Entra).
Most of the Data processing is done through Azure Databricks and stored on the Data Lake Gen2, in a Raw-Transformed-Prod style of layering.
After production-level data processing is done, Exploratory Analysis is done through Azure Synapse Analytics.
Visualization is then done through the use of PowerBI.

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
azure		azure
data		data
dev_etl		dev_etl
prod_etl		prod_etl
stag_etl		stag_etl
.gitignore		.gitignore
EXECUTION.md		EXECUTION.md
README.md		README.md
olympic_data_architecture.jpg		olympic_data_architecture.jpg
requirements.txt		requirements.txt
updated_olympic_data_architecture.png		updated_olympic_data_architecture.png