Project Initiation & Data Selection
Data Selection: Begin by exploring potential datasets from a variety of sources such as CSV files (available online or in platforms like Kaggle), web scraping or APIs. You may also revisit the datasets you explored in week 3. In case you need help finding a dataset, at the end of this brief, we’ve included a list of famous datasets that you could work with. =======
Data Selection: Begin by exploring potential datasets from a variety of sources such as CSV files (available online or in platforms like Kaggle), web scraping or APIs. You may also revisit the datasets you explored in week 3.
Business Framing:
Challenges Overview
We provide comprehensive football analysis on matches, players, leagues, teams and shots for the benefit of football enthusiasts, bookies and football pundits.
- Team Performance:
- There is a correlation between the number of shots on target and goals scored.
- Home teams have a higher win rate compared to away teams.
- Teams with more yellow/red cards tend to have lower winning rates.
- Player Performance:
- Forwards score more goals compared to midfielders and defenders.
- Players with higher passing accuracy contribute to more assists.
- The number of tackles made by a player is inversely proportional to the number of goals conceded by the team.
- Goalkeepers with a higher number of saves per game have lower goals-against averages.
- Match Statistics:
- Matches with higher total possession percentages have fewer goals.
- The number of corners taken by a team correlates with the number of goals scored.
- Teams that concede the first goal have a lower probability of winning the match.
- Matches with more fouls have a higher likelihood of yellow and red cards being issued.
- Seasonal Trends:
- Teams perform better in the second half of the season compared to the first half.
- There is a significant difference in team performance before and after major tournaments (e.g., World Cup, UEFA Champions League).
- Newly promoted teams perform worse compared to established teams in the league.
Solution
- Data Collection:
- Use SQL to query customer data, including purchase history, demographics, browsing behavior, and interaction data.
- Integrate public datasets from platforms like Kaggle to enrich the analysis and test hypotheses.
- Files:
Data Cleaning and Preparation:
- Clean the data using Python (e.g., pandas) to handle missing values, outliers, and inconsistencies.
- Normalize and preprocess the data for analysis.
- Customer Segmentation:
- Use clustering algorithms (e.g., K-means, hierarchical clustering) in Python to identify distinct customer segments based on behavior and demographic features.
- Personalization Strategies:
- Develop personalized marketing strategies for each segment, such as targeted email campaigns, personalized product recommendations, and customized offers.
Dataset Links:
https://www.kaggle.com/datasets/technika148/football-database?select=teamstats.csv
https://www.kaggle.com/datasets/hugomathien/soccer
ER Diagram
Project Planning: Notion AI, Trello.
Extract: retrieve data from various sources. Examine the data and try to understand what the fields mean before building the database.