Welcome to my repository of Data Analysis projects! This repository contains a series of notebooks demonstrating different data analysis and machine learning tasks. Each project focuses on a unique dataset and problem statement, showcasing various analytical and predictive techniques.
- Swiggy Restaurants Data Analysis
- GeeksforGeeks Data Analysis
- Cardekho Used Car Price Analysis
- Sonar Mine Prediction
- Big Mart Sales Prediction
- California House Price Prediction
- CarDekho Car Price EDA
- Credit Card Fraud Detection
- Customer Segmentation Using K-Means
- Fake News Prediction
- Gold Price Prediction
- Heart Disease Prediction
- House Prices: Advanced Regression Techniques
- Loan Eligibility Prediction
- Parkinson's Disease Detection
- Spam Mail Prediction
- Used Medical Insurance Prediction
Description: This project involves analyzing restaurant data from the Swiggy food delivery platform. Key aspects include:
- Data Collection: Access data on restaurant names, cuisines, ratings, reviews, delivery times, and locations.
- Data Cleansing and Preparation: Clean and preprocess the data for analysis.
- Restaurant Performance Analysis: Calculate average ratings, review counts, and identify high-performing restaurants.
- Cuisine and Menu Analysis: Analyze cuisine distribution and popular menu items.
Description: This project involves scraping and analyzing video data from the GeeksforGeeks YouTube channel.
- Data Gathering: Use YouTube Data API to fetch video details such as titles, views, upload dates, and lengths.
- Data Processing and Analysis: Calculate total views and lengths, identify popular topics, and analyze correlations.
- Visualization: Use libraries like matplotlib to create visualizations of trends and patterns.
Description: Analyze the used car dataset from Cardekho to uncover insights about factors influencing car prices.
- Data Gathering: The dataset includes features like selling price, vehicle age, KM driven, engine size, fuel type, seller type, and transmission type.
- Data Cleaning and Preprocessing: Handle missing values, remove duplicates, standardize text columns, and remove outliers.
- Exploratory Data Analysis (EDA): Perform univariate, bivariate, and categorical analyses to identify key trends and insights.
- Visualization: Use libraries like matplotlib and seaborn to create distribution plots, scatter plots, and correlation heatmaps.
- Insights and Findings: Analyze the impact of various factors on car prices and provide recommendations based on the analysis.
Description: Build a machine learning model to classify sonar signals as either mines (M) or rocks (R).
- Data Gathering: The dataset includes sonar readings for mines and rocks.
- Data Cleaning and Preprocessing: Verify and handle missing values and outliers.
- Exploratory Data Analysis (EDA): Analyze summary statistics and class distribution.
- Model Building: Create feature matrices, split data, and evaluate models such as Logistic Regression, SVC, Decision Tree, and Random Forest.
- Model Comparison: Compare models based on accuracy and performance metrics.
- Insights and Findings: Determine the best model for sonar signal classification based on accuracy.
Description: Predict sales for Big Mart using historical sales data.
- Data Gathering: Use sales data from Big Mart to create predictive models.
- Data Cleaning and Preprocessing: Handle missing values and preprocess data for modeling.
- Model Building: Build and evaluate regression models to predict sales.
Description: Predict house prices in California using historical data.
- Data Gathering: Use historical housing data from California.
- Data Cleaning and Preprocessing: Clean and preprocess data for analysis.
- Model Building: Develop regression models to predict house prices.
Description: Perform exploratory data analysis on CarDekho's car price dataset.
- Data Gathering: Analyze features such as car price, model, and mileage.
- Exploratory Data Analysis (EDA): Identify key trends and patterns in the dataset.
Description: Build a model to detect fraudulent credit card transactions.
- Data Gathering: Use historical credit card transaction data.
- Model Building: Develop and evaluate classification models to detect fraud.
Description: Segment customers into different groups using K-Means clustering.
- Data Gathering: Use customer data for clustering.
- Model Building: Apply K-Means clustering to segment customers.
Description: Predict whether a news article is fake or real.
- Data Gathering: Use a dataset of news articles.
- Model Building: Develop and evaluate classification models for fake news detection.
Description: Predict gold prices using historical data.
- Data Gathering: Use historical gold price data.
- Model Building: Develop regression models to predict future gold prices.
Description: Predict the likelihood of heart disease based on patient data.
- Data Gathering: Use health data related to heart disease.
- Model Building: Develop classification models to predict heart disease risk.
Description: Use advanced regression techniques to predict house prices.
- Data Gathering: Use historical housing data.
- Model Building: Apply advanced regression techniques to improve predictions.
Description: Predict loan eligibility based on applicant data.
- Data Gathering: Use applicant data to determine loan eligibility.
- Model Building: Develop classification models to predict loan approval.
Description: Build a model to detect Parkinson's disease from patient data.
- Data Gathering: Use health data related to Parkinson's disease.
- Model Building: Develop and evaluate classification models for disease detection.
Description: Predict whether an email is spam or not.
- Data Gathering: Use email data to classify spam and non-spam emails.
- Model Building: Develop classification models to detect spam emails.
Description: Predict the likelihood of medical insurance usage based on patient data.
- Data Gathering: Use patient data to predict insurance usage.
- Model Building: Develop classification models to predict medical insurance needs.
This project is licensed under the MIT License.