Skip to content

Palemravichandra/customer-segmentation

Repository files navigation

This is my customer segmentation project

  • Reading the file using pandas
  • checking the columns data types using describe function
  • checking for null values
  • checking for duplicate records and drop the duplicated values

Univariate analysis

  • perform the univariate analysis i.e. how the each feature is distributed
  • if any value treaten as null value replace it with mode of perticular feature

Outlier detection

  • Detect the outliers in categorical columns using box plot
  • impute te IQR values
  • replace the outliers with IQR values

Bivariate analysis

  • Bivariate analysis is knowing how feature is distrubuted with respect totarget variable

Encoding

  • Transforming categorical features into numerical values

Feature selection

  • checking for the features for correlated or not by setting threshold value

splitting the data

  • splitting the data into train data and test data

Balencing the data

  • Here the data is imbalenced balencing the train data using oversampling technique called SMOTTEN

standerdization

  • Transforming train data from range of numerical values into between 0 to

Model building

  • Building the model and train with different parameters and find the best parameters using gridserchCV
  • check the performance parameters and check the model accuracy using AURROC metric
  • find the best model with high accuracy and high AOUROC value