- Reading the file using pandas
- checking the columns data types using describe function
- checking for null values
- checking for duplicate records and drop the duplicated values
- perform the univariate analysis i.e. how the each feature is distributed
- if any value treaten as null value replace it with mode of perticular feature
- Detect the outliers in categorical columns using box plot
- impute te IQR values
- replace the outliers with IQR values
- Bivariate analysis is knowing how feature is distrubuted with respect totarget variable
- Transforming categorical features into numerical values
- checking for the features for correlated or not by setting threshold value
- splitting the data into train data and test data
- Here the data is imbalenced balencing the train data using oversampling technique called SMOTTEN
- Transforming train data from range of numerical values into between 0 to
- Building the model and train with different parameters and find the best parameters using gridserchCV
- check the performance parameters and check the model accuracy using AURROC metric
- find the best model with high accuracy and high AOUROC value