Explore the cutting-edge presentations by students from the Boston Institute of Analytics as they delve into the realm of employee churn prediction. Gain valuable insights into the methodologies, techniques, and predictive models used to anticipate and mitigate employee turnover. Dive into real-world case studies and discover how businesses can leverage predictive analytics to retain top talent and foster organizational success. Enroll in our comprehensive course to master the skills needed for predictive analytics in the workplace. visit https://bostoninstituteofanalytics.org/data-science-and-artificial-intelligence/ for more data science insights
3. 1. INTRODUCTION
2. PROBLEM IDENTIFICATION
3. ATTRIBUTE /FEATURE DESCRIPTION
4. EXPLORATORY DATA ANALYSIS
5. MODEL BUILDING
6. BUILDING CLASSIFICATION MODEL
7. RESULT AND CONCLUSION
PROJECT CONTENTS
4. INTRODUCTION
• In the dynamic landscape of the banking industry,
retaining customers is paramount for sustained success.
• Customer churn, or the loss of customers, poses
challenges that this project aims to address through
data-driven insights and proactive strategies.
• This presentation outlines our approach to identifying,
predicting, and mitigating customer churn for the
benefit of our bank and its valued customers
5. PROBLEM IDENTIFICATION
• Inadequate Customer Insights
• Data Quality Issues
• Dynamic Market Conditions
• Resource Allocation
• Limited Personalization
• Customer Communication Gaps
6. ATTRIBUTE/FEATURE DESCRIPTION
CustomerID: ID given to the Customer
Surname: Customers LastName
Geography: The place where the customers belongs.
Gender: Customers gender
Age: Customers Age
Tenure: Time duration of customers
Balance: The Amount remaining in the Account
7. EXPLORATORY DATA ANALYSIS
• IMPORT DATA:
df=pd.read_csv('/content/drive/MyDrive/Classroom/BIA/ML/Churn_Modelling.csv’)
• FIND MISSING VALUES:
No Missing Values
• FINDING FEATURES WITH ONE VALUE:
No features with one value
8. • CHECKING IF THE DATA IS BALANCED OR NOT ON TARGET
• The Data is highly Imbalanced.
12. • DROP UNWANTED COLUMNS:
data=data.drop(['CustomerId','Surname','Exited','RowNumber'],axis=1)
we have dropped these columns because it does not have huge impact on
model building. And dropped Exited column because it is Target variable.
• STANDARDIZATION:
Standardization is a preprocessing method used to transform numerical
data by scaling it to have a mean of zero and a standard deviation of one.
This transformation is applied to all features ensuring that they have the
same scale, thus preventing features with larger magnitudes from
dominating the learning algorithm.
• LABEL ENCODER:
As we have Analyzed in EDA we have Total 3 categorical features. Including
the Target column. So before Model building we will convert those into
numerical features,With the help of label encode.
13. MODEL BUILDING
• DATA IS HIGHLY IMBALANCED SO WE HAVE USED OVER SAMPLING:
• SPLITTING DATASET:
Split our dataset into 80% - 20% ratio
where x= Independent variable
y= Dependent variable
14. BUILDING CLASSIFICATION MODEL
• WE HAVE USED 3 ALGORITHM TO FIND BEST
ACCURACY:
• DECISION TREE
• XGBOOST CLASSIFIER
• RANDOM FOREST CLASSIFIER
15. • DECISION TREE:
Decision Tree is a Supervised learning technique that can be used for both
classification and Regression problems, but mostly it is preferred for solving
Classification problems..
16. • RANDOMFOREST CLASSIFIER:
Random Forest is a popular machine learning algorithm that belongs to the
supervised learning technique. It can be used for both Classification and
Regression problems in ML. It is based on the concept of ensemble learning
17. • XGBOOST CLASSIFIER:
XGBoost is an optimized distributed gradient boosting library designed
for efficient and scalable training of machine learning models. It is an
ensemble learning method that combines the predictions of multiple weak
models to produce a stronger prediction.