Business Report - 6
PG Program in Data Science and
Business Analytics
submitted by
Sangram Keshari Patro
BATCH:PGPDSBA.O.AUG24.B
Contents
1 Objective
3
2 Data Description
3
3 Data Overview
3
-
Data dictionary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Importing necessary libraries and the dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Structure and type of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Statistical summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Exploratory Data Analysis
4.1
4.2
Univariate Analysis . . . . . . .
4.1.1 Numerical columns . . .
Bivariate Analysis . . . . . . .
4.2.1 Numerical variables . .
4.2.2 Categorical vs numerical
. . . . . .
. . . . . .
. . . . . .
. . . . . .
variables
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
5 Data preprocessing
6 Clustering Methods
6.1
-
K-means Clustering . . . . . . . . . . . . . . . . . . . . . . .
6.1.1 Checking Elbow Plot . . . . . . . . . . . . . . . . . . .
6.1.2 Check Silhouette Scores . . . . . . . . . . . . . . . . .
6.1.3 Cluster Proling . . . . . . . . . . . . . . . . . . . . .
Hierarchical Clustering . . . . . . . . . . . . . . . . . . . . . .
6.2.1 Hierarchical clustering with dierent linkage methods
6.2.2 Cluster Proling . . . . . . . . . . . . . . . . . . . . .
K-means vs Hierarchical Clustering . . . . . . . . . . . . . . .
PCA for Visualization . . . . . . . . . . . . . . . . . . . . . .
PCA in 3 dimension . . . . . . . . . . . . . . . . . . . . . . .
6.5.1 Hierarchical Clustering on lower-dimensional data . .
7 Actionable Insights and Business Recommendations
3
3
3
4
4
-
18
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
18
-
33
List of Figures-
Table depicting the datatype and Non-Null values in each column. . . . . . . . . . . . . . . . . . . .
3
Statistical summary of the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Histogram and boxplot of 'Avg_Credit_Limit' column . . . . . . . . . . . . . . . . . . . . . . . .
4
Histogram and boxplot of 'Total_Credit_Cards' column . . . . . . . . . . . . . . . . . . . . . . .
5
'Total_visits_bank' column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
'Total_visits_online' column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
'Total_calls_made' column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Barchart of 'Total_calls_made', 'Total_visits_online','Total_visits_bank' and 'Total_Credit_Cards'
column . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
Heatmap of all numerical variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Pairplot of all numerical variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
'Avg_Credit_Limit' vs all columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
'Total_calls_made' vs 'Total_Credit_Cards' vs 'Total_visits_bank' vs 'Total_visits_online' 16
Distortion score Elbow for KMeans Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Silhouette scores for dierent k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Silhouette plots for dierent k . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Cluster Proling of KMeans group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Box plot of dierent columns vs KMeans groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Pairplot of dierent columns vs KMeans groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3D plot of dierent columns vs KMeans groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Among dierent distance and linkage methods, the highest cophenetic correlation is obtained using
Euclidean distance and average linkage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Dendrograms for the dierent linkage methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
Xgboost Classier performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Visualizing data in 2 dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
Pairplot of dierent columns vs Hierarchical groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Pairplot of PCA columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Dendrograms for the dierent linkage methods (Hierarchical Clustering on lower-dimensional data) . 32
3D plot of PCA columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
1
Objective
AllLife Bank wants to focus on its credit card customer base in the next nancial year. They have been advised
by their marketing research team, that the penetration in the market can be improved. Based on this input,
the Marketing team proposes to run personalized campaigns to target new customers as well as upsell to existing
customers. Another insight from the market research was that the customers perceive the support services of the
back poorly. Based on this, the Operations team wants to upgrade the service delivery model, to ensure that
customer queries are resolved faster.
To identify dierent segments in the existing customers, based on their spending patterns as well as past
interaction with the bank, using clustering algorithms, and provide recommendations to the bank on how to better
market to and service these customers.
2
Data Description
The data provided is of various customers of a bank and their nancial attributes like credit limit, the total number
of credit cards the customer has, and dierent channels through which customers have contacted the bank for any
queries (including visiting the bank, online, and through a call center). The detailed data dictionary is given below.
2.1
Data dictionary
Data Dictionary
Attribute
Sl_No
Customer Key
Average Credit Limit
Total Credit Cards
Total Visits Bank
Total Visits Online
Total Calls Made
3
3.1
Description
Primary key of the records.
Unique identication number assigned to each customer.
Average credit limit of each customer across all credit cards.
Total number of credit cards possessed by the customer.
Yearly total number of in-person visits the customer made to the bank.
Yearly total number of online logins or interactions by the customer.
Yearly total number of calls made by the customer to the bank or customer
service.
Data Overview
Importing necessary libraries and the dataset
The dataset is printed. It has 660 rows & 7 columns.
3.2
Structure and type of data
Data is explored further. The dataset is free from duplicate rows and contains no null values.
Figure 1: Table depicting the datatype and Non-Null values in each column.
3
3.3
Statistical summary
Figure 2: Statistical summary of the data
4
Exploratory Data Analysis
4.1
Univariate Analysis
4.1.1 Numerical columns
l 'Avg_Credit_Limit'
Figure 3: Histogram and boxplot of 'Avg_Credit_Limit' column
4
Observations
l Histogram: The distribution of average credit limits is right-skewed, with most customers having a lower
credit limit. The density decreases as the credit limit increases, showing that fewer customers have high credit
limits.
l Box Plot: The median credit limit lies within the interquartile range (IQR), with a signicant number of
outliers at the higher end. This indicates that while most customers have lower credit limits, a few have
exceptionally high limits.
Business Recommendations
l Customized Credit Oerings: Since a few customers have high credit limits, targeted premium credit
products should be designed for high-value customers.
l Risk Management: The presence of high-value outliers suggests the need for careful credit risk assessment
for customers with exceptionally high limits.
l Market Expansion: Since most customers fall in the lower credit limit range, banks should focus on nancial
products that cater to this majority segment.
l 'Total_Credit_Cards'
Figure 4: Histogram and boxplot of 'Total_Credit_Cards' column
Observations
l Histogram: The number of credit cards owned by customers is distributed in distinct clusters, indicating
customer segments with dierent credit needs. The distribution has multiple peaks, suggesting that specic
numbers of cards are more common.
5
l Box Plot: The median number of credit cards falls within a typical range, but there are some customers who
own an exceptionally high number of credit cards.
Business Recommendations
l Tailored Credit Card Oerings: The presence of multiple peaks suggests dierent customer segments.
Banks should create specic marketing strategies targeting each segment.
l Loyalty and Retention Programs: Customers with multiple credit cards may be valuable for retention
eorts through exclusive rewards and benets.
l Credit Utilization Monitoring: Customers with numerous credit cards may pose a higher risk in terms of
debt accumulation, requiring more rened credit monitoring policies.
l 'Total_visits_bank'
Figure 5: 'Total_visits_bank' column
Observations
l Histogram: The number of bank branch visits follows a slightly right-skewed distribution, with most customers making fewer visits, while a smaller proportion visits frequently.
l Box Plot: The median number of visits is low, indicating that a majority of customers prefer fewer in-person
interactions. However, some customers visit signicantly more, suggesting specic needs.
l Outliers: A few customers visit the bank far more than the average, which could indicate special service
needs or a lack of digital adoption.
6
Business Recommendations
l Promote Digital Banking: Since most customers make fewer visits, encourage further digital adoption by
oering incentives for online and mobile banking usage.
l Optimize Branch Services: For high-frequency visitors, analyze their needs and provide personalized
branch services or hybrid support models.
l Reduce Operational Costs: With low in-person engagement, consider streamlining branch operations,
optimizing sta allocation, and reallocating resources to digital customer support.
l Targeted Customer Education: Identify frequent branch visitors who might not be comfortable with
digital banking and oer training sessions to enhance their online banking experience.
l 'Total_visits_online'
Figure 6: 'Total_visits_online' column
Observations
l Histogram: Online visits are right-skewed, with most customers making a few visits while some engage
frequently.
l Box Plot: The median is low, showing limited online interactions, but a few customers visit very often.
l Outliers: Some users have exceptionally high visits, indicating strong digital engagement.
7
Business Recommendations
l Increase Digital Engagement: Encourage low-frequency users to explore online banking with promotions.
l Enhance User Experience: Improve website/app usability for frequent visitors.
l Optimize Digital Support: Provide chatbot assistance for high-trac users.
l 'Total_calls_made'
Figure 7: 'Total_calls_made' column
Observations
l Histogram: The number of calls made by customers follows a slightly right-skewed distribution, with most
customers making a low number of calls. The frequency of high call volumes decreases gradually.
l Box Plot: The median number of calls is within the IQR, showing a balanced distribution, but a few
customers have made a signicantly higher number of calls.
Business Recommendations
l Enhance Self-Service Options: Since most customers make fewer calls, investing in digital and self-service
banking solutions can further reduce dependency on customer support.
l Improve Call Center Eciency: A segment of customers makes frequent calls, indicating potential dissatisfaction or complex queries that need to be addressed with better FAQs and AI chat support.
l Segment-Based Service Models: Oer premium support services for high-frequency callers and encourage
digital interaction for low-frequency callers.
(a) 'Total_calls_made'
(b) 'Total_Credit_Cards'
(c) 'Total_visits_bank'
(d) 'Total_visits_online'
9
Observations
l Total Calls Made: The distribution is right-skewed, with most customers making a limited number of calls.
A small segment makes frequent calls, indicating a need for assistance or unresolved issues.
l Total Credit Cards: Most customers hold a low number of credit cards, while a small portion owns multiple
cards, possibly indicating high credit usage or loyalty to the bank.
l Total Bank Visits: A signicant number of customers visit the bank rarely, but a subset makes frequent
visits, likely for complex transactions or lack of digital adoption.
l Total Online Visits: Online banking is widely used, though visit frequency varies, suggesting dierent levels
of digital engagement among customers.
Business Recommendations
l Enhance Self-Service and Digital Support: Since call volumes are low for most but high for some,
improving AI chatbots and FAQs can reduce reliance on call centers.
l Targeted Credit Card Strategies: Oer personalized promotions to high-credit users while encouraging
others to explore additional banking products.
l Reduce In-Branch Dependency: Educate frequent branch visitors on digital banking options, ensuring
smoother transitions for routine transactions.
l Boost Online Engagement: Incentivize low-frequency digital users with promotions, tutorials, or exclusive
online banking benets.
4.2
Bivariate Analysis
4.2.1 Numerical variables
l Heatmap
Figure 9: Heatmap of all numerical variables
10
Observations
l Credit Limit and Total Credit Cards: A moderate positive correlation (0.62) suggests that customers
with more credit cards tend to have higher credit limits.
l Total Visits Bank vs. Total Visits Online: A strong negative correlation (-0.55) implies that customers
visiting the bank frequently are less likely to engage in online banking.
l Total Calls Made vs. Total Credit Cards: A signicant negative correlation (-0.65) indicates that
customers with more credit cards tend to make fewer calls.
Business Recommendations
l Promote Digital Banking: Since higher bank visits correlate negatively with online visits, targeted incentives for digital banking adoption can help reduce branch congestion.
l Optimize Call Center Services: Customers with fewer credit cards make more calls, indicating they may
need better onboarding or self-help resources.
l Tailored Credit Strategies: As credit limit correlates positively with the number of credit cards, segmenting
customers for personalized credit limit oers can enhance engagement.
l Pairplot
Figure 10: Pairplot of all numerical variables
11
Observations
l Avg. Credit Limit Distribution: Right-skewed, indicating most customers have lower credit limits, while
a few have signicantly higher limits.
l Total Credit Cards: A bimodal distribution suggests distinct groupsthose with fewer cards and those with
multiple.
l Total Visits (Bank vs. Online): Customers who visit banks more tend to have fewer online interactions,
reinforcing an inverse relationship.
l Total Calls Made: Most customers make a limited number of calls, but a small group makes signicantly
more, possibly indicating service issues.
Business Recommendations
l Segment-Based Credit Oers: Identify high-credit customers separately to oer exclusive nancial products.
l Encourage Digital Adoption: Customers with frequent branch visits should be incentivized to shift to
digital services.
l Call Center Enhancements: Address high-call-frequency customers with better self-service resources and
chatbot support.
l Cluster-Specic Engagement: Use distinct behavioral groups to personalize marketing strategies and
enhance customer experience.
4.2.2 Categorical vs numerical variables
l 'Avg_Credit_Limit' vs all columns
12
Figure 11: 'Avg_Credit_Limit' vs all columns
Observations
l Average Credit Limit vs. Total Calls Made: Customers with lower credit limits tend to make more
calls, possibly due to inquiries or service-related issues, whereas those with higher credit limits make fewer
calls, suggesting they require less assistance or have access to premium support channels.
l Average Credit Limit vs. Total Credit Cards: The distribution suggests that customers with higher
credit limits tend to have more credit cards, indicating a correlation between higher nancial trust and multiple
credit lines.
l Average Credit Limit vs. Total Visits to the Bank: Customers with higher credit limits tend to visit
the bank more frequently, possibly due to their engagement in more complex nancial transactions or seeking
premium services.
l Average Credit Limit vs. Total Online Visits: Higher credit limit customers tend to engage more in
online banking, showing a preference for digital nancial management over physical branch visits.
Business Recommendations
l Enhanced Support for Low-Credit Customers: Implement better self-service resources and proactive
customer support to reduce the need for excessive call center interactions.
l Exclusive Oers for High-Credit Customers: Provide tailored benets such as premium cards and
nancial products to encourage customer loyalty.
l Encouraging Digital Banking Adoption: Develop targeted campaigns to educate and incentivize lowcredit customers to shift towards online banking, reducing their dependency on branch visits.
l Optimizing Credit Card Strategies: Identify customers who can benet from additional credit options
and promote appropriate products based on their nancial behavior.
l 'Total_calls_made' vs 'Total_Credit_Cards' vs 'Total_visits_bank' vs 'Total_visits_online'
14
15
Figure 12:
'Total_calls_made' vs 'Total_Credit_Cards' vs 'Total_visits_bank' vs 'Total_visits_online'
Observations
Total Calls Made vs. Total Visits to Bank
Customers who visit the bank more frequently make fewer calls, suggesting that in-person visits help resolve issues.
Total Calls Made vs. Total Credit Cards
The median number of calls decreases as the number of credit cards increases, indicating that multi-card holders
require less support.
Total Calls Made vs. Total Visits Online
Customers with higher online activity make fewer calls, indicating reliance on digital channels for issue resolution.
Total Credit Cards vs. Total Visits to Bank
Customers with more credit cards visit the bank more frequently, suggesting higher service needs.
Total Credit Cards vs. Total Visits Online
Online visits initially decrease with more credit cards but increase for customers with multiple cards, indicating
varied digital adoption.
Total Visits to Bank vs. Total Visits Online
Customers with higher online engagement visit branches less, showing a shift towards digital banking.
Business Recommendations
Total Calls Made vs. Total Visits to Bank
l Improve in-branch service to reduce follow-up calls.
l Oer remote support for customers who visit less but call frequently.
l Optimize resource allocation based on visit and call patterns.
Total Calls Made vs. Total Credit Cards
l Enhance self-service options for independent issue resolution.
l Provide dedicated support for customers with fewer credit cards.
l Re-engage high-credit-card holders with targeted promotions.
Total Calls Made vs. Total Visits Online
l Promote digital banking through tutorials and incentives.
l Enhance self-service tools such as chatbots and FAQs.
l Provide digital onboarding for customers with low online engagement.
Total Credit Cards vs. Total Visits to Bank
l Introduce priority service for high-card customers.
l Cross-sell nancial products during in-branch visits.
l Implement an appointment-based system to manage high-trac customers.
Total Credit Cards vs. Total Visits Online
l Promote online banking to low-card customers.
l Enhance digital services for multi-card users.
l Personalize outreach to mid-level credit card holders.
Total Visits to Bank vs. Total Visits Online
l Strengthen digital banking capabilities.
l Oer exclusive digital promotions to low-visit customers.
l Reduce branch dependency through digital education initiatives.
5
Data preprocessing
The dataset contains no missing or duplicate values. The outliers are signicant for the data so we don't require to
remove them except for the column 'Avg_Credit_Limit'. The records for the same Customer Key appear to be
signicantly dierent from each other. This could be due to an error in Customer Key assignment or the absence
of a current_version_indicator in the dataset. For now, I will treat these as separate customers. After clustering,
I will analyze the groups associated with these sets of records.
The data is scaled with standard scaler function for clustering.
6
Clustering Methods
6.1
K-means Clustering
K-Means is an unsupervised machine learning algorithm used for clustering. It partitions data into K clusters based
on similarity. The algorithm works as follows:
l Choose K cluster centroids randomly.
l Assign each data point to the nearest centroid.
l Update centroids based on the mean of assigned points.
l Repeat until centroids stabilize.
It is ecient for large datasets but sensitive to the choice of K and outliers.
6.1.1 Checking Elbow Plot
18
Figure 13: Distortion score Elbow for KMeans Clustering
l We can clearly observe that the change in slope is observed at k=3,4 and 5 out of which k=3 is best as it
takes less time to t.
6.1.2 Check Silhouette Scores
Silhouette Score
Silhouette Score measures clustering quality by evaluating how well data points t within their assigned clusters.
It is calculated as:
b−a
S=
max(a, b)
where:
l a = average intra-cluster distance.
l b = average nearest-cluster distance.
Values range from -1 to 1. Higher values indicate well-separated clusters, while negative values suggest misclassication.
Figure 14: Silhouette scores for dierent k
l We can observe that silhouette score for k=3 is the highest which indicates well-separated clusters. So, we
will choose 3 as value of k.
19
Figure 15: Silhouette plots for dierent k
6.1.3 Cluster Proling
After dividing into clusters the data is explored further to get insights about the formed clusters.
20
Figure 16: Cluster Proling of KMeans group
Observations
l Cluster 0: Moderate Credit Limit, Mixed Engagement Customers have a moderate credit limit ( 3378)
and credit cards ( 5.5). They visit the bank ( 3.49) but engage less online ( 0.98) and via calls ( 2).
l Cluster 1: High Credit Limit, Digital-Savvy This segment has the highest credit limit ( 10266) and
credit cards ( 8.74). They prefer online banking ( 10.9 visits) and rarely visit the bank ( 0.6) or call ( 1.08).
l Cluster 2: Low Credit Limit, High Call Volume Customers have the lowest credit limit ( 2174) and
credit cards ( 2.4). They make the most calls ( 6.87) but visit the bank ( 0.93) and engage online ( 3.55)
moderately.
l Online vs.
preferences.
Bank Visits Higher online visits reduce physical visits, showing a clear shift in customer
l Call Frequency Cluster 2 requires more support, likely indicating service issues or digital unfamiliarity.
Business Recommendations
l Enhance Online Banking for Digital-Savvy Customers (Cluster 1) Since this group prefers online
banking, the bank should invest in improving the digital experience, oering premium online services, and
ensuring seamless mobile banking.
l Improve Support Services for High-Call Customers (Cluster 2) Customers in this segment require
frequent assistance. The bank should provide AI-driven chatbots, better FAQ sections, and proactive customer
education to reduce the call volume.
l Strengthen In-Person Customer Engagement for Cluster 0 Since these customers prefer visiting the
bank, the bank can introduce appointment-based services, dedicated relationship managers, and personalized
assistance to enhance their experience.
l Targeted Marketing for Upselling Credit Cards Cluster 1 already has a high number of credit cards,
so upselling additional cards may not be eective. Instead, focus on Cluster 0 customers, who hold around
5.5 cards, and could be encouraged to upgrade.
l Channel Optimization Strategy The bank should implement an omnichannel strategy, ensuring smooth
transitions between digital and physical services to cater to all customer segments eciently.
21
l Box plot of dierent columns vs KMeans groups
Figure 17: Box plot of dierent columns vs KMeans groups
Observations
l Cluster 0: Mid-Range Users - Moderate credit limit, credit cards, and balanced engagement across
channels.
l Cluster 1: High Credit, Digital Users - Highest credit limit, most cards, prefer online banking,
rarely visit the bank.
l Cluster 2: Low Credit, High Support Users - Lowest credit limit, few cards, frequent calls,
indicating service issues or low digital use.
l Service Channels - Higher online activity reduces bank visits. Cluster 2 relies more on calls.
Business Recommendations
l Improve Digital Services - Enhance self-service tools for Cluster 1.
l Targeted Credit Oers - Upsell Cluster 0, encourage credit use for Cluster 2.
l Optimize Support - AI chatbots and proactive help for Cluster 2.
l Omnichannel Strategy - Integrate digital and traditional services for a seamless experience.
l Pairplot of dierent columns vs KMeans groups
22
Figure 18: Pairplot of dierent columns vs KMeans groups
Observations
l Cluster 0: Mid-Level Users - Moderate credit limit, balanced visits across channels, and few calls.
l Cluster 1: High Credit, Digital Users - Highest credit limit, many credit cards, prefer online
banking, minimal bank visits.
l Cluster 2: Low Credit, High Support Users - Lowest credit limit, fewer cards, high call volume,
suggesting more service needs.
l Channel Preference - Online visits reduce bank visits. Cluster 2 shows higher dependency on calls.
Business Recommendations
l Enhance Digital Services - Cluster 1 needs improved self-service options to strengthen engagement.
l Credit Upsell Strategies - Cluster 0 can be targeted for credit expansion oers.
l Support Optimization - AI-based chatbots and proactive help for Cluster 2 to reduce call dependency.
l Integrated Experience - Align digital and traditional banking to ensure seamless service across all
clusters.
l 3D plot of dierent columns vs KMeans groups
Figure 19: 3D plot of dierent columns vs KMeans groups
The plot gives a great way to visualize 3 dierent clusters in 3d space.
6.2
Hierarchical Clustering
Hierarchical clustering builds a hierarchy of clusters using a dendrogram. It can be:
l Agglomerative - Starts with each point as a cluster and merges them iteratively.
l Divisive - Starts with one cluster and splits it recursively.
The choice of linkage (single, complete, average, ward's) aects clustering. It is useful for visualizing cluster
relationships but can be computationally expensive.
24
6.2.1 Hierarchical clustering with dierent linkage methods
Figure 20: Among dierent distance and linkage methods, the highest cophenetic correlation is obtained using
Euclidean distance and average linkage.
Let's view the dendrograms for the dierent linkage methods. A dendrogram, in general, is a diagram that shows
the hierarchical relationship between objects. It is most commonly created as an output from hierarchical clustering.
The main use of a dendrogram is to work out the best way to allocate objects to clusters.
25
Figure 21: Dendrograms for the dierent linkage methods
26
Looking the the above dendrograms, the average linkage seems to result in the best separation between clusters,
and its cophenetic correlation is lower than the other linkages. 3 looks to be a good choice for no. of clusters. Then
we used hierarchical clustering with average linkage and euclidean distance and performed clustering only to nd
that both the clusters are identical (i.e. we get the same set of clusters by dierent methods). Hence cluster
proling is same for both methods.
6.2.2 Cluster Proling
The cluster proling remains the same as the K-Means grouping since both clustering methods yield similar results.
The region for both the cluster resulting in same clusters is because the data has distinct, well-separated clusters
which can be seen from the 3d plot.
Cluster Analysis and Recommendations
l Cluster 0: Balanced Users - Moderate credit cards, bank visits, and calls, but low online activity. Action:
Encourage online banking to reduce call dependency.
l Cluster 1: High Credit, Digital Users - Most credit cards, high online usage, few bank visits, smallest
group. Action: Oer premium services and credit perks to retain them.
l Cluster 2: Low Credit, High Support Users - Fewest cards, low bank visits, high call volume. Action:
Use AI chatbots and credit-building programs for better service.
Strategy: Integrate digital and traditional channels for a seamless experience.
6.3
K-means vs Hierarchical Clustering
l The cluster proling remains the same as the K-Means grouping since both clustering methods yield similar
results. The region for both the cluster resulting in same clusters is because the data has distinct, wellseparated clusters which can be seen from the 3d plot.
l Both methods obtained 3 clusters.
l The time taken by the Hierarchical Clustering is 0.34 seconds whereas the time taken by K-means is
1.14 seconds.
l In both the methods clusters 0,1,2 contained 386, 50 and 224 observations (data points) respectively.
l Silhouette Score for both the methods is same and is equal to-
PCA for Visualization
PCA reduces dimensionality while preserving variance by transforming correlated features into orthogonal principal
components (PCs).
Although there are only 5 dimensions, it'll be really cool to be able to visualize the clusters at 3 dimensional
space without loosing much of the information. Let's use PCA to reduce the dimensions so that 90% of the variance
in the data is explained.
27
Figure 22: Xgboost Classier performance
28
We can visualize data in 2 dimensions and also data in 3 dimensions (using 3-D plots). In some cases, we can
also visualize data in 4 dimensions by using dierent hues for the 4th dimension in a 3-D plot. But it's impossible
for us to visualize and interpret data in 5 dimensions.
So using PCA, we scaled down to 2 dimensions, and now it's easy for us to visualize the data.
Figure 23: Visualizing data in 2 dimensions
In the above result, the explained variance is shown.
l The rst principal component explains 56.1% of the total variance in the data.
l The second principal component explains 28.2% of the total variance in the data.
l Pairplot of dierent columns vs Hierarchical groups
29
Figure 24: Pairplot of dierent columns vs Hierarchical groups
As expected the pairplots for both the groups are exactly identical.
6.5
PCA in 3 dimension
l Pairplot of PCA columns
30
Figure 25: Pairplot of PCA columns
l Cluster Separation: Principal Components 1 and 2 show distinct clusters, indicating well-dened groups
in the data.
l Principal Component 3: Less variance and more overlap, meaning it contributes less to dierentiation.
l Variance Distribution: PC1 captures the highest variance, followed by PC2, while PC3 shows minimal
spread.
l Recommendations:
Focus on PC1 and PC2 for customer segmentation and decision-making.
Reduce dimensionality by ignoring PC3 to simplify analysis.
Tailor marketing strategies based on the well-separated clusters in PC1 and PC2.
31
6.5.1 Hierarchical Clustering on lower-dimensional data
Figure 26: Dendrograms for the dierent linkage methods (Hierarchical Clustering on lower-dimensional data)
32
Observations
The cophenetic correlation is highest for average and centroid linkage.
Average linkage is preferred due to more distinct clusters, with a cophenetic correlation of 0.95
Figure 27: 3D plot of PCA columns
In the above result, the explained variance is shown.
The rst principal component explains 56.1% of the total variance in the data.
The second principal component explains 28.2% of the total variance in the data.
The third principal component explains 4.92% of the total variance in the data.
7
Actionable Insights and Business Recommendations
1. Customer Segmentation Based on Banking Interaction
Insight: Customers exhibit dierent banking interaction patterns:
Cluster 0: Balanced Users - Moderate credit limits, balanced engagement across online, calls, and
in-person visits.
Cluster 1: High Credit, Digital Users - High credit limits, frequent online banking usage, minimal
bank visits.
Cluster 2: Low Credit, High Support Users - Low credit limits, frequent calls, low online engagement.
Recommendation: Implement an omnichannel approach to ensure seamless transitions between digital and
traditional banking services.
2. Enhancing Digital Banking for High-Credit Users
Insight: Cluster 1 customers have the highest credit limits and prefer digital banking, minimizing in-branch
visits.
Recommendation:
Enhance self-service features like chatbots and digital advisors.
Oer exclusive online promotions and premium services to retain high-value customers.
33
3. Optimizing Support for High-Call Volume Customers
Insight: Cluster 2 customers make frequent calls, indicating high service dependency.
Recommendation:
Introduce AI-driven chatbots and enhanced FAQ sections to reduce call volumes.
Proactively educate customers on digital banking tools to improve self-service capabilities.
4. Targeted Credit Upsell Strategies
Insight: Cluster 0 customers have a mid-range credit limit and hold an average of 5.5 credit cards, making
them ideal for credit expansion.
Recommendation:
Oer personalized credit limit increases and additional credit card options.
Use behavioral analytics to identify the right moment for credit upsell campaigns.
5. Relationship-Based In-Person Banking for Balanced Users
Insight: Cluster 0 customers still visit the bank but have moderate online engagement.
Recommendation:
Introduce appointment-based services for personalized banking experiences.
Assign dedicated relationship managers to high-value customers preferring in-branch interactions.
6. Reducing Branch Dependency Through Digital Adoption
Insight: A strong negative correlation exists between online visits and in-person banking, highlighting a shift
towards digital preferences.
Recommendation:
Promote digital literacy campaigns to encourage online banking adoption.
Provide incentives for rst-time digital banking users.
7. Improving Customer Retention with Personalized Engagement
Insight: Customers with multiple credit cards tend to require less support and show higher retention rates.
Recommendation:
Develop targeted loyalty programs for multi-card holders.
Provide exclusive nancial planning assistance to high-credit customers.
Conclusion
By leveraging clustering analysis, AllLife Bank can implement a data-driven strategy to enhance customer
experience, optimize service delivery, and drive credit card growth. Integrating digital and traditional banking
solutions will create a seamless experience across all customer segments.