Churn Analysis

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/327337318
Customer Churn Analysis and Prediction in Telecommunication for Decision

Making
Conference Paper · August 2018
CITATIONS READS
2 2,051
3 authors, including:
Banage T. G. S. Kumara
Sabaragamuwa University of Sri Lanka
43 PUBLICATIONS 146 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Service clustering and Applications View project
All content following this page was uploaded by Banage T. G. S. Kumara on 01 March 2019.
The user has requested enhancement of the downloaded file.

2018 International Conference On Business Innovation (ICOBI), 25-26 August 2018, NSBM, Colombo, Sri Lanka
Customer Churn Analysis and Prediction in

Telecommunication for Decision Making
P.K.D.N.M. Alwis B.T.G.S. Kumara H.A.C.S. Hapuarachchi
Department of Computing and Department of Computing and Department of Computing and
Information Systems Information Systems Information Systems
Sabaragamuwa University of Sri Sabaragamuwa University of Sri Sabaragamuwa University of Sri
Lanka Lanka Lanka
Belihuloya, Sri Lanka, Belihuloya, Sri Lanka Belihuloya, Sri Lanka
madushani.niroshi@gmail
Abstract— With the rapid development of communication

technology, the field of telecommunication faces complex A. Churn Prediction
challenges due to the number of vibrant competitive service Today numerous telecom companies are prompt all over
providers. Customer Churn is the major issue that faces by the
the world. Telecommunication market is facing a severe loss
Telecommunication industries in the world. Churn is the activity
of customers leaving the company and discarding the services of revenue due to increasing competition among them and
offered by it, due to the dissatisfaction with the services. The loss of potential customers [1]. Churn is the activity of the
main areas of this research contend with the ability to identify telecommunication industry is the customers leaving the
potential churn customers, cluster customers with similar
current company and moving to another telecom company.
consumption behavior and mine the relevant patterns embedded
in the collected data. The primary data collected from customers Many companies are finding the reasons of losing customers
were used to create a predictive churn model that obtain by measuring customer loyalty to regain the lost customers.
customer churn rate of five telecommunication companies. For To keep up with the competition and to acquire as many
model building, classified the relevant variables with the use of
customers, most operators invest a huge amount of revenue
the Pearson chi-square test, cluster analysis, and association rule
mining. Using the Weka, the cluster results produced the to expand their business in the beginning [2]. In the
involvement of customers, interest areas and reasons for the telecommunication industry each company provides the
churn decision to enhance marketing and promotional activities. customers with huge incentives to attract them to switch to
Using the Rapid miner, the association rule mining with the FP-
Growth component was expressed rules to identify
their services, it is one of the reasons that customer churn is a
interestingness patterns and trends in the collected data have a big problem in the industry nowadays. To prevent this, the
huge influence on the revenues and growth of the company should know the reasons for which the customer
telecommunication companies. Then, the C5.0 Decision tree decides to move on to another telecom company. The
algorithm tree, the Bayesian Network algorithm, the Logistic
Regression algorithm, and the Neural Network algorithms were Telecom Churns can be classified into two main categories:
developed using the IBM SPSS Modeler 18. Finally, comparative Involuntary and Voluntary. Involuntary are easier to identify.
evaluation is performed to discover the optimal model and test Involuntary churn is those customers whom the Telecom
the model with accurate, consistent and reliable results. industry decides to remove as a subscriber. They are churned
Keywords—bayesian network, c5.0 decision tree, logistic for fraud, non-payment and those who don‘t use the service.
regression, neural network Voluntary churn is difficult to determine because it is the
decision of the customer to unsubscribe from the service
provider. Voluntary churn can further be classified as
I. INTRODUCTION
incidental and deliberate churn [3]. The former occurs
Decision making is a key feature of every organization.
without any prior planning by the churn but due to change in
The quality of decisions made is dependent on some amount
the financial condition, location, etc. Most operators are
of knowledge generated from existing or researched
trying to deal with these types of churns mainly.
information. The use of modern analytical tools to generate
such knowledge is reasonable for any profit-driven firm.
Taking decisions on customers is one of the key points in B. Churn Management
most companies, especially companies in the service sector. Churn management is very important for reducing churns
The ability of these companies to predict customer churn is as acquiring a new customer is more expensive than
critically inadequate. Customer churn is the action of the retaining the existing ones [4]. Churn rate is the
customer who is like to leave the company and it is one of measurement for the number of customers moving out and in
the mounting issues of today’s rapidly growing and during a specific period of time. If the reason for churning is
competitive telecommunication industry. To minimize the known, the providers can then improve their services to
customer churn, prediction activity to be an important part of fulfill the needs of the customers. Churns can be reduced by
the telecommunication industry’s vital decision making and analyzing the past history of the potential customers
strategic planning process. systematically [5]. A large amount of information is
40
maintained by telecom companies for each of their customers The type of customer, whether a
4 Tariffs
prepaid or post-paid customer
that keep on changing rapidly due to a competitive Length of time a customer has
5 Tenure
environment. The information includes the details about been with a particular subscriber
billing, calls and network data. The huge availability of Approximates the amount used to
Credit purchase
6 purchase call credits a month in
information arises the scope of using Data mining techniques amount (CpM)
rupees
in the telecom database. The information available can be Data purchase
Approximates the amount used to
7 purchase data bundles a month in
analyzed in different perspectives to provide various ways to amount (DpM)
rupees
the operators to predict and reduce churning. Only the Identifies whether customer have
8 Internet usage
relevant details are used in the analysis which contributes to used internet facility or not
Determines whether product
the study from the information given. Data mining 9 Product innovation innovation is necessary for
techniques are used for discovering the interesting patterns sustaining customers
within data and it helps to learn to predict whether a Identifies whether customer have
10 Churn
changed networks or not
customer will churn or not based on customer‘s data stored
in the database.
B. Data Pre-processing
C. Research Objectives The training and testing dataset used in this research may
The main objective of this research is to produce a be included missing data, repeated data or inconsistent data.
predictive model with better results that assess customer To handling missing data and removing duplicated data
churn rate of telecommunication companies using the values data pre-processing is done. The RapidMiner tool is
predictive analytics algorithm for data mining. used at this stage to pre-process the data for analysis and
mining. In doing cluster analysis, the Pearson chi-square and
The supporting objectives examined are to: predictive model building, the data types to be converted into
i. Cluster customers into various categories to numerical values.
enhance marketing and promotional activities.
ii. Mine the relevant patterns embedded in the TABLE 2: CODES FOR ALTERNATIVES
collected data have a huge influence on the Variable Alternative Code

revenues and growth of the Telecommunication Female 0
Gender
Male 1
companies. Student 1
Government Employee 2
II. METHODOLOGY AND EXPERIMENTAL Occupation Private Employee 3
DESIGN Own Business 4
Others 5
Data mining and statistical algorithms were used in the Dialog 1
data analysis, model building and model deployment in this Mobitel 2
Network often used Airtel 3
research. Weka 3.8, Minitab 17, RapidMiner Studio 8.1 and Hutch 4
IBM SPSS Modeler 18 were the analytical tools used in the Etisalat 5
respective analysis and mining process. Less than 1 year 1
Tenure 1-3 2
3-5 3
A. Data Collection Above 5 4
No 0
The questionnaire was used as the tool to collect the data Churn
Yes 1
primarily from customers. The Google drive plug-in was Pre-paid 1
used to design the questionnaire. Training data was collected Tariffs Post-paid 2
Both 3
from the 200 respondents and 50 responses were received
No 0
from respondents on the questionnaire for testing data. The Usage of Internet
Yes 1
data was collected during the period (October – November) No 0
of 2017. Product innovation Yes 1
Not sure 2
C. Research Framework
TABLE 1: THE VARIABLES USED IN DATASET FOR THIS RESEARCH In figure 1, a research framework developed to address
No Variable Name Description problems of this research. This research framework details
Age, Gender, the sectorial areas of concentration and the data mining
1 Demographic variables considered
Occupation algorithms adapted in creating the predictive model. It
Identifies the number of mobile
The number of includes a model deployment and evaluation strategies that
2 networks a customer is connected
networks
to and actively using will assess its effectiveness and efficiency.
Frequently used Identifies the most frequently used
3
network mobile network by the consumer
41
means clustering produced four clusters out of the 200
collected data.
TABLE 4: CLUSTER INSTANCES WITH PERCENTAGE
Cluster Clustered Percentage

Number Instances (%)
Cluster 0 69 17
Cluster 1 34 32.5
Cluster 2 65 16
Cluster 3 32 34.5
TABLE 5: FINAL CLUSTER CENTROIDS
Variabl Full Cluster Cluster Cluster Clust

e Data 0 (69.0) 1 2 er 3
(34.0) (65.0) (32.0
(200.0)
)
Gender 0.515 1 1 0 0
Age 33.24 35.49 54.79 22.93 26.40
Occupati
2.305 2.6232 4.0882 1.18 2
Figure 1: Research framework of Customer Churn on
Analysis and Prediction in Telecommunication Monthly 38833.3 61617. 10984. 2740
31827.5
Income 3 64 61 6.25
III. RESULTS NoOfMo
bileNet
2.125 2.5217 3 1.30 2
workCo
A. Pearson Chi-square Test nnected
Pearson Chi-square test is used to evaluate the variables MobileN
which are associated with the decision of churn that can be etworkO
2.195 2.2609 4.52 1 2
ftenUse
used in the predictive model building. Pearson and likelihood
d
ratio chi-square tests are conducted using Minitab. The test Tenure 3.085 3.7391 4 1.95 3
produced significant results (p–value is less than α level of Tariffs 1.325 1.1884 1.38 1.35 1.5
0.05) to indicate that some of the variables have an CpM 1003.25 1105.79
1948.5
504.61
790.6
association with the decision to churn. 2 2
Internet
0.8 1 1 0.3846 1
Usage
TABLE 3: SUMMARY OF ASSOCIATION OF EACH ATTRIBUTES THE Churn 0.675 1 1 0 1
CHURN DECISION
A. C Telecom providers can leverage this cluster model to

Variable P-Value Association
Marital Status 0.645 No l allocate customers’ for conducted promotional activities. It is
Gender 0.038 Yes u observed in the clusters that the churners are mostly
Age 0.005 Yes s businessman and private employees who are generally males
Occupation 0.011 Yes
Monthly Income 0.001 Yes t and government employees who are female. These churners
Purpose of mobile phone e spend a lot of call credit per month and used prepaid service
0.107 No
usage r package. The customers do not intend to churn are mostly
No of mobile network
connected
0.003 Yes A students who are generally females. Telecom providers,
Mobile network often used 0.006 Yes n especially those who have endured a churn of customers
Tenure 0.000 Yes a need to pay attention in this cluster to the reason for churn as
CpM 0.004 Yes l presented by these customers.
Tariffs 0.029 Yes
y
Internet usage 0.020 Yes
DpM 0.105 No s B. Association Rule Mining
Product Innovation 0.021 Yes i Association rule mining used to determine interestingness
s patterns and trends between variables in the dataset. It is
Cluster Analysis is used to discover groups with identical contracted to identify strong rules explored in the dataset
features in collected data. These groups explained the interest using some measures of interestingness. The RapidMiner
areas and churn decision with the reasons for targeted Studio 8.1 tool was used in creating the Association rules
marketing and product development. Using Weka 3.8 the k- model for collected data. In creating the model, the Frequent
42
Pattern Growth (FP-Growth) algorithm was used to mine Tariffs=Prepaid
and
associations between variables that result in a churn decision Gender=Female
with particular interest and focus on confidence. The
generated Association rules model presented in Figure 2. B. Predictive Model Building
Using the valid variables identified in the Pearson Chi-
square test, the four predictive models are created with IBM
SPSS Modeler 18.0 data mining software. The four
classification modeling techniques; C5.0 tree, the Bayesian
network, Neural Network and Logistic regression are used to
create predictive models. The optimal model is
recommended based on individual models and performance
metrics.
Figure 2: Association rules model An auto classifier was applied in the created C5.0
In the Table 6 showed ten (10) generated association rules tree model in Figure 3, to test whether the selected C5.0
were selected based on filtering the conclusion as the algorithm will be determined as one of the best algorithms to
decision of churn is yes and sorted in descending order in create the predictive model.
line with confidence. The sorted rules have a maximum
confidence of 95.5 percent and a minimum of 84.6 percent.
TABLE 6: TOP (10) GENERATED RULES

Confidence
Conclusion
Premises
Support
Laplace
No
InternetUsage=
Yes, Gender
Churn_Ye
1 =Male and 0.105 0.955 0.995
Tenure = 3-5
s Figure 3: C5.0 algorithm tree model
years
Tariffs=Prepaid, The C5.0 algorithm was listed in the suggested
Gender=Male Churn_Ye
2 0.100 0.952 0.995 churn algorithms which were applied to the data. In Figure 3,
and Tenure= 3- s
5 years the matrix was applied to create a table showing the
Gender=Male
3 and Tenure=3-5
Churn_Ye
0.130 0.929 0.991
relationship between fields of Churn by $C-Churn. In the
s
years created above model, analysis and evaluation are used to
MobileNetwork create a report and a chart for comparing the accuracy of
OftenUsed=Mo
Churn_Ye predictive models.
4 bitel and 0.115 0.920 0.991
s
Tenure=3-5
years
InternetUsage=
Yes and
Churn_Ye
5 MobileNetwork 0.105 0.913 0.991
s
OftenUsed=Mo
bitel
Tariffs=Prepaid
Churn_Ye
6 and Tenure= 3- 0.230 0.902 0.980
s
5 years
InternetUsage=
Yes, Figure 4: Bayesian network Figure 5: Neural network
Churn_Ye
7 Tariffs=Prepaid 0.190 0.884 0.979 model model
s
and Tenure= 3-
5 years
Tenure= 3-5 Churn_Ye
8 0.270 0.871 0.969
years s
Tariffs=Prepaid,
Gender=Female Churn_Ye
9 0.130 0.867 0.983
and Tenure= 3- s
5 years
1 InternetUsage= Churn_Ye
0.110 0.846 0.982
0 Yes, s
Figure 6: Logistic regression
model
43
and effectiveness of the model in predicting customer churn
in Telecommunication.
As the result of generating the logistic regression
model, it built up a statistical model which consists of two no yes % correct
mathematical equations to calculate the ability of a person C5.0 no 45 20 69.2
being churner or non-churner. yes 10 125 92.5
Overall Percentage 85%
no yes % correct
Equation 1: Calculating Y’ BN no 47 18 72.3
yes 24 111 82.2
Y' = 0.0682*Gender+(-0.00182)*Age
+0.04558*Occupation+0.00001458* no yes % correct
MonthlyIncome+(-0.7214)*Tenure LR no 27 38 41.5
+(-0.2053)*Tariffs+0.00001024*CpM+2.659 yes 18 117 86.6
Equation 2: Calculating P(1) no yes % correct
NN no 21 44 32.3
P(1) = exp(Y')/(1 + exp(Y')) yes 16 119 88.1
Equation 1 consists of most relevant variables TABLE 8: ACCURACY AND AUC VALUE OF EACH MODEL
which are most affected by the churn decision. The variables
Contrasting the four models, the C5.0 algorithm of
values should be replaced by this equation and then the value
decision tree proved optimal model with 85% accuracy and
of Y' can be calculated. Then the calculated Y' value should
AUC value as 0.888 for the customer churn analysis and
be replaced with the equation 2 and calculate the value of
prediction in Telecommunication based on the chosen
P(1). Prediction of being a churn or non-churn customer is
variables and attributes.
depending on this P(1) value.
If the P(1) value is equal or greater than 0.5, then the D. Model Testing
prediction result is positive and the person will be a churner. The optimal model based on the results of the evaluation
If the P(1) value is less than 0.5, the result is close to 0 is tested on the dataset designed to test the model. The C5.0
(zero). It means the prediction result is negative and the algorithm model was used to test the data as it was identified
person will be a non-churner. as the most optimal among the models. The chosen optimal
model was tested using the test data collected from
C. Model Evaluation customers. The test data has 50 observations, 7 variables and
The four models are evaluated by testing the significance coded the same as the coding in Table 2. The distribution of
of the predictive model generated. The performance metrics the dataset is along with all the gender, age, monthly income,
of all the models were correlated for optimal performance occupation and the other demographic and operational
using Area Under Receiver Operating Characteristic Curve variables used to develop the model. Predictions are then
(AUROC). made to indicate which customers are likely to churn and
those that are not. The predictor variable and target variables
TABLE 7: CONFUSION MATRIX WITH TRAINING DATA
used in building the predictive churn model were tested for
significance.
Model Accuracy (%) AUC Value
C5.0 algorithm
85 0.888
model
BA model 79 0.886
LR model 72 0.762
NN model 70 0.759
The variables were equally tested for validity and

reliability. The validity of the model indicates that it
measures what it is intended for while reliability test
produces consistent results. The tests assessed the efficiency
44
Figure 7: Test model for C5.0 algorithm marketing purpose to access marketing strategies in the
industry. In addition, the association rule mining was
The test data is applied by mapping the dataset to the model provided the significant results that present relevant
designed by the C5.0 algorithm as indicated in Figure 7. knowledge of factors that have a huge influence on the
Further model screening and applications are initiated to revenues and growth of the Telecommunication companies.
define the output in determining the likelihood of churn. The Telecommunication companies must grasp on this finding
test results presented as the model predicted that 36 and work to maintain their clients. C5.0 Decision tree model,
customers will churn with confidence from 100% to 55.6%. the Bayesian Network model, Logistic Regression model,
It was further explained by the results that over 62% of the and the Neural Network model were used and compared for
churn customers have a confidence of above 80%. According the most optimal model that predicts accurately. The C5.0
to the Figure 8, the results also indicate the churn customers algorithm of decision trees model proved optimal among the
staying their network above 5 years. It is expensive to models with 85 percent accuracy and AUC value as 0.888.
acquire new customers than to retain existing ones, the The C5.0 algorithm model of the decision tree can be
prediction of churners and the reasons proffered earlier need recommended for churn management. The models can be
close attention. The top 10 churners and non-churners used by industry with the IBM SPSS Modeler or any other
predicted by the model are presented in Figures 8 and 9 appropriate tool with the same algorithm. The
respectively. The source of the test data set can be connected Telecommunication companies can connect the models
to the database or server of the company to produce a real- directly to their servers or database to produce real-time
time output of churn results for decision making. results.
ACKNOWLEDGMENT
This study was done by the Department of Computing

and Information Systems in the faculty of Applied Sciences
in the Sabaragamuwa University of Sri Lanka. The author
would like to thanks to the Dr.B.T.G.S.Kumara (Head,
Figure 8: Results of test predictions_Yes Department of Computing & Information Systems,
Sabaragamuwa University of Sri Lanka) and
Mr.H.A.C.S.Hapuarachchi (Lecturer, Department of Sports
Sciences and Physical Education, Sabaragamuwa University
of Sri Lanka) for their contribution in stimulating
suggestions and encouragement, assistance, guidance, and
cooperation.
REFERENCES
Figure 9: Results of test predictions_No
[1] V. Umayaparvathi and K. Iyakutti, "A Survey on Customer Churn
Prediction in Telecom Industry: Datasets, Methods and Metrics,"
V. DISCUSSION AND CONCLUSION International Research Journal of Engineering and Technology
(IRJET), vol. 03, no. 04, April 2016.
Data mining is a symbolic tool in the
[2] Shin-Yuan Hung and Hsiu-Yu Wang, "Applying Data Mining to
Telecommunication industry that can exploit the large Telecom Churn Management," Department of Information
volume of data generated for pattern analysis. The recent Management, National Chung-Cheng University, Taiwan, ROC,.
increasing embrace of the predictive algorithm of data [3] M.Balasubramanian and M.Selvarani, "Churn Prediction in Mobile
Telecom Systems Using Data Mining Techniques," Department Of
mining has given room for companies to assess their future Computer Science, Annamalai University, Chidambaram, April 2014.
success, challenges, and targets. The research brings to fore [4] Rahul J. Jadhav and Usharani T. Pawar, "Churn Prediction in
the relevant untapped customer data and knowledge for Telecommunication Using Data Mining Technology," International
Journal of Advanced Computer Science and Applications, vol. 2, no.
churn prediction and customer classification for better 2, February 2011.
decision making. Clustering customers were developed in [5] K.Dahiya and S.Bhatai, "Customer churn analysis in telecom
industry," 4th International Conference on Realibility, Infocom
this research to determine the involvement of customers, Tehnilogies and Optimization(ICRITO), 2015.
interest areas and reasons for the churn decision. The results [6] Amjad Khan and Zahid Ansari, "Comparative Study Of Data Mining
of the cluster analysis can be used in promotional and direct Techniques In Telecommunications-A Survey," Dept of Electronics
and Communication, P.A. College of Engineering, Mangalore, India.
45
View publication stats

Churn Analysis

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Churn Analysis

Caricato da

Copyright:

Formati disponibili

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Customer Churn Analysis and Prediction in Telecommunication for Decision

Conference Paper · August 2018

Service clustering and Applications View project

The user has requested enhancement of the downloaded file.

Customer Churn Analysis and Prediction in

Abstract— With the rapid development of communication

collected data have a huge influence on the Variable Alternative Code

TABLE 4: CLUSTER INSTANCES WITH PERCENTAGE

Cluster Clustered Percentage

TABLE 5: FINAL CLUSTER CENTROIDS

Variabl Full Cluster Cluster Cluster Clust

A. C Telecom providers can leverage this cluster model to

TABLE 6: TOP (10) GENERATED RULES

The variables were equally tested for validity and

This study was done by the Department of Computing

View publication stats

Potrebbero piacerti anche