Customer Churn in Telecommunications with atoti

Remediating attrition with predictive analytics

Advancement in technology and changes in users’ behaviour resulted in a high attrition rate for the telecommunication sector. Some contributing factors are:

  • with Mobile Number Portability (MNP), customers can easily switch to another provider while preserving their number
  • OTT players such as Netflix, Amazon Prime Video, Disney+ are bypassing the traditional operators network such as cable, broadcast and satellite television
  • OTT applications such as WhatsApp, Google Hangout, Skype are cannibalizing the paid voice and messaging services

Personally, I have churned three times in the last 2 years! I left the telco giant where I was a loyal customer for more than 10 years, and subscribed to a newbie player. Then I moved to my current no-contract plan that keeps me happy and is generous on the mobile data.

Business Motivation

Churn is a critical metric not just for the telcos, but for all subscription and SaaS companies as it tells us how departing customers affect the company’s monthly revenue and growth, which in turn impacts investors’ confidence in the company.

Let’s look at some important metrics that are impacted by churn:

  • Monthly Recurring Revenue (MRR) is the recurring revenue expected on monthly basis for the subscribed goods/services
  • Net Revenue Retention (NRR) measures the change in the MRR over the period (includes benefits from expansion)
  • Gross Revenue Retention (GRR) measures annual revenue earned from the business’ existing customer base (excludes benefits from expansion)

The GRR is a bit like a happiness indicator for the existing customers. 

Having high GRR shows that the company has high retention rates. Customers are happy with the services/products that they are provided with. Investors would be assured by this stability.

If a company has high GRR and even higher NRR, it shows that on top of retaining existing customers, the company has grown its customer base further.

High NRR coupled with low GRR implies that although the company has acquired many new customers, it has low retention rates. So even if there is still revenue left over after the churn, there is high potential the new customers might churn too. The growth of the company becomes less predictable.

While it is important to acquire new customers in order to grow business, there are more perks that come with customer retention. Invesp has provided a good comparison between the two, showing that it can cost 5 times more to acquire new customers. Hence this article will focus on identifying the proportion of retention and acquisition in order to maintain a target NRR. 

Technical Motivation

We will explore the accuracy of 4 churn predictions presented in the Telco Customer Churn Prediction notebook from Kaggle:

  • Logistic Regression Prediction
  • Logistic Regression (SMOTE) Prediction
  • Naive Bayes Prediction
  • SVM Classifier Linear Prediction

With atoti, we will compare the performance of the four models in terms of the expenses spent to achieve a target NRR of 90%.

Leveraging on atoti’s simulation capability, we continue to explore the difference in the expenditure if we were to increase the NRR target, or the budget spent for acquisition.


We used Telco Customer Churn data from Kaggle for machine learning, as well as our analysis. Since we only have the past month’s data, we compare the models’ predictions against the same set of data for the accuracy comparison.

While we have the actual churn value, we added 3 more columns for the predictive output from the different models:

  • ChurnPredicted – Prediction output from model
  • ChurnProbability – Probability of churn happening (we keep only those predicted to churn)
  • PredictionAccuracy – To access if predicted churn matches the actual churn 

Integrating machine prediction into atoti data cube

We created a multidimensional, in-memory data cube with the dataset with only one datastore — customer_store. Read more about cube creation on atoti documentation.

Making use of the hierarchical nature of the data cube, we classify the data structurally under:

  • Account
  • Demographic
  • Services
  • Customer

We won’t delve into the training of the machine learning models in this article. You can download the corresponding Jupyter notebook from atoti notebook gallery for more details.

For now, we see the below classification reports of the four models that we are going to compare in this article.

After model training, we load the full data set into each model to get its corresponding prediction and probability. For instance:

lr_prediction = logit.predict(X)
lr_probability = logit.predict_proba(X)

With this output, we will generate the “ChurnPredicted”, “ChurnProbability” and “PredictionAccuracy” as previously mentioned:

def model_scenario(predictions, probabilities):
    churnProbability = np.amax(probabilities, axis=1)
    churn_forecast = (telcom.copy()).drop(
        ["ChurnPredicted", "ChurnProbability", "PredictionAccuracy"], axis=1

    churn_forecast = pd.concat(
                {"ChurnPredicted": predictions, "ChurnProbability": churnProbability}

    # we are not interested in the probability if it is predicted that the client will not churn
    churn_forecast["ChurnProbability"] = np.where(
        churn_forecast["ChurnPredicted"] == 1, churn_forecast["ChurnProbability"], 0

    churn_forecast["ChurnPredicted"] = np.where(
        churn_forecast["ChurnPredicted"] == 1, "Yes", "No"

    churn_forecast["PredictionAccuracy"] = (
        churn_forecast["ChurnPredicted"] == churn_forecast["Churn"]

    return churn_forecast

We are now ready to load the model’s prediction back into the atoti datacube!

Remember, we wanted to compare the performance of the models. To do so, we load the predictive output as a scenario in the datastore customer_store.

customer_store.scenarios["SVM Classifier Linear Prediction"].load_pandas(svm_df)

Model Accuracy Comparison

Let’s do a quick visualization of the churn prediction from the models against the actual churn, represented by Base below:

The chart shows that the number of churns predicted by SVM Classified Linear Prediction is closest to the actual churn. However, Logistic Regression Prediction is also not too far off. The models either over or under-predicted the churns. So which is more favourable?

Let’s decide after we have more statistical understanding of the data:

Using a pivot table, we derive the percentage of correct prediction by creating a calculated measure that takes PredictionAccuracy.SUM / Count. This gives us more clarity as we see that Logistic Regression Prediction has the highest percentage of correct prediction, 976 out of the 1,450 predicted true. In fact, Logistic Regression Prediction has the highest Accuracy score of 0.80 during the model training and Naive Bayes Model has the lowest accuracy score of 0.74.

Churn and MRR Analysis

More than a quarter of the customers left the telco in the past month.

The telco would have lost all its customers in the few coming months if this attrition rate keeps up.

Let’s use a simple formula that uses the monthly recurring revenue (MRR) to compute the net revenue retention (NRR):

Churn is the revenue lost due to customers leaving. In this case, we do not consider expansion of the business. We compute the Actual Revenue lost and the Predicted Revenue lost by using the actual churn and predicted churn values respectively:

m["Actual Revenue Lost"] =
    tt.filter(m["MonthlyCharges.SUM"], l["Churn"] == "Yes"), on=h["CustomerID"]

#  we use ChurnPredicted here instead of churn because we want to see the difference between the prediction and the actual churn
m["Predicted Revenue Lost"] =
    tt.filter(m["MonthlyCharges.SUM"], l["ChurnPredicted"] == "Yes"), on=h["CustomerID"]

Note that the initial MRR should not be impacted by any filters, hence we use the function to get the total monthly charges across CustomerID: 

m["MRR Initial"] =["MonthlyCharges.SUM"], on=h["CustomerID"])

Finally we have our NRR:
m["NRR"] = (m["MRR Initial"] - m["Predicted Revenue Lost"]) / m["MRR Initial"]

We see in the below table that when the churn prediction is under-estimated, as in the case of Logistic Regression Prediction, it gives a “false hope” of having higher NRR that implies lesser loss.  

Logistic Regression (SMOTE) Prediction and SVM Classifier Linear Prediction would give a more conservative NRR forecast while the NRR value from Naive Bayes would be too depressing for the telco.

Customer Retention Strategy

Naive Bayes predicted 3,012 customers will churn. It is not realistic to attempt to retain all of them, especially when the prediction is only 50% correct. Let’s see some assumptions for our customer retention strategy:

  1. We aim to achieve a target NRR of 90%
  2. We compute the number of customers that we need to retain in order to achieve this target NRR
  3. For each customer identified, we will set aside a budget of $100 for retention purpose
  4. We do not know who has really churned yet

Let’s start by creating a measure for the target NRR:

m["TargetNRR"] = 0.9

We need to compute the maximum possible loss in the revenue that would allow us to achieve the target NRR:

m["Max Loss Possible"] = m["MRR Initial"] - (m["TargetNRR"] * m["MRR Initial"])

Based on the predicted churn, we need to prevent the revenue loss from exceeding the maximum possible loss. We call the excess “Revenue Compensation” since this is the amount that we either have to retain or acquire anew.

m["Revenue Compensation"] = m["Predicted Revenue Lost"] - m["Max Loss Possible"]

We use the average monthly charge of those who are predicted to leave to gauge the number of customers we need to retain to achieve the Revenue Compensation amount. We could also use the minimum monthly charge but that would result in a bigger retention size. Likewise, we could use the maximum monthly charge and it would result in a smaller retention size. Feel free to explore different strategies here to determine the retention size.

churnMean = tt.agg.mean(
    tt.filter(m["MonthlyCharges.SUM"], l["ChurnPredicted"] == "Yes"),

m["Churn MonthlyCharges.MEAN"] = tt.parent_value(
    churnMean, on=h["CustomerID"], total_value=churnMean

m["Retention size"] =
    tt.ceil(m["Revenue Compensation"] / m["Churn MonthlyCharges.MEAN"]),

Did you notice we used atoti.parent_value to find the average monthly charge of those who churned? We want the value at the parent level to be applied for all the levels below. Likewise we use on retention size computation to ensure the value is constant across different levels.

Identifying Customers to retain

Now that we have the retention size, we need to identify who we want to retain. The easiest way is to retain those with the highest probability of leaving. However, we have a target revenue compensation amount that we need to achieve. 

Consider the scenario where we retained a customer whose monthly charge is $100, it would be equivalent to retaining 5 customers of $20 monthly charge.

Therefore, the strategy adopted is as follows:

  1. We increase the churn probability for customers with monthly charge greater than the average churn monthly charges
  2. With atoti.rank, we rank the customers who are churning based on the churn probability
# we only rank those customers who are churning. We give higher weighting to customers with higher charge so as to minimize the lost
m["churnPositive"] = tt.where(
    (m["MonthlyCharges.MEAN"] >= m["Churn MonthlyCharges.MEAN"])
    & (m["ChurnProbability.MEAN"] > 0),
    m["ChurnProbability.MEAN"] + 1,

m["Churn Rank"] = tt.rank(
    m["churnPositive"], h["CustomerID"], ascending=False, apply_filters=True

Retention vs Acquisition

Now that we have the ranking in preference for retention, we make further assumptions that for each customer retention, we will allocate a budget of $100 and in the event we need to replace the customer, that would require a marketing budget of $500.

m["Retention budget"] = 100
m["Marketing budget"] = 500

So based on the ranking preference, we will do a forecast on the retention and marketing budget:

# we spent $100 on each of the customers identified and managed to retain all of them
m["Forecast expense"] = tt.agg.sum(
        (m["Churn Rank"] <= m["Retention size"]) & (m["churnPositive"] > 0),
        m["Retention budget"],

# we retained none of the customers, hence spending $500 to recruit number of new customers equivalent to the retention size
m["Full recruitment expense"] = tt.agg.sum(
        (m["Churn Rank"] <= m["Retention size"]) & (m["churnPositive"] > 0),
        m["Marketing budget"],

Let’s see what kind of information these measures can give us.

Assuming all the identified customers either get retained successfully or have to be replaced, we gather 2 information here:

  • Logistic Regression costs the least to maintain the 90% NRR while the prediction based on Naive Bayes would be the most costly
  • It would be a lot more cost effective to retain customers than acquire new ones.

Reality Check

We are ready to compare our forecast against the actual churn results!

We are going to assume that the retention budget offered is fully utilized, whether the customers had intention to churn or not. This means that customers who had intended to churn got retained successfully. 

If we attempt to retain customers who had no intention of churning, the retention budget is wastefully utilized. Conversely, it is possible that some customers outside of our prediction actually churned and we probably would not achieve our target NRR. We have to spend more money to replace them, on top of the retention budget that we already spent.

Post Retention NRR

On this basis, we compute the new NRR after the retention effort:

# those that were not predicted correctly by the algorithm or those that we did not attempt to retain will make up the revenue lost
m["Post retention revenue lost"] = tt.agg.sum(
        m["Forecast expense"] == 100,
        tt.where(l["Churn"] == "Yes", m["MonthlyCharges.SUM"], 0),

m["Post retention NRR"] = (
    m["MonthlyCharges.SUM"] - m["Post retention revenue lost"]
) / m["MonthlyCharges.SUM"]

There we have it! 

Only Naive Bayes Prediction managed to cross the target NRR of 90%! However, it has the largest retention size of 2,354, the chances of it retaining the correct churning customer is naturally higher. This would come at an expense.

We see below, the money spent on retaining the correct target for the Naive Bayes Prediction is only 50%. Logistic Regression Prediction seems much more cost effective!

Customer Acquisition to reach target NRR

So after a round of retention effort, we need to acquire new customers if we haven’t reached our NRR target.

Let’s compute the gap in NRR and the number of customers we estimate to acquire.

gap_to_target_nrr = m["TargetNRR"] - m["Post retention NRR"]

m["Gap in revenue lost"] = m["Post retention revenue lost"] - m["Max Loss Possible"]

m["Clients to replace"] =
        m["Gap in revenue lost"] > 0,
        tt.ceil(m["Gap in revenue lost"] / m["Churn MonthlyCharges.MEAN"]),

We can compute the marketing budget required for the acquisition of these customers:

m["Actual Marketing budget"] = m["Clients to replace"] * m["Marketing budget"]

Since the retention budget is already spent, we have to add it to the marketing budget to compute the actual expense for us to reach the target NRR.

m["Actual Expense"] = m["Forecast expense"] + m["Actual Marketing budget"]

The figures showed that Logistic Regression (SMOTE) prediction would be the most cost effective model to achieve the target NRR of 90%!

Alternative situations

What-if we want 95% NRR?

Curious to know how the figures would change if the target NRR is changed? With atoti, we can do so simply by creating a measure simulation that allows us to replace the TargetNRR value:

NRR_simulation = cube.setup_simulation(
    "NRR Simulation", base_scenario="90% NRR", replace=[m["TargetNRR"]],

In this case, we replace the NRR from 0.9 to 0.95.

NRR_simulation["95% NRR"] = 0.95

Without further processing, we can compare the difference between 90% and 95% target NRR through atoti visualization:

Logistic regression (SMOTE) Prediction is still the most cost efficient algorithm. 

What-if Marketing budget is twice the expected 

Instead of replacing the value of measures in the simulation, let’s create a simulation that scale a measure instead:

marketing_budget_simulation = cube.setup_simulation(
    "Marketing Budget Simulation",
    base_scenario="5 x Retention",
    multiply=[m["Marketing budget"]],

We will multiply the Marketing budget by 2 to see how the total expenses will be impacted:

marketing_budget_simulation["10 x Retention"] = 2

We see Naive Bayes Prediction is unaffected by the increase in marketing budget, as the retention effort is sufficient to maintain the 90% NRR. However to maintain 95% NRR, all algorithms showed that new customers acquisition is necessary.

What’s next?

Despite the lack of data, we saw how atoti can help in business decision making, as well as analyse the accuracy of machine learning models. With longer periods of data, we could imagine having better model training and improve on the accuracy of the prediction.

Also, we have not considered the features importance in this article, and only taken into account the monthly charge. We can imagine identifying customers for retention based on the services or demographic and applying different retention budgets accordingly.  

Caption: image from Telco Customer Churn Prediction notebook from Kaggle

We welcome you to share with us on how you would implement your customer retention strategy!