Any business is interested in a relationship with the clients which will be smooth and long-term and not just a one-time story. However, each relationship needs investments, which in this case means money and time. Businesses have to give the customers a reason to stay with them. These can be appealing ads, discounts, gifts and so on and all of them are costly, however each company still has to earn money and stay profitable at the same time. It’s important to identify and develop the relationship with valuable clients, meaning those who pay the business more money during the customer lifetime than this business spends on them. Customer Lifetime Value is an effective tool to identify such customers and take actions in order to make them stay as long as possible.

Customer Lifetime Value, or CLV, is a total profit from the customer’s overall relationship with the brand during their whole lifespan. This profit includes costs of attracting, servicing and retaining the client, all the past and future customer’s transactions and networking effects which are about bringing new customers for the business (for example word of mouth).

When businesses start to think about their profits in terms of customers the main parameters they consider are purchases which are connected to the revenue, cost of goods sold, the cost of acquiring the customer and optionally additional costs that take part into creating the goods or services sold. Another thing that should be taken into consideration, especially when we think about the CLV as something that we calculate within some longer time period, is the discount rate. Discount rates can also be referred to as the concept of net present value which means that we are taking all the costs overtime and refocus them for analysis today. This is done in order to adjust for the uncertainty and risks.

So the CLV is commonly referred to as the revenue over the lifetime minus variable costs also discounted at a company-specific discount rate

The main idea of predicting the lifetime value is building models from past data, predicting the near future and combining these both things together.

**Calculating the CLV**

**Historical approach**

There are several ways we can calculate the potential CLV using historical data and simple calculations for that. One of the equations is based only on gross profit from purchases form your customer.

*APRU **= Total revenue for a chosen period / nr of customers buying in a chosen period*

APRU calculated for a certain period will be your potential CLV for this period.

Calculating APRU is the most coarse and quick way to briefly understand the CLV, however it does not take into consideration too many different parameters: how much time the customer will actually exist, the costs of the customer acquisition and product. Also this method assumes that all your customers behave the same way, spend similar amounts of money on their orders and stay with the company during similar periods of time – it assumes that one size fits all.

Another method takes into consideration more parameters:

**CLV **= **avg nr of transactions per month *** avg order value *** avg gross margin * avg customer lifespan in monthsnr of clients per month**

All the parameters (except lifespan) are calculated from one month

Gross margin shows which part of the revenue is the actual business profit and what is the cost expressed in percentage.

Lifespan is the period of time within which the customer stays with the company and continues to buy. It can be calculated using the churn rate:

**Lifespan **in months = 1 / churn rate (%)

And the churn rate in turn is calculated like that:

**Churn rate **= (CB – CECB) * 100%,

Where CB is the number of customers at the beginning of the month and CE is the number of customers at the end of the month.

One more thing that can be taken into consideration is a discount rate – this parameter can help us to discount our future cash flow into the present value. The idea here is that receiving money now is more valuable than receiving the same amount in the future, because it can be invested. The discount rate is expressed in terms of a percentage. The parameter value can be found in your financial department.

**LTV **= Margin * APRU * i = 1n1(1 + Discount rate)n,

N is the customer lifespan in months.

The historical approach is easy to implement and understand, the second formula also takes into consideration more useful parameters that widen our understanding of the CLV. Nevertheless the approach itself operates only on the historical part, suggests the average lifespan and assumes that the customers are similar to each other.

This is where machine learning and modelling come into play in order to make CLV more objective and predictable.

**Modelling approach**

Modelling approach assumes that we do not only calculate our historical data but also predict the customers future behaviour.

**LTV*** = Historical Lifetime Value + Residual Lifetime Value*

*Historical Lifetime Value = all the past transactions value * margin*

*Residual Lifetime Value = expected nr of transactions predicted revenue per future transaction margin*

Residual Lifetime Value is what will be calculated using the model approach.

Before starting to consider the modelling of the CLV we have to understand the business model we are in. The main division here is a contractual vs non-contractual setting. In a non-contractual we do not know the exact time when the customer churns – what happens after the first purchase – will the customer return? Is he still alive for our business? When is the customer gonna buy again? Retail, hotel reservations or doctor appointments can be a good example of a non-contractual business. Whereas in a contractual model the customer has a defined period of using the goods or services – we know exactly when the cooperation stops. Therefore in this model we do not have to deal with the uncertainty of the probability of being alive. The examples of such a model are insurance policies or fitness club memberships.

In contractual contexts managers are interested in predicting customer retention while in non-contractual contexts the focus is more on predicting future customer activity, the probability of being alive and the predicted contribution margin because there can be a chance that the customer will come back in the future.

Depending on the business model different approaches are used: survival-based approaches are a better fit for contractual businesses, however non-contractual businesses are better predicted with exponential models.

In this article I will focus on non-contractual businesses and CLV models.

**RFM concept**

The RFM is the abbreviation for Recency, Frequency and Monetary Value. This is the concept on which the datasets are prepared for the further modelling however it can also be used independently for customer segmentation.

If we look at the history of the customers’ purchases we will notice that they differ within different parameters: the number and value of purchases, the time between purchases, when the last purchase was made. All these parameters help us to understand the value of each customer and RFM helps us to calculate these parameters.

Frequency, or better called repeat frequency, is the number of purchases the customer made beyond the first one.

Recency represents the age of the customer at the moment of the last purchase meaning the period between the first and the last customer purchase.

Monetary value is the average value of the customer’s order which equals the total customer revenue divided by the number of all purchases.

For the further modelling we will also need the value T which represents the total age of the customer until the last moment of the analysed period meaning the duration between the first customer’s purchase and the last day of the analysed period.

**RFM customer segmentation**

These are the definitions of the parameters which will be used for further modelling however before jumping into describing the models I would like to mention the segmentation purpose of RFM. As was mentioned above, RFM can be used independently for the customer segmentation. We will use only Recency, Frequency and Monetary value – no need of T value here. For this purpose one of the definitions should be changed a bit – the definition of recency. For the segmentation purposes the recency definition was changed to ‘the period between the last customer’s purchase and the last moment of the analysed period.

For the segmentation we calculate the RFM values for each customer and cluster customers using K-means separately upon each of the parameters. We can make as many clusters for each parameter as we want, however I recommend using the elbow method here in order to identify the most optimal number of clusters – usually 3 or 4 clusters are the best choice.

After that we will be able to identify the most and least valuable clients using the following rules:

the lower the recency, the more valuable the client,

the higher the frequency and monetary value, the more valuable the client.

And this is the way how the RFM can be used independently in order to segment the customers and identify the most profitable and more probable to be longer alive for our business.

And now let’s move to the exponential models.

**Pareto/NBD Model**

Probabilistic models are very good in capturing the heterogeneous behaviour among customers. The goal of the CLV model here is to predict using historical data what is gonna happen in the future for the existing customers.

Pareto/Negative Binomial Distribution Model uses essentially two dimensions to characterise the behaviour of customers:

a transactional rate λ (lambda) which means random distributed customer’s purchases within a given window and is modelled via Poisson Distribution

a dropout rate μ (mu) which means a random customer dropout from the company and is explained using the exponential distribution (Pareto)

The heterogeneity of both these rates follow a Gamma distribution and vary independently across customers.

The idea of the model is to come up with the estimate of these two parameters at the individual level.

**BG/NBD Model**

Beta-geometric / Negative Binomial model is similar to the previous model. One thing which differs the most is the way of how the dropout event is looked at. The Pareto model suggests that the customer can churn at any moment and it is not linked with the purchase. Whereas the BG model assumes that the customer churns immediately after the purchase.

Two main assumptions about the purchase and dropout rate are the same for both models.

For both described models only two parameters of the RFM concept are used – recency and frequency.

And only with these two parameters the models are able to predict the expected number of transactions during the customer lifetime, the expected number of transactions during some predefined period of time and the probability of the customer to be still alive at the current period of time. Having these 3 values we can also calculate the expected future revenue for the customer which equals average order value multiplied by the expected number of transactions over the lifetime. However, the monetary value will not be modelled as the rest of the parameters but will be taken from the historical data only.

This can be improved by another model – Gamma-Gamma model

**Gamma-Gamma model**

Gamma-Gamma model predicts the values for the future transactions.

For this model the customer has several transactions with different values and the model calculates the expected monetary value of a customer based on the customer behaviour.

Main assumptions of the model are:

The values of the customer’s transactions vary randomly among the average transaction value

Average transaction values can be different across the customers but do not vary over time for any given individual

The distribution of average transaction values across customers is independent of the transaction process

And now having all future values we can finally calculate the Customer Lifetime Value:

**LTV*** = Historical Lifetime Value + Residual Lifetime Value*

*Historical Lifetime Value = all the past transactions value * margin*

*Residual Lifetime Value = expected nr of transactions predicted revenue per future transaction margin*

Expected number of transactions we will take from Pareto/NBD or BG/NBD model, predicted revenue from Gamma-Gamma model and margin from our revenue and costs business calculations.

**Best practices**

Always segment your customers.

As I have already mentioned, not all the customers are the same – they come from different sources and marketing campaigns, use test orders, loyal programs or discounts, can be from small villages or big cities. We cannot treat all of them in the same way. Segment your customers, check different cohorts, check all the parameters for them.

Use at least 3 customer interpurchase time periods for the model training set and half of the training set for the validation.

If the dataset lets you better use 5 or even 10 times inter-purchase time period for training the model. What we want to avoid here is that you train or validate the model on a few weeks period when the customer makes the purchase once in a few months.

Clean your data before any CLV analysis very carefully.

Firstly, in the client’s CRMs there can be many test accounts that do not actually generate the revenue for the company but were created only to test the system – find out about them and get rid of them.

Secondly, there is a huge chance that there are individuals who create several accounts – we do not want to count them as different people, we want to treat them as the same user. Therefore the aggregation of the different accounts having the same emails, phone numbers, names and surnames and addresses (and different combinations of these parameters) is very important.

**Why do we need CLV?**

One of the most important things that CLV gives us is the opportunity to make a customer segmentation. Using the modelled parameters such as probability of being alive, the expected number of future transactions or the expected revenue, and also other customer characteristics we can create different groups of clients for which we can create different effective marketing strategies. We can define the most valuable clients and those who are not likely to come back. These groups can be then linked with Google Ads, Google marketing Platform, Facebook or used in Marketing Automation.

All this, better automated, leads to a better understanding of the budget allocation and, therefore, optimization.

Knowing the cost for each customer and segment we can better understand how much we can spend on acquiring one new customer and who this customer should be.

**Literature**

Peter S. Fader, Bruce G. S. Hardie.

__The Gamma-Gamma Model of Monetary Value__, 2013Peter S. Fader, Bruce G.S. Hardie, Ka Lok Lee.

__“Counting Your Customers” the Easy Way: An Alternative to the Pareto/NBD Model, 2005__Siddarth S. Singh, Dipak C. Jain.

__Measuring Customer Lifetime Value: Models and Analysis, 2013__medium.com: Harminder Puri, Luca de Angelis, Elizaveta Lebedeva, Barış Karaman, Marie Sharapa, Julien Kervizic

PyData Conference, Seattle 2017, Jean-Rene Gauthier, Ben Van Dyke. Implementing and Training Predictive Customer Lifetime Value Models in Python (

__https://www.youtube.com/watch?v=gx6oHqpRgpY__)PyData Conference, Los Angeles 2018, Brian Bloniarz. Customer Lifetime Value: Models, Metrics and a Multitude of Uses (

__https://www.youtube.com/watch?v=f3ua_dDjTNU__)