lifetimes package¶
lifetimes.generate_data module¶
-
lifetimes.generate_data.
beta_geometric_beta_binom_model
(N, alpha, beta, gamma, delta, size=1)¶ Generate artificial data according to the Beta-Geometric/Beta-Binomial Model.
You may wonder why we can have frequency = n_periods, when frequency excludes their first order. When a customer purchases something, they are born, _and in the next period_ we start asking questions about their alive-ness. So really they customer has bought frequency + 1, and been observed for n_periods + 1
Parameters: - N (array_like) – Number of transaction opportunities for new customers.
- beta, gamma, delta (alpha,) – Parameters in the model. See [1]_
- size (int, optional) – The number of customers to generate
Returns: DataFrame – with index as customer_ids and the following columns: ‘frequency’, ‘recency’, ‘n_periods’, ‘lambda’, ‘p’, ‘alive’, ‘customer_id’
References
[1] Fader, Peter S., Bruce G.S. Hardie, and Jen Shang (2010), “Customer-Base Analysis in a Discrete-Time Noncontractual Setting,” Marketing Science, 29 (6), 1086-1108.
-
lifetimes.generate_data.
beta_geometric_nbd_model
(T, r, alpha, a, b, size=1)¶ Generate artificial data according to the BG/NBD model.
See [1] for model details
Parameters: - T (array_like) – The length of time observing new customers.
- alpha, a, b (r,) – Parameters in the model. See [1]_
- size (int, optional) – The number of customers to generate
Returns: DataFrame – With index as customer_ids and the following columns: ‘frequency’, ‘recency’, ‘T’, ‘lambda’, ‘p’, ‘alive’, ‘customer_id’
References
-
lifetimes.generate_data.
beta_geometric_nbd_model_transactional_data
(T, r, alpha, a, b, observation_period_end='2019-1-1', freq='D', size=1)¶ Generate artificial transactional data according to the BG/NBD model.
See [1] for model details
Parameters: - T (int, float or array_like) – The length of time observing new customers.
- alpha, a, b (r,) – Parameters in the model. See [1]_
- observation_period_end (date_like) – The date observation ends
- freq (string, optional) – Default ‘D’ for days, ‘W’ for weeks, ‘h’ for hours
- size (int, optional) – The number of customers to generate
Returns: DataFrame – The following columns: ‘customer_id’, ‘date’
References
-
lifetimes.generate_data.
modified_beta_geometric_nbd_model
(T, r, alpha, a, b, size=1)¶ Generate artificial data according to the MBG/NBD model.
See [3], [4] for model details
Parameters: - T (array_like) – The length of time observing new customers.
- alpha, a, b (r,) – Parameters in the model. See [1]_
- size (int, optional) – The number of customers to generate
Returns: DataFrame – with index as customer_ids and the following columns: ‘frequency’, ‘recency’, ‘T’, ‘lambda’, ‘p’, ‘alive’, ‘customer_id’
References
[2] Batislam, E.P., M. Denizel, A. Filiztekin (2007), “Empirical validation and comparison of models for customer base analysis,” International Journal of Research in Marketing, 24 (3), 201-209.
-
lifetimes.generate_data.
pareto_nbd_model
(T, r, alpha, s, beta, size=1)¶ Generate artificial data according to the Pareto/NBD model.
See [2]_ for model details.
Parameters: - T (array_like) – The length of time observing new customers.
- alpha, s, beta (r,) – Parameters in the model. See [1]_
- size (int, optional) – The number of customers to generate
Returns: obj: DataFrame – with index as customer_ids and the following columns: ‘frequency’, ‘recency’, ‘T’, ‘lambda’, ‘mu’, ‘alive’, ‘customer_id’
References
lifetimes.plotting module¶
-
lifetimes.plotting.
plot_period_transactions
(model, max_frequency=7, title='Frequency of Repeat Transactions', xlabel='Number of Calibration Period Transactions', ylabel='Customers', **kwargs)¶ Plot a figure with period actual and predicted transactions.
Parameters: - model (lifetimes model) – A fitted lifetimes model.
- max_frequency (int, optional) – The maximum frequency to plot.
- title (str, optional) – Figure title
- xlabel (str, optional) – Figure xlabel
- ylabel (str, optional) – Figure ylabel
- kwargs – Passed into the matplotlib.pyplot.plot command.
Returns: axes (matplotlib.AxesSubplot)
-
lifetimes.plotting.
plot_calibration_purchases_vs_holdout_purchases
(model, calibration_holdout_matrix, kind='frequency_cal', n=7, **kwargs)¶ Plot calibration purchases vs holdout.
This currently relies too much on the lifetimes.util calibration_and_holdout_data function.
Parameters: - model (lifetimes model) – A fitted lifetimes model.
- calibration_holdout_matrix (pandas DataFrame) – DataFrame from calibration_and_holdout_data function.
- kind (str, optional) –
- x-axis :”frequency_cal”. Purchases in calibration period,
- ”recency_cal”. Age of customer at last purchase, “T_cal”. Age of customer at the end of calibration period, “time_since_last_purchase”. Time since user made last purchase
- n (int, optional) – Number of ticks on the x axis
Returns: axes (matplotlib.AxesSubplot)
-
lifetimes.plotting.
plot_frequency_recency_matrix
(model, T=1, max_frequency=None, max_recency=None, title=None, xlabel="Customer's Historical Frequency", ylabel="Customer's Recency", **kwargs)¶ Plot recency frequecy matrix as heatmap.
Plot a figure of expected transactions in T next units of time by a customer’s frequency and recency.
Parameters: - model (lifetimes model) – A fitted lifetimes model.
- T (fload, optional) – Next units of time to make predictions for
- max_frequency (int, optional) – The maximum frequency to plot. Default is max observed frequency.
- max_recency (int, optional) – The maximum recency to plot. This also determines the age of the customer. Default to max observed age.
- title (str, optional) – Figure title
- xlabel (str, optional) – Figure xlabel
- ylabel (str, optional) – Figure ylabel
- kwargs – Passed into the matplotlib.imshow command.
Returns: axes (matplotlib.AxesSubplot)
-
lifetimes.plotting.
plot_probability_alive_matrix
(model, max_frequency=None, max_recency=None, title='Probability Customer is Alive, \nby Frequency and Recency of a Customer', xlabel="Customer's Historical Frequency", ylabel="Customer's Recency", **kwargs)¶ Plot probability alive matrix as heatmap.
Plot a figure of the probability a customer is alive based on their frequency and recency.
Parameters: - model (lifetimes model) – A fitted lifetimes model.
- max_frequency (int, optional) – The maximum frequency to plot. Default is max observed frequency.
- max_recency (int, optional) – The maximum recency to plot. This also determines the age of the customer. Default to max observed age.
- title (str, optional) – Figure title
- xlabel (str, optional) – Figure xlabel
- ylabel (str, optional) – Figure ylabel
- kwargs – Passed into the matplotlib.imshow command.
Returns: axes (matplotlib.AxesSubplot)
-
lifetimes.plotting.
plot_expected_repeat_purchases
(model, title='Expected Number of Repeat Purchases per Customer', xlabel='Time Since First Purchase', ax=None, label=None, **kwargs)¶ Plot expected repeat purchases on calibration period .
Parameters: - model (lifetimes model) – A fitted lifetimes model.
- max_frequency (int, optional) – The maximum frequency to plot.
- title (str, optional) – Figure title
- xlabel (str, optional) – Figure xlabel
- ax (matplotlib.AxesSubplot, optional) – Using user axes
- label (str, optional) – Label for plot.
- kwargs – Passed into the matplotlib.pyplot.plot command.
Returns: axes (matplotlib.AxesSubplot)
-
lifetimes.plotting.
plot_history_alive
(model, t, transactions, datetime_col, freq='D', start_date=None, ax=None, **kwargs)¶ Draw a graph showing the probability of being alive for a customer in time.
Parameters: - model (lifetimes model) – A fitted lifetimes model.
- t (int) – the number of time units since the birth we want to draw the p_alive
- transactions (pandas DataFrame) – DataFrame containing the transactions history of the customer_id
- datetime_col (str) – The column in the transactions that denotes the datetime the purchase was made
- freq (str, optional) – Default ‘D’ for days. Other examples= ‘W’ for weekly
- start_date (datetime, optional) – Limit xaxis to start date
- ax (matplotlib.AxesSubplot, optional) – Using user axes
- kwargs – Passed into the matplotlib.pyplot.plot command.
Returns: axes (matplotlib.AxesSubplot)
-
lifetimes.plotting.
plot_cumulative_transactions
(model, transactions, datetime_col, customer_id_col, t, t_cal, datetime_format=None, freq='D', set_index_date=False, title='Tracking Cumulative Transactions', xlabel='day', ylabel='Cumulative Transactions', ax=None, **kwargs)¶ Plot a figure of the predicted and actual cumulative transactions of users.
Parameters: - model (lifetimes model) – A fitted lifetimes model
- transactions (pandas DataFrame) – DataFrame containing the transactions history of the customer_id
- datetime_col (str) – The column in transactions that denotes the datetime the purchase was made.
- customer_id_col (str) – The column in transactions that denotes the customer_id
- t (float) – The number of time units since the begining of data for which we want to calculate cumulative transactions
- t_cal (float) – A marker used to indicate where the vertical line for plotting should be.
- datetime_format (str, optional) – A string that represents the timestamp format. Useful if Pandas can’t understand the provided format.
- freq (str, optional) – Default ‘D’ for days, ‘W’ for weeks, ‘M’ for months… etc. Full list here: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#dateoffset-objects
- set_index_date (bool, optional) – When True set date as Pandas DataFrame index, default False - number of time units
- title (str, optional) – Figure title
- xlabel (str, optional) – Figure xlabel
- ylabel (str, optional) – Figure ylabel
- ax (matplotlib.AxesSubplot, optional) – Using user axes
- kwargs – Passed into the pandas.DataFrame.plot command.
Returns: axes (matplotlib.AxesSubplot)
-
lifetimes.plotting.
plot_incremental_transactions
(model, transactions, datetime_col, customer_id_col, t, t_cal, datetime_format=None, freq='D', set_index_date=False, title='Tracking Daily Transactions', xlabel='day', ylabel='Transactions', ax=None, **kwargs)¶ Plot a figure of the predicted and actual incremental transactions of users.
Parameters: - model (lifetimes model) – A fitted lifetimes model
- transactions (pandas DataFrame) – DataFrame containing the transactions history of the customer_id
- datetime_col (str) – The column in transactions that denotes the datetime the purchase was made.
- customer_id_col (str) – The column in transactions that denotes the customer_id
- t (float) – The number of time units since the begining of data for which we want to calculate cumulative transactions
- t_cal (float) – A marker used to indicate where the vertical line for plotting should be.
- datetime_format (str, optional) – A string that represents the timestamp format. Useful if Pandas can’t understand the provided format.
- freq (str, optional) – Default ‘D’ for days, ‘W’ for weeks, ‘M’ for months… etc. Full list here: http://pandas.pydata.org/pandas-docs/stable/timeseries.html#dateoffset-objects
- set_index_date (bool, optional) – When True set date as Pandas DataFrame index, default False - number of time units
- title (str, optional) – Figure title
- xlabel (str, optional) – Figure xlabel
- ylabel (str, optional) – Figure ylabel
- ax (matplotlib.AxesSubplot, optional) – Using user axes
- kwargs – Passed into the pandas.DataFrame.plot command.
Returns: axes (matplotlib.AxesSubplot)
-
lifetimes.plotting.
plot_transaction_rate_heterogeneity
(model, suptitle='Heterogeneity in Transaction Rate', xlabel='Transaction Rate', ylabel='Density', suptitle_fontsize=14, **kwargs)¶ Plot the estimated gamma distribution of lambda (customers’ propensities to purchase).
Parameters: - model (lifetimes model) – A fitted lifetimes model, for now only for BG/NBD
- suptitle (str, optional) – Figure suptitle
- xlabel (str, optional) – Figure xlabel
- ylabel (str, optional) – Figure ylabel
- kwargs – Passed into the matplotlib.pyplot.plot command.
Returns: axes (matplotlib.AxesSubplot)
-
lifetimes.plotting.
plot_dropout_rate_heterogeneity
(model, suptitle='Heterogeneity in Dropout Probability', xlabel='Dropout Probability p', ylabel='Density', suptitle_fontsize=14, **kwargs)¶ Plot the estimated beta distribution of p.
p - (customers’ probability of dropping out immediately after a transaction).
Parameters: - model (lifetimes model) – A fitted lifetimes model, for now only for BG/NBD
- suptitle (str, optional) – Figure suptitle
- xlabel (str, optional) – Figure xlabel
- ylabel (str, optional) – Figure ylabel
- kwargs – Passed into the matplotlib.pyplot.plot command.
Returns: axes (matplotlib.AxesSubplot)
lifetimes.utils module¶
Lifetimes utils and helpers.
-
lifetimes.utils.
calibration_and_holdout_data
(transactions, customer_id_col, datetime_col, calibration_period_end, observation_period_end=None, freq='D', freq_multiplier=1, datetime_format=None, monetary_value_col=None, include_first_transaction=False)¶ Create a summary of each customer over a calibration and holdout period.
This function creates a summary of each customer over a calibration and holdout period (training and testing, respectively). It accepts transaction data, and returns a DataFrame of sufficient statistics.
Parameters: - transactions – a Pandas DataFrame that contains the customer_id col and the datetime col.
- customer_id_col (string) – the column in transactions DataFrame that denotes the customer_id
- datetime_col (string) – the column in transactions that denotes the datetime the purchase was made.
- calibration_period_end – a period to limit the calibration to, inclusive.
- observation_period_end – a string or datetime to denote the final date of the study. Events after this date are truncated. If not given, defaults to the max ‘datetime_col’.
- freq (string, optional) – Default: ‘D’ for days. Possible values listed here: https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units
- freq_multiplier (int, optional) – Default: 1. Useful for getting exact recency & T. Example: With freq=’D’ and freq_multiplier=1, we get recency=591 and T=632 With freq=’h’ and freq_multiplier=24, we get recency=590.125 and T=631.375
- datetime_format (string, optional) – a string that represents the timestamp format. Useful if Pandas can’t understand the provided format.
- monetary_value_col (string, optional) – the column in transactions that denotes the monetary value of the transaction. Optional, only needed for customer lifetime value estimation models.
- include_first_transaction (bool, optional) – Default: False By default the first transaction is not included while calculating frequency and monetary_value. Can be set to True to include it. Should be False if you are going to use this data with any fitters in lifetimes package
Returns: obj: DataFrame – A dataframe with columns frequency_cal, recency_cal, T_cal, frequency_holdout, duration_holdout If monetary_value_col isn’t None, the dataframe will also have the columns monetary_value_cal and monetary_value_holdout.
-
lifetimes.utils.
summary_data_from_transaction_data
(transactions, customer_id_col, datetime_col, monetary_value_col=None, datetime_format=None, observation_period_end=None, freq='D', freq_multiplier=1, include_first_transaction=False)¶ Return summary data from transactions.
- This transforms a DataFrame of transaction data of the form:
- customer_id, datetime [, monetary_value]
- to a DataFrame of the form:
- customer_id, frequency, recency, T [, monetary_value]
Parameters: - transactions – a Pandas DataFrame that contains the customer_id col and the datetime col.
- customer_id_col (string) – the column in transactions DataFrame that denotes the customer_id
- datetime_col (string) – the column in transactions that denotes the datetime the purchase was made.
- monetary_value_col (string, optional) – the columns in the transactions that denotes the monetary value of the transaction. Optional, only needed for customer lifetime value estimation models.
- observation_period_end (datetime, optional) – a string or datetime to denote the final date of the study. Events after this date are truncated. If not given, defaults to the max ‘datetime_col’.
- datetime_format (string, optional) – a string that represents the timestamp format. Useful if Pandas can’t understand the provided format.
- freq (string, optional) – Default: ‘D’ for days. Possible values listed here: https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units
- freq_multiplier (int, optional) – Default: 1. Useful for getting exact recency & T. Example: With freq=’D’ and freq_multiplier=1, we get recency=591 and T=632 With freq=’h’ and freq_multiplier=24, we get recency=590.125 and T=631.375
- include_first_transaction (bool, optional) – Default: False By default the first transaction is not included while calculating frequency and monetary_value. Can be set to True to include it. Should be False if you are going to use this data with any fitters in lifetimes package
Returns: obj: DataFrame: – customer_id, frequency, recency, T [, monetary_value]
-
lifetimes.utils.
calculate_alive_path
(model, transactions, datetime_col, t, freq='D')¶ Calculate alive path for plotting alive history of user.
Uses the
conditional_probability_alive()
method of the model to achieve the path.Parameters: - model – A fitted lifetimes model
- transactions (DataFrame) – a Pandas DataFrame containing the transactions history of the customer_id
- datetime_col (string) – the column in the transactions that denotes the datetime the purchase was made
- t (array_like) – the number of time units since the birth for which we want to draw the p_alive
- freq (string, optional) – Default: ‘D’ for days. Possible values listed here: https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units
Returns: obj: Series – A pandas Series containing the p_alive as a function of T (age of the customer)
-
lifetimes.utils.
expected_cumulative_transactions
(model, transactions, datetime_col, customer_id_col, t, datetime_format=None, freq='D', freq_multiplier=1, set_index_date=False)¶ Get expected and actual repeated cumulative transactions.
Uses the
expected_number_of_purchases_up_to_time()
method from the fitted model to predict the cumulative number of purchases.This function follows the formulation on page 8 of [1]_.
In more detail, we take only the customers who have made their first transaction before the specific date and then multiply them by the distribution of the
expected_number_of_purchases_up_to_time()
for their whole future. Doing that for all dates and then summing the distributions will give us the complete cumulative purchases.Parameters: - model – A fitted lifetimes model
- transactions – a Pandas DataFrame containing the transactions history of the customer_id
- datetime_col (string) – the column in transactions that denotes the datetime the purchase was made.
- customer_id_col (string) – the column in transactions that denotes the customer_id
- t (int) – the number of time units since the begining of data for which we want to calculate cumulative transactions
- datetime_format (string, optional) – a string that represents the timestamp format. Useful if Pandas can’t understand the provided format.
- freq (string, optional) – Default: ‘D’ for days. Possible values listed here: https://numpy.org/devdocs/reference/arrays.datetime.html#datetime-units
- freq_multiplier (int, optional) – Default: 1. Useful for getting exact recency & T. Example: With freq=’D’ and freq_multiplier=1, we get recency=591 and T=632 With freq=’h’ and freq_multiplier=24, we get recency=590.125 and T=631.375
- set_index_date (bool, optional) – when True set date as Pandas DataFrame index, default False - number of time units
Returns: obj: DataFrame – A dataframe with columns actual, predicted
References
[1] Fader, Peter S., Bruce G.S. Hardie, and Ka Lok Lee (2005), A Note on Implementing the Pareto/NBD Model in MATLAB. http://brucehardie.com/notes/008/
All fitters from fitters directory.
-
class
lifetimes.
BetaGeoFitter
(penalizer_coef=0.0)¶ Bases:
lifetimes.fitters.BaseFitter
Also known as the BG/NBD model.
Based on [2]_, this model has the following assumptions:
- Each individual, i, has a hidden lambda_i and p_i parameter
- These come from a population wide Gamma and a Beta distribution respectively.
- Individuals purchases follow a Poisson process with rate lambda_i*t .
- After each purchase, an individual has a p_i probability of dieing (never buying again).
Parameters: penalizer_coef (float) – The coefficient applied to an l2 norm on the parameters -
penalizer_coef
¶ The coefficient applied to an l2 norm on the parameters
Type: float
-
params_
¶ The fitted parameters of the model
Type: obj: Series
-
data
¶ A DataFrame with the values given in the call to fit
Type: obj: DataFrame
-
variance_matrix_
¶ A DataFrame with the variance matrix of the parameters.
Type: obj: DataFrame
-
confidence_intervals_
¶ A DataFrame 95% confidence intervals of the parameters
Type: obj: DataFrame
-
standard_errors_
¶ A Series with the standard errors of the parameters
Type: obj: Series
-
summary
¶ A DataFrame containing information about the fitted parameters
Type: obj: DataFrame
References
[2] Fader, Peter S., Bruce G.S. Hardie, and Ka Lok Lee (2005a), “Counting Your Customers the Easy Way: An Alternative to the Pareto/NBD Model,” Marketing Science, 24 (2), 275-84. -
conditional_expected_number_of_purchases_up_to_time
(t, frequency, recency, T)¶ Conditional expected number of purchases up to time.
Calculate the expected number of repeat purchases up to time t for a randomly chosen individual from the population, given they have purchase history (frequency, recency, T).
This function uses equation (10) from [2]_.
Parameters: - t (array_like) – times to calculate the expectation for.
- frequency (array_like) – historical frequency of customer.
- recency (array_like) – historical recency of customer.
- T (array_like) – age of the customer.
Returns: array_like
References
[2] Fader, Peter S., Bruce G.S. Hardie, and Ka Lok Lee (2005a), “Counting Your Customers the Easy Way: An Alternative to the Pareto/NBD Model,” Marketing Science, 24 (2), 275-84.
-
conditional_probability_alive
(frequency, recency, T)¶ Compute conditional probability alive.
Compute the probability that a customer with history (frequency, recency, T) is currently alive.
From http://www.brucehardie.com/notes/021/palive_for_BGNBD.pdf
Parameters: - frequency (array or scalar) – historical frequency of customer.
- recency (array or scalar) – historical recency of customer.
- T (array or scalar) – age of the customer.
Returns: array – value representing a probability
-
conditional_probability_alive_matrix
(max_frequency=None, max_recency=None)¶ Compute the probability alive matrix.
Uses the
conditional_probability_alive()
method to get calculate the matrix.Parameters: - max_frequency (float, optional) – the maximum frequency to plot. Default is max observed frequency.
- max_recency (float, optional) – the maximum recency to plot. This also determines the age of the customer. Default to max observed age.
Returns: matrix – A matrix of the form [t_x: historical recency, x: historical frequency]
-
expected_number_of_purchases_up_to_time
(t)¶ Calculate the expected number of repeat purchases up to time t.
Calculate repeat purchases for a randomly chosen individual from the population.
Equivalent to equation (9) of [2]_.
Parameters: t (array_like) – times to calculate the expection for Returns: array_like References
[2] Fader, Peter S., Bruce G.S. Hardie, and Ka Lok Lee (2005a), “Counting Your Customers the Easy Way: An Alternative to the Pareto/NBD Model,” Marketing Science, 24 (2), 275-84.
-
fit
(frequency, recency, T, weights=None, initial_params=None, verbose=False, tol=1e-07, index=None, **kwargs)¶ Fit a dataset to the BG/NBD model.
Parameters: - frequency (array_like) – the frequency vector of customers’ purchases (denoted x in literature).
- recency (array_like) – the recency vector of customers’ purchases (denoted t_x in literature).
- T (array_like) – customers’ age (time units since first purchase)
- weights (None or array_like) – Number of customers with given frequency/recency/T, defaults to 1 if not specified. Fader and Hardie condense the individual RFM matrix into all observed combinations of frequency/recency/T. This parameter represents the count of customers with a given purchase pattern. Instead of calculating individual loglikelihood, the loglikelihood is calculated for each pattern and multiplied by the number of customers with that pattern.
- initial_params (array_like, optional) – set the initial parameters for the fitter.
- verbose (bool, optional) – set to true to print out convergence diagnostics.
- tol (float, optional) – tolerance for termination of the function minimization process.
- index (array_like, optional) – index for resulted DataFrame which is accessible via self.data
- kwargs – key word arguments to pass to the scipy.optimize.minimize function as options dict
Returns: BetaGeoFitter – with additional properties like
params_
and methods likepredict
-
probability_of_n_purchases_up_to_time
(t, n)¶ Compute the probability of n purchases.
\[P( N(t) = n | \text{model} )\]where N(t) is the number of repeat purchases a customer makes in t units of time.
Comes from equation (8) of [2]_.
Parameters: - t (float) – number units of time
- n (int) – number of purchases
Returns: float – Probability to have n purchases up to t units of time
References
[2] Fader, Peter S., Bruce G.S. Hardie, and Ka Lok Lee (2005a), “Counting Your Customers the Easy Way: An Alternative to the Pareto/NBD Model,” Marketing Science, 24 (2), 275-84.
-
class
lifetimes.
ParetoNBDFitter
(penalizer_coef=0.0)¶ Bases:
lifetimes.fitters.BaseFitter
Pareto NBD fitter [7].
Parameters: penalizer_coef (float) – The coefficient applied to an l2 norm on the parameters -
penalizer_coef
¶ The coefficient applied to an l2 norm on the parameters
Type: float
-
params_
¶ The fitted parameters of the model
Type: obj: OrderedDict
-
data
¶ A DataFrame with the columns given in the call to fit
Type: obj: DataFrame
References
[7] - David C. Schmittlein, Donald G. Morrison and Richard Colombo
- Management Science,Vol. 33, No. 1 (Jan., 1987), pp. 1-24
“Counting Your Customers: Who Are They and What Will They Do Next,”
-
conditional_expected_number_of_purchases_up_to_time
(t, frequency, recency, T)¶ Conditional expected number of purchases up to time.
Calculate the expected number of repeat purchases up to time t for a randomly choose individual from the population, given they have purchase history (frequency, recency, T).
This is equation (41) from: http://brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf
Parameters: - t (array_like) – times to calculate the expectation for.
- frequency (array_like) – historical frequency of customer.
- recency (array_like) – historical recency of customer.
- T (array_like) – age of the customer.
Returns: array_like
-
conditional_probability_alive
(frequency, recency, T)¶ Conditional probability alive.
Compute the probability that a customer with history (frequency, recency, T) is currently alive.
Section 5.1 from (equations (36) and (37)): http://brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf
Parameters: - frequency (float) – historical frequency of customer.
- recency (float) – historical recency of customer.
- T (float) – age of the customer.
Returns: float – value representing a probability
-
conditional_probability_alive_matrix
(max_frequency=None, max_recency=None)¶ Compute the probability alive matrix.
Builds on the
conditional_probability_alive()
method.Parameters: - max_frequency (float, optional) – the maximum frequency to plot. Default is max observed frequency.
- max_recency (float, optional) – the maximum recency to plot. This also determines the age of the customer. Default to max observed age.
Returns: matrix – A matrix of the form [t_x: historical recency, x: historical frequency]
-
conditional_probability_of_n_purchases_up_to_time
(n, t, frequency, recency, T)¶ Return conditional probability of n purchases up to time t.
Calculate the probability of n purchases up to time t for an individual with history frequency, recency and T (age).
The main equation being implemented is (16) from: http://www.brucehardie.com/notes/028/pareto_nbd_conditional_pmf.pdf
Parameters: - n (int) – number of purchases.
- t (a scalar) – time up to which probability should be calculated.
- frequency (float) – historical frequency of customer.
- recency (float) – historical recency of customer.
- T (float) – age of the customer.
Returns: array_like
-
expected_number_of_purchases_up_to_time
(t)¶ Return expected number of repeat purchases up to time t.
Calculate the expected number of repeat purchases up to time t for a randomly choose individual from the population.
Equation (27) from: http://brucehardie.com/notes/009/pareto_nbd_derivations_2005-11-05.pdf
Parameters: t (array_like) – times to calculate the expectation for. Returns: array_like
-
fit
(frequency, recency, T, weights=None, iterative_fitting=1, initial_params=None, verbose=False, tol=0.0001, index=None, fit_method='Nelder-Mead', maxiter=2000, **kwargs)¶ Pareto/NBD model fitter.
Parameters: - frequency (array_like) – the frequency vector of customers’ purchases (denoted x in literature).
- recency (array_like) – the recency vector of customers’ purchases (denoted t_x in literature).
- T (array_like) – customers’ age (time units since first purchase)
- weights (None or array_like) – Number of customers with given frequency/recency/T, defaults to 1 if not specified. Fader and Hardie condense the individual RFM matrix into all observed combinations of frequency/recency/T. This parameter represents the count of customers with a given purchase pattern. Instead of calculating individual log-likelihood, the log-likelihood is calculated for each pattern and multiplied by the number of customers with that pattern.
- iterative_fitting (int, optional) – perform iterative_fitting fits over random/warm-started initial params
- initial_params (array_like, optional) – set the initial parameters for the fitter.
- verbose (bool, optional) – set to true to print out convergence diagnostics.
- tol (float, optional) – tolerance for termination of the function minimization process.
- index (array_like, optional) – index for resulted DataFrame which is accessible via self.data
- fit_method (string, optional) – fit_method to passing to scipy.optimize.minimize
- maxiter (int, optional) – max iterations for optimizer in scipy.optimize.minimize will be overwritten if set in kwargs.
- kwargs – key word arguments to pass to the scipy.optimize.minimize function as options dict
Returns: ParetoNBDFitter – with additional properties like
params_
and methods likepredict
-
-
class
lifetimes.
GammaGammaFitter
(penalizer_coef=0.0)¶ Bases:
lifetimes.fitters.BaseFitter
Fitter for the gamma-gamma model.
It is used to estimate the average monetary value of customer transactions.
This implementation is based on the Excel spreadsheet found in [3]. More details on the derivation and evaluation can be found in [4].
Parameters: penalizer_coef (float) – The coefficient applied to an l2 norm on the parameters -
penalizer_coef
¶ The coefficient applied to an l2 norm on the parameters
Type: float
-
params_
¶ The fitted parameters of the model
Type: obj: OrderedDict
-
data
¶ A DataFrame with the columns given in the call to fit
Type: obj: DataFrame
References
[3] (1, 2) http://www.brucehardie.com/notes/025/ The Gamma-Gamma Model of Monetary Value. [4] (1, 2) Peter S. Fader, Bruce G. S. Hardie, and Ka Lok Lee (2005), “RFM and CLV: Using iso-value curves for customer base analysis”, Journal of Marketing Research, 42 (November), 415-430. -
penalizer_coef
The coefficient applied to an l2 norm on the parameters
Type: float
-
params_
The fitted parameters of the model
Type: obj: Series
-
data
A DataFrame with the values given in the call to fit
Type: obj: DataFrame
-
variance_matrix_
¶ A DataFrame with the variance matrix of the parameters.
Type: obj: DataFrame
-
confidence_intervals_
¶ A DataFrame 95% confidence intervals of the parameters
Type: obj: DataFrame
-
standard_errors_
¶ A Series with the standard errors of the parameters
Type: obj: Series
-
summary
¶ A DataFrame containing information about the fitted parameters
Type: obj: DataFrame
-
conditional_expected_average_profit
(frequency=None, monetary_value=None)¶ Conditional expectation of the average profit.
This method computes the conditional expectation of the average profit per transaction for a group of one or more customers.
Equation (5) from: http://www.brucehardie.com/notes/025/
Parameters: - frequency (array_like, optional) – a vector containing the customers’ frequencies. Defaults to the whole set of frequencies used for fitting the model.
- monetary_value (array_like, optional) – a vector containing the customers’ monetary values. Defaults to the whole set of monetary values used for fitting the model.
Returns: array_like – The conditional expectation of the average profit per transaction
-
customer_lifetime_value
(transaction_prediction_model, frequency, recency, T, monetary_value, time=12, discount_rate=0.01, freq='D')¶ Return customer lifetime value.
This method computes the average lifetime value for a group of one or more customers.
Parameters: - transaction_prediction_model (model) – the model to predict future transactions, literature uses pareto/ndb models but we can also use a different model like beta-geo models
- frequency (array_like) – the frequency vector of customers’ purchases (denoted x in literature).
- recency (the recency vector of customers' purchases) – (denoted t_x in literature).
- T (array_like) – customers’ age (time units since first purchase)
- monetary_value (array_like) – the monetary value vector of customer’s purchases (denoted m in literature).
- time (float, optional) – the lifetime expected for the user in months. Default: 12
- discount_rate (float, optional) – the monthly adjusted discount rate. Default: 0.01
- freq (string, optional) – {“D”, “H”, “M”, “W”} for day, hour, month, week. This represents what unit of time your T is measure in.
Returns: Series – Series object with customer ids as index and the estimated customer lifetime values as values
-
fit
(frequency, monetary_value, weights=None, initial_params=None, verbose=False, tol=1e-07, index=None, q_constraint=False, **kwargs)¶ Fit the data to the Gamma/Gamma model.
Parameters: - frequency (array_like) – the frequency vector of customers’ purchases (denoted x in literature).
- monetary_value (array_like) – the monetary value vector of customer’s purchases (denoted m in literature).
- weights (None or array_like) – Number of customers with given frequency/monetary_value, defaults to 1 if not specified. Fader and Hardie condense the individual RFM matrix into all observed combinations of frequency/monetary_value. This parameter represents the count of customers with a given purchase pattern. Instead of calculating individual loglikelihood, the loglikelihood is calculated for each pattern and multiplied by the number of customers with that pattern.
- initial_params (array_like, optional) – set the initial parameters for the fitter.
- verbose (bool, optional) – set to true to print out convergence diagnostics.
- tol (float, optional) – tolerance for termination of the function minimization process.
- index (array_like, optional) – index for resulted DataFrame which is accessible via self.data
- q_constraint (bool, optional) – when q < 1, population mean will result in a negative value leading to negative CLV outputs. If True, we penalize negative values of q to avoid this issue.
- kwargs – key word arguments to pass to the scipy.optimize.minimize function as options dict
Returns: GammaGammaFitter – fitted and with parameters estimated
-
-
class
lifetimes.
ModifiedBetaGeoFitter
(penalizer_coef=0.0)¶ Bases:
lifetimes.fitters.beta_geo_fitter.BetaGeoFitter
Also known as the MBG/NBD model.
Based on [5], [6], this model has the following assumptions: 1) Each individual,
i
, has a hiddenlambda_i
andp_i
parameter 2) These come from a population wide Gamma and a Beta distributionrespectively.- Individuals purchases follow a Poisson process with rate \(\lambda_i*t\) .
- At the beginning of their lifetime and after each purchase, an individual has a p_i probability of dieing (never buying again).
References
[5] Batislam, E.P., M. Denizel, A. Filiztekin (2007), “Empirical validation and comparison of models for customer base analysis,” International Journal of Research in Marketing, 24 (3), 201-209. [6] Wagner, U. and Hoppe D. (2008), “Erratum on the MBG/NBD Model,” International Journal of Research in Marketing, 25 (3), 225-226. -
penalizer_coef
¶ The coefficient applied to an l2 norm on the parameters
Type: float
-
params_
¶ The fitted parameters of the model
Type: obj: Series
-
data
¶ A DataFrame with the values given in the call to fit
Type: obj: DataFrame
-
variance_matrix_
¶ A DataFrame with the variance matrix of the parameters.
Type: obj: DataFrame
-
confidence_intervals_
¶ A DataFrame 95% confidence intervals of the parameters
Type: obj: DataFrame
-
standard_errors_
¶ A Series with the standard errors of the parameters
Type: obj: Series
-
summary
¶ A DataFrame containing information about the fitted parameters
Type: obj: DataFrame
-
conditional_expected_number_of_purchases_up_to_time
(t, frequency, recency, T)¶ Conditional expected number of repeat purchases up to time t.
Calculate the expected number of repeat purchases up to time t for a randomly choose individual from the population, given they have purchase history (frequency, recency, T) See Wagner, U. and Hoppe D. (2008).
Parameters: - t (array_like) – times to calculate the expectation for.
- frequency (array_like) – historical frequency of customer.
- recency (array_like) – historical recency of customer.
- T (array_like) – age of the customer.
Returns: array_like
-
conditional_probability_alive
(frequency, recency, T)¶ Conditional probability alive.
Compute the probability that a customer with history (frequency, recency, T) is currently alive. From https://www.researchgate.net/publication/247219660_Empirical_validation_and_comparison_of_models_for_customer_base_analysis Appendix A, eq. (5)
Parameters: - frequency (array or float) – historical frequency of customer.
- recency (array or float) – historical recency of customer.
- T (array or float) – age of the customer.
Returns: array – value representing probability of being alive
-
expected_number_of_purchases_up_to_time
(t)¶ Return expected number of repeat purchases up to time t.
Calculate the expected number of repeat purchases up to time t for a randomly choose individual from the population.
Parameters: t (array_like) – times to calculate the expectation for Returns: array_like
-
fit
(frequency, recency, T, weights=None, initial_params=None, verbose=False, tol=1e-07, index=None, **kwargs)¶ Fit the data to the MBG/NBD model.
Parameters: - frequency (array_like) – the frequency vector of customers’ purchases (denoted x in literature).
- recency (array_like) – the recency vector of customers’ purchases (denoted t_x in literature).
- T (array_like) – customers’ age (time units since first purchase)
- weights (None or array_like) – Number of customers with given frequency/recency/T, defaults to 1 if not specified. Fader and Hardie condense the individual RFM matrix into all observed combinations of frequency/recency/T. This parameter represents the count of customers with a given purchase pattern. Instead of calculating individual log-likelihood, the log-likelihood is calculated for each pattern and multiplied by the number of customers with that pattern.
- verbose (bool, optional) – set to true to print out convergence diagnostics.
- tol (float, optional) – tolerance for termination of the function minimization process.
- index (array_like, optional) – index for resulted DataFrame which is accessible via self.data
- kwargs – key word arguments to pass to the scipy.optimize.minimize function as options dict
Returns: ModifiedBetaGeoFitter – With additional properties and methods like
params_
andpredict
-
probability_of_n_purchases_up_to_time
(t, n)¶ Compute the probability of n purchases up to time t.
\[P( N(t) = n | \text{model} )\]where N(t) is the number of repeat purchases a customer makes in t units of time.
Parameters: - t (float) – number units of time
- n (int) – number of purchases
Returns: float – Probability to have n purchases up to t units of time
-
class
lifetimes.
BetaGeoBetaBinomFitter
(penalizer_coef=0.0)¶ Bases:
lifetimes.fitters.BaseFitter
Also known as the Beta-Geometric/Beta-Binomial Model [1]_.
Future purchases opportunities are treated as discrete points in time. In the literature, the model provides a better fit than the Pareto/NBD model for a nonprofit organization with regular giving patterns.
The model is estimated with a recency-frequency matrix with n transaction opportunities.
Parameters: penalizer_coef (float) – The coefficient applied to an l2 norm on the parameters -
penalizer_coef
¶ The coefficient applied to an l2 norm on the parameters
Type: float
-
params_
¶ The fitted parameters of the model
Type: obj: Series
-
data
¶ A DataFrame with the values given in the call to fit
Type: obj: DataFrame
-
variance_matrix_
¶ A DataFrame with the variance matrix of the parameters.
Type: obj: DataFrame
-
confidence_intervals_
¶ A DataFrame 95% confidence intervals of the parameters
Type: obj: DataFrame
-
standard_errors_
¶ A Series with the standard errors of the parameters
Type: obj: Series
-
summary
¶ A DataFrame containing information about the fitted parameters
Type: obj: DataFrame
References
[1] Fader, Peter S., Bruce G.S. Hardie, and Jen Shang (2010), “Customer-Base Analysis in a Discrete-Time Noncontractual Setting,” Marketing Science, 29 (6), 1086-1108. -
conditional_expected_number_of_purchases_up_to_time
(m_periods_in_future, frequency, recency, n_periods)¶ Conditional expected purchases in future time period.
The expected number of future transactions across the next m_periods_in_future transaction opportunities by a customer with purchase history (x, tx, n).
\[E(X(n_{periods}, n_{periods}+m_{periods_in_future})| \alpha, \beta, \gamma, \delta, frequency, recency, n_{periods})\]See (13) in Fader & Hardie 2010.
Parameters: t (array_like) – time n_periods (n+t) Returns: array_like – predicted transactions
-
conditional_probability_alive
(m_periods_in_future, frequency, recency, n_periods)¶ Conditional probability alive.
Conditional probability customer is alive at transaction opportunity n_periods + m_periods_in_future.
\[P(alive at n_periods + m_periods_in_future|alpha, beta, gamma, delta, frequency, recency, n_periods)\]See (A10) in Fader and Hardie 2010.
Parameters: m (array_like) – transaction opportunities Returns: array_like – alive probabilities
-
expected_number_of_transactions_in_first_n_periods
(n)¶ Return expected number of transactions in first n n_periods.
Expected number of transactions occurring across first n transaction opportunities. Used by Fader and Hardie to assess in-sample fit.
\[Pr(X(n) = x| \alpha, \beta, \gamma, \delta)\]See (7) in Fader & Hardie 2010.
Parameters: n (float) – number of transaction opportunities Returns: DataFrame – Predicted values, indexed by x
-
fit
(frequency, recency, n_periods, weights=None, initial_params=None, verbose=False, tol=1e-07, index=None, **kwargs)¶ Fit the BG/BB model.
Parameters: - frequency (array_like) – Total periods with observed transactions
- recency (array_like) – Period of most recent transaction
- n_periods (array_like) – Number of transaction opportunities. Previously called n.
- weights (None or array_like) – Number of customers with given frequency/recency/T, defaults to 1 if not specified. Fader and Hardie condense the individual RFM matrix into all observed combinations of frequency/recency/T. This parameter represents the count of customers with a given purchase pattern. Instead of calculating individual log-likelihood, the log-likelihood is calculated for each pattern and multiplied by the number of customers with that pattern. Previously called n_custs.
- verbose (boolean, optional) – Set to true to print out convergence diagnostics.
- tol (float, optional) – Tolerance for termination of the function minimization process.
- index (array_like, optional) – Index for resulted DataFrame which is accessible via self.data
- kwargs – Key word arguments to pass to the scipy.optimize.minimize function as options dict
Returns: BetaGeoBetaBinomFitter – fitted and with parameters estimated
-