Uplift Modeling¶

Overview¶

For each step, read the explanation, then run the code cell(s) right below it.

You will practice how to:

Load and prepare data for modeling
Conduct a manual hypothesis test
Utilize built-in statistical libraries to conduct a hypothesis test
Tune and train a RandomForestClassifier for uplift modeling
Inspect top and bottom uplift values for the model
Plot ranked uplift values for all customers

Import libraries¶

In [1]:

Copied!





import os
import sys
from pathlib import Path
import pandas as pd
import numpy as np
import math
import scipy.stats as stats
from statsmodels.stats.proportion import test_proportions_2indep
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt

SEED = 1
import os
import sys
from pathlib import Path
import pandas as pd
import numpy as np
import math
import scipy.stats as stats
from statsmodels.stats.proportion import test_proportions_2indep
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt

SEED = 1

In [2]:

Copied!





# Import local libraries
root_dir = Path.cwd().resolve().parents[0]
sys.path.append(str(root_dir))

# Visualization functions
from src.visuals.make_plots import *

# Helper functions
from src.utils.helpers import *

# Load the "autoreload" extension so that code can change
%load_ext autoreload
#%reload_ext autoreload

# Always reload modules so that as you change code in src, it gets loaded
%autoreload 2
# Import local libraries
root_dir = Path.cwd().resolve().parents[0]
sys.path.append(str(root_dir))

# Visualization functions
from src.visuals.make_plots import *

# Helper functions
from src.utils.helpers import *

# Load the "autoreload" extension so that code can change
%load_ext autoreload
#%reload_ext autoreload

# Always reload modules so that as you change code in src, it gets loaded
%autoreload 2

Hair Care Promotion Example¶

Create a dataframe for the Hair-Care-Product.csv data

In the next cell, we

load the dataset,
clean up the column names replacing with _

In [3]:

Copied!

hair_df = pd.read_csv(os.path.join('..', 'data', 'Hair-Care-Product.csv'))

hair_df.columns = [c.lower() for c in hair_df.columns]
hair_df.columns = hair_df.columns.str.replace(r'[._\s]+', '_', regex=True)

hair_df.head()
hair_df = pd.read_csv(os.path.join('..', 'data', 'Hair-Care-Product.csv'))

hair_df.columns = [c.lower() for c in hair_df.columns]
hair_df.columns = hair_df.columns.str.replace(r'[._\s]+', '_', regex=True)

hair_df.head()

Out[3]:

	age	hair_color	u_s_region	validation	promotion_ord	residence_ord
0	25	Black	Southwest	1	1	1
1	30	Black	Northwest	1	0	1
2	45	Red	Northeast	1	0	0
3	35	Blond	Southwest	0	0	1
4	33	Brown	Southwest	0	1	1

Run descriptive statistics

The promotion_ord column has a 1 for members that received a promotion. The purchase column has a 1 for members that purchased a product after the promotion. Run descriptive statistics to identify any key observations from the full dataset.

In [4]:

Copied!

hair_df.describe()
hair_df.describe()

Out[4]:

	purchase	age	validation	promotion_ord	gender_ord	residence_ord
count	10000.000000	10000.000000	10000.000000	10000.000000	10000.000000	10000.000000
mean	0.011200	40.362800	0.324900	0.497600	0.315200	0.705400
std	0.105241	11.555403	0.468361	0.500019	0.464619	0.455886
min	0.000000	21.000000	0.000000	0.000000	0.000000	0.000000
25%	0.000000	30.000000	0.000000	0.000000	0.000000	0.000000
50%	0.000000	40.000000	0.000000	0.000000	0.000000	1.000000
75%	0.000000	50.000000	1.000000	1.000000	1.000000	1.000000
max	1.000000	60.000000	1.000000	1.000000	1.000000	1.000000

Calculate purchase rates for both groups

Compute the purchase rates for the control and treatment groups. These values form the basis for measuring the impact of the promotion.

In [5]:

Copied!





# Split into control and treatment groups
control = hair_df[hair_df['promotion_ord'] == 0]['purchase']
treatment = hair_df[hair_df['promotion_ord'] == 1]['purchase']

n_A = control.shape[0]
n_B = treatment.shape[0]

purchase_A = control.sum()
purchase_B = treatment.sum()

# Calculate purchase rates
p_A = control.mean()
p_B = treatment.mean()

print(f"Control Purchase Rate: {p_A:.4f}")
print(f"Treatment Purchase Rate: {p_B:.4f}")
# Split into control and treatment groups
control = hair_df[hair_df['promotion_ord'] == 0]['purchase']
treatment = hair_df[hair_df['promotion_ord'] == 1]['purchase']

n_A = control.shape[0]
n_B = treatment.shape[0]

purchase_A = control.sum()
purchase_B = treatment.sum()

# Calculate purchase rates
p_A = control.mean()
p_B = treatment.mean()

print(f"Control Purchase Rate: {p_A:.4f}")
print(f"Treatment Purchase Rate: {p_B:.4f}")

Control Purchase Rate: 0.0064
Treatment Purchase Rate: 0.0161

Conduct hypothesis test

Calculate the absolute difference, % lift and conduct the hypothesis test that the difference between the population purchase rate for the treatment group is > the control group. This is a one-tailed upper test and we will calculate the pooled estimator, z test statistic, and use the normal distribution for calculating the p-value.

In [6]:

Copied!





# Absolute difference
diff = p_B - p_A

# Lift
lift = diff / p_A

# Pooled proportion under H0: p_treat = p_ctrl
p_pool = (purchase_B + purchase_A) / (n_B + n_A)

# Standard error using pooled proportion
se_pool = math.sqrt(p_pool * (1 - p_pool) * (1 / n_B + 1 / n_A))

# z statistic
z_stat = diff / se_pool

# Two-sided p-value
#p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))
# One-sided upper p-value
p_value = 1 - stats.norm.cdf(z_stat)

print(f"Absolute Difference: {diff:.4f}")
print(f"% Lift: {lift:.2%}")
print(f"Pooled estimator: {p_pool:.4f}")
print(f"z test statistic: {z_stat:.2f}")
print(f"p-value: {p_value:.6f}")
# Absolute difference
diff = p_B - p_A

# Lift
lift = diff / p_A

# Pooled proportion under H0: p_treat = p_ctrl
p_pool = (purchase_B + purchase_A) / (n_B + n_A)

# Standard error using pooled proportion
se_pool = math.sqrt(p_pool * (1 - p_pool) * (1 / n_B + 1 / n_A))

# z statistic
z_stat = diff / se_pool

# Two-sided p-value
#p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))
# One-sided upper p-value
p_value = 1 - stats.norm.cdf(z_stat)

print(f"Absolute Difference: {diff:.4f}")
print(f"% Lift: {lift:.2%}")
print(f"Pooled estimator: {p_pool:.4f}")
print(f"z test statistic: {z_stat:.2f}")
print(f"p-value: {p_value:.6f}")

Absolute Difference: 0.0097
% Lift: 152.41%
Pooled estimator: 0.0112
z test statistic: 4.61
p-value: 0.000002

Alternative hypothesis testing using statsmodels

Next we will use test_proportions_2indep in the statsmodels.stats.proportion library. This is the more appropriate choice for A/B testing with a binary response because it is designed for comparing two proportions.

In [7]:

Copied!

result = test_proportions_2indep(count1=purchase_B, nobs1=n_B, count2=purchase_A, nobs2=n_A, 
                                 method='wald', compare='diff', alternative='larger')

print(f"Test-statistic: {result.statistic:.2f}")
print(f"p-value: {result.pvalue:.6f}")
result = test_proportions_2indep(count1=purchase_B, nobs1=n_B, count2=purchase_A, nobs2=n_A, 
                                 method='wald', compare='diff', alternative='larger')

print(f"Test-statistic: {result.statistic:.2f}")
print(f"p-value: {result.pvalue:.6f}")

Test-statistic: 4.61
p-value: 0.000002

Another option using using Welch's t-test

ttest_ind can give a similar result when the outcome is coded 0/1, since the group mean equals the group proportion, but it is fundamentally a test of means rather than a proportion-specific test.

In [8]:

Copied!

t_stat, p_value = stats.ttest_ind(treatment, control, equal_var=False, alternative='greater')

print(f"t test statistic: {t_stat:.2f}")
print(f"p-value: {p_value:.6f}")
t_stat, p_value = stats.ttest_ind(treatment, control, equal_var=False, alternative='greater')

print(f"t test statistic: {t_stat:.2f}")
print(f"p-value: {p_value:.6f}")

t test statistic: 4.61
p-value: 0.000002

Demonstrate sample size sensitivity in A/B Testing

First, we will the simulate the A/B test with a smaller sample size of 500 using the z-test statistics and normal approximation.

In [9]:

Copied!





n = 500
purchase_A_new = round(n * p_A,0).astype(int)
purchase_B_new = round(n * p_B,0).astype(int)

# Pooled proportion under H0: p_treat = p_ctrl
p_pool_new = (purchase_B_new + purchase_A_new) / (n * 2)

# Standard error using pooled proportion
se_pool_new = math.sqrt(p_pool_new * (1 - p_pool_new) * (1 / n + 1 / n))

# z statistic
z_stat_new = diff / se_pool_new

p_value_new = 1 - stats.norm.cdf(z_stat_new)

print(f"z test statistic: {z_stat_new:.2f}")
print(f"p-value: {p_value_new:.6f}")
n = 500
purchase_A_new = round(n * p_A,0).astype(int)
purchase_B_new = round(n * p_B,0).astype(int)

# Pooled proportion under H0: p_treat = p_ctrl
p_pool_new = (purchase_B_new + purchase_A_new) / (n * 2)

# Standard error using pooled proportion
se_pool_new = math.sqrt(p_pool_new * (1 - p_pool_new) * (1 / n + 1 / n))

# z statistic
z_stat_new = diff / se_pool_new

p_value_new = 1 - stats.norm.cdf(z_stat_new)

print(f"z test statistic: {z_stat_new:.2f}")
print(f"p-value: {p_value_new:.6f}")

z test statistic: 1.47
p-value: 0.070563

With a sample size of 500 and the same purchase rates, the conclusion would now be to NOT reject the Null hypothesis.

Rerun test_proportions_2indep with the smaller sample size

In this step, we repeat the A/B test using a smaller sample size of 500 and apply test_proportions_2indep with method='wald'.

Although both approaches use a z-style test statistic, they are not using the same standard error. Our manual calculation uses the pooled proportion under the null hypothesis, while test_proportions_2indep(..., method='wald') uses an unpooled variance estimate based on the observed purchase rates in each group.

In [10]:

Copied!

result = test_proportions_2indep(count1=purchase_B_new, nobs1=n, count2=purchase_A_new, nobs2=n, 
                                 method='wald', compare='diff', alternative='larger')

print(f"Test-statistic: {result.statistic:.2f}")
print(f"p-value: {result.pvalue:.6f}")
result = test_proportions_2indep(count1=purchase_B_new, nobs1=n, count2=purchase_A_new, nobs2=n, 
                                 method='wald', compare='diff', alternative='larger')

print(f"Test-statistic: {result.statistic:.2f}")
print(f"p-value: {result.pvalue:.6f}")

Test-statistic: 1.52
p-value: 0.064550

Prepare data for uplift modeling

For simple illustration purposes, we will not use a pipeline for dummy variable preprocessing.

In [11]:

Copied!





hair_df['hair_color'] = hair_df['hair_color'].astype('category')
hair_df['u_s_region'] = hair_df['u_s_region'].astype('category')
hair_df2 = pd.get_dummies(hair_df)
hair_df2.head()
hair_df['hair_color'] = hair_df['hair_color'].astype('category')
hair_df['u_s_region'] = hair_df['u_s_region'].astype('category')
hair_df2 = pd.get_dummies(hair_df)
hair_df2.head()

Out[11]:

	age	validation	promotion_ord	residence_ord	hair_color_Black	hair_color_Blond	hair_color_Brown	hair_color_Red	u_s_region_Northeast	u_s_region_Northwest	u_s_region_Southeast	u_s_region_Southwest
0	25	1	1	1	True	False	False	False	False	False	False	True
1	30	1	0	1	True	False	False	False	False	True	False	False
2	45	1	0	0	False	False	False	True	True	False	False	False
3	35	0	0	1	False	True	False	False	False	False	False	True
4	33	0	1	1	False	False	True	False	False	False	False	True

Split the data into training and test sets

Next, we separate the predictors from the target variable purchase, then create a training set (70%) and a test set (30%).

In [12]:

Copied!

y = hair_df2['purchase']
X = hair_df2.drop(columns=['purchase'])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=SEED)

print('Training set:', X_train.shape)
print('Test set:', X_test.shape)
y = hair_df2['purchase']
X = hair_df2.drop(columns=['purchase'])

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=SEED)

print('Training set:', X_train.shape)
print('Test set:', X_test.shape)

Training set: (7000, 13)
Test set: (3000, 13)

Tune an RandomForestClassifier

Next, we will build and perform some basic tuning of an RandomForestClassifier model. We also evaluate the class_weight paramater due to imbalanced nature of purchase target variable.

In [13]:

Copied!





# Base model
rf = RandomForestClassifier(random_state=SEED)

# Parameter grid
param_grid = {
    "n_estimators": [50, 100, 200],
    "max_depth": [None, 5, 10, 20],
    "min_samples_leaf": [1, 2, 4],
    "class_weight": [None, "balanced"]
}

# Grid search
rf_grid = GridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    scoring="recall",
    cv=5,
    n_jobs=-1,
    verbose=1
)

# Fit on training data
rf_grid.fit(X_train, y_train)

# Best results
print("Best Parameters:", rf_grid.best_params_)
print("Best CV Recall:", rf_grid.best_score_)

# Best tuned model
best_rf = rf_grid.best_estimator_
# Base model
rf = RandomForestClassifier(random_state=SEED)

# Parameter grid
param_grid = {
    "n_estimators": [50, 100, 200],
    "max_depth": [None, 5, 10, 20],
    "min_samples_leaf": [1, 2, 4],
    "class_weight": [None, "balanced"]
}

# Grid search
rf_grid = GridSearchCV(
    estimator=rf,
    param_grid=param_grid,
    scoring="recall",
    cv=5,
    n_jobs=-1,
    verbose=1
)

# Fit on training data
rf_grid.fit(X_train, y_train)

# Best results
print("Best Parameters:", rf_grid.best_params_)
print("Best CV Recall:", rf_grid.best_score_)

# Best tuned model
best_rf = rf_grid.best_estimator_

Fitting 5 folds for each of 72 candidates, totalling 360 fits
Best Parameters: {'class_weight': 'balanced', 'max_depth': 5, 'min_samples_leaf': 1, 'n_estimators': 50}
Best CV Recall: 0.3304761904761905

Display all evaluation metrics

Display all of the evaluation metrics for the tuned model.

In [14]:

Copied!

rf_pred = best_rf.predict(X_test)
rf_metrics = evaluate_model(y_test, rf_pred, beta=2, model_name='Random Forest Model')
rf_metrics
rf_pred = best_rf.predict(X_test)
rf_metrics = evaluate_model(y_test, rf_pred, beta=2, model_name='Random Forest Model')
rf_metrics

Out[14]:

	Accuracy	Precision	Recall	F1	F2
Random Forest Model	0.789667	0.025682	0.4	0.048265	0.102171

Implement uplift modeling

Conduct uplift modeling by making predictions on test data simulating 1 and 0 for the promotion for all observations. Display the top 5 values by uplift in descending order.

In [15]:

Copied!





uplift_test = X_test.copy()  # Need to create a copy to allow modifying data

uplift_test['promotion_ord'] = 1
treatment_pred = best_rf.predict_proba(uplift_test)
uplift_test['promotion_ord'] = 0
control_pred = best_rf.predict_proba(uplift_test)

upliftResult_rf = pd.DataFrame({
    'prob_B': treatment_pred[:,1],
    'prob_A': control_pred[:,1],
    'uplift': treatment_pred[:,1] - control_pred[:,1],
    }, index=uplift_test.index)
upliftResult_rf = upliftResult_rf.sort_values(by=['uplift'], ascending=False).reset_index()
upliftResult_rf.head()
uplift_test = X_test.copy()  # Need to create a copy to allow modifying data

uplift_test['promotion_ord'] = 1
treatment_pred = best_rf.predict_proba(uplift_test)
uplift_test['promotion_ord'] = 0
control_pred = best_rf.predict_proba(uplift_test)

upliftResult_rf = pd.DataFrame({
    'prob_B': treatment_pred[:,1],
    'prob_A': control_pred[:,1],
    'uplift': treatment_pred[:,1] - control_pred[:,1],
    }, index=uplift_test.index)
upliftResult_rf = upliftResult_rf.sort_values(by=['uplift'], ascending=False).reset_index()
upliftResult_rf.head()

Out[15]:

	index	prob_B	prob_A	uplift
0	1929	0.673710	0.294312	0.379398
1	1685	0.673710	0.294312	0.379398
2	4093	0.652509	0.289873	0.362636
3	7714	0.652509	0.289873	0.362636
4	7209	0.652509	0.289873	0.362636

Display bottom values

Now display the bottom 5 values by uplift.

In [16]:

Copied!

upliftResult_rf.tail()
upliftResult_rf.tail()

Out[16]:

	index	prob_B	prob_A	uplift
2995	4298	0.214958	0.302001	-0.087043
2996	9034	0.284157	0.375622	-0.091465
2997	4995	0.376775	0.480979	-0.104204
2998	5064	0.357044	0.463360	-0.106316
2999	5985	0.345672	0.463360	-0.117688

Plot ranked uplift

Plot uplift values with a cutoff value of 0.10 to identify how many customers should be targeted for the promotion that have at least a 10% increase in purchase probability.

In [17]:

Copied!

fig, ax = plt.subplots(figsize=(11,6))
plot_ranked_uplift(upliftResult_rf, 'uplift', 0.10, ax)
fig, ax = plt.subplots(figsize=(11,6))
plot_ranked_uplift(upliftResult_rf, 'uplift', 0.10, ax)

No description has been provided for this image

In [ ]: