Uplift Modeling¶
Overview¶
For each step, read the explanation, then run the code cell(s) right below it.
You will practice how to:
- Load and prepare data for modeling
- Conduct a manual hypothesis test
- Utilize built-in statistical libraries to conduct a hypothesis test
- Tune and train a
RandomForestClassifierfor uplift modeling - Inspect top and bottom uplift values for the model
- Plot ranked uplift values for all customers
Import libraries¶
import os
import sys
from pathlib import Path
import pandas as pd
import numpy as np
import math
import scipy.stats as stats
from statsmodels.stats.proportion import test_proportions_2indep
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from sklearn.ensemble import RandomForestClassifier
import matplotlib.pyplot as plt
SEED = 1
# Import local libraries
root_dir = Path.cwd().resolve().parents[0]
sys.path.append(str(root_dir))
# Visualization functions
from src.visuals.make_plots import *
# Helper functions
from src.utils.helpers import *
# Load the "autoreload" extension so that code can change
%load_ext autoreload
#%reload_ext autoreload
# Always reload modules so that as you change code in src, it gets loaded
%autoreload 2
Hair Care Promotion Example¶
Create a dataframe for the Hair-Care-Product.csv data
In the next cell, we
- load the dataset,
- clean up the column names replacing with
_
hair_df = pd.read_csv(os.path.join('..', 'data', 'Hair-Care-Product.csv'))
hair_df.columns = [c.lower() for c in hair_df.columns]
hair_df.columns = hair_df.columns.str.replace(r'[._\s]+', '_', regex=True)
hair_df.head()
| purchase | age | hair_color | u_s_region | validation | promotion_ord | gender_ord | residence_ord | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 25 | Black | Southwest | 1 | 1 | 0 | 1 |
| 1 | 0 | 30 | Black | Northwest | 1 | 0 | 0 | 1 |
| 2 | 0 | 45 | Red | Northeast | 1 | 0 | 0 | 0 |
| 3 | 0 | 35 | Blond | Southwest | 0 | 0 | 0 | 1 |
| 4 | 0 | 33 | Brown | Southwest | 0 | 1 | 0 | 1 |
Run descriptive statistics
The promotion_ord column has a 1 for members that received a promotion. The purchase column has a 1 for members that purchased a product after the promotion. Run descriptive statistics to identify any key observations from the full dataset.
hair_df.describe()
| purchase | age | validation | promotion_ord | gender_ord | residence_ord | |
|---|---|---|---|---|---|---|
| count | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 | 10000.000000 |
| mean | 0.011200 | 40.362800 | 0.324900 | 0.497600 | 0.315200 | 0.705400 |
| std | 0.105241 | 11.555403 | 0.468361 | 0.500019 | 0.464619 | 0.455886 |
| min | 0.000000 | 21.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 25% | 0.000000 | 30.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 |
| 50% | 0.000000 | 40.000000 | 0.000000 | 0.000000 | 0.000000 | 1.000000 |
| 75% | 0.000000 | 50.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
| max | 1.000000 | 60.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 |
Calculate purchase rates for both groups
Compute the purchase rates for the control and treatment groups. These values form the basis for measuring the impact of the promotion.
# Split into control and treatment groups
control = hair_df[hair_df['promotion_ord'] == 0]['purchase']
treatment = hair_df[hair_df['promotion_ord'] == 1]['purchase']
n_A = control.shape[0]
n_B = treatment.shape[0]
purchase_A = control.sum()
purchase_B = treatment.sum()
# Calculate purchase rates
p_A = control.mean()
p_B = treatment.mean()
print(f"Control Purchase Rate: {p_A:.4f}")
print(f"Treatment Purchase Rate: {p_B:.4f}")
Control Purchase Rate: 0.0064 Treatment Purchase Rate: 0.0161
Conduct hypothesis test
Calculate the absolute difference, % lift and conduct the hypothesis test that the difference between the population purchase rate for the treatment group is > the control group. This is a one-tailed upper test and we will calculate the pooled estimator, z test statistic, and use the normal distribution for calculating the p-value.
# Absolute difference
diff = p_B - p_A
# Lift
lift = diff / p_A
# Pooled proportion under H0: p_treat = p_ctrl
p_pool = (purchase_B + purchase_A) / (n_B + n_A)
# Standard error using pooled proportion
se_pool = math.sqrt(p_pool * (1 - p_pool) * (1 / n_B + 1 / n_A))
# z statistic
z_stat = diff / se_pool
# Two-sided p-value
#p_value = 2 * (1 - stats.norm.cdf(abs(z_stat)))
# One-sided upper p-value
p_value = 1 - stats.norm.cdf(z_stat)
print(f"Absolute Difference: {diff:.4f}")
print(f"% Lift: {lift:.2%}")
print(f"Pooled estimator: {p_pool:.4f}")
print(f"z test statistic: {z_stat:.2f}")
print(f"p-value: {p_value:.6f}")
Absolute Difference: 0.0097 % Lift: 152.41% Pooled estimator: 0.0112 z test statistic: 4.61 p-value: 0.000002
Alternative hypothesis testing using statsmodels
Next we will use test_proportions_2indep in the statsmodels.stats.proportion library. This is the more appropriate choice for A/B testing with a binary response because it is designed for comparing two proportions.
result = test_proportions_2indep(count1=purchase_B, nobs1=n_B, count2=purchase_A, nobs2=n_A,
method='wald', compare='diff', alternative='larger')
print(f"Test-statistic: {result.statistic:.2f}")
print(f"p-value: {result.pvalue:.6f}")
Test-statistic: 4.61 p-value: 0.000002
Another option using using Welch's t-test
ttest_ind can give a similar result when the outcome is coded 0/1, since the group mean equals the group proportion, but it is fundamentally a test of means rather than a proportion-specific test.
t_stat, p_value = stats.ttest_ind(treatment, control, equal_var=False, alternative='greater')
print(f"t test statistic: {t_stat:.2f}")
print(f"p-value: {p_value:.6f}")
t test statistic: 4.61 p-value: 0.000002
Demonstrate sample size sensitivity in A/B Testing
First, we will the simulate the A/B test with a smaller sample size of 500 using the z-test statistics and normal approximation.
n = 500
purchase_A_new = round(n * p_A,0).astype(int)
purchase_B_new = round(n * p_B,0).astype(int)
# Pooled proportion under H0: p_treat = p_ctrl
p_pool_new = (purchase_B_new + purchase_A_new) / (n * 2)
# Standard error using pooled proportion
se_pool_new = math.sqrt(p_pool_new * (1 - p_pool_new) * (1 / n + 1 / n))
# z statistic
z_stat_new = diff / se_pool_new
p_value_new = 1 - stats.norm.cdf(z_stat_new)
print(f"z test statistic: {z_stat_new:.2f}")
print(f"p-value: {p_value_new:.6f}")
z test statistic: 1.47 p-value: 0.070563
With a sample size of 500 and the same purchase rates, the conclusion would now be to NOT reject the Null hypothesis.
Rerun test_proportions_2indep with the smaller sample size
In this step, we repeat the A/B test using a smaller sample size of 500 and apply test_proportions_2indep with method='wald'.
Although both approaches use a z-style test statistic, they are not using the same standard error. Our manual calculation uses the pooled proportion under the null hypothesis, while test_proportions_2indep(..., method='wald') uses an unpooled variance estimate based on the observed purchase rates in each group.
result = test_proportions_2indep(count1=purchase_B_new, nobs1=n, count2=purchase_A_new, nobs2=n,
method='wald', compare='diff', alternative='larger')
print(f"Test-statistic: {result.statistic:.2f}")
print(f"p-value: {result.pvalue:.6f}")
Test-statistic: 1.52 p-value: 0.064550
Prepare data for uplift modeling
For simple illustration purposes, we will not use a pipeline for dummy variable preprocessing.
hair_df['hair_color'] = hair_df['hair_color'].astype('category')
hair_df['u_s_region'] = hair_df['u_s_region'].astype('category')
hair_df2 = pd.get_dummies(hair_df)
hair_df2.head()
| purchase | age | validation | promotion_ord | gender_ord | residence_ord | hair_color_Black | hair_color_Blond | hair_color_Brown | hair_color_Red | u_s_region_Northeast | u_s_region_Northwest | u_s_region_Southeast | u_s_region_Southwest | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 25 | 1 | 1 | 0 | 1 | True | False | False | False | False | False | False | True |
| 1 | 0 | 30 | 1 | 0 | 0 | 1 | True | False | False | False | False | True | False | False |
| 2 | 0 | 45 | 1 | 0 | 0 | 0 | False | False | False | True | True | False | False | False |
| 3 | 0 | 35 | 0 | 0 | 0 | 1 | False | True | False | False | False | False | False | True |
| 4 | 0 | 33 | 0 | 1 | 0 | 1 | False | False | True | False | False | False | False | True |
Split the data into training and test sets
Next, we separate the predictors from the target variable purchase, then create a training set (70%) and a test set (30%).
y = hair_df2['purchase']
X = hair_df2.drop(columns=['purchase'])
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=SEED)
print('Training set:', X_train.shape)
print('Test set:', X_test.shape)
Training set: (7000, 13) Test set: (3000, 13)
Tune an RandomForestClassifier
Next, we will build and perform some basic tuning of an RandomForestClassifier model. We also evaluate the class_weight paramater due to imbalanced nature of purchase target variable.
# Base model
rf = RandomForestClassifier(random_state=SEED)
# Parameter grid
param_grid = {
"n_estimators": [50, 100, 200],
"max_depth": [None, 5, 10, 20],
"min_samples_leaf": [1, 2, 4],
"class_weight": [None, "balanced"]
}
# Grid search
rf_grid = GridSearchCV(
estimator=rf,
param_grid=param_grid,
scoring="recall",
cv=5,
n_jobs=-1,
verbose=1
)
# Fit on training data
rf_grid.fit(X_train, y_train)
# Best results
print("Best Parameters:", rf_grid.best_params_)
print("Best CV Recall:", rf_grid.best_score_)
# Best tuned model
best_rf = rf_grid.best_estimator_
Fitting 5 folds for each of 72 candidates, totalling 360 fits
Best Parameters: {'class_weight': 'balanced', 'max_depth': 5, 'min_samples_leaf': 1, 'n_estimators': 50}
Best CV Recall: 0.3304761904761905
Display all evaluation metrics
Display all of the evaluation metrics for the tuned model.
rf_pred = best_rf.predict(X_test)
rf_metrics = evaluate_model(y_test, rf_pred, beta=2, model_name='Random Forest Model')
rf_metrics
| Accuracy | Precision | Recall | F1 | F2 | |
|---|---|---|---|---|---|
| Random Forest Model | 0.789667 | 0.025682 | 0.4 | 0.048265 | 0.102171 |
Implement uplift modeling
Conduct uplift modeling by making predictions on test data simulating 1 and 0 for the promotion for all observations. Display the top 5 values by uplift in descending order.
uplift_test = X_test.copy() # Need to create a copy to allow modifying data
uplift_test['promotion_ord'] = 1
treatment_pred = best_rf.predict_proba(uplift_test)
uplift_test['promotion_ord'] = 0
control_pred = best_rf.predict_proba(uplift_test)
upliftResult_rf = pd.DataFrame({
'prob_B': treatment_pred[:,1],
'prob_A': control_pred[:,1],
'uplift': treatment_pred[:,1] - control_pred[:,1],
}, index=uplift_test.index)
upliftResult_rf = upliftResult_rf.sort_values(by=['uplift'], ascending=False).reset_index()
upliftResult_rf.head()
| index | prob_B | prob_A | uplift | |
|---|---|---|---|---|
| 0 | 1929 | 0.673710 | 0.294312 | 0.379398 |
| 1 | 1685 | 0.673710 | 0.294312 | 0.379398 |
| 2 | 4093 | 0.652509 | 0.289873 | 0.362636 |
| 3 | 7714 | 0.652509 | 0.289873 | 0.362636 |
| 4 | 7209 | 0.652509 | 0.289873 | 0.362636 |
Display bottom values
Now display the bottom 5 values by uplift.
upliftResult_rf.tail()
| index | prob_B | prob_A | uplift | |
|---|---|---|---|---|
| 2995 | 4298 | 0.214958 | 0.302001 | -0.087043 |
| 2996 | 9034 | 0.284157 | 0.375622 | -0.091465 |
| 2997 | 4995 | 0.376775 | 0.480979 | -0.104204 |
| 2998 | 5064 | 0.357044 | 0.463360 | -0.106316 |
| 2999 | 5985 | 0.345672 | 0.463360 | -0.117688 |
Plot ranked uplift
Plot uplift values with a cutoff value of 0.10 to identify how many customers should be targeted for the promotion that have at least a 10% increase in purchase probability.
fig, ax = plt.subplots(figsize=(11,6))
plot_ranked_uplift(upliftResult_rf, 'uplift', 0.10, ax)