let’s frame this in a churn prediction context, because that’s a very common case where people see IV, PSI, and CSI all being used, notice that the formulas look similar, but get confused about why they’re treated differently.
1️⃣ The setting — churn prediction
-
Target:
churn_flag
(1 = churned, 0 = stayed). -
Feature:
avg_monthly_usage
(average minutes per month). -
Goal: Build a model that predicts churn, and also monitor if the feature is stable over time.
We have:
-
Train set → Customers from Jan–Mar 2025
-
OOT1 → Customers from Apr 2025
-
OOT2 → Customers from May 2025
2️⃣ The same base formula — different contexts
The mathematical core of IV, PSI, and CSI is a weighted log ratio:
The difference is what those “fractions” mean and which datasets are compared.
3️⃣ Information Value (IV)
-
Question: Does this feature separate churners from non-churners in a single dataset?
-
Fractions:
-
= fraction of stayers in that bin (within train set)
-
= fraction of churners in that bin (within train set)
-
-
Data involved: Only one dataset (e.g., Train).
-
Use: Feature selection — keep features with high IV (e.g., > 0.02).
-
Example:
Train: Low usage: 80% churn, 20% stay High usage: 10% churn, 90% stay
This produces a high IV → strong predictive power.
4️⃣ Population Stability Index (PSI)
-
Question: Has the overall feature distribution shifted over time? (no target involved)
-
Fractions:
-
= proportion of customers in that bin in Train (all customers, churned or not)
-
= proportion of customers in that bin in OOT (all customers, churned or not)
-
-
Data involved: Two datasets (e.g., Train vs OOT1).
-
Use: Detect population drift — if customers’ usage patterns shift, even if churn rate doesn’t change.
-
Example:
Train: Low usage: 30% of all customers High usage: 70% OOT1: Low usage: 50% of all customers High usage: 50%
PSI will be high → customer base composition shifted (maybe more low-usage customers now).
5️⃣ Characteristic Stability Index (CSI)
-
Question: Has the relationship between the feature and the target changed over time? (concept drift)
-
Fractions:
-
= proportion of churners in Train that fall into that bin
-
= proportion of churners in OOT that fall into that bin
-
-
Data involved: Two datasets (Train vs OOT1), target-specific.
-
Use: Detect changes in target–feature relationship.
-
Example:
Train churners: Low usage: 70% of churners High usage: 30% of churners OOT1 churners: Low usage: 50% of churners High usage: 50%
CSI will be high → churn pattern shifted; low usage no longer dominates churn.
6️⃣ Why they differ even if formula looks same
The formula structure is the same because all three are distribution comparison measures (based on KL divergence-like logic).
But the inputs differ:
-
IV → compares good vs bad within one dataset.
-
PSI → compares overall feature distribution across datasets.
-
CSI → compares event-specific feature distribution across datasets.
That’s why in churn:
-
A feature can have high IV, low PSI, low CSI → predictive and stable.
-
Or high IV, high PSI → predictive, but customer profile is shifting (risk for model drift).
-
Or high IV, high CSI → predictive in train, but churn relationship is changing (concept drift).
No comments:
Post a Comment