Bigdata and data science by Kartheek Dachepalli: IV, PSI, CSI

let’s frame this in a churn prediction context, because that’s a very common case where people see IV, PSI, and CSI all being used, notice that the formulas look similar, but get confused about why they’re treated differently.

1️⃣ The setting — churn prediction

Target: churn_flag (1 = churned, 0 = stayed).
Feature: avg_monthly_usage (average minutes per month).
Goal: Build a model that predicts churn, and also monitor if the feature is stable over time.

We have:

Train set → Customers from Jan–Mar 2025
OOT1 → Customers from Apr 2025
OOT2 → Customers from May 2025

2️⃣ The same base formula — different contexts

The mathematical core of IV, PSI, and CSI is a weighted log ratio:

\text{metric} = \sum (\text{fraction diff}) \times \log \left( \frac{\text{fraction 1}}{\text{fraction 2}} \right)

The difference is what those “fractions” mean and which datasets are compared.

3️⃣ Information Value (IV)

Question: Does this feature separate churners from non-churners in a single dataset?
Fractions:
- $p_{\text{stay, bin}}$ = fraction of stayers in that bin (within train set)
- $p_{\text{churn, bin}}$ = fraction of churners in that bin (within train set)
Data involved: Only one dataset (e.g., Train).
Use: Feature selection — keep features with high IV (e.g., > 0.02).

Example:

Train:
Low usage: 80% churn, 20% stay
High usage: 10% churn, 90% stay

This produces a high IV → strong predictive power.

4️⃣ Population Stability Index (PSI)

Question: Has the overall feature distribution shifted over time? (no target involved)
Fractions:
- $p_{\text{bin, train}}$ = proportion of customers in that bin in Train (all customers, churned or not)
- $p_{\text{bin, OOT}}$ = proportion of customers in that bin in OOT (all customers, churned or not)
Data involved: Two datasets (e.g., Train vs OOT1).
Use: Detect population drift — if customers’ usage patterns shift, even if churn rate doesn’t change.

Example:

Train:
Low usage: 30% of all customers
High usage: 70%

OOT1:
Low usage: 50% of all customers
High usage: 50%

PSI will be high → customer base composition shifted (maybe more low-usage customers now).

5️⃣ Characteristic Stability Index (CSI)

Question: Has the relationship between the feature and the target changed over time? (concept drift)
Fractions:
- $\text{event\_frac}_{A, \text{bin}}$ = proportion of churners in Train that fall into that bin
- $\text{event\_frac}_{B, \text{bin}}$ = proportion of churners in OOT that fall into that bin
Data involved: Two datasets (Train vs OOT1), target-specific.
Use: Detect changes in target–feature relationship.

Example:

Train churners:
Low usage: 70% of churners
High usage: 30% of churners

OOT1 churners:
Low usage: 50% of churners
High usage: 50%

CSI will be high → churn pattern shifted; low usage no longer dominates churn.

6️⃣ Why they differ even if formula looks same

The formula structure is the same because all three are distribution comparison measures (based on KL divergence-like logic).
But the inputs differ:

IV → compares good vs bad within one dataset.
PSI → compares overall feature distribution across datasets.
CSI → compares event-specific feature distribution across datasets.

That’s why in churn:

A feature can have high IV, low PSI, low CSI → predictive and stable.
Or high IV, high PSI → predictive, but customer profile is shifting (risk for model drift).
Or high IV, high CSI → predictive in train, but churn relationship is changing (concept drift).

Bigdata and data science by Kartheek Dachepalli

Sunday, August 10, 2025

IV, PSI, CSI - differences

1️⃣ The setting — churn prediction

2️⃣ The same base formula — different contexts

3️⃣ Information Value (IV)

4️⃣ Population Stability Index (PSI)

5️⃣ Characteristic Stability Index (CSI)

6️⃣ Why they differ even if formula looks same

No comments:

Post a Comment