let's walk through a step-by-step example of the KS statistic using 10 observations with:
- 
Actuals (ground truth): 1 = defaulter, 0 = non-defaulter 
- 
Predicted scores: from a classification model 
๐งพ Sample Data: 10 Observations
| Obs | Actual (Y) | Predicted Score | 
|---|---|---|
| 1 | 1 | 0.95 | 
| 2 | 0 | 0.90 | 
| 3 | 1 | 0.85 | 
| 4 | 0 | 0.80 | 
| 5 | 0 | 0.70 | 
| 6 | 1 | 0.60 | 
| 7 | 0 | 0.40 | 
| 8 | 0 | 0.30 | 
| 9 | 1 | 0.20 | 
| 10 | 0 | 0.10 | 
๐ Step 1: Sort by predicted score descending
| Rank | Actual (Y) | Score | Cumulative Positives | Cumulative Negatives | (+ve%) - (-ve%) | 
|---|---|---|---|---|---|
| 1 | 1 | 0.95 | 1 / 4 = 0.25 | 0 / 6 = 0.00 | 0.25 | 
| 2 | 0 | 0.90 | 1 / 4 = 0.25 | 1 / 6 = 0.167 | 0.083 | 
| 3 | 1 | 0.85 | 2 / 4 = 0.50 | 1 / 6 = 0.167 | 0.333 | 
| 4 | 0 | 0.80 | 2 / 4 = 0.50 | 2 / 6 = 0.333 | 0.167 | 
| 5 | 0 | 0.70 | 2 / 4 = 0.50 | 3 / 6 = 0.500 | 0.00 | 
| 6 | 1 | 0.60 | 3 / 4 = 0.75 | 3 / 6 = 0.500 | 0.25 | 
| 7 | 0 | 0.40 | 3 / 4 = 0.75 | 4 / 6 = 0.667 | 0.083 | 
| 8 | 0 | 0.30 | 3 / 4 = 0.75 | 5 / 6 = 0.833 | -0.083 | 
| 9 | 1 | 0.20 | 4 / 4 = 1.00 | 5 / 6 = 0.833 | 0.167 | 
| 10 | 0 | 0.10 | 4 / 4 = 1.00 | 6 / 6 = 1.000 | 0.00 | 
✅ Step 2: Identify KS
Look for the maximum difference between:
- 
(Cumulative positives) — % of defaulters seen so far 
- 
(Cumulative negatives) — % of non-defaulters seen so far 
The maximum value in the last column ((Cumulative positives%) - (Cumulative negatives %)) is:
๐ Interpretation:
- 
KS = 0.333 → The maximum separation between defaulters and non-defaulters occurs when the score threshold is around 0.85 
- 
At that point: - 
You've captured 50% of defaulters 
- 
Only 16.7% of non-defaulters 
 
- 
- 
This is the optimal score threshold for maximum model discrimination 
 
No comments:
Post a Comment