let's walk through a step-by-step example of the KS statistic using 10 observations with:
-
Actuals (ground truth): 1 = defaulter, 0 = non-defaulter
-
Predicted scores: from a classification model
๐งพ Sample Data: 10 Observations
Obs | Actual (Y) | Predicted Score |
---|---|---|
1 | 1 | 0.95 |
2 | 0 | 0.90 |
3 | 1 | 0.85 |
4 | 0 | 0.80 |
5 | 0 | 0.70 |
6 | 1 | 0.60 |
7 | 0 | 0.40 |
8 | 0 | 0.30 |
9 | 1 | 0.20 |
10 | 0 | 0.10 |
๐ Step 1: Sort by predicted score descending
Rank | Actual (Y) | Score | Cumulative Positives | Cumulative Negatives | (+ve%) - (-ve%) |
---|---|---|---|---|---|
1 | 1 | 0.95 | 1 / 4 = 0.25 | 0 / 6 = 0.00 | 0.25 |
2 | 0 | 0.90 | 1 / 4 = 0.25 | 1 / 6 = 0.167 | 0.083 |
3 | 1 | 0.85 | 2 / 4 = 0.50 | 1 / 6 = 0.167 | 0.333 |
4 | 0 | 0.80 | 2 / 4 = 0.50 | 2 / 6 = 0.333 | 0.167 |
5 | 0 | 0.70 | 2 / 4 = 0.50 | 3 / 6 = 0.500 | 0.00 |
6 | 1 | 0.60 | 3 / 4 = 0.75 | 3 / 6 = 0.500 | 0.25 |
7 | 0 | 0.40 | 3 / 4 = 0.75 | 4 / 6 = 0.667 | 0.083 |
8 | 0 | 0.30 | 3 / 4 = 0.75 | 5 / 6 = 0.833 | -0.083 |
9 | 1 | 0.20 | 4 / 4 = 1.00 | 5 / 6 = 0.833 | 0.167 |
10 | 0 | 0.10 | 4 / 4 = 1.00 | 6 / 6 = 1.000 | 0.00 |
✅ Step 2: Identify KS
Look for the maximum difference between:
-
(Cumulative positives) — % of defaulters seen so far
-
(Cumulative negatives) — % of non-defaulters seen so far
The maximum value in the last column ((Cumulative positives%) - (Cumulative negatives %)) is:
๐ Interpretation:
-
KS = 0.333 → The maximum separation between defaulters and non-defaulters occurs when the score threshold is around 0.85
-
At that point:
-
You've captured 50% of defaulters
-
Only 16.7% of non-defaulters
-
-
This is the optimal score threshold for maximum model discrimination
No comments:
Post a Comment