Sunday, August 3, 2025

ROC-AUC - Step by step calculation

 Let’s go through ROC-AUC just like we did for KS — with intuitive explanation, formulas, and a step-by-step example using 10 observations.


πŸ“˜ What is ROC-AUC?

🟦 ROC = Receiver Operating Characteristic Curve

It plots:

  • X-axis: False Positive Rate (FPR) = FP / (FP + TN)

  • Y-axis: True Positive Rate (TPR) = TP / (TP + FN)

Each point on the ROC curve represents a threshold on the predicted probability.


🟧 AUC = Area Under the Curve

  • AUC = Probability that the model ranks a random positive higher than a random negative

  • AUC ranges from:

    • 1.0 → perfect model

    • 0.5 → random guessing

    • < 0.5 → worse than random


✅ ROC-AUC Formula (Conceptually)

There are two main interpretations:

1. Integral of the ROC Curve:

AUC=01TPR(FPR)dFPRAUC = \int_0^1 TPR(FPR) \, dFPR

2. Rank-Based Interpretation (Used in practice):

AUC=Number of correct positive-negative pairsTotal positive-negative pairsAUC = \frac{\text{Number of correct positive-negative pairs}}{\text{Total positive-negative pairs}}

πŸ“Š Example: 10 Observations

We'll reuse your 10 data points:

ObsActual (Y)Predicted Score
110.95
200.90
310.85
400.80
500.70
610.60
700.40
800.30
910.20
1000.10
  • Total Positives (P) = 4

  • Total Negatives (N) = 6


πŸ“ˆ Step-by-Step: Rank-Based AUC Calculation

Let’s find all (positive, negative) score pairs and count how many times:

  • Positive score > Negative score → Correct

  • Positive score == Negative score → 0.5 credit

  • Positive score < Negative score → Wrong

Step 1: List All Positive-Negative Pairs

Positive scores: 0.95, 0.85, 0.60, 0.20
Negative scores: 0.90, 0.80, 0.70, 0.40, 0.30, 0.10

Total Pairs = 4 × 6 = 24

Step 2: Count Favorable Pairs

Pos ScoreCompared to Neg ScoresWins
0.95> all (0.90 ... 0.10)6
0.85> all except 0.905
0.60> 0.40, 0.30, 0.103
0.20> 0.10 only1
Total6+5+3+1 = 15 wins

No ties, so:

AUC=1524=0.625AUC = \frac{15}{24} = \boxed{0.625}

🧠 Interpretation:

  • Model has 62.5% chance of ranking a random defaulter higher than a non-defaulter.

  • Better than random, but not great.


πŸ“‰ ROC Curve (Optional Idea):

If we plot TPR vs FPR at various thresholds:

  • Start at (0,0)

  • End at (1,1)

  • The area under that curve will match AUC = 0.625

No comments:

Post a Comment