Bigdata and data science by Kartheek Dachepalli: ROC-AUC

Sunday, August 3, 2025

ROC-AUC - Step by step calculation

Let’s go through ROC-AUC just like we did for KS — with intuitive explanation, formulas, and a step-by-step example using 10 observations.

📘 What is ROC-AUC?

🟦 ROC = Receiver Operating Characteristic Curve

It plots:

X-axis: False Positive Rate (FPR) = FP / (FP + TN)
Y-axis: True Positive Rate (TPR) = TP / (TP + FN)

Each point on the ROC curve represents a threshold on the predicted probability.

🟧 AUC = Area Under the Curve

AUC = Probability that the model ranks a random positive higher than a random negative
AUC ranges from:
- 1.0 → perfect model
- 0.5 → random guessing
- < 0.5 → worse than random

✅ ROC-AUC Formula (Conceptually)

There are two main interpretations:

1. Integral of the ROC Curve:

AUC = \int_0^1 TPR(FPR) \, dFPR

2. Rank-Based Interpretation (Used in practice):

AUC = \frac{\text{Number of correct positive-negative pairs}}{\text{Total positive-negative pairs}}

📊 Example: 10 Observations

We'll reuse your 10 data points:

Obs	Actual (Y)	Predicted Score
1	1	0.95
2	0	0.90
3	1	0.85
4	0	0.80
5	0	0.70
6	1	0.60
7	0	0.40
8	0	0.30
9	1	0.20
10	0	0.10

Total Positives (P) = 4
Total Negatives (N) = 6

📈 Step-by-Step: Rank-Based AUC Calculation

Let’s find all (positive, negative) score pairs and count how many times:

Positive score > Negative score → Correct
Positive score == Negative score → 0.5 credit
Positive score < Negative score → Wrong

Step 1: List All Positive-Negative Pairs

Positive scores: 0.95, 0.85, 0.60, 0.20
Negative scores: 0.90, 0.80, 0.70, 0.40, 0.30, 0.10

Total Pairs = 4 × 6 = 24

Step 2: Count Favorable Pairs

Pos Score	Compared to Neg Scores	Wins
0.95	> all (0.90 ... 0.10)	6
0.85	> all except 0.90	5
0.60	> 0.40, 0.30, 0.10	3
0.20	> 0.10 only	1
Total		6+5+3+1 = 15 wins

No ties, so:

AUC = \frac{15}{24} = \boxed{0.625}

🧠 Interpretation:

Model has 62.5% chance of ranking a random defaulter higher than a non-defaulter.
Better than random, but not great.

📉 ROC Curve (Optional Idea):

If we plot TPR vs FPR at various thresholds:

Start at (0,0)
End at (1,1)
The area under that curve will match AUC = 0.625

Bigdata and data science by Kartheek Dachepalli