Scenario
You are building an XGBoost regression model that predicts “How much credit can be safely assigned to a loan requester” based on their application data, financial history, and behavioral data.
The target (y) is a continuous variable — the approved credit limit amount.
1. Accuracy Metrics & Intuition
Since this is regression, classification metrics like AUC or KS don’t apply.
Instead, we use error-based and rank-based metrics:
| Metric | Intuition | Example in Credit Limit Context | 
|---|---|---|
| RMSE (Root Mean Squared Error) | Penalizes large mistakes more heavily. Good for spotting models that make occasional big blunders. | If RMSE = 3,000 and limits range 0–50k, it means large errors like predicting 50k for someone eligible for 10k happen occasionally. | 
| MAE (Mean Absolute Error) | Average size of error regardless of direction. Easier to interpret than RMSE. | MAE = 2,000 means that on average, your credit limit predictions are off by $2,000. | 
| MAPE (Mean Absolute Percentage Error) | Normalizes error relative to actual values. | If MAPE = 15%, predicting 8,500 for someone eligible for 10,000 is a 15% miss. | 
| R² (Coefficient of Determination) | Measures how much variance in actual credit limits your model explains. | R² = 0.65 means the model explains 65% of why different customers get different limits. | 
| Spearman correlation | Focuses on rank ordering — important when exact limit isn’t as critical as ranking applicants from lowest to highest limit eligibility. | Spearman = 0.8 means applicants ranked high by model generally do get higher limits. | 
2. Typical Acceptable Thresholds for Model Approval
(These are common industry validation ranges — vary by portfolio type)
| Metric | Acceptable Range | Why it Matters Here | 
|---|---|---|
| RMSE | ≤ 15–25% of credit limit range | Avoids big over-limit assignments that increase default risk. | 
| MAE | ≤ 10–20% of limit range | Keeps average prediction error within acceptable business tolerance. | 
| MAPE | ≤ 30–40% | Ensures proportional accuracy across low and high limits. | 
| R² | ≥ 0.5 (≥ 0.4 for noisy small-business data) | Shows the model meaningfully explains applicant differences. | 
| Spearman | ≥ 0.6–0.7 | Keeps rank ordering stable — critical for approval tiers. | 
3. Why Rank Ordering Can Matter More Than Absolute Accuracy
- 
In credit assignment, the exact dollar figure might be adjusted by policy rules after the model predicts. 
- 
What’s critical is ranking applicants correctly: - 
If the model thinks Applicant A should have higher limit than Applicant B, that ordering should hold true most of the time. 
- 
This is why Spearman correlation is often a regulatory requirement alongside RMSE. 
 
- 
4. Stability & Predictive Power Checks
For regression, we can still use:
| Check | Purpose | Regression Adjustment | 
|---|---|---|
| PSI (Population Stability Index) | Ensures feature distributions don’t shift drastically between development and monitoring periods. | Use binned feature values, not target values. | 
| CSI (Characteristic Stability Index) | Checks if relationship between feature & target is stable. | For continuous target, bin both feature & target and use mean target per bin for distribution. | 
| Feature Importance | Identify which variables drive credit limits most. | Use gain/cover in XGBoost. | 
| SHAP | Explain predictions to regulators & business. | Works identically for regression. | 
5. Model Approval Reality
- 
If your RMSE, MAE, MAPE, and Spearman are within target ranges, and PSI/CSI show stability, you’re in a strong position. 
- 
Even if one metric is slightly weak, a model can be approved if: - 
It beats the current champion model. 
- 
It’s more stable over time. 
- 
It’s more explainable and policy-compliant. 
 
- 
- 
Failing both accuracy and stability → high risk of rejection. 
 
No comments:
Post a Comment