Scenario
You are building an XGBoost regression model that predicts “How much credit can be safely assigned to a loan requester” based on their application data, financial history, and behavioral data.
The target (y
) is a continuous variable — the approved credit limit amount.
1. Accuracy Metrics & Intuition
Since this is regression, classification metrics like AUC or KS don’t apply.
Instead, we use error-based and rank-based metrics:
Metric | Intuition | Example in Credit Limit Context |
---|---|---|
RMSE (Root Mean Squared Error) | Penalizes large mistakes more heavily. Good for spotting models that make occasional big blunders. | If RMSE = 3,000 and limits range 0–50k, it means large errors like predicting 50k for someone eligible for 10k happen occasionally. |
MAE (Mean Absolute Error) | Average size of error regardless of direction. Easier to interpret than RMSE. | MAE = 2,000 means that on average, your credit limit predictions are off by $2,000. |
MAPE (Mean Absolute Percentage Error) | Normalizes error relative to actual values. | If MAPE = 15%, predicting 8,500 for someone eligible for 10,000 is a 15% miss. |
R² (Coefficient of Determination) | Measures how much variance in actual credit limits your model explains. | R² = 0.65 means the model explains 65% of why different customers get different limits. |
Spearman correlation | Focuses on rank ordering — important when exact limit isn’t as critical as ranking applicants from lowest to highest limit eligibility. | Spearman = 0.8 means applicants ranked high by model generally do get higher limits. |
2. Typical Acceptable Thresholds for Model Approval
(These are common industry validation ranges — vary by portfolio type)
Metric | Acceptable Range | Why it Matters Here |
---|---|---|
RMSE | ≤ 15–25% of credit limit range | Avoids big over-limit assignments that increase default risk. |
MAE | ≤ 10–20% of limit range | Keeps average prediction error within acceptable business tolerance. |
MAPE | ≤ 30–40% | Ensures proportional accuracy across low and high limits. |
R² | ≥ 0.5 (≥ 0.4 for noisy small-business data) | Shows the model meaningfully explains applicant differences. |
Spearman | ≥ 0.6–0.7 | Keeps rank ordering stable — critical for approval tiers. |
3. Why Rank Ordering Can Matter More Than Absolute Accuracy
-
In credit assignment, the exact dollar figure might be adjusted by policy rules after the model predicts.
-
What’s critical is ranking applicants correctly:
-
If the model thinks Applicant A should have higher limit than Applicant B, that ordering should hold true most of the time.
-
This is why Spearman correlation is often a regulatory requirement alongside RMSE.
-
4. Stability & Predictive Power Checks
For regression, we can still use:
Check | Purpose | Regression Adjustment |
---|---|---|
PSI (Population Stability Index) | Ensures feature distributions don’t shift drastically between development and monitoring periods. | Use binned feature values, not target values. |
CSI (Characteristic Stability Index) | Checks if relationship between feature & target is stable. | For continuous target, bin both feature & target and use mean target per bin for distribution. |
Feature Importance | Identify which variables drive credit limits most. | Use gain/cover in XGBoost. |
SHAP | Explain predictions to regulators & business. | Works identically for regression. |
5. Model Approval Reality
-
If your RMSE, MAE, MAPE, and Spearman are within target ranges, and PSI/CSI show stability, you’re in a strong position.
-
Even if one metric is slightly weak, a model can be approved if:
-
It beats the current champion model.
-
It’s more stable over time.
-
It’s more explainable and policy-compliant.
-
-
Failing both accuracy and stability → high risk of rejection.
No comments:
Post a Comment