Model Retraining Cycle
Fraud patterns evolve continuously. The ML detection models powering FraudShield AI must be retrained regularly so they remain calibrated against current fraud typologies. This page describes when to retrain, how the champion/challenger framework works, the training data requirements, and the deployment and rollback process.
When to retrain
Retrain a detection model when any of the following conditions are met:
| Trigger | Indicator | Urgency |
|---|---|---|
| Model drift | Population Stability Index (PSI) > 0.25 for any input feature distribution, sustained over 4 consecutive weeks. | High |
| Performance degradation | AUC-ROC drops > 5 percentage points below the baseline established at last deployment. | High |
| New fraud typology | A confirmed new fraud pattern is not represented in current training data (identified via typology gap analysis). | Medium |
| Scheduled retraining | Standard annual retraining cycle, regardless of performance. Required for SR 11-7 model governance compliance. | Routine |
| Significant portfolio change | Acquisition, new product launch, or channel expansion that materially changes the transaction mix. | Medium |
Champion / challenger framework
FraudShield AI uses a champion/challenger (C/C) framework for all model updates. The current production model is the champion. A newly trained model is the challenger. Both models run simultaneously in shadow mode, scoring every transaction. The champion's decision governs; the challenger's score is logged without influencing the outcome.
Incoming transaction
│
├──────────────────────────────────┐
▼ ▼
CHAMPION model CHALLENGER model
(production — decision governs) (shadow mode — logged only)
│ │
▼ ▼
Decision: APPROVE / STEP-UP / BLOCK Score logged to challenger_scores table
│
▼
Response returned to core banking
After evaluation period (30–60 days):
┌─────────────────────────────────────────────────────┐
│ Compare challenger vs. champion across: │
│ • AUC-ROC, precision, recall, F1 │
│ • False positive rate by BTA and segment │
│ • Confirmed fraud catch rate │
│ • Score stability (PSI on challenger scores) │
└─────────────────────────────────────────────────────┘
│
┌────┴─────────────────────────────┐
│ Challenger wins │ Champion retained
│ │
▼ ▼
Promote challenger to champion Challenger retired
Deploy via blue/green deployment Retrain with additional data
Training data requirements
Minimum data volumes
| Model | Minimum fraud labels | Minimum legitimate labels | Recommended observation window |
|---|---|---|---|
MDL_WIRE_ATO |
2,000 | 200,000 | 24 months |
MDL_ACH_FRAUD |
5,000 | 500,000 | 18 months |
MDL_RTP_MULE |
1,500 | 150,000 | 12 months |
MDL_CARD_CNP |
10,000 | 1,000,000 | 18 months |
MDL_1PF_APPFRAUD |
3,000 | 300,000 | 24 months |
Label quality requirements
- Fraud labels must be sourced from confirmed fraud dispositions in Case Manager, not from alert creation. Unreviewed alerts must not be included as fraud labels.
- Legitimate labels are sampled from non-alerted transactions using a stratified sampling strategy to ensure representation across all channels, BTAs, and customer segments.
- Labels must include a performance window of at least 90 days after the transaction date to allow charge-backs and late-reported fraud to be captured.
Training pipeline
-
Data extraction and labeling
Extract labeled transactions from the data warehouse using the Model Training Extract job. Include all RI values calculated at the time of the transaction (stored in the RI audit log). This ensures the model trains on the same feature values that real-time scoring uses.
-
Feature engineering and validation
Run the Feature Validation Report to check for data drift (PSI), missing value rates, and feature correlation changes against the previous training dataset. Resolve any issues before training begins.
-
Model training
Train the challenger model using the ML Model Configuration Platform (MCP). Select the algorithm type (XGBoost, Neural Network, or Gradient Boosting), configure hyperparameters, and run a k-fold cross-validation (k=5) to evaluate out-of-sample performance.
-
Back-test evaluation
Evaluate the challenger model on a held-out test set (most recent 3 months of data, excluded from training). Record AUC-ROC, KS statistic, precision at alert rate, and the Gini coefficient. All metrics must meet or exceed the thresholds in the Model Validation Policy.
-
Model Risk Management review
Submit the Model Validation Report to the Model Risk Management team. Approval is required before the challenger model can be deployed to shadow mode.
-
Shadow-mode deployment and monitoring
Deploy the challenger model in shadow mode for 30–60 days. Monitor challenger score distribution, FPR, FNR, and PSI weekly. Compare against champion performance using the C/C Dashboard in the Operations Console.
-
Promotion decision
After the evaluation period, the Model Owner presents the promotion recommendation to the Model Risk Management team. If approved, the challenger is promoted to champion via the blue/green deployment process.
Deployment and rollback
Model promotion uses a blue/green deployment: the new champion model is deployed alongside the current champion. Traffic is gradually shifted (10% → 50% → 100% over 48 hours) while monitoring alert volumes and score distributions. If an anomaly is detected, traffic is shifted back to the previous champion immediately.
Rollback procedure
- In the Operations Console, go to Administration > Model Registry > Active Models.
- Select the model to roll back and select Revert to Previous Champion.
- Confirm the rollback. Traffic shifts back to the previous champion within 60 seconds.
- Open an incident ticket and notify the Model Risk Management team within 24 hours of the rollback.
- Investigate the root cause before attempting re-deployment.
threshold-config.yaml and ri-config.ini.