Threshold Tuning

Audience: Fraud Operations, Data Science Last updated: March 2026 Version: 4.2

Threshold tuning is the process of calibrating score decision thresholds, Risk Indicator (RI) score scales, and RI weights so that the fraud detection system performs optimally for your institution's portfolio. Tuning balances the fraud catch rate (sensitivity) against the false positive rate — the two key performance indicators for any fraud detection system.

Tuning cadence FraudShield AI requires a formal tuning review at least quarterly. Fraud patterns shift over time: a model tuned in Q1 may be over- or under-alerting by Q3 without adjustment. New product launches, channel expansions, or large-scale fraud events should trigger an out-of-cycle review.

Key tuning concepts

False Positive Rate (FPR)
The percentage of legitimate transactions incorrectly scored as HIGH or CRITICAL. High FPR increases customer friction and operational cost.
False Negative Rate (FNR)
The percentage of fraudulent transactions scored below the REVIEW threshold. High FNR means fraud passes through undetected.
Decision threshold
The composite score value at which the risk level changes (e.g., the boundary between MEDIUM and HIGH). Raising a threshold reduces FPR at the cost of higher FNR.
RI score scale
The mapping function that converts a raw RI value (e.g., transaction velocity = 12 transactions/hour) into a sub-score (0–100). Score scales can be numeric, Boolean, or string-based.
RI weight
The relative importance of an RI in the composite score calculation. Higher-weight RIs have more influence over the final score.

RI score scale types

Every RI uses one of three score scale types to map its raw value to a 0–100 sub-score. Scale definitions live in ri-config.ini.

Numeric scale

Maps a numeric RI value to a score using range brackets. Use this for RIs with continuous numeric outputs (velocity counts, amount values, days elapsed).

ri-config.ini — numeric scale example (RI_VELOCITY_TRANSFER_1H)
[RI_VELOCITY_TRANSFER_1H] scale_type = numeric # Format: <upper_bound> → <sub-score> scale_0_2 = 0 # 0–2 transactions/hour → score 0 (normal) scale_3_5 = 20 # 3–5 → score 20 (slightly elevated) scale_6_10 = 55 # 6–10 → score 55 (elevated) scale_11_20 = 80 # 11–20 → score 80 (high) scale_21_up = 100 # 21+ → score 100 (critical velocity) weight = 1.4

Boolean scale

For binary RIs — a condition is either true or false. The scale assigns a fixed score when true and 0 when false.

ri-config.ini — boolean scale example (RI_TOR_EXIT_NODE)
[RI_TOR_EXIT_NODE] scale_type = boolean score_if_true = 95 # TOR exit node detected → very high risk score_if_false = 0 weight = 1.8

String / categorical scale

Maps specific string values to scores. Use this for channel, country, or transaction type RIs where each category has a defined risk level.

ri-config.ini — string scale example (RI_PAYEE_HIGH_RISK_COUNTRY)
[RI_PAYEE_HIGH_RISK_COUNTRY] scale_type = string # ISO 3166-1 country codes mapped to risk scores score_NG = 90 # Nigeria — FATF grey list score_IR = 100 # Iran — sanctioned jurisdiction score_RU = 85 # Russia — elevated risk score_DEFAULT = 10 # All other countries — baseline weight = 1.6

Tuning decision thresholds

Decision thresholds are set per BTA in threshold-config.yaml. Different payment types warrant different thresholds: a wire transfer to a new international payee carries inherently more risk than an ACH credit, so the HIGH threshold for WEB_WIRE_TRANSFER should be lower (more sensitive) than for ACH_CREDIT_BATCH.

threshold-config.yaml — example
thresholds: - bta_id: "WEB_WIRE_TRANSFER" low_max: 249 medium_max: 499 high_max: 699 critical_min: 700 decision_map: LOW: "APPROVE" MEDIUM: "APPROVE" HIGH: "STEP-UP" CRITICAL: "BLOCK" - bta_id: "ACH_CREDIT_BATCH" low_max: 349 medium_max: 599 high_max: 799 critical_min: 800 decision_map: LOW: "APPROVE" MEDIUM: "APPROVE" HIGH: "REVIEW" CRITICAL: "BLOCK"

Quarterly tuning workflow

  1. Pull the Performance Report

    In the FraudShield Operations Console, go to Reports > Model Performance. Download the quarterly report for each active BTA. Key metrics: FPR, FNR, precision, recall, and alert volume trend.

  2. Identify out-of-range RIs

    Review the RI Contribution Report for RIs with consistently low sub-scores across fraud cases. These RIs may be over-weighted relative to their predictive value. Flag any RI where the average fraud-case sub-score is below 30.

  3. Run threshold simulation

    Use the Threshold Simulator (Operations Console > Tuning > Simulate) to replay the last 90 days of transactions against proposed new thresholds. Review the simulated FPR/FNR impact before changing production settings.

  4. Update configuration files

    Apply approved changes to ri-config.ini and threshold-config.yaml in the staging environment. Validate with a shadow-mode run (scoring without influencing decisions) for 5–10 business days.

  5. Promote to production

    Submit a change request and, once approved, deploy configuration files to production. Monitor alert volume, FPR, and FNR for the first 48 hours using the Operations Dashboard.

  6. Document the tuning rationale

    Record the business rationale, simulation results, and approvals in the Model Change Log. This forms part of the model governance audit trail required under SR 11-7 and CFPB model risk guidelines.

Population-based tuning

For RIs whose optimal thresholds vary significantly by customer segment (for example, high-net-worth clients vs. basic retail accounts), FraudShield AI supports population-scoped RI configurations. Set the population_scope parameter on any RI to apply a separate score scale for a defined customer segment.

ri-config.ini — population-scoped RI example
[RI_AMOUNT_SPIKE_3SD] scale_type = numeric weight = 1.5 # Default scale (all populations) scale_0_1sd = 0 scale_1_2sd = 30 scale_2_3sd = 65 scale_3up = 90 # Population override: High Net Worth (HNW) — wider baseline, lower scores [RI_AMOUNT_SPIKE_3SD:segment=HNW] scale_0_1sd = 0 scale_1_2sd = 10 scale_2_3sd = 30 scale_3up = 60
Start with the default thresholds For new deployments, run the default configuration for 30 days before making tuning changes. You need sufficient data — at least 1,000 scored transactions per BTA — for threshold simulation to produce statistically meaningful results.