Measuring the Unmeasurable
Cross-sectional analyses tentatively indicate that better FATF ratings are affecting volume of SARs filed, recoveries, fines, and convictions. But how useful is that if we don’t know the true scale of illicit activities? Effectiveness scores, as measured by Immediate Outcomes (IOs) in the Basel AML Index, have stagnated or even declined according to the latest report [1]. Do increased recoveries signal improved effectiveness or merely indicate heightened criminal activity?
Why Current Metrics Are Insufficient
AML, CTF, and sanctions effectiveness measures, and in consequence fraud measures, are misleading if we lack baseline data. Imagine testing a bridge by merely checking if it can carry more weight each time without knowing its maximum capacity — this is essentially how financial crime effectiveness is often evaluated.
More SARs filed? Higher recoveries? While superficially promising, these metrics can mislead. Criminals may intentionally flood the system with false positives, effectively launching a “DDOS attack” on financial institutions, overwhelming compliance and diluting real threats. There may simply be more illicit activity, or system flaws drive the metrics.
The Need for Quantitative Validation
We need to rethink testing frameworks in AML/CTF compliance, adopting rigorous, quantitative validations similar to engineering standards. Mutual evaluations and FATF Immediate Outcomes (IOs) are valuable, but they need to be complemented by systematic, controlled tests to provide a complete picture of effectiveness.
Encouraging Regulatory Approaches
Some regulators, like the UK’s FCA most recently [2], utilize data-driven thematic reviews and examinations as part of their AML supervision methodology. Their approach typically involves detailed onsite examinations, reviews of banks’ policies and procedures, transaction sampling, and assessing the practical application of risk-based controls. However, despite these structured efforts, they still fall short of simulating comprehensive real-world challenges. Critical factors such as investigator expertise, procedural rigor in reviews, and the systems’ overall ability to extract actionable intelligence from transactions are hard to examine without looking at the interplay of systems and operational processes. Further, the nature of real-world threats is the fact that criminals utilize multiple financial institutions and non-financial market participants. Enhancing reviews such as those conducted by the FCA with more comprehensive, controlled real-world tests could significantly strengthen the effectiveness of fraud and AML compliance frameworks.
The Promise and Challenges of Red-Teaming
In cybersecurity, penetration testing and red-teaming have become standard practice. Financial crime prevention would greatly benefit from a similar approach. Real-time, controlled simulations could objectively measure how many simulated illicit transactions and entities are detected by AML frameworks, providing clear metrics of effectiveness. This concept has been recognized by academics more than five years ago [3], underscoring its potential to improve financial crime effectiveness testing significantly.
Unfortunately, authentic red-teaming in financial crime compliance faces significant legal barriers. Activities such as creating false identities, executing illicit transactions, or facilitating unauthorized cross-border movements of funds involving multiple institutions directly violate laws like the Bank Secrecy Act (BSA) in the United States, the Proceeds of Crime Act (POCA) in the UK, and similar AML regulations globally. Additionally, such acts may fall under criminal conspiracy statutes or anti-money laundering provisions, creating severe legal risks for institutions and individuals involved. Given the evolution of legal frameworks, one consideration might be to evaluate whether law enforcement or competent authorities would have a sufficient mandate for such testing.
Bridging the Gap: Measurable Progress Needed
To genuinely assess AML effectiveness, we must understand what should be detected and rigorously measure how well systems detect it. Recent data [4] underscore the complexity. Recoveries, a seemingly solid indicator, remain difficult to benchmark against unknown actual illicit flows.
Cross-sectional data indeed suggests better FATF scores align with improved Immediate Outcomes. Yet, recent global trends reported in the Basel AML Index briefing indicate stagnation or even decline in effectiveness scores worldwide [1].
Towards a Quantitative Testing Framework
The financial crime compliance community can and should embrace rigorous, quantitative testing alongside current evaluative approaches. Controlled simulations and automated AI agent enabled tests can swiftly highlight vulnerabilities and strengths within AML frameworks, paving the way for clearer insights and stronger defenses. Once legal challenges are resolved, red-teaming could form a central pillar of this approach.
Assessing Legal Obstacles and Exemption Possibilities
Given the global nature and being far from a comprehensive legal review the following should serve as an indication of the legal challenges for red-teaming activities and similar highlight possible exemption ideas. The hope is that this starts further research and a broader discussion if some of the cyber-crime good-faith provisions can be applied to financial crime research.
- Criminal offenses for fraud, money laundering, identity misuse, e.g., AMLD, eIDAS, POCA and various provision in US Code Title 18 would need to be exempted by such provisions such as Confidential Informants as per FBI MIOG to engage in Otherwise Illegal Activities. Possibly financial crime research in the future might be related to the “good-faith security research” in CFAA for computer fraud, as financial crime vulnerabilities would, in most circumstances, uncover software configuration vulnerabilities in financial crime detection systems.
- Adding and abetting according to US Title 18 and UK Accessories and Abettors Act as well as various EU member state provisions would, without exemption, prevent working with lawyers, accountants and advisors in a cross-border setting for the purpose of AML & fraud red-teaming. Exemptions such as the above would need to be extended to such individuals involved, ideally without their knowledge to reduce tipping-off the control owners being tested.
- No-Tipping off provisions apply but by treating any SARs raised by the red-team testing as legitimate SARs, perhaps only informing the FIU about the ongoing testing might prevent violating the provision.
- Many activities as part of red-teaming might constitute money transfer agent activity and would need licensing or else present a violation of a licensing requirements. It would be impractical for the purpose of AML red-team testing to receive a public license. There are provisions in Canadian and EU FTA rogation that would allow exemptions, even though it is not clear if these would be granted for the purpose of a red-team exercise.
- Data protection and employment laws may present a challenge in some jurisdictions, but in many circumstances, these may be avoided through clever setup of the red-team activities using identities that waived GDPR rights, are under law-enforcement control or meet litigate interest for legally obligated controls testing.
- Commercial agreements would often include clauses that prohibit the use of systems and services for illicit activities. Such contracts might need amendment for the purpose of red-teaming too include waivers for the limited scope of a red-team exercise.
Conclusion
While live red-teaming in most circumstances may still be legally challenging, the involvement of law enforcement or specific exemptions such as those available in the US show a path forward, especially when red-teaming can be performed within national borders.
The ability to asses a countries AML and fraud prevention readiness though such a quantitative testing approach is likely to uncover several layers of improvement reaching from necessary policy change to technical cooperation mechanisms between obligated entities.
If you’re exploring similar ideas or developing innovative approaches to AML effectiveness enhancement and testing, we’d love to connect and exchange ideas while we are collecting further feedback on such an approach.