GoingDigital is part of the GermanLawInternational platform.

GoingDigital is part of the GermanLawInternational platform.

Current Issue

From black box to decision support

Listen to article
Summarize article
Share on LinkedIn
Share by mail
Copy URL
Print

Artificial intelligence opens new possibilities for compliance – while raising new questions. How mature is the technology really, and where are its limits?

Compliance teams face mounting pressure to digitize and improve efficiency. Yet uncertainty persists about the technology’s actual maturity. Many find themselves caught in what we might call the “messy middle” – between hype and reality, between strategic ambition and operational feasibility. AI still feels like a black box to most: Difficult to understand, technically complex, and opaque in how it works.

Technological innovation in compliance is not new. Rule-based systems entered the field in the 2010s, monitoring transactions, managing policies. But ever since generative AI models have gained ground, the landscape has fundamentally shifted. These systems are not just faster – they are more flexible, and they demand new approaches to governance, control, and expertise.

The EQS AI Benchmark Report is the first-ever report to deliver reliable data on how leading AI models perform in real compliance environments. The results indicate that AI can make substantial contributions to compliance work – provided it is deployed strategically with clearly defined tasks.

What AI can do today: Low error rates, high consistency

The benchmark tested models across 120 realistic tasks spanning ten compliance areas. Performance was strongest in structured, specific activities: Classification, prioritization, and data extraction. The hallucination rate – cases where models generate factually incorrect or fabricated content – averaged just 0.71% across all models. While hallucination rates have been dropping for newer models overall, the benchmark demonstrates something more: With well-defined tasks and thoughtful prompts, this risk can be minimized further.

Hallucinations pose significant risks in compliance processes. That makes it even more noteworthy that the tested models performed correctly in nearly all cases. One example: Claude Opus 4.1 was asked to identify whistleblower reports. It correctly flagged all relevant cases of potential retaliation, including a borderline scenario where an employee was transferred to another region after filing a report.

Consistency was equally high. In repeat tests, models provided identical answers to multiple-choice questions over 95% of the time. This indicates strong stability for clearly defined tasks.

Third-party due diligence provides a clear example. AI agents can already analyze screening reports and identify risks with high precision and speed. Models also perform convincingly in initial reviews of reports and risk classification.

One benchmark example simulated a complete conflict of interest workflow – from categorization through risk assessment to selecting appropriate countermeasures. The top-performing models handled this process largely autonomously. However, accuracy dropped to around 70% for certain steps – indicating that human checkpoints remain essential.

Limits of automation: Where AI needs support

The results also reveal clear boundaries. While models consistently deliver high accuracy on structured tasks, performance varies considerably with open-ended, complex questions. These include assessing cultural risks or deriving recommended actions, tasks that require substantial contextual understanding and judgment. The tested systems performed noticeably weaker here than on clearly defined tasks.

Significant differences emerged between individual models. Google Gemini 2.5 Pro and GPT-5 led in nearly all categories, while older models like GPT-4o or Mistral Large 2 lagged behind by more than 60 percentage points in some cases – particularly on analytical and interpretive tasks.

These findings underscore two points. Human oversight remains indispensable. AI can support compliance teams but cannot replace them. And choosing the right model – and ideally the newest one – is critical to unlocking the technology’s potential.

The new role of compliance teams

It is not entirely new that technological progress is changing the role of compliance professionals. In the future, this role will increasingly demand skills specific to working with AI systems: Defining tasks and prompts, reviewing results, and providing feedback. Day-to-day, this means operational compliance work will recede while governance grows in importance.

Some organizations are already developing hybrid role profiles where compliance experts with a technological grounding work closely with data scientists and IT. In the long term, we expect this to become the standard in many compliance teams, as cross-functional collaboration is the prerequisite for using AI effectively and responsibly.

Between progress and duty: AI in the regulatory framework

The more AI becomes part of decision-making processes, the more important ethical responsibility for its use will become. Transparency, traceability, and documentation are not just ethical requirements; they are regulatory imperatives.

The EU takes a more comprehensive regulatory approach with the AI Act than the US, where rules vary by state. If a decision to reject a business partner based on AI use is not documented traceably, it can create substantial problems in legal disputes – especially when discriminatory patterns have not been ruled out.

What does this mean for compliance teams? AI use presents a dual challenge. Compliance professionals are both users and overseers: Responsible for deploying the technology sensibly while managing its risks. They must make use of efficiency gains without losing sight of regulatory and ethical standards.

Between experiment and strategy: Shaping AI use deliberately

Beyond answering where AI can be deployed effectively in compliance today and where human control remains necessary, the AI Benchmark Report delivers clear recommendations for compliance teams working with AI:

  • Launch pilot projects: Start with clearly bounded, structured tasks. This allows early wins and builds experience.
  • Select models strategically: One-size-fits-all does not work. Test different models on your use cases and leverage their specific strengths.
  • Evaluate vendors critically: Which models are they using? Are they up-to-date and how is quality assured? Transparency is mandatory.
  • Communicate with nuance: Emphasize AI’s capabilities but don’t hide its limits. Human control remains essential.


Step by step, AI is becoming an integral part of compliance work. Its role will continue to evolve in coming years, from individual tools to integrated, agent-based systems that support entire workflows and prepare decisions. Control mechanisms and human oversight remain prerequisites.

The central challenge persists: Connecting technology and expertise meaningfully. The Benchmark Report provides a reliable foundation for doing so. The challenge now lies with compliance professionals to translate this insight into practice.

Author

Moritz Homann EQS Group, Munich Director Product Innovation & Artificial Intelligence

Moritz Homann

EQS Group, Munich
Director Product Innovation & Artificial Intelligence


moritz.homann@eqs.com
www.eqs.com