The European Union’s AI Act is coming. Its provisions will also have an impact on internal investigations, in particular on eDiscovery measures, so it is important to assess which aspects must already be considered now.
eDiscovery for tracking down critical data and facts
eDiscovery relating to internal or legal investigations is often a challenge for business processes. Organizations often need support from external counsel and specialized service providers to tackle the massive amount of data from digital information and communication technologies that is used for the complex eDiscovery process, especially for large-scale projects, and often under significant time pressure.
Since 2005, the Electronic Discovery Reference Model (EDRM – see figure 1) has established a standard framework for the activities in the eDiscovery process, and has helped to better coordinate the necessary activities at interfaces and create a common understanding. Even if the model has a linear structure, steps in projects are often carried out iteratively. Advanced software solutions are available for the individual steps, which are either end-to-end or can be combined in a meaningful way.
Using artificial intelligence for eDiscovery
The real challenge initially lies in determining the amount of factually relevant source data from an organization’s enormous amount of distributed data, which is usually held in a wide variety of formats, mountains of files and business documents. Despite good scoping, the amount of data still remains enormous – not least due to the fact that there is more and more stored data in organizations as a result of the digitalization of numerous business processes.
A quick overview of the facts, an initial assessment of the case and making a reduction to the relevant data only are therefore the key to success. In this context, success also means a cost-effective approach in a timely manner without compromising quality. Artificial intelligence methods have already been playing a key role in the field of eDiscovery for some time to achieve this goal. These AI approaches all aim to reduce manual human effort and achieve faster and possibly better results by automating tasks. Some typical approaches are:
- Intelligent Character Recognition and Translation: As part of the digitalization of text documents using optical character recognition (OCR) to generate fully searchable text files, methods of context analysis (intelligent character recognition, ICR) are now used to improve the quality of the results and reduce manual work.
- Early case assessment: Use of unsupervised learning approaches. The software clusters and illustrates the content of texts without human intervention and then allows an organization to quickly identify relevant persons, topics and dependencies to derive search terms for review.
- Accelerating the review with Technology Assisted Review (TAR): In this variant of supervised learning, the algorithm is provided with a training set of documents for initial training. The solution then learns via manual relevance feedback, so it is increasingly able to automatically distinguish relevant from non-relevant documents.This allows a review prioritization and/or the exclusion of documents no longer relevant to the investigation with a defined confidence level.
- Entity Based Recognition: Automated recognition of defined data objects without prior provision of specific object lists, e.g., to identify e-mail addresses, names, etc., which is particularly relevant for redactions required under data protection requirements.
The rise of Large Language Models (LLM) is significantly changing the situation:
- Generative AI solutions allow a different approach to interacting with a large scale of documents regarding document summarization, responsiveness, identifying personal data and initial contract review.
- The “Google-style” interrogation of large volumes of structured and unstructured data via prompting can lead to faster and more accurate findings than with classic human or conventional AI approaches.
- One major challenge – due to the sensitive nature of the documents – is the use of LLMs in a secure environment. Deloitte has developed NavigAite for this purpose, which enables a further assisted review via an interface (API) with the review platform and allows the use of different LLMs.
The EU AI Act – European framework for artificial intelligence
All uses of AI for the purposes of eDiscovery will very soon have to comply with the provisions of the EU Artificial Intelligence Act. The AI Act contains obligations that apply in addition to the requirements of the EU General Data Protection Regulation, which remains unaffected by the AI Act.
The AI Act, which the European Commission considers to be the first-ever comprehensive legal framework on Artificial Intelligence worldwide, will enter into force 20 days following its publication in the Official Journal, most likely in June or July 2024. Some of its provisions will already apply six months after its entry into force, while the other provisions will apply 12, 24 and 36 months following its entry into force. Since preparations for compliance can take significant amounts of time, preparation for the AI Act should start as early as possible. Possible fines for non-compliance with the AI Act will reach 7% of total worldwide turnover of an undertaking for the preceding financial year, or €35 million, whichever is higher.
The scope of the AI Act is broad: “AI systems” covers any kind of machine-based system that is designed to operate with a certain degree of autonomy and that may exhibit adaptiveness after deployment, and that, for explicit or implicit objectives, infers from the input it receives, how to generate outputs, such as predictions, content, recommendations, or decisions that can influence physical or virtual environments. While this definition does not include systems that are based on rules defined solely by natural persons to automatically execute operations, it does include machine learning approaches, and logic- and knowledge-based approaches. In addition, there are specific provisions for “general-purpose AI models”, e.g., large language models (LLMs).
The obligations set by the AI Act are based on different categories of risk: It is considered that certain AI practices create unacceptable risks and are therefore prohibited. High-risk AI systems are subject to detailed requirements, while other AI systems are only subject to certain transparency obligations, or no obligations at all, if there are only minimal risks or no relevant risks.
eDiscovery best practices and the AI Act
In light of the broad scope of the AI Act, classifying typical eDiscovery solutions will become a crucial step to ensure compliance. Implementing acts and technical standards will provide further clarifications, but some general themes and considerations can already be developed:
- First, all actors will need to ensure clarity as to whether they qualify as a “provider” of an AI system, or merely as a user, which the AI Act refers to as a “deployer”. “Provider” has a broad definition and includes organizations that develop AI systems or that have AI systems developed and placed on the market, or that put AI systems into service under their own name, whether for payment or free of charge. Under the AI Act, providers have significantly more obligations than deployers, which are defined as organizations using an AI system under their authority (except where the AI system is used in the course of a personal non-professional activity). It is important to remember that deployers can become providers e.g., if they make substantial modifications to an AI system, or if they modify the intended purpose of an AI system.
- Article 5 (Prohibited AI Practices) must be kept in mind, but it seems unlikely that good practices of eDiscovery will reach the relevant thresholds. One of the situations addressed in Article 5 is “risk assessments of natural persons in order to predict the risk of a natural person committing a criminal offence”. However, these would have to occur “based solely on the profiling of a natural person or on assessing their personality traits and characteristics”, which would not be in line with current eDiscovery best practices.
Similarly, Article 5(1)(f) prohibits the putting into service of certain forms of AI systems to infer emotions of a natural person in the area of workplace institutions. Consequently, sentiment analysis with AI systems might require a more detailed analysis based on the specific circumstances. - There are also a number of situations where the requirements of the AI Act for high-risk AI systems may require a more detailed analysis. Among the items on the list in Annex III is the category “Employment, workers management and access to self-employment”, which includes AI systems “intended to be used to make decisions affecting terms of work-related relationships, the […] termination of work-related contractual relationships, […] or to monitor and evaluate the performance and behaviour persons in such relationships”. While this appears, at least at first sight, to be potentially relevant to investigations into e.g., the conduct of individual employees, the AI system in question would, under current best practices, not be “making decisions”, but would primarily assist reviewers e.g., by prioritizing documents for fact finding, or identifying communication networks and support inside and outside counsel in their decision-making e.g., in the field of early case assessment.
Article 5(3) of the AI Act makes clear that AI systems that are “high risk” under Annex III may still be considered to be not high-risk if they do not pose a significant risk of harm, including not materially influencing the outcome of decision making, because the AI system e.g., performs a narrow procedural task, or just performs a preparatory task. At the same time, profiling of natural persons by an AI system would remain high-risk. Providers would have to document this assessment.
The European Commission will provide guidelines specifying the practical implementation of these requirements, including a comprehensive list of practical examples of use cases that are high-risk and not high-risk (see Article 6 [5]). - Providers of general-purpose AI models have additional documentation obligations and must provide detailed information about the models to providers who implement the general-purpose AI model into an AI system.
Recommended steps to take now – short-term action required
Organizations should immediately start creating inventories of AI systems which they provide and/or which they deploy. This kind of proactive, systematic approach will be helpful for all the following steps taken to comply with the AI Act.
The most urgent assessment needs to address prohibited AI practices: The provisions of the AI Act addressing this topic will start having legal effects in Q4/2024, so organizations need to ensure compliance quickly.
As discussed above, some best practices in eDiscovery may require a more detailed analysis, as they might qualify as high-risk AI systems under Chapter III of the AI Act (see figure 2). The European Commission will publish additional guidance, including specific use cases, but organizations should not wait for these. In the context of eDiscovery, the challenges in this category appear to be manageable with appropriate use of AI systems and documentation.
Transparency obligations, e.g., for chatbots will even apply to systems that are not high-risk, and organizations should also be evaluating which measures they need to initiate in this area.
Author
Dr. Martin Braun
WilmerHale Frankfurt/Main, Brussels
Partner
martin.braun@wilmerhale.com
www.wilmerhale.com
Author
Dr. Marcus Pauli
Deloitte Wirtschaftsprüfungsgesellschaft, Munich
Partner
Author
Helmut Brechtken
Deloitte Wirtschaftsprüfungsgesellschaft, Cologne
Partner