Unleashing the Power of Large Language Models for Legal Applications Dell Zhang Thomson Reuters Labs London, UK dell.z@ieee.org Alina Petrova Thomson Reuters Labs London, UK alina.petrova@thomsonreuters.com Dietrich Trautmann Thomson Reuters Labs Zug, Switzerland dietrich.trautmann@thomsonreuters.com Frank Schilder Thomson Reuters Labs Eagan, Minnesota, USA frank.schilder@thomsonreuters.com ABSTRACT The use of Large Language Models (LLMs) is revolutionizing the legal industry. In this technical talk, we would like to explore the various use cases of LLMs in legal tasks, discuss the best practices, investigate the available resources, examine the ethical concerns, and suggest promising research directions. CCS CONCEPTS · Information systems Information retrievalComput- ing methodologies Natural language processing; Neural networksApplied computing Law. KEYWORDS legal data mining, legal information retrieval, legal natural language processing, legal knowledge management, large language models ACM Reference Format: Dell Zhang, Alina Petrova, Dietrich Trautmann, and Frank Schilder. 2023. Unleashing the Power of Large Language Models for Legal Applications. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management (CIKM ’23), October 21–25, 2023, Birmingham, United Kingdom. ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/ 3583780.3615993 1 USE CASES The legal industry is currently undergoing a signifcant transforma- tion due to the use of Large Language Models (LLMs) [1]. GPT-4 Ð OpenAI’s latest LLM Ð has stunned lawyers by passing the Uniform Bar Exam with the 90th-percentile overall performance. The poten- tial of such LLMs to understand and generate professional legal text is being harnessed to automate routine legal tasks, thereby increas- ing efciency and reducing costs [2]. Thomson Reuters, for instance, is leveraging these models to provide advanced solutions to a wide variety of use cases including legal search and research (Westlaw Precision and Practical Law), legal document review and summary Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for proft or commercial advantage and that copies bear this notice and the full citation on the frst page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). CIKM ’23, October 21–25, 2023, Birmingham, United Kingdom © 2023 Copyright held by the owner/author(s). ACM ISBN 979-8-4007-0124-5/23/10. https://doi.org/10.1145/3583780.3615993 (HighQ and Document Intelligence), legal contract drafting (plugin for Microsoft 365 Copilot), and so on. 2 BEST PRACTICES 2.1 Legal Prompting A key factor in the success of LLMs is how they are prompted. A well-designed sequence of prompts can steer an LLM towards specifc legal context and objective, while generating legal text with higher relevance and accuracy. Chain-of-Thought (CoT) or Tree- of-Thought (ToT) prompting are particularly important for legal LLMs, as legal documents are usually long and legal tasks often require multiple steps, Here we will explain our investigations into legal prompt engineering techniques for multilingual legal judgment prediction [3], long legal document classifcation 1 , and complex legal reasoning [4, 5]. For example, one of the key fndings is that the best legal reasoning performance for the COLIEE entailment task is achieved with prompts derived from a specifc logical structure such as IRAC (Issue, Rule, Application, Conclusion) mimicking a lawyer’s thinking process. 2.2 Legal Embedding The legal domain demands that the information presented by an LLM is verifable and supported by authoritative sources. Although łhallucinationž is arguably an inherent problem of GPT-style gener- ative LLMs, it could be greatly alleviated by grounding LLM outputs in trustworthy legal content such as the Westlaw Precision or Prac- tical Law documents that are meticulously curated by our Subject Matter Experts (SMEs). In other words, an LLM specifcally de- signed for the legal feld should not just provide answers, but also include hyperlinks to the sources it cites. This can be achieved by adopting the Retrieval-Augmented Generation (RAG) framework where an LLM is augmented with a vector database as its external long-term memory storing the embeddings of legal text nuggets. Here we will discuss the technical design choices for the computa- tion of embeddings. As the legal domain often relies on analogy to craft compelling legal arguments, the ability to perform embedding- based semantic search (to locate similar cases etc.) is especially crucial to the utilization of LLMs in many legal workfows. 1 https://www.swisstext.org/programme/# 5257