Unleashing the Power of Large Language Models for
Legal Applications
Dell Zhang
Thomson Reuters Labs
London, UK
dell.z@ieee.org
Alina Petrova
Thomson Reuters Labs
London, UK
alina.petrova@thomsonreuters.com
Dietrich Trautmann
Thomson Reuters Labs
Zug, Switzerland
dietrich.trautmann@thomsonreuters.com
Frank Schilder
Thomson Reuters Labs
Eagan, Minnesota, USA
frank.schilder@thomsonreuters.com
ABSTRACT
The use of Large Language Models (LLMs) is revolutionizing the
legal industry. In this technical talk, we would like to explore the
various use cases of LLMs in legal tasks, discuss the best practices,
investigate the available resources, examine the ethical concerns,
and suggest promising research directions.
CCS CONCEPTS
· Information systems → Information retrieval;· Comput-
ing methodologies → Natural language processing; Neural
networks;· Applied computing → Law.
KEYWORDS
legal data mining, legal information retrieval, legal natural language
processing, legal knowledge management, large language models
ACM Reference Format:
Dell Zhang, Alina Petrova, Dietrich Trautmann, and Frank Schilder. 2023.
Unleashing the Power of Large Language Models for Legal Applications.
In Proceedings of the 32nd ACM International Conference on Information
and Knowledge Management (CIKM ’23), October 21–25, 2023, Birmingham,
United Kingdom. ACM, New York, NY, USA, 2 pages. https://doi.org/10.1145/
3583780.3615993
1 USE CASES
The legal industry is currently undergoing a signifcant transforma-
tion due to the use of Large Language Models (LLMs) [1]. GPT-4 Ð
OpenAI’s latest LLM Ð has stunned lawyers by passing the Uniform
Bar Exam with the 90th-percentile overall performance. The poten-
tial of such LLMs to understand and generate professional legal text
is being harnessed to automate routine legal tasks, thereby increas-
ing efciency and reducing costs [2]. Thomson Reuters, for instance,
is leveraging these models to provide advanced solutions to a wide
variety of use cases including legal search and research (Westlaw
Precision and Practical Law), legal document review and summary
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for proft or commercial advantage and that copies bear this notice and the full citation
on the frst page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).
CIKM ’23, October 21–25, 2023, Birmingham, United Kingdom
© 2023 Copyright held by the owner/author(s).
ACM ISBN 979-8-4007-0124-5/23/10.
https://doi.org/10.1145/3583780.3615993
(HighQ and Document Intelligence), legal contract drafting (plugin
for Microsoft 365 Copilot), and so on.
2 BEST PRACTICES
2.1 Legal Prompting
A key factor in the success of LLMs is how they are prompted.
A well-designed sequence of prompts can steer an LLM towards
specifc legal context and objective, while generating legal text with
higher relevance and accuracy. Chain-of-Thought (CoT) or Tree-
of-Thought (ToT) prompting are particularly important for legal
LLMs, as legal documents are usually long and legal tasks often
require multiple steps, Here we will explain our investigations into
legal prompt engineering techniques for multilingual legal judgment
prediction [3], long legal document classifcation
1
, and complex
legal reasoning [4, 5]. For example, one of the key fndings is that the
best legal reasoning performance for the COLIEE entailment task
is achieved with prompts derived from a specifc logical structure
such as IRAC (Issue, Rule, Application, Conclusion) mimicking a
lawyer’s thinking process.
2.2 Legal Embedding
The legal domain demands that the information presented by an
LLM is verifable and supported by authoritative sources. Although
łhallucinationž is arguably an inherent problem of GPT-style gener-
ative LLMs, it could be greatly alleviated by grounding LLM outputs
in trustworthy legal content such as the Westlaw Precision or Prac-
tical Law documents that are meticulously curated by our Subject
Matter Experts (SMEs). In other words, an LLM specifcally de-
signed for the legal feld should not just provide answers, but also
include hyperlinks to the sources it cites. This can be achieved
by adopting the Retrieval-Augmented Generation (RAG) framework
where an LLM is augmented with a vector database as its external
long-term memory storing the embeddings of legal text nuggets.
Here we will discuss the technical design choices for the computa-
tion of embeddings. As the legal domain often relies on analogy to
craft compelling legal arguments, the ability to perform embedding-
based semantic search (to locate similar cases etc.) is especially
crucial to the utilization of LLMs in many legal workfows.
1
https://www.swisstext.org/programme/#
5257