Solving Bar Exam Questions with Deep Neural Networks
Adebayo Kolawole John
Department of Computer
Science, University of Torino
Corso Svizzera 185
Torino, 10149, Italy
collawolley3@yahoo.com
Luigi Di Caro
Department of Computer
Science, University of Torino
Corso Svizzera 185
Torino, 10149, Italy
dicaro@di.unito.it
Guido Boella
Department of Computer
Science, University of Torino
Corso Svizzera 185
Torino, 10149, Italy
guido@di.unito.it
ABSTRACT
In this paper, we present a system which solves a Bar Ex-
amination written in Natural Language. The proposed sys-
tem exploits the recent techniques in Deep Neural Networks
which have shown promise in many Natural Language Pro-
cessing (NLP) applications. We evaluate our system on
a real Legal Bar Examination, the United States Multi-
State Bar Examination (MBE), which is a multi-choice 200-
questions exam for aspiring lawyers. We show that our sys-
tem achieves good performance without relying on any ex-
ternal knowledge. Our work comes with an added effort of
curating a small corpus, following similar question answer-
ing datasets from the well-known MBE examination. The
proposed system beats a TFIDF-based baseline, while show-
ing a strong performance when modified for a legal Textual
Entailment evaluation.
1. INTRODUCTION
Many tasks in Natural Language Processing (NLP) in-
volve generation of semantic representation for proper text
understanding. For example, tasks like Textual Entailment
[5] and Question Answering [11, 31] involve deep semantic
understanding of the text since a popular approach like the
Bag of Words (BOW) has limitations due to natural lan-
guage ambiguity.
Question Answering (QA) tasks follow the Human learn-
ing and testing process. For instance, a student reads a
course note in order to obtain some facts and background
knowledge. The student then answers any question based
on the facts available to him. This is the main essence of
learning, which is about ’committing to memory’ and ’gen-
eralizing’ to new events. Even though learning seems to
be a natural phenomenon to humans, it is nevertheless still
a challenging goal for computers to replicate. Researchers
working in the Computer Science field of Machine Learning
(ML) often employ methods to analyze existing data in or-
der to predict the likelihood of uncertain outcomes. These
methods usually produce results that approximate human
capabilities [19].
The term ML is actually a broad term used to describe
supervised or unsupervised approaches for making the com-
puter identify patterns in our data. Usually, a human hand-
In: Proceedings of the Second Workshop on Automated Semantic Analysis of Infor-
mation in Legal Text (ASAIL 2017), June 16, 2017, London, UK.
Copyright © 2017 held by the authors. Copying permitted for private and academic
purposes.
Published at http://ceur-ws.org
crafts some features from the data, and the extracted fea-
tures are then shown to the algorithm for it to learn the
latent discriminating features. Finally, the algorithm learns
to predict the outcome of an unseen event. Neural Networks
(NN) [8] are now extensively used by researchers because
they offer a higher representational power. NN try to mimic
the cognitive system of the human. They have a lot of inter-
connected nodes. Each node receives some inputs from the
lower layer nodes, performs a computation on the input by
using some non-linear functions, and lastly, the node trans-
mits its output to the nodes in the layer above it. Such a
network with many interconnecting layers stacked is called
a Deep Neural Network (DNN) [24].
When performed by a human, QA requires some form
of cognitive abilities such as reasoning, meta-cognition, the
contextual perception of abstract concepts, intelligence, and
language comprehension. Although machines are yet to repli-
cate a strong cognitive ability like a human, nevertheless, the
non-cognitive computational techniques that employ heuris-
tics and statistical approximation can rightly model most
problems while giving an ’intelligent’ result which is close
to that from a human [27]. We leverage this assumption
by taking for granted the cognitive capability comparison to
our system. Instead, the goal is to achieve a result that is
presumed acceptable by a human examiner.
In the QA task, systems are provided with a text passage
containing some facts or background knowledge, and a ques-
tion which is related to that text passage. Furthermore, an
answer to the question is provided. The system is then given
a similar but slightly different question and is expected to
answer it from the same background knowledge.
The remaining part of the paper is organized as follows.
In the next section, we review the related work. This is
followed by a description of the MBE Exam and the corpus
used for the experiment. Next, we describe our approach.
Finally, we describe the experiment and evaluation.
2. RELATED WORK
NNs have shown good performance in many NLP tasks
including QA. The authors in [31, 12] achieved an excellent
result with DNN for QA. In particular, [31] achieved 100%
accuracy on some tasks.
1
Similarly, the work of [26] and the
Answer-Sentence Selection proposed by Feng [10] are also
based on NN. A considerable portion of the QA systems use
1
e.g. the single supporting facts and two supporting facts
on BaBi dataset. A similar result was reported for CBT
and Simple Question datasets. The datasets are accessible
at https://research.facebook.com/research/babi/