Solving Bar Exam Questions with Deep Neural Networks Adebayo Kolawole John Department of Computer Science, University of Torino Corso Svizzera 185 Torino, 10149, Italy collawolley3@yahoo.com Luigi Di Caro Department of Computer Science, University of Torino Corso Svizzera 185 Torino, 10149, Italy dicaro@di.unito.it Guido Boella Department of Computer Science, University of Torino Corso Svizzera 185 Torino, 10149, Italy guido@di.unito.it ABSTRACT In this paper, we present a system which solves a Bar Ex- amination written in Natural Language. The proposed sys- tem exploits the recent techniques in Deep Neural Networks which have shown promise in many Natural Language Pro- cessing (NLP) applications. We evaluate our system on a real Legal Bar Examination, the United States Multi- State Bar Examination (MBE), which is a multi-choice 200- questions exam for aspiring lawyers. We show that our sys- tem achieves good performance without relying on any ex- ternal knowledge. Our work comes with an added effort of curating a small corpus, following similar question answer- ing datasets from the well-known MBE examination. The proposed system beats a TFIDF-based baseline, while show- ing a strong performance when modified for a legal Textual Entailment evaluation. 1. INTRODUCTION Many tasks in Natural Language Processing (NLP) in- volve generation of semantic representation for proper text understanding. For example, tasks like Textual Entailment [5] and Question Answering [11, 31] involve deep semantic understanding of the text since a popular approach like the Bag of Words (BOW) has limitations due to natural lan- guage ambiguity. Question Answering (QA) tasks follow the Human learn- ing and testing process. For instance, a student reads a course note in order to obtain some facts and background knowledge. The student then answers any question based on the facts available to him. This is the main essence of learning, which is about ’committing to memory’ and ’gen- eralizing’ to new events. Even though learning seems to be a natural phenomenon to humans, it is nevertheless still a challenging goal for computers to replicate. Researchers working in the Computer Science field of Machine Learning (ML) often employ methods to analyze existing data in or- der to predict the likelihood of uncertain outcomes. These methods usually produce results that approximate human capabilities [19]. The term ML is actually a broad term used to describe supervised or unsupervised approaches for making the com- puter identify patterns in our data. Usually, a human hand- In: Proceedings of the Second Workshop on Automated Semantic Analysis of Infor- mation in Legal Text (ASAIL 2017), June 16, 2017, London, UK. Copyright © 2017 held by the authors. Copying permitted for private and academic purposes. Published at http://ceur-ws.org crafts some features from the data, and the extracted fea- tures are then shown to the algorithm for it to learn the latent discriminating features. Finally, the algorithm learns to predict the outcome of an unseen event. Neural Networks (NN) [8] are now extensively used by researchers because they offer a higher representational power. NN try to mimic the cognitive system of the human. They have a lot of inter- connected nodes. Each node receives some inputs from the lower layer nodes, performs a computation on the input by using some non-linear functions, and lastly, the node trans- mits its output to the nodes in the layer above it. Such a network with many interconnecting layers stacked is called a Deep Neural Network (DNN) [24]. When performed by a human, QA requires some form of cognitive abilities such as reasoning, meta-cognition, the contextual perception of abstract concepts, intelligence, and language comprehension. Although machines are yet to repli- cate a strong cognitive ability like a human, nevertheless, the non-cognitive computational techniques that employ heuris- tics and statistical approximation can rightly model most problems while giving an ’intelligent’ result which is close to that from a human [27]. We leverage this assumption by taking for granted the cognitive capability comparison to our system. Instead, the goal is to achieve a result that is presumed acceptable by a human examiner. In the QA task, systems are provided with a text passage containing some facts or background knowledge, and a ques- tion which is related to that text passage. Furthermore, an answer to the question is provided. The system is then given a similar but slightly different question and is expected to answer it from the same background knowledge. The remaining part of the paper is organized as follows. In the next section, we review the related work. This is followed by a description of the MBE Exam and the corpus used for the experiment. Next, we describe our approach. Finally, we describe the experiment and evaluation. 2. RELATED WORK NNs have shown good performance in many NLP tasks including QA. The authors in [31, 12] achieved an excellent result with DNN for QA. In particular, [31] achieved 100% accuracy on some tasks. 1 Similarly, the work of [26] and the Answer-Sentence Selection proposed by Feng [10] are also based on NN. A considerable portion of the QA systems use 1 e.g. the single supporting facts and two supporting facts on BaBi dataset. A similar result was reported for CBT and Simple Question datasets. The datasets are accessible at https://research.facebook.com/research/babi/