Lexical-Syntactic and Graph-Based Features for Authorship Verification Notebook for PAN at CLEF 2013 Darnes Vilariño, David Pinto, Helena Gómez Saúl León, and Esteban Castillo Benemérita Universidad Autónoma de Puebla Faculty of Computer Science, Mexico {darnes, dpinto,saul.leon,helena.adorno}@cs.buap.mx, ecjbuap@gmail.com Abstract. In this paper we present the results obtained by an approach submi- tted to the author identification task of PAN 2013 which uses lexical, syntactic and graph-based features for constructing a representation model of document authors. In particular, the features extracted from the graph representation were obtained by means of the SubDue mining tool. As a classification model we have employed Support Vector Machines (SVM). The overall results have ranked our approach in the fifth place from around 17 teams. Keywords: Authorship verification, graph-based representation, phrase-level lexical- syntactic features, support vector machines 1 Introduction Authorship verification is the task of determinining if a document has been written by a given author or not. This task is particularly important for forensic linguists who are often called upon to answer this kind of question. This task has been empowered by the continuous growing of information in Internet, thus, the importance of finding the correct features for characterizing the particular writing style of a given author is fundamental for solving the problem of authorship verification. The results reported in this paper were obtained in the framework of the Interna- tional Workshop on Plagiarism detection, Author Identification, and Author Profilling (PAN’13). In particular, in the task named “Author Identification” which has focused this year in the problem of authorship verification which may be described as follows: “Given a small set (no more than 10, possibly as few as one) of “known” documents by a single person and a “questioned” document, the task is to determine whether the questioned document was written by the same person who wrote the known document set”. In order to tackle this problem, we propose to extract a set of lexical syntactic level features from each target document, and up to 100 words which are representative of each author. These representative words are selected through the tool “SubDue” (des- cribed in Section 2.2) in order to construct a representation of the whole documents written by the given author using a graph structure.