Latent Semantic Analysis for Bulgarian literature Preslav Ivanov Nakov The paper presents the results of experiments of usage of LSA for analysis of textual data. The method is explained in brief and special attention is pointed on its potential for comparison of Bulgarian literature texts. Two hypotheses are tested: • The texts from the same author are alike and can be automatically discovered; • The texts belonging to different periods can be distinguished automatically. Latent Semantic Analysis The Latent Semantic Analysis (LSA) is a powerful statistical technique for indexing, retrieval and analysis of textual information used in different fields of the human cognition during the last decade. The method is fully automatic and is based on the general idea that there exists a set of latent dependencies between the words and their contexts (phrases, paragraphs and texts). Their identification and proper treatment permits LSA to deal successfully with the synonymy and partially with the polysemy. LSA starts with the construction of a term to document occurrence frequency matrix, which is then submitted to singular value decomposition (SVD). As a result each term or document is associated a vector of low dimensionality (e.g. 100). The proximity between two documents can be calculated as the dot product between their normalised vectors. (see [1,2,3] for details) Application to Bulgarian literature texts The experiments were performed on Bulgarian literature texts we found in the Virtual Library for Bulgarian Literature at: http://slovo.orbitel.bg ([4]). We selected all the 3032 available texts for the following authors grouped by period (the text counts are in parentheses):