1 Classification and Categorization in Computer Assisted Reading and Analysis of Texts Jean Guy Meunier, Université du Québec à Montréal Dominic Forest, Université du Québec à Montréal Ismail Biskri, Université du Québec à Trois-Rivières 1. Introduction 1.1. CARAT: general presentation In the early 1960, computer appeared as a revolutionary tool for computing mathematical symbols. Even the numerical ones! “ Scientists in AI saw computers as machines that manipulated symbols. The great thing was, they said, that every thing could be encoded into symbols, even numbers. ” (Newell, 1983, p. 196) Surprisingly though, the real impact of the computer technology was not processing numerical symbols but a whole class of other types of symbols, such as the natural language ones. Indeed the computer has been then widely applied to text processing so much that today most of the processing done by computer is applied to text (Internet, e-mails, documents, etc). Since then, research in cognitive sciences and information technologies (IT) has had an important impact on the reading and analysis of texts. And the particular field of the humanities has greatly gained from this. Since now practically fifty years, technologies for computer assisted reading and analysis of text (CARAT) have penetrated the various humanities and social sciences disciplines (Hockey, 2001). One finds them in philosophy (McKinnon, 1968, 1973, 1979; Meunier, 1997; Lusignan, 1985, Floridi, 2002), in psychology and sociology (Barry, 1998; Alexa et Zuell, 1999a, 1999b; Glaser et Strauss, 1967; Jenny, 1997), in literature (Brunet, 1996; Kastberg Sjöblom and Brunet, 2000; Hockey, 2001; Fortier, 2002; Bradley and Rockwell, 1992; Sinclair, 2002; Rastier et al., 1995; Bernard, 1999), in textual semiotics (Ryan, 1999; Rastier, 2001), in political sciences (Fielding and Lee, 1991; Lebart and Salem, 1994; Mayaffre, 2000), in history (Greenstein, 2002), etc. From the encounter between these various disciplines and computer sciences and technologies has emerged an original research field called Computer Assisted Reading and Analysis of Text (CARAT) (Bernard, 1999; Hockey, 2001; Meunier, 1997; Popping, 2000). This research field is different from the artificial intelligence (AI) approach to discourse analysis (Hobbs, 1990; Marcu, 1999; Marcu et al., 2000) or automatic reading (Ram and Moore, 1999) where the objective is to have the computer simulate some type or other of “understanding” of a text in some specific application or process (inference, summary, knowledge extraction, question answering, e-mail routing, etc.). It is also different from information retrieval (Salton, 1989; Salton and McGill, 1983) or hypertext technologies (Rada, 1991) where the