Published in the proceedings of the International Joint Conference of Artificial Intelligence, IJCAI 99, Stockolm, Sweden, Abstract Successfully managing information means being able to find relevant new information and to cor- rectly integrate it with pre-existing knowledge. Much information is nowadays stored as multi- lingual textual data; therefore advanced classifi- cation systems are currently considered as stra- tegic components for effective knowledge man- agement. We describe an experience integrating different innovative AI technologies such as hi- erarchical pattern matching and information ex- traction to provide flexible multilingual classifi- cation adaptable to user needs. Pattern matching produces fairly accurate and fast categorisation over a large number of classes, while informa- tion extraction provides fine-grained classifica- tion for a reduced number of classes. The re- sulting system was adopted by the main Italian financial news agency providing a pay-to-view service. 1 Introduction Knowledge is nowadays the key source for competitive advantage. The success or failure of a company can de- pend on the ability to find the right information at the right time. The www explosion (and the increasing usage of Internet technologies as a core channel for communi- cation) multiplies the sources of information and in- creases by orders of magnitude the amount of informa- tion available. However, while raising the opportunities for gaining competitive advantages, this also increases the information glut. The main value is not in the infor- mation itself, but in the capability of managing it suc- cessfully to derive knowledge that is critical to an organi- sation’s objectives. Successfully managing information means being able to correctly integrate it with existing 1 This work was partially funded by the European Union in the framework of the Language Engineering Sector, project FACILE (LE 2440). ÖFAI is supported by the Austrian BMWV. structured information, to facilitate communication and knowledge sharing and to support knowledge-based or- ganisations. The role of natural language processing and artificial intelligence is fundamental in this respect as: i) the vast majority of this information is textual and available in different languages; ii) the development of new tools for structuring textual data starting from its content repre- sents one of the fundamental steps in successfully man- aging information. This is particularly evident in the business arena, where on-line textual information from news providers has long since been available and heavily used. Recent reports from Gartner Group [Bair, 1998] explicitly mention ad- vanced classification systems characterised by semantic technologies as the most strategically relevant element to support effective knowledge management. However, technology available on the market still mainly resorts to information retrieval-derived systems and techniques (IR). This technology does not provide adequate accuracy when coping with rich and complex classification struc- tures. This is because IR systems do not take into account linguistic features. More linguistically oriented text clas- sification can be achieved by the use of pattern matching (PM). PM can produce a fairly accurate and fast categori- sation over a large number of classes [Hayes and Wein- stein, 1991; Jacobs and Rau, 1990]. Resource develop- ment does not require linguistic expertise and can be done by trained users. But PM is still weak on the analy- sis of linguistic structures and cannot be used for fine- grained categorisation. Information extraction (IE) techniques can be also used for text classification (e.g., the Text filtering subtask of the ST task in [MUC6]). IE systems perform very well in detecting texts relevant for a single class (e.g., manage- ment succession) with results ranging between 80-95 for both precision and recall. But IE cannot be performed on a large number of classes: it is an expensive technology as it requires a large amount of time of linguistically aware personnel [Grishman, 1997]. Attempts at separat- ing linguistic knowledge (e.g., syntactic knowledge) from domain dependent knowledge (e.g., the domain patterns) [Grishman, 1997; Hobbs et al., 1997] simplified the task, FACILE: Classifying Texts Integrating Pattern Matching and Information Extraction 1 Fabio Ciravegna Alberto Lavelli Nadia Mana ITC-irst Loc. Pantè di Povo 38050 Trento Italy Johannes Matiasek ÖFAI Schottengasse 3 1010 Vienna Austria Luca Gilardoni Silvia Mazza Massimo Ferraro Quinary SpA Via Fara 35 20124 Milan Italy William J. Black Fabio Rinaldi David Mowatt UMIST - PO Box 88 Manchester M60 1QD United Kingdom