360 IEEE TRANSACTIONS ON FUZZY SYSTEMS, VOL. 10, NO. 3, JUNE 2002 A Fuzzy MHT Algorithm Applied to Text-Based Information Tracking Santiago Aja-Fernández, Carlos Alberola-López, Member, IEEE, and George V. Cybenko Abstract—In this paper, we carry out a detailed analysis of a fuzzy version of Reid’s classical multiple hypothesis tracking (MHT) algorithm. Our fuzzy version is based on well-known fuzzy feedback systems, but the fact that the system we describe is specialized for likelihood discrimination makes this study par- ticularly novel. We discuss several techniques for rule activation. One of them, namely, the sum–product, seems particularly useful for likelihood management and its linearity makes it tractable for further analysis. Our analysis is performed in two stages. First, we demonstrate that, with appropriately chosen rules, our system can discriminate the correct hypothesis. Second, the steady-state behavior with constant input is characterized analytically. This enables us to establish the optimality of the sum–product method and it also gives a simple procedure to predict the system’s behavior as a function of the rule base. We believe this fact can be used to devise a simple procedure for fine-tuning the rule base according to the system designer needs. The application driving our fuzzy MHT implementation and analysis is the tracking of natural language text-based messages. That application is used as an example throughout the paper. Index Terms—Fuzzy feedback system, hypotheses discrimina- tion, information tracking, multiple hypothesis tracking (MHT) al- gorithm, natural language processing. I. INTRODUCTION N ATURAL language messages are present in many infor- mation processing and analysis applications. However, to-date most systems for natural language processing have been used for database querying or machine translation. New and more powerful text processing techniques need to be developed and analyzed to handle other important applications that require correlation of text-based messages such as intelligence analysis, computer security incidents databases, and customer service reporting. These applications have several common attributes: they involve tracking possibly ambiguous reports generated by Manuscript received April 6, 2001; revised October 6, 2001 and December 5, 2001. The work of S. Aja-Fernández and C. Alberola-López was supported in part by the Comisión Interministerial de Ciencia y Tecnología under Research Grants TIC97-0772 and 1FD97-0881 and by Junta de Castilla y León under Research Grants VA78/99 and VA91/01. The work of G. V. Cybenko was sup- ported by the National Science Foundation under Grant 9813744 and DARPA Grant F30602-98-2-0107. S. Aja-Fernández is with the Department of Teoría de la Señal y Comuni- caciones e Ingenier a Telemática, University of Valladolid, 47011 Valladolid, Spain. C. Alberola-López is with the Department of Teoría de la Señal y Comuni- caciones e Ingeniería Telemática of the University of Valladolid, Spain, ETSI Telecomunicación, Campus Miguel Delibes, 47011 Valladolid, Spain (e-mail: caralb@tel.uva.es). G. V. Cybenko is with the Thayer School of Engineering, Dartmouth College, Hanover, NH 03755 USA. Publisher Item Identifier S 1063-6706(02)04829-4. different observers over time (in this context tracking means finding which messages deal with the same pieces of informa- tion and, therefore, they should be correlated somehow over time). Each such application also tends to be narrow in scope so a few important keywords should be carefully searched for and processed. These applications areas are all in need of more advanced automatic analysis techniques given the increasing amount of networked text-based information available to them. TEXTTRACK, described in [1], is a software system whose goals are to apply advanced signal processing tracking con- cepts to natural language processing. TEXTTRACK addresses the problems of correlating and tracking observations of multiple moving vehicles reported by natural language messages that are generated by multiple observers asynchronously over time. The system has demonstrated that such problems can be tackled using relatively mature concepts from radar signal processing, namely the multiple hypothesis tracking (MHT) algorithm [20]. The prototype accepts simple natural language messages about vehicle types and locations, correlates the messages and asso- ciates groups of messages into the most likely tracks based on a succession of positions. The correlation procedure is solved in two steps: first, an appropriately modified, but still classical, Bayesian framework is used to handle the ambiguity in natural language descriptions. A formal theorem shows that under very mild conditions, the correct solution is eventually achieved. The second step uses a fuzzy inference engine (FIE), specifically, a fuzzy version of the classical Bayesian Reid’s multiple hypoth- esis tracking algorithm. Since the purpose is to model natural language ambiguity, linguistic variables (i.e., computing with words in Zadeh’s terminology [24]) are a natural choice for this purpose. However, [1] does not include a rigorous analytical study of the TEXTTRACK system. That work presented an intu- itive argument for the system’s effectiveness and was illustrated with several working examples. In this paper, we give the fuzzy MHT algorithm originally developed in [1] a solid theoretical foundation by analytically characterizing the FIE on which the algorithm is based. Due to the fact that its mathematical characterization is application-in- dependent, a natural byproduct of this paper is the broadening of the range of possible applications of the text-based MHT philos- ophy. That is, not only is it possible to track mobile man-made objects, but we will see it is possible to handle information about any time-varying phenomenon, as long as the phenomenon can be described by means of a few keywords, and the phenomenon itself is statistically causal in the sense that the distribution of future states is statistically dependent on past observed states. The principal ingredient of the FIE arising in the MHT algorithm is a variant of well-known fuzzy feedback systems 1063-6706/02$17.00 © 2002 IEEE