1 To be presented at the ICCC/IFIP 5 th Conf. on Electronic Publishing ELPUB'01, Canterbury,UK, 5-7 July 2001 A framework for automatic combination of media contents by minimising information redundancy Case: Integrated publishing in multimedia networks Anneli Heimbürger (*) , Paula Silvonen and Caj Södergård VTT Information Technology P.O. Box 1204 FIN - 02044 VTT Finland (*) Email: anneli.heimburger@vtt.fi Abstract Information redundancy becomes a crucial problem in the Web when contents from different resources are automatically combined to produce a new WWW–publication. Information retrieval, natural language processing and the latest WWW–activities offer a challenging framework to approach the information redundancy problem of automatically combined news articles. It seems reasonable, that minimising information redundancy should be performed by a hybrid technique that combines some elements of these approaches. The purpose of this exploratory study is to introduce a theoretical and practical framework for clarifying the information redundancy problem in the case of integrated publishing. 1. Introduction For millions of Internet users, resource discovery from the World Wide Web would be a more fascinating and efficient experience if there were automatic or at least semiautomatic methods of detecting and filtering out information redundancy. The information redundancy problem becomes crucial in dynamic, distributed Web database applications in which heterogeneous information is automatically integrated from different data resources. One such application area is integrated publishing, where materials coming from several Web newspapers and TV news are automatically combined to form a new integrated Web– publication or a news service. If the Web-based news service provides a user with a set of related documents that might overlap to a certain extent, the user is probably interested in the union of these news articles where similarities are eliminated. In this paper, news is defined as information about recent national and international events of general interest currently reported by