Intermediary-based transcoding framework by S. C. Ihde P. P. Maglio J. Meyer R. Barrett With the rapid increase in the amount of content on the World Wide Web, it is now becoming clear that information cannot always be stored in a form that anticipates all of its possible uses. One solution to this problem is to create transcoding intermediaries that convert data, on demand, from one form into another. Up to now, these transcoders have usually been stand-alone components, converting one particular data format to another particular data format. A more flexible approach is to create modular transcoding units that can be composed as needed. In this paper, we describe the benefits of an intermediary-based transcoding approach and present a formal framework for document transcoding that is meant to simplify the problem of composing transcoding operations. T oday, content providers on the World Wide Web (WWW) are under constant pressure to make in- formation available in a variety of formats and for a variety of purposes. For example, the Yahoo!** catalog server provides information formatted in Hy- perText Markup Language (HTML) for standard Web browsers, and also provides some of this informa- tion formatted for handheld devices such as Palm Pilots** and wireless phones. In this case, content is formatted differently for displays that have differ- ent capabilities, and is also delivered differently for devices that have different connectivity. Concern for network bandwidth limitations in particular has spurred many projects aimed at minimizing the amount of data transmitted for Web transactions. For instance, Fox and colleagues 1–3 developed a proxy-based architecture for distilling or transform- ing content so that thin devices receive only the data they can handle (e.g., devices with monochrome dis- plays do not receive color images), thus minimizing the network bandwidth needed to transmit informa- tion. Moreover, as more and more companies explore the WWW as a place to do business, large amounts of in- formation of various types and in a variety of for- mats will be made available on the Web. This leads to the problem of converting data to enable appli- cations to handle data that might come from a va- riety of sources. In this context, bandwidth limita- tions or client resources (e.g., CPU power and disk space) are not a major concern. The main questions here are: what form is the information in; what form does it need to be in? The ability to convert content from one form to another lets systems that use dif- ferent languages and conventions communicate and interoperate. The Extensible Markup Language 4 (XML) is particularly suited to the needs of businesses to convert data from one form to another, as it pro- vides means for specifying semantic structure. The process of converting, distilling, or transform- ing content is often referred to as transcoding 1,5 (see Figure 1). In particular, this term is also used when referring to algorithms for transforming certain data, such as images and movies, from one format into an- other. For example, video transcoding is the process of converting between different compression formats, or of further reducing the bit rate of a previously Copyright 2001 by International Business Machines Corpora- tion. Copying in printed form for private use is permitted with- out payment of royalty provided that (1) each reproduction is done without alteration and (2) the Journal reference and IBM copy- right notice are included on the first page. The title and abstract, but no other portions, of this paper may be copied or distributed royalty free without further permission by computer-based and other information-service systems. Permission to republish any other portion of this paper must be obtained from the Editor. IBM SYSTEMS JOURNAL, VOL 40, NO 1, 2001 0018-8670/01/$5.00 © 2001 IBM IHDE ET AL. 179