Protection of Privacy on the Web Thomas M. Chen Dept. of Electrical Engineering Southern Methodist University PO Box 750338, Dallas, Texas 75275 Tel: +1 214-768-8541 Email: tchen@engr.smu.edu Zhi (Judy) Fu Network and Infrastructure Research Lab Motorola Labs 1301 E Algonquin Rd., Schaumburg, IL 60196 Tel: +1 847-576-6656 Email: judy.fu@motorola.com Abstract Most people are concerned about online privacy but may not be aware of the various ways that personal information about them is collected during routine Web browsing. We review the types of personal information that may be collected voluntarily or involuntarily through the Web browser or disclosed by a Web server. We present a taxonomy of regulatory and technological approaches to protect privacy. All approaches to date have only been partial solutions. By its nature, the Web was designed to be an open system to facilitate data sharing, and hence Web privacy continues to be a challenging problem. Introduction The main appeal of the World Wide Web is convenient and instant access to a wealth of information and services. Many people will start research on a topic with a Google search. The number of Web sites has grown exponentially and reached more than 149 million in November 2007 according to Netcraft (http://news.netcraft.com/archives/web_server_survey.html). In their search for services, users may not keep in mind that the Web is capable of collecting data as well as displaying data. The most obvious means of data collection are Web forms for registrations, logins, and messaging. These forms are voluntary disclosures of personal information that most people understand to be necessary for shopping, online banking, and other personalized services. However, users may not fully appreciate that Web sites collect information about them routinely without their consent or even notification. Web sites keep track of clients’ IP (Internet protocol) addresses at a minimum and often additional information such as browser version, operating system, viewed resources, and clicked links. Moreover, this collected information may be shared among organizations in the background without the public’s knowledge. 1