Coolhunting for Trends on the Web Peter A. Gloor MIT Center for Collective Intelligence pgloor@mit.edu Invited Paper ABSTRACT This paper introduces a new way of measuring the popularity of brand names and famous people such as movie stars, politicians, and business executives. It is based upon the premise that in today’s Internet economy the Web displays a mirror of the real world. Our system uses TeCFlow, a social networking tool developed for the last four years at MIT, to measure popularity and influence of brands and stars by looking at their relative position on the Web. It is based on the simple insight: “You are who links to you”. It applies the Social Network Analysis (SNA) metric of “betweenness centrality” to the Web, looking at the linking structure of Web sites to find how Web pages discussing brands and stars are connected. It uses high-betweenness Web sites returned to a search engine query for a brand or star name as a proxy for the significance of this brand or star. KEYWORDS: Coolhunting, degree-of-separation search, TeCFlow, online metrics 1. INTRODUCTION The well-know saying “on the Internet nobody knows that you are a dog” alludes to the perception that the Internet offers anonymity to its users. In reality, the opposite is true. The Internet has become a major communication channel for late-breaking news and to disclose innermost secrets. For example, when CBS published documents about George W. Bush’s behavior during his military service, Republican bloggers quickly identified weak spots in the authenticity of the documents. This questionable evidence regarding George Bush’s potential evasion of military service during the Vietnam War era ultimately lead to the early retirement of CBS news anchor Dan Rather. This incident is just one of many illustrating that today’s news are made and disseminated on the Web and in the Blogosphere. This paper introduces a new Web mining approach which we call “Web Coolhunting” [27] making use of the fact that the Web has become a mirror of the real world, breaking latest news through active participation of millions of volunteers on Web sites such as Wikipedia, and political blogs such as dailykos and instapundit. Large-scale phone polling through surveys to track popularity of politicians has been used for a long time to gauge public opinion. Our approach offers an automated and much cheaper way than polling people over the phone to achieve similar goals by analyzing the linking structure of Web sites and blogs. Using the Web as a mirror of the real word permits to automatically measure and track the popularity and attributes of brands and stars. It offers an efficient way to trace fame of brands and stars in the real world. 2. RELATED WORK Popularized by Barabazi in his book “Linked” [2], there is a rich body of research on how the linking structure of the Web influences accessibility of Web pages [9, 13,15] and their ranking in search engines. Visualization of Web structure and contents has been an active area of research since the creation of the Web. There are numerous systems for the static visualization and analysis of the link structure of the Web [5,6]. Inxight [18], Visual Insight [21], Touchgraph [20], Grokster [17], and Mooter [19] are all systems for the visualization of the linking structure of the Web, sometimes also offering a visual front end for search results. In a related stream of work, researchers have been trying to predict the hidden linking structure based on known links [1,10]. Additionally, by looking at contents of Web sites, subspaces of the Web have been clustered by topics [4,11,12]. Combining these two lines of research, community Web sites have been mined to discover trends and trendsetters for viral marketing [14]. Our research focuses on a similar application – tracking the strengths of brands over time. For our analysis we are using the TeCFlow system [8] originally developed to mine e-mail networks to automatically generate dynamic social network movies. 3. DEGREE-OF-SEPARATION SEARCH Our Web datamining approach combines two ideas: measuring betweenness centrality of Web sites as defined in social network theory, and doing degree-of-separation search, explained in the subsequent paragraph.