Recent Progress Towards an Ecosystem of Structured Data on the Web Nitin Gupta, Alon Y. Halevy, Boulos Harb, Heidi Lam, Hongrae Lee, Jayant Madhavan, Fei Wu, Cong Yu Google Research U.S.A. Abstract—Google Fusion Tables aims to support an ecosystem of structured data on the Web by providing a tool for managing and visualizing data on the one hand, and for searching and exploring for data on the other. This paper describes a few recent developments in our efforts to further the ecosystem. I. I NTRODUCTION The combination of a rich repository of structured data on the Web coupled with new tools for data management and visualization are leading us to exciting times in which structured data is having a profound impact on many aspects of our lives. In many countries, citizens take for granted the fact that governments, local authorities, and non-government organizations should make a variety of data sets available to the public. These data sets span a variety of topics such as economic indicators, crime statistics, educational data, government spending and campaign contributions. The new generation of tools for managing and visualizing data have empowered data activists, led by journalists, who are turning this data into visualizations and stories that are spread by social networks and seen by millions of people [1]. These visualizations, stories and public attention, in turn, lead to new questions and hence a demand for additional data. The success of this trend is still dependent on improving our solutions to several long-standing data management problems. First, we need to continue developing tools that enable a broader set of users to manage data and create compelling visualizations. Second, we need methods for identifying high- quality data from the Web and other corpora. Third, we should be able to recover the semantics of these data sets sufficiently well so they can be displayed for relevant user queries and combined with other data sets to provide additional meaningful insights. All put together, we need to create an ecosystem of tools and data that enable us to discover good data, create useful artifacts from it, and contribute it back to the Web. The Google Fusion Tables project has been addressing some of these issues. At the core, we offer a cloud-based tool for querying, sharing, visualizing, integrating and publishing data. We complement the tool with a search engine that enables users to find high-quality tables from a corpus of over 130 million tables on the Web. This paper highlights some of the new functionalities we recently added to our service. II. DATA MANAGEMENT Google Fusion Tables is a cloud-based service for data management and visualization. With Fusion Tables, it is easy to upload data, share the data with collaborators or make it public, and to pose simple queries. Fusion Tables also emphasizes the ability to create visualizations of the data that can be easily embedded in other Web sites. Because of its ease of use, it has been adopted by data enthusiasts, namely individuals and organizations that have valuable data they want to share or visualize but do not have deep technical expertise. For example, journalists use Fusion Tables very frequently to include data in their articles. In addition, Fusion Tables has been used in disaster response situations where valuable data has been made available to people in an area of need; to cover the election results in several countries around the world and for novel crowd-sourcing applications. A few examples of Fusion Tables applications is illustrated in [2]. Initially, Fusion Tables invested heavily in creating map visualizations. The reason for this investment was to respond to specific requests of our users and because maps are the most common visualization and applies to a wide range of domains. However, geographical information systems are an entire field onto themselves, and rather than investing wholly in maps, we decided to diversify into other visualizations. Hence, in the recent months we have launched visualizations such as the zoomable time-line (see Figure 1) and the network graph (see Figure 2). Importantly, the architecture and optimizations we created for map visualizations (e.g., [3]) informed us on how to support other visualizations. The main challenge we had to address is providing an efficient and smooth visualization experience in a cloud-based environment, where users expect immediate responses as they zoom into a visualization or pan across it. The specific challenge is that while the underly- ing data sets may be large, only a small amount of data can be transmitted to the client to satisfy the performance requirements and not overload the client. Hence, we built a hierarchical index that guarantees that with every operation we transmit only a bounded number of rows and these can be determined very efficiently from the index. In many cases, our users were not familiar with the data sets they were exploring. Instead of looking for a specific fact, they might be looking for patterns in the data, either ones that apply 978-1-4673-4910-9/13/$31.00 2013 IEEE ICDE Conference 2013 5