A Domain-Specific Language for Do-It-Yourself Analytical Mashups Julian Eberius, Maik Thiele, and Wolfgang Lehner Technische Universit¨at Dresden Faculty of Computer Science, Database Technology Group 01062 Dresden, Germany {julian.eberius,maik.thiele,wolfgang.lehner}@tu-dresden.de Abstract. The increasing amount and variety of data available in the web leads to new possibilities in end-user focused data analysis. While the classic data base technologies for data integration and analysis (ETL and BI) are too complex for the needs of end users, newer technologies like web mashups are not optimal for data analysis. To make productive use of the data available on the web, end users need easy ways to find, join and visualize it. We propose a domain specific language (DSL) for querying a reposi- tory of heterogeneous web data. In contrast to query languages such as SQL, this DSL describes the visualization of the queried data in addi- tion to the selection, filtering and aggregation of the data. The resulting data mashup can be made interactive by leaving parts of the query vari- able. We also describe an abstraction layer above this DSL that uses a recommendation-driven natural language interface to reduce the diffi- culty of creating queries in this DSL. Keywords: data analytics, data mashups, natural language queries. 1 Introduction The increasing amount and variety of data available in the web leads to new possibilities in end-user focused data analysis. In the course of the Open Data trend, public agencies have started to make governmental data available using web services. In addition, there is a large amount of “crowdsourced” data from services such as Yelp (venue ratings) or Twitter (trending topics, sentiments). To make productive use of this data, two elements are needed: first, a way to integrate the heterogenous data into a common representation, second, a way to analyze the integrated data to make it usable. cities The well-known solutions to these two problems are data integration through ETL processes into data warehouses, and the usage of BI (business intelligence) tools for analytics. These tools could basically be applied to these new forms of data as well, but for end-user data analysis they have two disadvantages: First, they are designed for skilled users. Second, ETL processes are constructed for static sets of input sources and are not suitable for on-demand joining of web data sources. A. Harth and N. Koch (Eds.): ICWE 2011 Workshops, LNCS 7059, pp. 337–341, 2011. c Springer-Verlag Berlin Heidelberg 2011