Recommending APIs for mashup completion using association rules mined from real usage data Boris Tapia, Pablo Ortega, Romina Torres, Hern´ an Astudillo Departamento de Inform´ atica Universidad T´ ecnica Federico Santa Mar´ ıa Valpara´ ıso, Chile Email: {btapia, portega, romina, hernan}@inf.utfsm.cl Abstract—Mashups are becoming the de facto approach to build customer-oriented Web applications, by combining several Web APIs into a single lightweight, rich, customized Web front-end. To help mashup builders to choose among a plethora of available APIs to assemble in their mashups, some existing recommendation techniques rank candidate APIs using popularity (a social measure) or keyword-based measures (whether semantic or unverified tags). This article proposes to use information on co-usage of APIs in previous mashups to suggest likely candidate APIs, and introduces a global measure which improves on earlier local co-API measures. The gCAR (global Co-utilization API Ranking) is calculated using association rules inferred from historical API usage data. The MashupRECO tool combines gCAR and a keyword- based measure, to avoid the ”cold-start” problem for new or unused APIs. Evaluation of MashupRECO versus the keyword search of the well-known ProgrammableWeb catalog show that the tool reduces the search time for comparable degree of completeness. Keywords-Web mashup; Web API; recommender system; association rules; frequent itemsets; I. I NTRODUCTION Mashups are becoming the de facto approach to build customer-oriented Web applications, by combining opera- tions on several Web APIs (Application Programming In- terfaces) into a single lightweight, rich, customized Web front-end. They allow to develop complete applications by searching, composing and executing functionality provided by external sources. Typically, mashups are built with more than one API (according to ProgrammableWeb 1 45% of the registered mashups are built with more than one API and this percentage, considering only new mashups built at 2011, increases to YY%). Then, we expect the API discovery process to have a memory of the iterative items selection, in order to take advantage of the previous compositions made by the mashup developers. API Web catalogs provide, besides APIs’ documentation, information about in which mashups (inside the catalog) they have been used. In our previous work [1], we argued the need to combine descriptions with social information, where description-based techniques can be leveraged by 1 http://www.programmableweb.com social indicators. This combination allows the discovery of candidates that would have passed unnoticed because of their poor quality descriptions or their low popularity. We discussed in [2] how our combined approach enriches keyword-based search approaches with the social informa- tion provided by the mashup community. We also argued why this balanced approach reduced the cold start problem new APIs experience when they start to compete in a market where a preferential attachment trend is exhibited (if only the social information would be used). We basically proposed two indicators: the Web API Rank (WAR), which measures API utilization over time, and the Co-utilization API Rank (CAR), which measures its co-utilization with other APIs. In this work, we calculate the CAR indicator using symbolic methods for knowledge discovery in databases, specifically extracting association rules which gives us infor- mation about which other APIs to use when we are building a mashup (instead of just using rough statistics). The remainder of the paper is organized as follows. Section II discusses the motivation of our research. Section III discusses related work. Section IV identify the specific problem addressed in this paper. Section V presents the background needed to introduce the approach. Section VI and section VII present the complete approach to obtain candidate components to build a mashup in an iterative fashion. Section VIII presents the implementation details of the MashupRECO tool. Section IX shows the experiments performed to test the effectiveness and efficiency of our approach, and finally, section X draws conclusions. II. MOTIVATION In this section we describe a case in which a workshop’s organizer is searching APIs to create a mashup for the venue section of the workshop’s website. The first step is to identify the specific functionalities needed (see Figure 1). Typically in a venue’s section we can find information regarding the celebration’s place, nearby hotels, the city and entertainment, photos of the city attractions and, in some cases, videos. Then, the organizer needs APIs capable to display a map of the place, locate points of interest, hotels, photos, if there are any available, and videos. The aim is to keep assistants