A tool for producing structured interoperable data from product features on the web Tuğba Özacar n Department of Computer Engineering, Celal Bayar University, Muradiye, 45140 Manisa, Turkey article info Article history: Received 11 March 2015 Accepted 7 September 2015 Recommended by: F. Carino Jr. Available online 25 September 2015 Keywords: Information extraction GoodRelations Protégé Web scraping Ontology Rich snippets abstract This paper introduces a tool that produces structured interoperable data from product features, i.e., attribute name–value pairs, on the web. The tool extracts the product fea- tures using a web site-speciﬁc template created by the user. The value of the extracted data is maximized by using GoodRelations, which is the standard vocabulary for modeling product types and their features. The ﬁnal output of the tool is GoodRelations snippets, which contain product features encoded in RDFa or Microdata. These snippets can be embedded into existing static and dynamic web pages in a way accessible to major search engines like Google and Yahoo, mobile applications, and browser extensions. This increases the visibility of your products and services in the latest generation of search engines, recommender systems, and other novel applications. & 2015 Elsevier Ltd. All rights reserved. 1. Introduction The web contains a huge number of online shops which provide excellent resources for product information. Besides, the data of e-commerce is growing at a rapid speed [1]. Information in e-commerce includes technical speciﬁcations and descriptions of products. If we present this information in a structured way, it will signiﬁcantly improve the effectiveness of many applications [2]. The vast majority of web content consists of different kinds of textual documents, which are provided in a number of different formats and vary from plain text to semi-structured documents containing data records. This makes different methods of bringing structure and semantics to the web (including web information extraction) an active research ﬁeld [3]. Although the web has a dynamic nature, Etzioni has argued for that “information on the web is sufﬁciently structured to facilitate effective web mining” [4]. Since a big portion of web content subject to web information extraction is created from data repositories, a web information extraction system rediscovers the structure that was encoded in a web page. This paper introduces a tool 1 that produces structured interoperable data from product features, i.e., attribute name– value pairs, on the web. It extends the previous work of the author [5] in two ways. First it supports tree nodes that deﬁne text operations (e.g. concatenate, contains, fragment, lower, upper, replace, substring, and trim) on tree nodes. Second it presents a user-based evaluation accomplished using 15 different “real world” scenarios. Designed as a plug-in for the open source ontology editor Protégé [6], the proposed tool exploits the advantages of the ontology as a formal model for the domain knowledge. Contents lists available at ScienceDirect journal homepage: www.elsevier.com/locate/infosys Information Systems http://dx.doi.org/10.1016/j.is.2015.09.002 0306-4379/& 2015 Elsevier Ltd. All rights reserved. n Tel.: þ90 236 2012103; fax: þ90 236 2412143. E-mail address: tugba.ozacar@cbu.edu.tr 1 Download link: https://github.com/tugbaozacar/iris Information Systems 56 (2016) 36–54