Using SVM and Concept Analysis to support Web Service Classification and Annotation Marcello Bruno, Gerardo Canfora RCOST - Research Centre on Software Technology University of Sannio, Department of Engineering Palazzo ex Poste, Via Traiano 82100 Benevento, Italy marcello.bruno@unisannio.it, canfora@unisannio.it Massimiliano Di Penta, Rita Scognamiglio RCOST - Research Centre on Software Technology University of Sannio, Department of Engineering Palazzo ex Poste, Via Traiano 82100 Benevento, Italy dipenta@unisannio.it, ritasco@unisannio.it ABSTRACT The need for supporting the classification and semantic annotation of services constitutes an important challenge for service–centric software engineering. Late–binding and, in general, service match- ing approaches, require services to be semantically annotated. Such a semantic annotation may require, in turn, to be made in agreement to a specific ontology. Also, a service description needs to properly relate with other similar services. This paper proposes an approach to i) automatically classify ser- vices to specific domains and ii) identify key concepts inside ser- vice textual documentation, and build a lattice of relationships be- tween service annotations. Support Vector Machines and Formal Concept Analysis have been used to perform the two tasks. Results obtained classifying a set of web services show that the approach can provide useful insights in both service publication and service retrieval phases. 1. INTRODUCTION One of the most relevant advantages of service–centric software engineering is the possibility a developer has to build his/her own system as a composition of one or more abstract services, i.e., se- mantic descriptions that can be matched at run–time with the de- scription of one or more concrete services. The subsumption re- lationship between an abstract service and the concrete services is completed by means of matching algorithms integrated in the ser- vice broker [21]. The choice of the actual concrete service to bind to an abstract service can also consider concrete services’ Quality of Service (QoS) attributes [33]. The above described scenario requires that each service must have a semantic description, according to a specific ontology 1 . Service semantic annotation is, however, a difficult task that, given the ac- tual state–of–the–art, is often too expensive to be done in practice. Also the building and maintenance of ontologies requires expertise and budgets not always available. Unfortunately, very often the only source of information available is a pure–textual description of the service, sometimes extracted from source code comments. During service publication, it would be therefore useful to exploit this form of textual information to: • permit an automatic classification of services to be published according to the broker’s service ontological classification. Even when this activity can easily be made manually by the service publisher, the automatic classification can provide a feedback to indicate whether or not the service textual de- scription is meaningful with respect to the class the service belongs to; • support the building and maintenance of domain–specific on- tologies. When a set of new services is going to be published, the related domain–specific ontology needs to be built, if it does not yet exist. When such an ontology is already avail- able, the publication of a new service could add new con- cepts, and therefore trigger the need for updating the ontol- ogy; • aid the semantic annotation of a service with respect to the ontology. By detecting concepts inside the service textual documentation, it would be possible to see how the service concepts can be identified in the ontology, and how the ser- vice can be cataloged with respect to other existing services. For example, it should be able to see if, according to its textual description, a service appears more specific, more generic, or maybe alternative to existing ones. Again, if the service publisher realizes that the extracted concepts, or the classification of the service with respect to others is meaning- less, ambiguous or inconsistent, then the service description needs to be corrected in some way. 1 There is work investigating the possibility of matching between services described with different ontologies. This aspect, however, is out of scope for this paper and will not be further considered. 1