Data Mining Learning Bootstrap Through Semantic Thumbnail Analysis Sebastiano Battiato, Giovanni Maria Farinella , Giovanni Giuffrida, Giuseppe Tribulato Dipartimento di Matematica ed Informatica – Università di Catania Viale Andrea Doria, 6 – 95125, Catania Email: {battiato, gfarinella, ggiuffrida, tribulato}@dmi.unict.it ABSTRACT The rapid increase of technological innovations in the mobile phone industry induces the research community to develop new and advanced systems to optimize services offered by mobile phones operators (telcos) to maximize their effectiveness and improve their business. Data mining algorithms can run over data produced by mobile phones usage (e.g. image, video, text and logs files) to discover user’s preferences and predict the most likely (to be purchased) offer for each individual customer. One of the main challenges is the reduction of the learning time and cost of these automatic tasks. In this paper we discuss an experiment where a commercial offer is composed by a small picture augmented with a short text describing the offer itself. Each customer’s purchase is properly logged with all relevant information. Upon arrival of new items we need to learn who the best customers (prospects) for each item are, that is, the ones most likely to be interested in purchasing that specific item. Such learning activity is time consuming and, in our specific case, is not applicable given the large number of new items arriving every day. Basically, given the current customer base we are not able to learn on all new items. Thus, we need somehow to select among those new items to identify the best candidates. We do so by using a joint analysis between visual features and text to estimate how good each new item could be, that is, whether or not is worth to learn on it. Preliminary results show the effectiveness of the proposed approach to improve classical data mining techniques. Keywords: Semantic Image Analysis, Data Mining, Content Based Image Retrieval, Learning Bootstrap 1. INTRODUCTION As mobile phone usage becomes ubiquitous the number of business opportunities for telcos and related operators is growing at a very fast pace. In some countries the mobile phone market is saturated and fierce competition among operators is now based on convincing customers to switch in from competitors. This is mostly achieved by offering more attractive rate planes that represent the main customer’s decision factor for churning. However, as telco revenues from phone calls reduce, the need for making money from value added services (VAS) becomes imperative. As a matter of fact, mobile data services and applications are fastly increasing in number and complexity [13,14]. Tools such as e-mailing and internet browsing are now available on many devices. M-commerce the mobile phone counterpart of Internet e-commerce is also becoming widespread and is proven to be able to produce revenue streams [1]. In some countries, even TV is now available on portable devices. The future of communication is definitively based on: wirelessly, portability and, networking. Mobile phones, better than portable PCs, are able to implement all that. These additional services need to be optimized in order to maximize their effectiveness. Many variables are available from mobile phone usage logs which are well suited for sophisticated data intelligence analysis such as data and text mining. In particular, for each customer, accurate logs describing his/her device interaction are available. Different data mining models can be exploited in this domain and, recently, telcos are becoming very interested in such techniques as they realize the great benefits they can produce. Academic research in this domain is still in its infancy, many interesting research avenues will be soon opening to researchers. A fairly well established technique is the advertising through SMS and MMS of payable contents such as ring tones, info services (news, weather forecast, sports, etc.), wallpapers, music, videos, and others. The message includes one or more of such offers, in order to try to stimulate the customer