Applying User Profiles in Transient Peer-to-Peer Environment Bertalan Forstner, Imre Kelényi and Hassan Charaf Department of Automation and Applied Informatics Budapest University of Technology and Economics Budapest, Hungary {bertalan.forstner, imre.kelenyi, hassan.charaf}@aut.bme.hu Abstract—Semantic data is widely used in order to increase the performance of Peer-to-Peer information retrieval networks. The most efficient approaches construct user profiles in order to describe the fields of interest of the users and to shape a semantic overlay network. However, the characteristics of mobile devices and the behavior of wireless Peer-to-Peer users require the consideration of the applied algorithms and protocols when applying them in such environment. In this paper we describe our experiments with a special mobile Gnutella client that collected information on the mobile user behavior, together with the other distinctiveness of the mobile environment. We also propose an appropriate utilization of user profiles in transient Peer-to-Peer systems. Keywords- information retrieval; mobile devices; peer-to-peer I. INTRODUCTION The emerging demand for mobile file sharing solutions can be observed by the popularity of the available applications. Mobile market surveys also report the need for such software which can help in sharing content made by the smartphone users, such as images, videos, audio records or other notes [1]. An ideal approach could be the Peer-to-Peer (P2P) technology, which is widely used in desktop environment for such purposes. The last few years we collected first hand experience with a Symbian-based mobile Gnutella client, the Symella application [15], in order to learn usage patterns and other characteristics of the application of this technology in the mobile environment. Early P2P protocols suffer from scalability issues: with the growth of the number of nodes also the amount of required network traffic (or other resources) increases notably to reach reasonable hit rate. The efforts dealing with this issue can be classified between two significantly different approaches: they can be structured or unstructured. The structured P2P protocols (for example [2][3][4][5]) specify strict rules for the location of documents to be stored, or define which other peers a node can connect to. Although these networks have usually good scalability properties, and their performance can be estimated quite accurately, they are becoming disadvantageous in networks with strong transient character: they can handle the frequent changes in the network population with difficulties and with great resource expenses. The second approach examines unstructured networks such as the basic Gnutella protocol [6]. In that case there is no rule for the location of the documents to store, and the connections of the nodes are controlled by a few simple rules. For that reason, these systems have limited protocol overhead and can tolerate when nodes frequently enter to and leave from the network. Recently some systems were developed to improve the search performance of P2P networks; some of them try to achieve this in cooperative manner, with semantic overlay networks. These are built on the fact that the fields of interest belonging to the users can be determined, and nodes with greater expectable recall value can be found. The first group of these algorithms tries to achieve better hit rate based on run- time statistics [7][8][9]. The second and more efficient group of the content-aware Peer-to-Peer algorithms uses metadata provided for the documents in the system. These metadata can be keywords describing a document or any other method to classify different kinds of information. A simple example for the metadata of music files can be the ID3 tags attached to the mp3 music files, describing the author, title, performer, album, genre or other information of the music. This information can be used to deduct the fields of interest of the user, and then find users that share similar interests. Direct connections to such nodes can increase the hit rate in a P2P network and decrease the number of messages necessary to find that information. Although some of the approaches give quite good results in desktop environment, their performance and usability decreases when deployed to mobile devices. The reason is in the special characteristics of mobile Peer-to-Peer systems. The rest of this paper is organized as follows. After this introductory part a summary of the characteristics of the mobile networks follows. In Section III we write about our user modeling technique. In the last two sections we evaluate our results, conclude our work and raise open questions. II. CHARACTERISTICS OF THE MOBILE ENVIRONMENT We developed a special version of the mobile Gnutella client for the users volunteered to provide us usage statistics and semantic information collected by the application. We also examined the properties of the available handsets in the market. In this section, we will conclude the results of our experiment. The modified crawler client used a taxonomy that classified the different music styles according to their ID3 tag. It logged This full text paper was peer reviewed at the direction of IEEE Communications Society subject matter experts for publication in the ICC 2008 workshop proceedings. 978-1-4244-2052-0/08/$25.00 ©2008 IEEE