A Characterization of Broadband User Behavior and Their E-Business Activities Humberto T. Marques Neto Leonardo C. D. Rocha Pedro H. C. Guerra Jussara M. Almeida Wagner Meira Jr. Virgilio A. F. Almeida {hmarques, lcrocha, pcalais, jussara, meira, virgilio}@dcc.ufmg.br 1 Abstract This paper presents a characterization of broadband user be- havior from an Internet Service Provider standpoint. Users are broken into two major categories: residential and Small- Office/Home-Office (SOHO). For each user category, the characterization is performed along four criteria: (i) session arrival process, (ii) session duration, (iii) number of bytes transferred within a session and (iv) user request patterns. Our results show that both residential and SOHO session inter-arrival times are exponentially distributed. Whereas res- idential session arrival rates remain relatively high during the day, SOHO session arrival rates vary much more significantly during the day. On the other hand, a typical SOHO user ses- sion is longer and transfers a larger volume of data. Fur- thermore, our analysis uncovers two main groups of session request patterns within each user category. The first group consists of user sessions that use traditional Internet services, such as e-mail, instant messenger and, mostly, www services. On the other hand, sessions from the second group, a smaller group, use typically peer-to-peer file sharing applications, re- main active for longer periods and transfer a large amount of data. Looking further into the e-business services most com- monly accessed, we found that subscription-based and adver- tising services account for the vast majority of user HTTP requests in both residential and SOHO workloads. Under- standing these user behavior patterns is important to the de- velopment of more efficient applications for broadband users. 1 Introduction Understanding the nature and characteristics of broadband user behavior is a crucial step to improve the quality of ser- vice offered to users in the next generation broadband envi- ronments. Broadband user behavior characterization can lead to a better understanding of the interaction between users and service providers. It can also help the design of systems with better QoS metrics, such as performance, availability, security and cost. Broadband penetration keeps growing fast for users and 1 Department of Computer Science, Federal University of Minas Gerais, Brazil - 31270-010 households. However, studies of broadband user behavior are scarce in the literature, mainly because of the difficulty in obtaining actual logs from Internet Service Providers (ISPs). Most of the service providers on the Internet consider logs as very sensitive data. Existing studies, such as the one done by Pew Internet & American Life [2], concentrate on qualitative analysis based on surveys. The Pew report shows how on- line Americans’ behavior changes with high speed connec- tions at home. The study also shows that broadband services allow users to distinguish themselves from dial-up counter- parts in the following ways: (i) broadband users engage in multiple Internet activities on a daily basis, (ii) high speed users become creators and managers of different types of on- line content and (iii) broadband users perform a large variety of queries for information. In spite of the Pew report, quanti- tative studies of broadband user behavior are still lacking. This paper intends to fill this gap. To understand the broad- band user behavior, we present a characterization from a broadband ISP (a TV cable company that provides broad- band services to its users), which classifies their users into two major categories: residential and Small-Office/Home- Office (SOHO). For each category, we identify user sessions, which are defined as the period that an user is connected to the broadband network. Basically, the behavior of users is de- fined as a function of the way users arrive at the ISP, how long they remain on-line, the number of bytes they transfer and what they do while connected, i.e., the request pattern within a session. Thus, the characterization process is performed along four criteria: (i) session arrival process, (ii) session du- ration, (iii) number of bytes transferred within a session and (iv) user request pattern. The broadband user behavior char- acterization is based on logs collected on an authentication server and by Netflow [1] running in a border router. The data collecting architecture implemented in the ISP allows us to identify the services used by each user category. In order to analyze the service request patterns, we use a state transition graph called Customer Behavior Model Graph (CBMG) [14], which describes the behavior of groups of cus- tomers who exhibit similar navigational patterns. We then applied clustering algorithms to user session data (both res- idential and SOHO) to determine groups of users that ex- hibit similar behavior graphs. Finally, we look further into the HTTP-based web services most frequently accessed by