An Approach to Comprehending and Detecting Networked Applications through Analogy Maxim Shevertalov, Edward Stehle, Chris Rorres, Spiros Mancoridis, and Moshe Kam Department of Computer Science Drexel University Philadelphia, PA 19104 {ms333,eds23,crorres,spiros}@cs.drexel.edu, and kam@minerva.ece.drexel.edu Abstract Distributed applications rely on packet-switched net- works to connect their various elements. This paper de- scribes a technique that can help software engineers and network administrators characterize and detect unfamiliar networked applications by matching them to a single, or a combination of several, analogous and familiar networked application(s). This matching is based on the size distri- bution of the packets sent and received by the application undergoing scrutiny. It can be used by network administra- tors to gain a greater understanding of the network they are administering. 1. Introduction Distributed software systems generally rely on packet- switched networks to connect their various elements. For example, web browsing and e-mail systems transfer text, images, audio, and video content as well as control and pro- tocol establishing data by sending packets across the In- ternet and edge networks [1] [2] [3] [4]. Similarly, net- work conferencing software transfers varying combinations of video, voice, and text between clients using packet- switched networks. This work presents a technique to as- sist in the identification and comprehension of unfamiliar networked applications by drawing analogies to previously studied (i.e., familiar) networked applications. After observing the packets sent and received by an un- familiar networked application, our technique generates a composition of familiar applications that have analogous network traffic patterns to the application under scrutiny. For instance, our technique may characterize an unfamil- iar video conferencing application as having a 40% resem- blance to a specific familiar streaming video application and 60% of a resemblance to a familiar web browser. This knowledge will allow system administrators to develop a better understanding of exactly the type of services their networks provide. It will allow them to further configure the network to help these services and improve the over all user experience. Our technique analyzes networked software systems by observing the size distribution of the packets sent and re- ceived. The aim is to discover which networked applica- tions, or which combination of networked applications, are most analogous to the software undergoing scrutiny. No a priori knowledge of the application is required to use our technique. To date, we have analyzed five networked appli- cations whose features cover a broad spectrum of the capa- bilities that are commonly associated with networked appli- cations. We will refer to this set of analyzed applications as the canonical set of applications. To date, this set includes the following: 1. Streaming audio: GnomeMeeting (G711 codec) 2. Streaming video: GnomeMeeting 3. Text messaging: Jabber 4. Web browsing/File transfer: Firefox and gFTP 5. E-mail: Unix Mail We suspect that a significant fraction of existing net- worked applications can be represented as some combina- tion of the five applications in the canonical set. For ex- ample, a software system such as Internet Explorer may be used to stream video, browse the Internet, and check e- mail simultaneously and so can be considered analogous to a combination of the second, fourth, and fifth applications in the canonical set. The rest of this paper is organized as follows: Section 2 discusses previous work in the areas of packet-size distri- butions and the comprehension of distributed systems; Sec- tion 3 presents our software characterization technique and the results of simulated composed network flows; Section 4 describes a case study where applications are analyzed and our discoveries and conclusions about the applicability 1