195 Int’l Conf. Artificial Intelligence | ICAI’10 | Identifying Suspicious Bidders Utilizing Hierarchical Clustering and Decision Trees * Benjamin J. Ford, Haiping Xu, and Iren Valova Computer and Information Science Department University of Massachusetts Dartmouth, North Dartmouth, MA 02747, USA E-mail: {u_bford, hxu, ivalova}@umassd.edu Abstract - Identifying bidders with suspicious bidding activities related to possible online auction fraud is a difficult task due to a large number of users participating in online auctions. In order to reduce the number of users to be investigated, we examine observable features of a bidder’s behavior, and utilize a hierarchical clustering technique to divide a collection of bidders into normal and deviant groups. Based on the clustering results, we generate a decision tree that can be used to efficiently characterize new bidders as normal, suspicious, or highly suspicious. To illustrate the effectiveness of our proposed approach, we collected real auction datasets from online auctions, and used 3-fold validation approach to show that the error rates of the generated decision trees are reasonably low. Keywords: Online auctions, suspicious bidder, shilling behavior, hierarchical clustering, decision tree. 1 Introduction Shill bidding is a type of auction fraud, which refers to the practice of sellers using a faked bidder account or asking another bidder to place bids on their auctions for the purpose of raising the final price [1]. Sellers typically do this through accomplices or by creating fake bidder accounts – an easy task in an anonymous environment such as the Internet. Shill bidding is unique in that it is very difficult to detect. Unlike blatantly obvious forms of other auction fraud, such as non-delivery fraud, shill bidding typically goes undetected by those victimized, especially those who do not know how to recognize the signs of shill bidding that may look like normal bidding activities. In this paper, we present a series of attributes to describe suspicious bidding activities related to shilling behavior in online auctions. Once a set of bidders from a dataset have been characterized using these attributes, we can utilize hierarchical clustering to identify suspicious groups. As observed in [2], online auction participants belong to heterogeneous groups based on their bidding behavior. However, current literature focuses primarily on * This material is based upon work supported by the U.S. National Science Foundation under grant numbers CNS-0715648 and CNS-0715657. identifying groups of bidding behavior based on legitimate bidding intentions. Thus, there is a pressing need to design an effective method to identify suspicious bidders with illegitimate bidding intentions for efficient detection of shill bidders in online auctions. Furthermore, the clustering results, which are labeled as normal, suspicious or highly suspicious, can be used as a training dataset to create a decision tree. With the decision tree, we can efficiently classify new bidders in an online auction immediately following its closure. If a new bidder is classified as suspicious or highly suspicious, we can use existing verification techniques, such as Dempster-Shafer (D-S) theory [3], to verify shill bidders. Note that existing approaches, such as D-S theory, are not efficient for analyzing large datasets. By efficiently identifying suspicious bidders in online auctions, our approach strongly complements existing techniques for shill detection, which suffer from time inefficiency. Previous work employed various data mining techniques to categorize groups of bidders based on their bidding behavior. Bapna, et al. utilized k-means clustering to generate five distinct groups of bidding behaviors for Yankee Auctions [2]. They observed that users can improve the execution of their bidding strategies over time by adopting agent bidding to lower their bidding costs. Shah, et al. analyzed collected auction data from eBay to generate four distinct groups of bidding behavior [4]. The analysis revealed that there are certain bidding behaviors that appear frequently in online auctions. Hou and Rego used hierarchical clustering to generate four distinct groups of bidding behaviors for standard eBay auctions, namely goal- driven bidders, experiential bidders, playful bidders, and opportunistic bidders [5]. They concluded that online bidders are a heterogeneous group rather than a homogenous one. Although the above approaches look closely related to our proposed approach, they focus on creating clusters based on the assumption that bidders are honest ones with no malicious intentions. Thus, the clusters generated using these approaches reflect that bidders are all normal, even though there is significant evidence for shilling behavior. Unlike these approaches, we attempt to uncover suspicious bidders in online auctions. Once suspicious bidders are identified, we may use existing approaches such as D-S theory [3] to verify shill bidders.