Web Security Requirements: A Phishing Perspective Ian Fette School of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Email: icf@cs.cmu.edu Norman Sadeh School of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Email: sadeh@cs.cmu.edu Lorrie Cranor School of Computer Science Carnegie Mellon University Pittsburgh, Pennsylvania 15213 Email: lorrie@cs.cmu.edu Abstract—We are currently focusing on web security prob- lems caused by phishing, and similar semantic attacks against users. Our current investigations are leading towards heuristic, collaborative, and semantic approaches towards thwarting such attacks. Additionally, we are considering new approaches to authentication that minimize the room for user error in the presence of semantic attacks. We feel that there is significant room for progress in both of these areas, and that further testing to validate any potential solution to web security problems must take semantic attacks into account in the context of real user behavior. I. I NTRODUCTION Phishing is a growing problem[1] that affects an increasing number of users and companies providing online services. At its most fundamental level, phishing is a subset of a larger class of semantic attacks against the user, which are seen as defining the current wave of network attacks[2]. These attacks are increasingly perpetrated by targeting the user’s environment and interfaces. A lack of usable mutual authentication opens consumers up to both classic man in the middle attacks and semantic attacks, showcasing the need for a solution to help users authenticate service providers in a usable and secure manner. We believe that there is potential for immediate return on technologies designed to detect spoofed webpages and emails. To that end, we are currently investigating heuristic approaches and collaborative approaches designed with the goal of de- termining the authenticity of an email or webpage. We hope that this will offer a sufficiently effective approach to thwart the majority of phishing attacks. We are also considering a few long-term approaches, including semantic reasoning and analysis of attacks, as well as examining the fundamental faults in authentication mechanisms that make these attacks possible. Some solutions we are currently considering include leveraging the pre-existing out of band communications that take place between customers and institutions. II. DETECTING SPOOFED CONTENT A. Heuristic approaches There are a number of heuristic approaches that we believe may be effective in detecting phishing attacks, starting at the email level. These approaches are based on an understanding of common traits found in phishing emails, as described in [3]. Besides the traditional spam filtering approaches, such as Bayesian filters, we believe that filters acting on certain key characteristics may be able to filter out many phishing attacks at the email level. Such characteristics include emails with links to IP addresses, emails with links to newly registered domains, and emails that appear commercial in nature but originate from either residential or foreign IP addresses. Heuristics are already in use in anti-phishing toolbars, such as Spoofguard[4] and the Netcraft Toolbar[5], and we are currently conducting studies to evaluate the effectiveness of these various approaches. Our preliminary studies indicate that the accuracy of currently available toolbars varies quite a bit from product to product. Unfortunately, applying heuristics at the email level must rely on different techniques than applying heuristics at the browser level. At the most basic level, different information is available in the email (such as header information), but on a higher level, the cost of heuristics at an email level must be weighed differently. Any filtering done by an ISP, or similar email gateway, must have a low marginal processing cost due to the high volume of mail to be screened. Filtering done in the user’s web browser can have a relatively higher marginal cost, since there is far less volume, and less sensitivity by humans to small (sub-second) delays. These small delays can quickly add up and halt an email server, however, if 100 emails per second arrive with each email taking more than 1/100s to process. B. Collaborative approaches Given that there are on the order of thousands of phishing attacks per month[6] going to hundreds of millions of users, it seems reasonable to hope that a collaborative approach could also assist in detecting a large number of phishing attacks. A typical phishing attack seems to be sent out multiple times to many different people, and if the first recipient recognizes the email as a phishing email, it should be possible for that person to effectively inform the community, acting in effect as a vaccine against further emails of a similar nature. For this to be feasible, a number of new developments are required. First, there needs to be a framework for a scale- free network to communicate such “vaccination” information among peers. Second, there must be a way of accurately developing such information such that a maximum number of similar attacks are matched, but legitimate emails are not matched. Third, there needs to be a way to evaluate and