Email Classification and Summarization: A Machine Learning Approach Taiwo Ayodele Rinat Khusainov David Ndzi Department of Electronics and Computer Engineering University of Portsmouth, United Kingdom {taiwo.ayodele, rinat.khusainov, david.ndzi} @port.ac.uk Keywords Email, algorithms, email summarization, activities, Classification. Abstract This paper presents the design and implementation of a system to group and summarize email messages. The system uses the subject and content of email messages to classify emails based on users’ activities and generate summaries of each incoming message with unsupervised learning approach. Our framework solves the problem of email overload, congestion, difficulties in prioritizing and difficulties in finding previously archived messages in the mail box. 1. Introduction Emails are parts of everyday life. Personal computer users use emails to communicate with friends, families, e-businesses and colleagues allowing ease for communication. Emails serve as an archival tool to some people, while many users never discard messages because their information contents might be useful at a later date – for example, as a reminder of upcoming events and outstanding issues. Also, a paper by Schuff et al [1] states that “Email is widely used to synchronize real-time communication, which is inconsistent with its primary goals”. Email messages are designed to be sent, accumulated in a repository and be periodically collected and read by a recipient. And because of the high volume of email received daily the mail box is easily congested. Messages range from static organizational knowledge to conversations with a broad horizon of topics. Users may find it difficult to prioritize and successfully process the contents of incoming messages. Also it may be difficult to find a previously archived message in a mail box. In this paper we propose a new effective method for managing information in email, reducing email overloads by the method of grouping emails based on users’ activities, and providing summarization of emails in this project. We propose email groupings based on users’ activities where incoming mails are identified and grouped into appropriate activities and related messages are grouped in the same activity. Email messages are grouped by extracting most frequent words in the content of the message as well as comparing common words with most frequent words in the message to decide which activity the email message belongs. We developed some techniques that allow our classifier and summarizer to extract information from email messages and build a model from extraction of most frequent and common words in email messages in order to group messages into activities. Our classifier and summarizer make use of some rules sets to group emails into activities based on their observations and set of rules that is passed unto both the classifier and the summarizer. 2. Related Work One of the common existing methods is to manually archive messages into folders with a view of reducing the number of information objects a user must process at any given time. However, this is an insufficient solution as folder names are not necessarily a true reflection of their content and their