978-1-4244-7445-5/10/$26.00 ©2010 IEEE
Text Mining of Personal Communication
Understanding the Technical and Privacy Related Challenges
Håkan Jonsson
Corporate Technology Office
Sony Ericsson
Lund, Sweden
hakan1.jonsson@sonyericsson.com
Pierre Nugues, Christofer Bach, Johan Gunnarsson
Computer Science
Lund University
Lund, Sweden
pierre.nugues@cs.lth.se, buffyin@gmail.com,
johan.gunnarsson@gmail.com
Abstract— This paper reports on the work on a new service using
text mining on SMS data: SMSTrends. The service extracts
trends in the form of keywords from SMS messages sent and
received by ad hoc location-based communities of users. Trends
are then presented to the user using a phone widget, which is
regularly updated to show the latest trends. This allows the user
to see what the user community is texting about, and makes her
aware of what is going on in this community.
Privacy considerations of the service are governed by user
expectations and regulations. Brenner and Wang [1] discussed
mining of personal communication in operator bit pipes. We
expand on this by looking deeper into privacy and regulatory
aspects through the specific example of SMSTrends. Especially,
the use of adaptive location granularity selection is introduced.
Keywords-text mining; messaging; location; context awareness;
collective awareness; privacy
I. INTRODUCTION
Personal communication such as SMS is considered highly
private. This combined with privacy and data protection
regulations makes it very hard to develop services and
applications or do research which require a priori access to
large amounts of SMS messages. Examples of such services
are text prediction engines and marketing analytics on SMS.
A. Background
The work on the SMSTrends service was started as a
research project to extract named entities from SMS messages
(SMS). When we discovered the problems of finding or
collecting a relevant corpus of SMs to carry out the project, the
corpus collection became a topic in itself: Under what
conditions are users ready to give others access to their SMS?
As SMS messages are private data exchanged between two
parties, a classical approach to corpus collection – automatic
gathering from machine-readable documents or transcriptions
from printed sources – is not applicable. A first naïve request to
our colleagues to hand us their SMs for the sake of science
miserably failed. We started the SMSTrends application in an
attempt to offer them a benefit to sharing their SMS data. After
a small group of users had tried it (about a third of the people
asked), few wanted to continue using it unless it was made
possible to mark messages as secret, to make sure they were
not used by the service. After this feature was introduced, a
small group continued to use the service. However, the user
group is yet too small to make any conclusions regarding the
end user value of the service compared to the cost of the user
information, and further studies with larger groups are needed.
B. The Service
The service extracts trends in the form of keywords from
SMS messages sent and received by the users of the
application. Trends are then presented to a user using a phone
widget, which is regularly updated to show the latest trends.
This allows her to see what a user community is texting about,
and makes her aware of what is going on in this community.
Figure 1. SMSTrends widget screenshot