Data Quality: Developments and Directions
Bhavani Thuraisingham* and Eric Hughes
The MITRE Corporation, Bedford MA, USA
*On Leave at the National Science Foundation, Arlington VA, USA
Abstract:
This paper first examines various issues on data quality and provides an
overview of current research in the area. Then it focuses on research at the
MITRE Corporation to use annotations to manage data quality. Next some of
the emerging directions in data quality including managing quality for the
semantic web and the relationships between data quality and data mining will
be discussed. Finally some of the directions for data quality will be provided.
Key words: Data Quality, Annotations, Semantic Web, Data Mining, Security
1. INTRODUCTION
There has been a lot of research and development on data quality for the
past two decades or so. Data quality is about understanding whether data is
good enough for a given use. Quality parameters will include the accuracy of
the data, the timelines of the data and the precision of the data. Data quality
has received increased attention after the revolution of the Internet and E-
Commerce. This is because organizations now have to share data coming
from all kinds of sources, and intended for various uses, and therefore it is
critical that organizations have some idea as to the quality of the data.
Furthermore, heterogeneous databases have to be integrated within and
across organizations. This also makes data quality more important. Another
reason for the increased interest in data quality is warehousing. Data
warehouses are being developed by many organizations and data is brought
into the warehouse from multiple sources. The sources are often
inconsistent, so the data in the warehouse could appear to be useless to users.
M. Gertz et al. (eds.), Integrity, Internal Control and Security in Information Systems
© Springer Science+Business Media New York 2002