Abstract. Classification systems for research publications
are often based on taxonomies. The ACM society for
computing and professionals provides a digital library whose
cataloguing system is based on a taxonomy that has been
continuously updated over the years. The CiteSeer collection
contains a large collection of computer science research
papers, many of which are tagged with categories from the
ACM’s taxonomy. By analyzing the small portion of
CiteSeer’s manually tagged documents and by considering
different time frames, we extracted statistics that shows how
the ACM’s taxonomy covers the publications in computer
and information science research sub-fields. We also studied
size and growth of categories over the last four available
years. These results allow us to reveal areas with higher or
lower publication rate. We believe that these techniques
could be exploited to quickly identify trends within
taxonomies. This would greatly help to construct more
efficient browsing and searching systems.
Keywords: Classification systems, ACM taxonomy,
CiteSeer digital library.
I. INTRODUCTION
LASSIFICATION systems for research
publications must be continuously improved and
adapted to reflect current research activities and trends.
Because of its rapid changes, this is especially true for the
computer and information sciences field. New areas of
research continue to emerge while research in other areas
falls off. The identification of a society’s interests is
important not only to understand research trends, but also
to analyze the usage of the adopted classification system,
verifying that it properly covers all research areas and that
all categories are used as evenly as possible. The
distribution of publications is important to develop and
maintain efficient searching and browsing systems. In
addition, by identifying “hot” and “cold” publication
areas, we can identify areas for improvement in the
classification system. The information is also of general
interest since it provides a summary of current research
activities in the field.
ACM
1
is the world’s oldest and largest educational and
scientific computing society. Its numerous conferences
This work was partially supported by the National Science Foundation
under grant NSF CRI 0454121 (Next Generation CiteSeer).
1
ACM: Association for Computer Machinery, http://www.acm.org,
last visited: November 2009.
and journals are the most popular choices for researchers
to publish their work. The ACM’s Computer Classification
System (CCS), first developed in 1964, is a taxonomy of
computer and information science areas that is widely
used. Researchers use the categories in this taxonomy to
classify their work, and the IEEE Computer Society
2
has
also adopted it as the basis for its own taxonomy.
In order to study the fit between the ACM’s taxonomy
and published research over time, we studied the
documents contained in the CiteSeer [16] collection, an
automated digital library for scholarly computing-related
literature. Our snapshot of this collection contains the
4,348 documents published between 1980 and 2005 that
were explicitly tagged by their authors. We processed
these documents to study changes and analyze trends in
the use of ACM’s taxonomy. By analyzing the use of the
ACM’s CCS over the CiteSeer collection, we can find out
how well the taxonomy represents the current research
trends in the Computer & Information Science &
Engineering (CISE) community. This analysis will show
the usage of the taxonomy and will reveal research trends
that can be used to assist future revision of the ACM’s
CCS.
We believe that the structure of classification systems
will become more and more complex over the years.
Making changes to the taxonomy will be increasingly
difficult because any changes will have to take the
historical integrity into consideration while simultaneously
adapting the taxonomy to reflect current research trends.
For instance if we consider the CCS taxonomy and the
different types of changes (e.g., introduction of cross
references) applied during the last revision in 1998 and
described in section 3, the structure will become more
similar to an ontology with multiple, non-hierarchic links
rather than a simple hierarchical taxonomy. For this reason
tools should be implemented to conduct periodic and
automatic analyses such as the one reported in this study.
Further investigations could be conducted if we considered
this taxonomy as an ontology and we applied the state of
the art techniques to maintain and evolve ontologies.
In Section 2, we begin by discussing some trend
analysis techniques used in previous studies. Section 3
then describes the ACM CCS taxonomy and the CiteSeer
data collection in more detail. In section 4 we introduce
2
IEEE Computer Society, http://www.computer.org/, last visited:
November 2009.
Using CiteSeer to Analyze Trends in the ACM’s
Computing Classification System
Mirco Speretta
1
, Susan Gauch
2
, and Praveen Lakkaraju
3
1,2
University of Arkansas, Fayetteville, AR, USA,
3
University of Kansas, Lawrence, KS, USA
msperett@uark.edu, sgauch@uark.edu, lakkaraju.praveen@gmail.com
C
571
978-1-4244-7562-9/10/$26.00 ©2010 IEEE