57
Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.
Chapter 4
Concept-Based Mining Model
Shady Shehata
University of Waterloo, Canada
Fakhri Karray
University of Waterloo, Canada
Mohamed Kamel
University of Waterloo, Canada
IntroductIon
Due to the daily rapid growth of the information, there are considerable needs in extracting and discover-
ing valuable knowledge from the vast amount of information found in different data sources today such
AbstrAct
Most of text mining techniques are based on word and/or phrase analysis of the text. Statistical analysis
of a term frequency captures the importance of the term within a document only. However, two terms can
have the same frequency in their documents, but one term contributes more to the meaning of its sen-
tences than the other term. Thus, the underlying model should indicate terms that capture the semantics
of text. In this case, the model can capture terms that present the concepts of the sentence, which leads
to discover the topic of the document. A new concept-based mining model that relies on the analysis of
both the sentence and the document, rather than, the traditional analysis of the document dataset only
is introduced. The concept-based model can effectively discriminate between non-important terms with
respect to sentence semantics and terms which hold the concepts that represent the sentence meaning.
The proposed model consists of concept-based statistical analyzer, conceptual ontological graph rep-
resentation, and concept extractor. The term which contributes to the sentence semantics is assigned
two different weights by the concept-based statistical analyzer and the conceptual ontological graph
representation. These two weights are combined into a new weight. The concepts that have maximum
combined weights are selected by the concept extractor. The concept-based model is used to enhance
the quality of the text clustering, categorization and retrieval signifcantly.
DOI: 10.4018/978-1-60566-908-3.ch004