© 2016, IJARCSMS All Rights Reserved 156 | P age
ISSN: 2321-7782 (Online)
Impact Factor: 6.047
Volume 4, Issue 11, November 2016
International Journal of Advance Research in
Computer Science and Management Studies
Research Article / Survey Paper / Case Study
Available online at: www.ijarcsms.com
Pattern Based Document Recommendation using Maximum
Matched Equivalence Classes
Smita K. Thakare
1
PG student Department of Computer Engineering,
Late G.N.Sapkal college of Engineering Savitribai Phule,
Pune University – India
Prof. J. V. Shinde
2
Assistant Professor Department of Computer Engineering
Late G.N.Sapkal college of Engineering, Savitribai Phule
Pune University – India
Abstract: Topicmodelling has been widely accepted in the areas of machine learning and text mining, etc. It was proposed to
generate statistical models to classify multiple topics in a collection of documents.Existing model I.e. pattern based model,
term based model suffered with polysemy and synonymy ,noise generated by this model . All this model only consider that
user interested in in only one topic but in situation user are interested in at time many topic in the filled on information
filtering Patterns are always thought to be more discriminative than single terms for describing documents. Selection of the
most representative and discriminative patterns from the huge amount of discovered patterns becomes essential. To deal with
the above mentioned limitations a novel information filtering model is proposed. Proposed model includes user information
needs are generated in terms of multiple topics where each topic is represented by patterns. Patterns are generated from topic
models and are organized in terms of their statistical and taxonomic features and the most discriminative and representative
patterns are proposed to estimate the document relevance to the user’s information needs in order to filter out irrelevant
documents. To evaluate the effectiveness of the proposed model TREC data collection and Reuters Corpus Volume 1 are
used.
Keywords: Topic model, information filtering, and pattern based model, term based model, maximum matched pattern.
I. INTRODUCTION
All data mining and text mining techniques assume that the user’s interest is only related to a single topic. Actually, this is
not necessary in the case. When a user asks for information about a product like “CAR”, the user not able to typically mean to
find documents which consistently mention t he word “CAR”. The user probably wants to find documents that contain
information about different aspects of the product, such as location, price, and servicing. This means that a user’s interest
usually involves multiple aspects relating to multiple topics. The most inspiring contribution of topic modeling is that it
automatically classifies documents in the collection by a no. of topic which represent every document with multiple topics and
their corresponding distribution. When we are comparing with pattern-based model and word-based model, pattern-based model
generate most meaningful and useful content as per the use requirement. But some time pattern are small in size or large in size
and that pattern is not carry the meaning related to the particular topic.so to avoid this
Problem related to pattern we have to find out The topic-based representation generated by using topic modeling can
conquer the problem of semantic confusion compared with the traditional text mining techniques. Topic modeling needs
improved modeling users interests in terms of topics’ interpretations. Hence we proposed the innovate system i.e,A Maximum
matched Pattern-based Topic Model which generates pattern enhanced topic representations to model user’s interests across
multiple topics. Model selects maximum matched patterns, instead of using all discovered patterns, for estimating the relevance
of incoming documents. To find out most meaningful pattern we using ranking method and most ranked pattern is most useful