International Journal of Scientific and Research Publications, Volume 3, Issue 3, March 2013 1 ISSN 2250-3153 www.ijsrp.org Selecting Attribute to stand out in the Competitive World Mr. Murlidher Mourya * , Mr. P.Krishna Rao ** * Computer Science, Vardhaman College of Engineering ** Computer Science, Vardhaman College of Engineering Abstract- Mining of frequent item sets is one of the most fundamental problems in data mining applications. My proposed algorithm which guides the seller to select the best attributes of a new product to be inserted in the database so that it stands out in the existing competitive products, due to budget constraints there is a limit, say m, on the number of attribute that can be selected for the entry into the database. Although the problems are NP complete. The Approximation algorithm are based on greedy heuristics. My proposed algorithm performs effectively and generates the frequent item sets faster. Index Terms- Association rules,Data mining, Mining frequent itemsets. I. INTRODUCTION n recent years there has been development of ranking functions and efficient top-k retrivel algorithms which help the users in mining Frequent itemsset which plays a major role in many data mining applications .examples include: users wishing to search databases and catalogs of products such as homes, cars, cameras, or articles such as news and job ads. Users browsing these databases typically execute search queries via public front-end interfaces to these databases. Typical queries may specify sets of keywords in case of text databases or the desired values of certain attributes in case of structured relational databases. The query answering system answers `such queries by either returning all data objects that satisfy the quer conditions, or may rank and return the top-k data objects, or return the results that are on the query’s skyline. If ranking is employed, the ranking may either be simplistic—e.g., objects are ranked by an attribute such as Price; or more sophisticated—e.g., objects may be ranked by the degree of “relevance” to the query. Attributes selection: There are two types of users of these databases. Buyers of products who search such database trying to locate objects of interest ,while the latter type of user are sellers of products who insert new objects into these databases in the hope that they will be easily discovered by the buyers i.e it must stands out in the existing competitive products. To understand it a little better consider the following scenario : If a real estate seller wants to give an add on the news paper about sale of flats, He has to choose the best features of the flats, that are the most of the customers are interested. If he has given an add with some features (or attributes), and if no customer is interested on those features, then the add may not add value to his advertisement .If he has a system, that can suggest top k attributes (or features) of the product, then he can give a very good add, and that add will be referred by more number of customers. General problem also arises in domains beyond e-commerce applications. For example, in the design of a new product, a manufacturer may be interested in selecting the 10 best features from a large wish-list of possible features—e.g., a homebuilder can find out that adding a swimming pool really increases visibility of a new home in a certain neighborhood. The problem here is selecting the proper and the best attributes of the flats, to give a good advertisement that is more number of customers are interested. To define our problem more formally, we need to develop a few abstractions. Let D be the database of products already being advertised in the marketplace (i.e .,the “competition”). let Q be the set of search queries that have been executed against this database in the recent past—thus Q is the “workload” or “query log.” The query log is our primary model of what past potential buyers have been interested in. For a new product that needs to be inserted into this database, we assume that the seller has a complete “ideal” description of the product. But due to budget constraints, there is a limit, say m, on the number of attributes/keywords that can be selected for entry into the database. Our problem can now be defined as follows. II. PROBLEM FRAMEWORK Given a database D, a query log Q, a new tuple t, and an integer m, determine the best (i.e., top-m) attributes of t to retain such that if the shortened version of t is inserted into the database, the number of queries of Q that retrieve t is maximized. PRELIMINARIES First we provide some useful definitions Boolean database Let D = {t1 . . . tN} be a collection of Boolean tuples over the attribute set A = {a1 . . . aM}, where each tuple t is a bit- vector where a 0 implies the absence of a feature and a 1 implies the presence of a feature. A tuple t may also be considered as a subset of A, where an attribute belongs to t if its value in the bit- vector is 1. I