Scalable User Intent Mining using a Multimodal Restricted Boltzmann Machine Yue Shang, Wanying Ding, Mengwen Liu, Xiaoli Song, Tony Hu, Yuan An College of Computing and Informatics Drexel University Philadelphia, PA, USA ys439,wd78,ml943,xs48,xh29,ya45@drexel.edu Haohong Wang, Lifan Guo TCL Research America San Jose, CA, USA haohong.wang@tcl.com, lifanguo@gmail.com Abstract— Nowadays, search engines have become indispensable parts of modern human life, which create hundreds and thousands of search logs every second throughout the world. With the explosive growth of online information, a key issue for web search service is to better understand user’s need through the short search query to match the user’s preference as much as possible. However, due to the lack of the personal information in some scenario and the huge calculation when seeking for relevant user group, personalized search becomes a quite a challenging problem. In this work, we propose a novel scalable framework based on multimodal Restricted Boltzmann Machine (RBM) to do the user intent mining and prediction. This scalable framework works in an unsupervised manner, and is flexible to various situations regardless of the amount of individual information, in other words, it can handles scenarios without personal history information or limited personal history information, the more individual data the better accuracy of user intent prediction and more capable to reflect the individual’s interests changing. The framework outputs a binary representation for each query log, thus to some extent, could solve data sparsity problem and reduce the computation complexity when looking for users with similar interests. The experiment results shown that, the model can learn reasonable user intent category during the learning procedure, according to the qualitative analysis of the top ranked context and websites for each class. And it can get a competitive performance when no individual data is offered. Moreover, by offering more individual data (10 history queries), the overall performance improves up to 10% of precision. Keywords—Click-through Data Mining, Named Entity Mining I. INTRODUCTION Search engine plays an important role in life for people to find information and for years it has greatly facilitate people’s daily life. However, it’s always not an easy problem for machines to understand what people are looking for. Moreover, different people have very different interests. And even for one individual, his/her interest will change over time. Thus it’s necessary for online search service to meet the need of personalized searching and adapt to the change of user intent over time. As a result, user specific information, e.g. user profile, user query history, or previous view content information become significant when identify the user’s taste. Studies have shown that personalization algorithm can have a promoting result when there’s sufficient amount of user data[1]. However, it’s always challenging to acquire adequate user information because of the privacy issues. So, many studies seek solutions by developing group level personalization, which combines limited individual information with other related people to perform a collaborative filtering[2]. But to find similar users to enrich personalization is also challenging because of the data sparsity and have to compute the distance among each user. Moreover, the user information suffers from great imbalance. The imbalance amount of user personal data is resulted from various reasons, but the situation is that some user may have plenty of online activity records while others may be almost no trace at all. And this requires the model to flexible enough to fit different scenario. When there’s no personal data, the model can learn to mine the user intent from public dataset. And it should scale up the personal model training when there’s adequate individual data. Compared to other resources like tweets, blogs, etc., search engine query logs can more directly reflect users’ interests and needs. When use search engine, people tend to use brief and direct words to describe their needs, mostly they will use named entities. In domains of data mining, a named entity refers to a phrase that clearly identifies an item from other items that with similar attributes. Examples of named entities are location, person’s first and last name, address, product names, and etc. Different users may look for different aspects of a named entities and it’s difficult for the search engines to tell users’ exact search intent. Query logs from search engine provide huge amount of user search information. And studies have shown that nearly 70% of query logs contain single named entities (e.g. “Gone girl trailer”)[1]. These named entities cover varies categories of named entities such as movies, music, books, autos, electronic products and etc. In this work, we propose a novel scalable framework to learn from both huge amounts of public query logs and an individual’s own query activity to understand user’s intent. By offering more personal search history, the model can learn user’s intent more accuracy. And without history activity records, it also can work by leveraging the model learned from public and try to make a reasonable decision. Furthermore, the 2015 International Conference on Computing, Networking and Communications, Invited Position Papers 978-1-4799-6959-3/15/$31.00 ©2015 IEEE 618