Scalable User Intent Mining using a Multimodal
Restricted Boltzmann Machine
Yue Shang, Wanying Ding,
Mengwen Liu, Xiaoli Song, Tony Hu, Yuan An
College of Computing and Informatics
Drexel University
Philadelphia, PA, USA
ys439,wd78,ml943,xs48,xh29,ya45@drexel.edu
Haohong Wang, Lifan Guo
TCL Research America
San Jose, CA, USA
haohong.wang@tcl.com, lifanguo@gmail.com
Abstract— Nowadays, search engines have become
indispensable parts of modern human life, which create
hundreds and thousands of search logs every second throughout
the world. With the explosive growth of online information, a key
issue for web search service is to better understand user’s need
through the short search query to match the user’s preference as
much as possible. However, due to the lack of the personal
information in some scenario and the huge calculation when
seeking for relevant user group, personalized search becomes a
quite a challenging problem. In this work, we propose a novel
scalable framework based on multimodal Restricted Boltzmann
Machine (RBM) to do the user intent mining and prediction. This
scalable framework works in an unsupervised manner, and is
flexible to various situations regardless of the amount of
individual information, in other words, it can handles scenarios
without personal history information or limited personal history
information, the more individual data the better accuracy of user
intent prediction and more capable to reflect the individual’s
interests changing. The framework outputs a binary
representation for each query log, thus to some extent, could
solve data sparsity problem and reduce the computation
complexity when looking for users with similar interests. The
experiment results shown that, the model can learn reasonable
user intent category during the learning procedure, according to
the qualitative analysis of the top ranked context and websites for
each class. And it can get a competitive performance when no
individual data is offered. Moreover, by offering more individual
data (10 history queries), the overall performance improves up to
10% of precision.
Keywords—Click-through Data Mining, Named Entity Mining
I. INTRODUCTION
Search engine plays an important role in life for people to
find information and for years it has greatly facilitate people’s
daily life. However, it’s always not an easy problem for
machines to understand what people are looking for. Moreover,
different people have very different interests. And even for one
individual, his/her interest will change over time. Thus it’s
necessary for online search service to meet the need of
personalized searching and adapt to the change of user intent
over time. As a result, user specific information, e.g. user
profile, user query history, or previous view content
information become significant when identify the user’s taste.
Studies have shown that personalization algorithm can have
a promoting result when there’s sufficient amount of user
data[1]. However, it’s always challenging to acquire adequate
user information because of the privacy issues. So, many
studies seek solutions by developing group level
personalization, which combines limited individual
information with other related people to perform a
collaborative filtering[2]. But to find similar users to enrich
personalization is also challenging because of the data sparsity
and have to compute the distance among each user. Moreover,
the user information suffers from great imbalance. The
imbalance amount of user personal data is resulted from
various reasons, but the situation is that some user may have
plenty of online activity records while others may be almost no
trace at all. And this requires the model to flexible enough to fit
different scenario. When there’s no personal data, the model
can learn to mine the user intent from public dataset. And it
should scale up the personal model training when there’s
adequate individual data.
Compared to other resources like tweets, blogs, etc., search
engine query logs can more directly reflect users’ interests and
needs. When use search engine, people tend to use brief and
direct words to describe their needs, mostly they will use
named entities. In domains of data mining, a named entity
refers to a phrase that clearly identifies an item from other
items that with similar attributes. Examples of named entities
are location, person’s first and last name, address, product
names, and etc. Different users may look for different aspects
of a named entities and it’s difficult for the search engines to
tell users’ exact search intent. Query logs from search engine
provide huge amount of user search information. And studies
have shown that nearly 70% of query logs contain single
named entities (e.g. “Gone girl trailer”)[1]. These named
entities cover varies categories of named entities such as
movies, music, books, autos, electronic products and etc.
In this work, we propose a novel scalable framework to
learn from both huge amounts of public query logs and an
individual’s own query activity to understand user’s intent. By
offering more personal search history, the model can learn
user’s intent more accuracy. And without history activity
records, it also can work by leveraging the model learned from
public and try to make a reasonable decision. Furthermore, the
2015 International Conference on Computing, Networking and Communications, Invited Position Papers
978-1-4799-6959-3/15/$31.00 ©2015 IEEE 618