Multimodal Language Independent App Classification Using Images and Text Kushal Singla, Niloy Mukherjee, Joy Bose Samsung R&D Institute, Bangalore, India kushal.s@samsung.com, niloy.m@samsung.com, joy.bose@samsung.com Abstract. There are a number of methods for classification of mobile apps, but most of them rely on a fixed set of app categories and text descriptions associ- ated with the apps. Often, one may need to classify apps into a different taxon- omy and might have limited app usage data for the purpose. In this paper, we present an app classification system that uses object detection and recognition in images associated with apps, along with text based metadata of the apps, to generate a more accurate classification for a given app according to a given tax- onomy. Here, our motivation is to build a better user modeling system to under- stand user interests through the installed apps on a user's device. Our image based approach can, in principle, complement any existing text based approach for app classification. We train a fast RCNN to learn the coordinates of bound- ing boxes in an app image for effective object detection, as well as labels for the objects. We then use the detected objects in the app images in an ensemble with a text based system that uses a hierarchical supervised active learning pipeline based on uncertainty sampling for generating the training samples for a classifi- er. Using the ensemble, we are able to obtain better classification accuracy than if either of the text or image systems are used on their own. We describe the implementation details and test results on the accuracy of the models for differ- ent classes of apps. Keywords. User modelling; app classification; object recognition; object detec- tion 1 Introduction User modelling and user interest computation, based on the user’s profile and activity on mobile devices, is important to provide personalized services to users. One of the significant indicators of user interest, that is not so well studied, is the usage of mobile applications on the device. Data collected on mobile app usage is very high– dimensional, since a typical user would have multiple devices, each device having hundreds of apps and the user using these apps multiple times every day. Assigning coarse-grained categories to the apps helps to avoid the effects of the curse of dimen- sionality and hence can make it feasible to perform the user modelling. These app categories are typically hierarchical. For example, a flight booking app such as Sky-