Automatic Visual Attributes Extraction from Web Offers Images Angelo Nodari, Ignazio Gallo & Marco Vanetti Dipartimento di Scienze Teoriche e Applicate, Universita‘ degli studi dell’Insubria In this study we propose a method for the automatic extraction of Visual Attributes from images. In particular, our case study concerns the processing of images related to commercial offers in the fashion domain. The pro- posed method is based on a pre-processing phase in which an object detection algorithm identiﬁes the object of interest, subsequently the visual attributes are extracted using a descriptor based on the Pyramid of Histograms of Orientation Gradients. In order to classify these descriptions, we have trained a discriminative model using a manually annotated dataset of commercial offers, available for comparisons. To increase the performance of the visual attributes extraction, the results provided by the previous step have been reﬁned with an a priori prob- ability which models the occurrence of each visual attribute with a speciﬁc product type, opportunely estimated on the dataset. 1 INTRODUCTION In the recent years we are experiencing a growing in- terest turned towards visual attributes and their us- age in support to many different task: classiﬁcation, recognition, content-based image retrieval etc... The concept of visual attribute has been ﬁrstly formalized and analyzed by (Ferrari and Zisserman 2007) who propose a generative model for learning simple color and texture attributes from loose annota- tions and (Farhadi et al. 2009) which learn a richer set of attributes including parts, shape, materials, etc In the ﬁeld of face recognition (Kumar et al. 2011) used a set of general and local visual attributes to train a discriminative model which measures the presence, absence, or degree to which an attribute is expressed in images in order to compose a signature of visual attributes. In (Sivic et al. 2006; Anguelov et al. 2007) the main goal consists in ﬁnding all the occurrences of a particular person in a sequence of pictures taken over a short period of time. In particular the use of visual attributes to extract information about the hair and clothes of the people has given a consistent con- tribution in the management of all the cases where the people move around, change their pose and scale, and partially occlude each other. There are also successful applications of the visual attributes in the ﬁeld of security and surveillance sys- tems, for example in (Vaquero et al. 2009) the au- thors used visual attributes instead of the standard face recognition algorithms, which are known to be subject to problems like lighting changes, face pose Upturned collar Upturned collar Frontal Placket Buttons Frontal Placket Buttons Long Sleeves Long Sleeves Long Sleeves Long Sleeves buttons zip open nothing other Frontal Placket Search Window Predictions Prior Probability Overall Classification long half short nothing other Sleeves camisole hood ... upturned round Neckline ... ... ... Figure 1: Example of the attribute extraction phase using the proposed method for the three types of at- tribute analyzed in this study. variation and low-resolution. They search for people by parsing human parts and their attributes, including facial hair, eyewear, clothing color, etc. In (Wang and Mori 2010) the authors demonstrate that object naming can beneﬁt from inferring at- tributes of objects and that, in general, the attributes are not independent each other. The visual attributes express local or general char- acteristics of a subject, while color, texture and shape are the global features most commonly used, they are also the less interesting in the particularization of the object of interest. They are also used in the aforementioned work and we have analyzed them in a previous work (Gallo et al. 2011), but in this paper we have focused only on the local attributes. These visual attributes are very domain-speciﬁc and there- fore contain much information that can be used in various ﬁelds. In order to be exported to other do- mains, the extracted attributes must be carefully se- lected. Moreover, to ensure the applicability of the 1