Sharing Features Between Objects and Their Attributes Sung Ju Hwang 1 , Fei Sha 2 and Kristen Grauman 1 1 Department of Computer Science 2 Computer Science Department University of Texas at Austin University of Southern California {sjhwang,grauman}@cs.utexas.edu feisha@usc.edu Abstract Visual attributes expose human-defined semantics to ob- ject recognition models, but existing work largely restricts their influence to mid-level cues during classifier training. Rather than treat attributes as intermediate features, we consider how learning visual properties in concert with ob- ject categories can regularize the models for both. Given a low-level visual feature space together with attribute- and object-labeled image data, we learn a shared lower- dimensional representation by optimizing a joint loss func- tion that favors common sparsity patterns across both types of prediction tasks. We adopt a recent kernelized formula- tion of convex multi-task feature learning, in which one al- ternates between learning the common features and learn- ing task-specific classifier parameters on top of those fea- tures. In this way, our approach discovers any structure among the image descriptors that is relevant to both tasks, and allows the top-down semantics to restrict the hypothe- sis space of the ultimate object classifiers. We validate the approach on datasets of animals and outdoor scenes, and show significant improvements over traditional multi-class object classifiers and direct attribute prediction models. 1. Introduction Visual attributes are human-understandable properties shared among object categories (e.g., “glassy”, “has legs”), and are a compelling way to introduce high-level seman- tic knowledge into predictive models. Recent work shows that attributes are valuable in several interesting scenarios, ranging from description of generic images or unfamiliar objects [11, 9, 24], to zero-shot transfer learning [13], to intermediate features that aid in distinguishing people, ob- jects, and scenes [12, 13, 9, 27]. Existing approaches to attribute-based recognition as- sume that the attributes’ role is primarily to focus learn- ing effort on properties that will be reusable for many cate- gories of interest, and to elegantly integrate human knowl- edge into discriminative models. As such, attribute classi- polar bear dalmatian white spots ym1 ya1 yaA ymM polar bear dalmatian white spots Object class Attributes l ifi ym1 ya1 yaA ymM class classifier classifier 11 12 D(M+A) u1 u2 uD Shared features !! u3 Input visual feature x1 x2 x3 xD !! Figure 1. In our model, object categories and their human-defined visual attributes share a lower-dimensional representation (dashed lines indicate zero-valued connections), thereby allowing the attribute-level supervision to regularize the learned object models. fiers are learned independently from object classifiers, and then their predictions are treated as “mid-level” features that bridge low-level image features and high-level object classes. However, segregating supervision about attributes from supervision about objects may restrict their impact. In particular, in conventional models, even though attributes influence object predictions, the attribute-labeled training data does not directly introduce new information when dis- criminatively learning the objects. We explore how learning visual attributes in concert with object categories can strengthen recognition. The assump- tion is that both types of prediction tasks rely on some shared structure in the original image descriptor space. In other words, patterns among those generic visual proper- ties that humans elect to name may reveal information about which low-level cues are valuable to object recognition—in the most general case, whether the objects of interest exhibit those attributes or not. Thus, rather than treat attributes as intermediate features, we propose an approach to discover this structure and learn a shared lower-dimensional repre- sentation amenable to discriminative models for either one (see Figure 1). In effect, we show how human-defined se- mantics (as revealed by attributes) can regularize training for object classifiers. 1761