STYLISTIC ANALYSIS OF PAINTINGS USING WAVELETS AND MACHINE LEARNING Sina Jafarpour, G¨ ung¨ or Polatkan, Eugene Brevdo, Shannon Hughes, Andrei Brasoveanu, Ingrid Daubechies Princeton University Departments of Electrical Engineering, Computer Science, and Mathematics and the Program in Applied and Computational Mathematics Princeton, NJ 08544 ABSTRACT Wavelet transforms and machine learning tools can be used to assist art experts in the stylistic analysis of paintings. A dual-tree complex wavelet transform, Hidden Markov Tree modeling and Random Forest classifiers are used here for a stylistic analysis of Vincent van Gogh’s paintings with results on two stylometry challenges that concern “dating, resp. ex- tracting distinguishing features”. 1. INTRODUCTION Stylometry, i.e determining a painter’s style, is a challeng- ing problem for art historians. Many factors play a role. Technical analyses of the painting, including of pigments present, the materials used and the method of their prepa- ration, the artist’s process as documented in the underlayers of the painting (observed through Xray and infrared imag- ing), etc, provide one type of information. Visual inspection of the painting is of course very important as well, to evalu- ate and help characterize the visual appearance and style of the work. However, even the sum of all these analyses may prove inconclusive for some works. A new movement in image processing seeks to use com- putational tools from image analysis and machine learning to provide an additional source of analysis for such chal- lenging paintings, based on the assumption that an artist’s brushwork can be characterized, (at least in part), by sig- nature features (e.g. those arising from the artist’s habitual physical movements) and that such distinguishing quantita- tively measurable characteristics might be found by machine learning methods and used as an additional piece of evidence in stylometry tasks. Indeed, early attempts in this area have already found considerable success [1, 2, 3]. Recent attempts to characterize paintings of particular style via features discernible by image processing and ma- chine learning algorithms, have often focused on character- izing the statistics of the wavelet coefficients of digital scans of paintings by that artist [1, 4, 5]. This paper uses an approach of this type on a dataset pro- vided by the Van Gogh Museum and the Kroller-Muller Mu- seum in the Netherlands, consisting of high resolution scans of paintings by Vincent van Gogh. We combine recent image processing and machine learn- ing techniques, in order to tackle two stylometry problems proposed by the two museums: extracting distinguishing fea- tures, and a dating challenge. We show how modeling style as a hidden variable, controlling the behavior of the image observables, such as brushstrokes, color patterns, etc, can improve the accuracy of the style analyzer to a significant extent. We use a dual-tree complex wavelet transform [6], that is (almost) shift invariant, to capture quantitatively the effects observable in the image. Next, using Hidden Markov Trees [7], an extension of Hidden Markov Variables, com- bined with the expectation maximization algorithm [8], we extract the style parameters from the noisy observables. Fi- nally, using standard machine learning techniques, we feed the extracted features to appropriate classifiers, and use the resulting prediction rule for style analysis. This paper is a sibling of [10], in which similar tech- niques were used by our team, for authentication purposes instead of stylistic analysis. 2. APPLICATIONS 2.1 Dating Challenge In the absence of convincing documentation, the dating of a painting is based on where it fits in the chronology of the artist’s style, concerning for example, subject matter, materi- als used, color palette, compositional style, and brushwork. Some undocumented paintings have a mixture of features that seemingly correspond with different periods of their cre- ator’s artistic development. Such feature mixes pose difficult dating challenges . When dating relies on categorizing style and technique issues, computer-based image processing tasks for magni- fying the differences in style should prove useful. Further- more, artificial intelligence and machine learning techniques can provide the right tools for the final decision task. The dating challenge concerns the dating of paintings by Vincent van Gogh that stem from either his Paris phase (end- ing early in 1888) or his following late Arles period. The question is to ascertain which features distinguish the two test sets (taking as benchmark the paintings that are unques- tionably from the Paris or Arles period), and to use them subsequently to attempt to associate each of the dating can- didates with one group or the other. In distinguishing Van Gogh paintings from these two pe- riods, art historians rely on several general observations re- garding shifts in his practice. For instance, small strokes are more prominent in Paris, while brush handling is broader in Arles; colors appear more saturated in Arles due to the filling in of larger areas. At the initial stage of the challenge, the set of training examples included 33 images each, from the Paris and the Arles periods. At the final stage, three test paintings were provided. Each test painting exhibits some general features associated with Arles, as well as others associated with Paris. The final goal of this challenge was to come up with a high-confidence 17th European Signal Processing Conference (EUSIPCO 2009) Glasgow, Scotland, August 24-28, 2009 © EURASIP, 2009 1220