ChIP-on-chip protocol for genome-wide analysis of transcription factor binding in Drosophila melanogaster embryos Thomas Sandmann 1 , Janus S Jakobsen 1 & Eileen E M Furlong 1 EMBL Heidelberg, Meyerhofstrasse 1, 69117 Heidelberg, Germany. Correspondence should be addressed to E.E.M.F. (furlong@embl.de). Published online 25 January 2007; doi:10.1038/nprot.2006.383 This protocol describes a method to detect in vivo associations between proteins and DNA in developing Drosophila embryos. It combines formaldehyde crosslinking and immunoprecipitation of protein-bound sequences with genome-wide analysis using microarrays. After crosslinking, nuclei are enriched using differential centrifugation and the chromatin is sheared by sonication. Antibodies specifically recognizing wild-type protein or, alternatively, a genetically encoded epitope tag are used to enrich for specifically bound DNA sequences. After purification and polymerase chain reaction-based amplification, the samples are fluorescently labeled and hybridized to genomic tiling microarrays. This protocol has been successfully used to study different tissue-specific transcription factors, and is generally applicable to in vivo analysis of any DNA-binding proteins in Drosophila embryos. The full protocol, including the collection of embryos and the collection of raw microarray data, can be completed within 10 days. INTRODUCTION A large number of DNA-binding and chromatin-associated pro- teins are recruited to organize, package, transcribe or repair the genetic information stored in the nucleus of eukaryotic cells. An important step toward understanding any of these (and other) processes at a molecular level is to determine the sites of direct protein–DNA interactions. ChIP-on-chip allows genome-wide protein–DNA interactions to be assayed in vivo Approaches such as DNAseI footprinting 1 and electrophoretic mobility shift assays 2 allow monitoring of binding events in vitro, whereas chromatin immunoprecipitation studies are a valuable tool to probe these essential interactions in vivo. By co-precipitating the protein of interest with its associated DNA sequences from cellular or embryonic chromatin extracts, a snapshot of binding site occupancy—reflecting complex variables such as chromatin acces- sibility, combinatorial binding and/or competition with other DNA-associated factors—can be obtained (Fig. 1). Typically, che- mical reagents or ultraviolet irradiation are used to stabilize transient interactions by introducing covalent crosslinks. Formal- dehyde, a small molecule that readily penetrates biological samples, induces (partially) reversible crosslinks between e-lysine groups of proteins and (i) neighboring peptide bonds (protein–protein cross- linking) or (ii) amino groups of partially denatured DNA bases (protein–DNA crosslinks) 3,4 . Ultraviolet irradiation, on the other hand, induces irreversible bonds exclusively between nucleic acids and directly bound proteins, but cannot penetrate more than a few cell layers, requiring dissociation of cells from larger tissues and embryos 5–7 . After crosslinking, cells are lysed and the chromatin is sheared, to fragments typically between 0.2 and 1 kb in length, and sequences specifically bound by the protein of interest are purified by immunoprecipitation. Traditionally, Southern blot 8 or quantitative real-time polymerase chain reaction (qPCR) 9 have been used to p u o r G g n i h s i l b u P e r u t a N 7 0 0 2 © natureprotocols / m o c . e r u t a n . w w w / / : p t t h Stage-matched wt embryos Formaldehyde fixation, preparation of chromatin Immunoprecipitation Crosslink reversal, amplification, Klenow labeling Hybridization to tiling arrays Statistical analysis Visualization, data integration Labeled genomic DNA Figure 1 | Schematical overview of ChIP-on-chip experiments. A population of wild-type (or transgenic) D. melanogaster embryos is dechorionated and covalent bonds between proteins as well as proteins and nucleic acids are introduced by formaldehyde crosslinking. Shearing the DNA allows immunoprecipitation of short sequences associated with the protein of interest. After partial reversal of the crosslinks, these can be amplified, labeled and hybridized against a genomic reference to genomic tiling arrays. Raw data processing followed by statistical analysis allows identification of significantly enriched sequences bound by the protein of interest. NATURE PROTOCOLS | VOL.1 NO.6 | 2006 | 2839 PROTOCOL