Conditional Random Fields for Predicting and Analyzing Histone Occupancy, Acetylation and Methylation Areas in DNA Sequences Dang Hung Tran 1 , Tho Hoan Pham 2 , Kenji Satou 1,3 , and Tu Bao Ho 1,3 1 School of Knowledge Science, Japan Advanced Institute of Science and Technology, 1-1 Asahidai, Nomi, Ishikawa 923-1292, Japan tran@jaist.ac.jp 2 Faculty of Information Technology, Hanoi University of Pedagogy, 136 Xuan Thuy, Cau Giay, Hanoi, Vietnam 3 Institute for Bioinformatics Research and Development (BIRD), Japan Science and Technology Agency (JST), Japan Abstract. Eukaryotic genomes are packaged by the wrapping of DNA around histone octamers to form nucleosomes. Nucleosome occupan- cies together with their acetylation and methylation are important modification factors on all nuclear processes involving DNA. There have been recently many studies of mapping these modifications in DNA sequences and of relationship between them and various genetic activities, such as transcription, DNA repair, and DNA remodeling. However, most of these studies are experimental approaches. In this paper, we introduce a computational approach to both predicting and analyzing nucleosome occupancy, acetylation, and methylation areas in DNA sequences. Our method employs conditional random fields (CRFs) to discriminate between DNA areas with high and low relative occupancy, acetylation, or methylation; and rank features of DNA sequences based on their weight in the CRFs model trained from the datasets of these DNA modifications. The results from our method on the yeast genome reveal genetic area preferences of nucleosome occu- pancy, acetylation, and methylation are consistent with previous studies. Keywords: Histone proteins, acetylation, methylation, conditional random fields. 1 Introduction Eukaryotic genomes are packaged into nucleosomes that consist of 145–147 base pairs of DNA wrapped around a histone octamer [9]. The histone components of nucleosomes and their modification state (of which acetylation and methylation are the most important ones) can profoundly influence many genetic activities, including transcription [2, 4, 5, 16], DNA repair, and DNA remodeling [13]. There have been recently many studies of mapping histone occupancies to- gether with their modifications in DNA sequences and of relationship between F. Rothlauf et al. (Eds.): EvoWorkshops 2006, LNCS 3907, pp. 221–230, 2006. c Springer-Verlag Berlin Heidelberg 2006