Potentials and Challenges of Building Computational Models of Proteins Based on Cellular Automata Alia Madain , Abdel Latif Abu Dalhoum , Azzam Sleit Department of Computer Science King Abdulla II School for Information Technology The University of Jordan, Amman Abstract—Processes of protein creation touches different as- pects of science including, chemistry, biology, physics, math, and computer science. Moving from DNA to a functional protein in nature is a complex process. Building Models of any natural phe- nomenon requires some form of abstraction. Cellular automata are discrete models that are capable of universal computation; these models have their roots in biology and were introduced as formal models of self-reproducing organisms. The connection between cellular automata concepts and the central dogma of molecular biology was explored and studied from different angles in literature. The motivation of this review is to highlight the potential of employing cellular automata in building protein models, and to discuss the challenges of modeling proteins. There are many possible models based on cellular automata and equivalent systems that can be further studied and experimented, which provides a future direction for researchers in the field. Index Terms—Computational Models, Protein Folding, Cellu- lar Automata I. I NTRODUCTION Building computational models of proteins is a rapidly growing research field with a large number of papers published every year. The practical applications of these models are of a vital importance in the development of many fields, how- ever computational modeling of proteins is full of challenges starting from the problem representation to the measurement of the model effectiveness and accuracy. Tools and methods of computer science and particularly those used in optimization tasks were heavily used in building protein models. Some examples are: neural networks [1], optimized evidence-theoretic K-nearest neighbor classifier [2], complexity measure factor [3], moments [4], in addition to fusing multiple classifiers [5]. The reason why these models are needed is the fact that the natural processes responsible of moving information from DNA to mRNA all the way to creating a functional protein are quite complex and full of details. Building abstract meaningful computational models help in understanding certain aspects of proteins natural processes. As in the case of any form of ab- straction, the abstract models allow for further experimentation [6]. A Cellular automaton (CA) is a discrete abstract model of computation that rely on local rules. A CA can be considered as one of the oldest models of natural computing [7]. Since CA patterns can be found in nature and the concept of CA has its roots in biology where CAs were first introduced as formal models of self-reproducing organisms [8], it became natural to assume that the concept of CA can model the pro- cesses of the central dogma of molecular biology accurately. The attempts to deploy CA in modeling the process of the central dogma of molecular biology include studying the gene networks behavior [9], DNA sequence evolution [10] [11] [12], and DNA duplication [13]. This review focuses on the proteins part of the central dogma of molecular biology, presented in terms of the CA employed. The work done in this domain so far can be classified in three main categories. The first category con- tains those attempts to use the conventional elementary CAs combined with a descriptive representation of proteins. The second category includes work that combines CA with some optimization algorithm and searching power, such as the use of genetic algorithms and neural networks. The third category contains CAs specially designed to represent proteins in addition to systems proven to be equivalent to CA in terms of computational power. This review is organized as follows: Section II gives background information about proteins. Section III presents background information on cellular automata. Section IV dis- cusses modeling proteins using elementary cellular automata. Section V discusses the use of CA combined with searching algorithms. Section VI gives examples of specially designed CAs and equivalent systems used in building protein models. Section VIII discusses the potentials and challenges of mod- eling proteins based on CA, and finally, in section VIII we conclude the review. II. PROTEINS The central dogma of molecular biology states that the DNA decoding process produces a protein. DNA is transcribed into messenger RNA (mRNA), which is translated into proteins [14]. The transcription process occurs in the nucleus of the cell but no proteins are translated yet. The translation process or the protein synthesis process takes place in the cell cytoplasm, which involves the decoding of RNA by the ribosomes. International Journal of Computer Science and Information Security (IJCSIS), Vol. 14, No. 9, September 2016 1086 https://sites.google.com/site/ijcsis/ ISSN 1947-5500