Molecular Learning and Pattern Denoising using Markov Random Field Models Dharani Punithan 1 and Byoung-Tak Zhang 2 1 Institute of Computer Technology, Seoul National University, South Korea 2 Dept. of Computer Science and Engineering, Seoul National University, South Korea {punithan.dharani@gmail.com, btzhang@snu.ac.kr} 마르코프 랜덤 필드 모델을 활용한 분자적 학습 및 패턴 잡음 제거 다라니 부니탄, 장병탁 Abstract We propose an in silico molecular associative memory model for pattern learning, storage and denoising using Pairwise Markov Random Field (PMRF) model. Our PMRF-based molecular associative memory model extracts locally distributed features from the exposed examples, learns and stores the patterns in the molecular associative memory and denoises the given noisy patterns via DNA computation based operations. Thus, our computational molecular model demonstrates the functionalities of content-addressability of human memory. Our molecular simulation results show that the averaged mean squared error between the learned and denoised patterns are low (< 0.014) up to 30% of noise. 1. Introduction Memory is a crucial part of the learning process in both ani- mals and humans. It is the mental process of encoding, storing and retrieving. Among the different types of memories, the most useful one is associative memory (AM). AM stores data in a dis- tributed fashion and is addressed through its contents. Hence, AM is also known as a content-addressable memory (CAM). AM works by learning patterns and retrieving or reconstructing a previously learned pattern that most closely resembles the noisy patterns. Thus, it has applications in pattern matching, pattern recognition, robotics, etc. This type of memory is robust and fault-tolerant as it exhibits error-correction ability. DNA works as a “memory” to store genetic information in the cellular organism. The striking features of DNA such as self- assembly, huge information storage capacity and massive paral- lelism are similar to the brain. Hence, associative memory can be realized on a molecular level, which can be vaster than human brain. Some recent studies [1] show that molecular systems can exhibit brain-like cognition. In our in silico molecular simulation, we demonstrate the potential of molecular associative memory us- ing the popular image processing tool PMRF. To our knowledge, our work is the first to denoise patterns using molecular algorithms. This work is based on [2] and also an extension of our previ- ous works [3, 4, 5]. In our previous works, we used mutation in the learning phase, which we avoid in this work so as to match the conventional DNA computing based bio-algorithms. In addition to the recall functionality as in our previous works, here we propose molecular methods to denoise the noisy patterns iteratively using PMRF model. To summarize, the tasks of our proposed molecular associative memory model are 1) to learn and store a set of patterns (digits from 0 to 9) when exposed to MNIST training examples and 2) to denoise the noisy patterns iteratively. We combine DNA- based bio-molecular operations such as hybridization, melting, and amplification with PMRF model to demonstrate these functional- ities. We use PMRF formulations, but the involved computations are based on hybridization reactions. We mainly take advantage of the hybridization operations to implement the proposed molecular content-addressable memory. The detailed methods and results of this work are available in [6]. 2. Methods 2.1 Molecular Memory and Encoding Molecular memory is modeled as a set of m two-dimensional weighted graphs M = {G m =(V m , C m ,W m )}, each of size N × N , where m represents the number of binary patterns to be learned (digits from 0 to 9), V m is a set of all nodes representing pixels {x m ij } of the m th pattern, i and j represent row and column indices of the pixel location, C m is the set of all unary (first-order) and pairwise (second-order) cliques in the second-order (8-point) neighborhood system of the m th pattern and W m represents the weights of the nodes of the m th pattern. We set N = 28; as each MNIST example is of size 28 × 28. In our previous works [3, 4], all pixels of all patterns in the mem- ory were initially black. On training, we extracted the information from the training examples and the molecular memories were mu- tated with respect to the foreground pixels of the training images. In this work, we avoid mutation to match the DNA computing based bio-algorithms. Hence, we create all possible unary and pairwise cliques in the initial memory. For each pixel location (row and col- umn indices), we create both black (background pixel) and white (foreground pixel) DNA molecules, as we learn binary patterns. Then, for each pixel location and pixel color, we create all possi- ble unary and pairwise cliques. We construct m such bags of DNA single-strands. Each single-strand represents either a unary (pixel) or a pairwise clique. We form the molecules from the four-letter DNA alphabet A, T , G, C . For example, a pixel (node) informa- tion – location (row and column indices) and color (black or white) – is encoded into a DNA sequence as ‘GTGGTTA’; ‘GTG’ (first three bases) represent row index (i) of a pixel, ‘GTT’ (next three bases) represent column index (j ) of that pixel and ‘A’ (last base) represents the color of that binary pixel. We combine two such se- quences to form the pairwise cliques. We then re-encode the character-based DNA sequence into a 2 × n matrix, where n is the number of bases of the DNA se- quence. Each DNA base is re-encoded into a vector : A as [1, 0] T , T as [-1, 0] T , G as [0, 1] T , and C as [0, -1] T . This initial molecu- lar memory is trained on the MNIST examples to memorize pat- terns (digits from 0 to 9). During learning, the weights of the 812 2020년 한국컴퓨터종합학술대회 논문집