Medical Image Segmentation by Using Reinforcement Learning Agent Mahsa Chitsaz Faculty of Computer Science and Information Technology, University of Malaya mchitsaz@perdana.um.edu.my Woo Chaw Seng Faculty of Computer Science and Information Technology, University of Malaya cswoo@um.edu.my Abstract— Image segmentation still requires improvements although there have been research work since the last few decades. This is due to some factors. Firstly, most image segmentation solution is problem- based. Secondly, medical image segmentation methods generally have restrictions because medical images have very similar gray level and texture among the interested objects. The goal of this work is to design a framework to extract simultaneously several objects of interest from Computed Tomography (CT) images. Our method does not need a large training set or priori knowledge. The learning phase is based on reinforcement learning (RL). The input image is divided into several sub-images, and each RL agent works on it to find the suitable value for each object in the image. Each state in the environment has associated defined actions, and a reward function computes reward for each action of the RL agent. Finally the valuable information is stored in a Q-Matrix, and the final result can be applied in segmentation of new similar images. The experimental results for cranial CT images demonstrated segmentation accuracy above 93%. Keywords- Biomedical image segmentation; multi-agent system; reinforcement learning system I. INTRODUCTION Image segmentation techniques have been an invaluable task in many domains. For example, quantification of tissue volumes, medical diagnosis, pathological localization, anatomical structure study, treatment planning, partial volume correction of functional imaging data, and computer integrated surgery[1]. Image segmentation separates an image into some disjoint partitions whereas the whole of partitions reconstruct the primary image. Image segmentation is still a debatable problem although there have been done many research work in the last few decades[2]. First of all, every solution for image segmentation is problem-based. Secondly, medical image segmentation methods generally have restrictions because medical images have very similar gray level and texture among the interested objects. Therefore, significant segmentation error may occur. Another difficulty may arise due to the lack of sufficient training samples. For instance, some supervised segmentation methods require training samples that are prepared by field experts. Consequently, a more universal approach in segmentation requires decreasing the level of user interaction and minimum training data set. Bearing in mind the above obstacles of medical image segmentation, our new algorithm based on reinforcement learning (RL) is proposed. Agent can learn to perform segmentation over time by systematic trial and error[3]. The reinforcement learning agent is trained by obtaining rewards or punishment based on its action in an environment. Due to the dynamic nature of RL agent, it is suitable for segmenting images with high complexity. The goal of the RL agent is to find out an optimal way to reach the best answer given some signals obtained after each action. The state and action should be defined when using RL method in medical image segmentation; state can be defined as regions within the image .Firstly, the agent takes an image and applies some values. The input image is divided into several sub-images, and each RL agent works on it to find the suitable value for each object in the image. Each state in the environment has associated with some actions; and a reward function computes reward for each action of the RL agent. Therefore, the agent tries to learn which actions can gain the highest reward. Finally the gained valuable information will be used to segment new similar images. The main purpose of this work is to segment medical images simultaneously with some different regions of interest. This is a significant advantage compared to other approaches because it can segment many objects within an image concurrently. In addition, our method does not need a large training set or priori knowledge. We present a short description of reinforcement learning in Section II. Section III gives the details of the approach and discusses algorithms used in this research. Section IV analyses the experimental results. Finally, Section V concludes our work. II. BACKGROUND A. Reinforcement-Learning Model Learning to act in ways that are rewarded is a sign of intelligence. For example, it is natural to train an elephant in circus by rewarding it when it correctly acts in reaction of a command as have been studied in experimental psychology[4]. In the standard reinforcement learning model, an agent interacts with its environment via perception and action as depicted in Figure 1. On each step of interaction the agent receives as input, i, the current state, s, of the environment; the agent then chooses an action, a, to generate an output. The action changes the state of the environment and the value of this state transition is then received by the agent through a reinforcement signal (reward/punishment), r. The agent's behavior, B, should International Conference on Digital Image Processing 978-0-7695-3565-4/09 $25.00 © 2009 IEEE DOI 10.1109/ICDIP.2009.14 216