Tumor Segmentation with Multi-Modality Image in Conditional Random Field Framework with Logistic Regression Models Yu-chi Hu 1 , Michael Grossberg 2 , and Gig Mageras 3 Abstract—We have developed a semi-automatic method for multi-modality image segmentation aimed at reducing the manual process time via machine learning while preserving human guidance. Rather than reliance on heuristics, human oversight and expert training from images is incorporated into logistic regression models. The latter serve to estimate the prob- ability of tissue class assignment for each voxel as well as the probability of tissue boundary occurring between neighboring voxels given the multi-modal image intensities. The regression models provide parameters for a Conditional Random Field (CRF) framework that defines an energy function with the regional and boundary probabilistic terms. Using this CRF, a max-flow/min-cut algorithm is used to segment other slices in the 3D image set automatically with options of addition user input. We apply this approach to segment visible tumors in multi-modal medical volumetric images. I. I NTRODUCTION A key requirement in high-precision radiotherapy is the accurate spatial delineation of the tumor and the critical normal organs abutting it. For this purpose, it is crucial to use images from multiple modalities. Computed tomography (CT) images provide high resolution images of both soft tissues and bony structures. Relative to CT, magnetic reso- nance imaging (MRI) provides improved soft tissue contrast in many anatomical sites. Positron emission tomography (PET) reveals functional data, though with much poorer resolution than CT or MRI. Figure 1(a) shows images from these modalities for the same patient. The complementary information provided by these images is heavily relied upon by the radiation oncologists in defining the tumor volume and designing an optimized treatment plan. However, there are challenges in how to combine these images in treatment planning. For example, it is well known that large variability exists in target-delineation by different doctors using a single modality image [4]. In addition, it has been reported [22] that significantly different tumor volumes can be delineated by the same observer on images from different modalities. Given the above, it would be desirable to develop a computer-aided multiple modality image segmentation tool that incorporates both expert-user- input and machine-learning capabilities. We hypothesize that 1 Y.-C. Hu is with the Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA and the Department of Computer Science, The Graduate Center, City University of New York, 365 Fifth Avenue, New York, NY 10016, USA huj at mskcc.org 2 M. Grossberg is with the Department of Computer Science, City College of New York, 160 Convent Avenue, New York NY 10031, USA grossberg at cs.ccny.cuny.edu 1 G.. Mageras is with the Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA magerasg at mskcc.org Fig. 1. (a) An image slice from co-registration of 3 modalities: CT, MRI and PET, showing greatly different image characteristics. CT virtually provides no distinguishable information on the tumor. The red contour was manually drawn by an expert on MRI only. The green contour was manually drawn by an expert on PET only. The blue contour was drawn by an export probably referring to both MRI and PET. The variation of the tumor contours is considerable. (b) Our method obtained training on one of the image slice in the same volume with all 3 modalities(left). The trained models were used to segment the same image slice shown on row (a). The resulting segmentation is the yellow contour on MRI and PET to show how our method utilizes information across the modalities by learning from the user’s inputs. such an approach will reduce both inter-modality and inter- observer variability. We have developed a semi-automatic statistical framework for multi-modality image segmentation. Our work is based on Conditional Random Fields (CRF) [14] and energy min- imization with a max flow/min cut algorithm. We define purely probabilistic regional and boundary terms in an energy function. The terms are based on logistic regression models statistically estimated. The parameters of the models are learned online from a training multi-modal image with the inputs of the expert user. The other image slices in the same data set then are segmented using the same parameters of the CRF framework without the need of user interactions on every image slice. Optional user input on further slices allows for correction of the segmentation as well as refinement of the CRF framework parameters. An overview of our seg- mentation method is shown in Figure 2. The novelty of the present work is exploring the use of CRF in simultaneously segmenting images from multi-modalities, and the use of regression models, trained with human supervision, for both the regional and boundary properties in context of tumor