Seeing Through the Window: Pre-fetching Strategies for Out-of-core Image Processing Algorithms Pinho R., Batenburg K. J. and Sijbers J. VisionLab, Physics Dept., University of Antwerp, Antwerp, Belgium ABSTRACT Scientiﬁc data ﬁles have been increasing in size during the past decades. In the medical ﬁeld, for instance, magnetic resonance imaging and computer aided tomography can yield image volumes of several gigabytes. While secondary storage (hard disks) increases in capacity and its cost per megabyte slumps over the years, primary memory (RAM) can still be a bottleneck in the processing of huge amounts of data. This represents a problem for image processing algorithms, which often need to keep in memory the original image and a copy of it to store the results. Operating systems optimize memory usage with memory paging and enhanced I/O operations. Although image processing algorithms usually work on neighbouring areas of a pixel, they follow pre-determined paths through the image and might not beneﬁt from the memory paging strategies oﬀered by the operating system, which are general purpose and unidimensional. Having the principles of locality and pre- determined traversal paths in mind, we developed an algorithm that uses multi-threaded pre-fetching of data to build a disk cache in memory. Using the concept of a window that slides over the data, we predict the next block of memory to be read according to the path followed by the algorithm and asynchronously pre-fetch such block before it is actually requested. While other out-of-core techniques reorganize the original ﬁle in order to optimize reading, we work directly on the original ﬁle. We demonstrate our approach in diﬀerent applications, each with its own traversal strategy and sliding window structure. Keywords: Out-of-core image processing, Pre-fetching. 1. INTRODUCTION A typical scenario in image processing applications is a pipeline of processing units. The ﬁrst unit receives as input the original image buﬀer, generates the result, possibly in a new buﬀer, and passes it on to the next unit. The second unit receives the new buﬀer, processes it, and hands it over to the next. This process is repeated until the last unit produces the ﬁnal image, as depicted in Figure 1. Figure 1. Image processing pipeline. Considering such a pipeline, image processing algorithms often require that the original image and one or several copies of it be kept in memory. A problem may thus arise if the processed image buﬀer is too big to ﬁt in memory. In practice, with the constant increase in the size of scientiﬁc data ﬁles, processing of large amounts of data has become commonplace. In the medical ﬁeld, for instance, magnetic resonance imaging and computer aided tomography can yield image volumes of several gigabytes. Even in an optimised processing pipeline which eliminates intermediate image copies, the usual is to have the original image and a copy of it in memory in order to store the ﬁnal result. Oﬀ-the-shelf applications 1 and implementation libraries 2, 3 commonly used in medical imaging processing and visualisation also try to allocate the entire image volume in main memory. When they do not manage to do so, alternative image representations are oﬀered, eg at lower scale, or they simply refuse to allocate the requested block and return an error to the user. E-mail: {romulo.pinho, joost.batenburg, jan.sijbers}@ua.ac.be