PRZEGLĄD ELEKTROTECHNICZNY, ISSN 0033-2097, R. 89 NR 2b/2013 339 Paweł KAPUSTA, Michał MAJCHROWICZ, Dominik SANKOWSKI, Lidia JACKOWSKA-STRUMIŁŁO, Robert BANASIAK Politechnika Łódzka, Instytut Informatyki Stosowanej Distributed multi-node, multi-GPU, heterogeneous system for 3D image reconstruction in Electrical Capacitance Tomography – network performance and application analysis Abstract. 3D ECT provides a lot of challenging computational issues as image reconstruction requires execution of many basic operations of linear algebra, especially when the solutions are based on Finite Element Method. In order to reach real-time reconstruction a 3D ECT computational subsystem has to be able to transform capacitance data into image in fractions of seconds. By performing computations in parallel and in a distributed, heterogeneous, multi-GPU environment a significant speed-up can be achieved. Nevertheless performed tests clearly illustrate the need for developing a highly optimized distributed platform, which would mitigate existing hardware and software limitations. Streszczenie. 3D ECT zapewnia wiele złożonych problemów obliczeniowych, jako, że rekonstrukcja obrazu wymaga wykonania wielu podstawowych operacji algebry liniowej, zwłaszcza, gdy rozwiązania oparte są na Metodzie Elementów Skończonych. W celu osiągnięcia rekonstrukcji w czasie rzeczywistym system obliczeniowy musi być zdolny do przekształcania danych pomiarowych na obraz w ułamkach sekund. Poprzez wykonywanie obliczeń w sposób równoległy, z wykorzystaniem rozproszonego środowiska heterogenicznego multi-GPU można uzyskać znaczne ich przyspieszenie. Niemniej przeprowadzone badania wyraźnie pokazują potrzebę opracowania wysoce zoptymalizowanej, rozproszonej platformy, która pozwoliłaby na ominięcie istniejących ograniczeń sprzętowych i programowych. (Rozproszony, wielowęzłowy, heterogeniczny system multi-GPU do celów rekonstrukcji obrazów 3D w elektrycznej tomografii pojemnościowej – analiza wydajności sieciowej oraz zastosowania). Keywords: parallel computations, distributed computations, Electrical Capacitance Tomography, Finite Element Method. Słowa kluczowe: obliczenia równoległe, obliczenia rozproszone, elektryczna tomografia pojemnościowa, Metoda Elementów Skończonych. Introduction Electrical Capacitance Tomography is a relatively mature imaging method in industrial process tomography [4]. The ECT is performing a task of imaging of materials with a contrast in dielectric permittivity by measuring capacitance from a set of electrodes. Applications of ECT include the monitoring of oil-gas flows in pipelines, gas- solids flows in pneumatic conveying and imaging flames in combustion, gravitational flows in silo [1]. Among other non-invasive imaging techniques, ECT characterizes much higher temporal resolution than Magnetic Resonance Imaging, X-ray Computed Tomography etc. This makes ECT a good candidate for real-time imaging technique which is capable of long term monitoring on fast-varying industrial process applications. To reach this goal 3D ECT computational subsystem should be able to transform capacitance data into image in fractions of seconds, which is really hard to achieve since typically 3D ECT tomography image can be composed of large number of elements. 3D ECT provides few challenging computational issues that have been reported in the past by many researchers [1,2,3]. This is due to the fact that most of the algorithms perform a series of complex algebraic operations on two-dimensional arrays, which contain many elements. Nonlinear three-dimensional image reconstruction in 3D capacitance tomography is therefore a complex numerical problem, saturated with linear algebra transformations that cannot be efficiently performed in real- time using classic (even multi-core chips) CPU power [6]. In this paper a distributed GPGPU approach has been considered as an efficient way to obtain a significant speed- up of 3D ECT reconstruction process. By assuming, that many of the computations can be performed in parallel using modern, fast graphics processor and by altering the algorithms time to achieve high quality image reconstruction will be shortened significantly Computations on Graphic Processors GPGPU (General-Purpose computing on Graphics Processing Units) is a technique of using graphic cards (GPUs – Graphics Processing Unit), which normally handles graphics rendering, for computations that are usually handled by processors (CPUs – Central Processing Unit). Growing interest in GPU computations started with the inability to clock CPU above certain level, because of the limitations of silicon based transistor technology and constant demand for improvements. Any change in speed of sequential programs execution is now based on architecture improvements of the CPU rather than higher clocks, but even this approach has limitations. Parallel programming is not a new idea, though till only recently it was reserved for high performance clusters with many processors. This changed with the introduction of many-core processors to the mainstream market. GPUs fit well in that trend, even take it to another level. Compared to CPUs, which today have maximum of 2 to 12 cores, GPUs consist, of dozens and even hundreds of smaller, simpler cores designed for high-performance calculations. CPUs are built and designed to execute single thread no matter how unpredictable, diverse or complicated it may be, as fast as possible. For that they require additional resources such as: complicated mechanisms for predicting branches, cache memory and data prefetching. On the other hand GPUs mostly take care of data computations that are much simpler in their nature and for that reason their execution units, or cores, can be much simpler, which also mean smaller (Fig. 1). Thanks to that there can be much more of them on a single chip with numbers reaching dozens or even hundreds. This translates into much higher number of operations per second than what can be achieved on traditional CPUs. Thanks to this GPUs can run hundreds even thousands of threads at once, compared to only few on CPU. Distributed computations The local GPGPU approach can be adapted to achieve a significant gain in computational power. This solution however has a very important drawback. It is easy to increase the computation power by equipping the computer