ISSN 1054-6618, Pattern Recognition and Image Analysis, 2007, Vol. 17, No. 3, pp. 390–398. © Pleiades Publishing, Ltd., 2007.
Partial Evaluation Technique
for Distributed Image Processing
A. Tchernykh
a
, A. Cristóbal-Salas
b
, V. Kober
a
, and I. A. Ovseevich
c
a
Computer Science Department, CICESE Research Center Ensenada, BC, Mexico 22830
b
School of Chemistry Sciences and Engineering, University of Baja California, Tijuana, B.C. Mexico 22390
c
Institute for Information Transmissions Problems, Russian Academy of Sciences, Bol’shoi Karetnyi 19, Moscow, Russia
e-mail: chernykh@cicese.mx; cristobal@uabc.mx; vitally@iitp.ru; ovseev@iitp.ru
Abstract—In this paper, a partial evaluation technique to reduce communication costs of distributed image pro-
cessing is presented. It combines application of incomplete structures and partial evaluation together with clas-
sical program optimization such as constant-propagation, loop unrolling and dead-code elimination. Through
a detailed performance analysis, we establish conditions under which the technique is beneficial.
DOI: 10.1134/S1054661807030054
Received January 9, 2007
1. INTRODUCTION
Typical image processing tasks require a large
amount of processing power, larger than can be
achieved by current state-of-the-art workstations. Par-
allel processing using distributed systems appears to be
the only solution to obtain sufficient processing power
for handling all image processing levels. In spite of the
fact that the exchange of information remains a critical
bottleneck in such applications with inherent high com-
munication costs, the message passing paradigm has
significantly increased in popularity with the prolifera-
tion of clusters and GRID technology. It appears that
the optimization of parallel programs is as equally
demanding as the design of a parallel algorithm itself.
Communication cost depends on many factors such as
memory manipulation overheads (message prepara-
tion, message interpretation), network communication
delays, etc. Reducing this cost is vital to achieve good
performance. There are several strategies to minimize
it, for instance, by computation and communication
overlapping, network topology optimization, band-
width increasing, reduction of number of messages
(message coalescing, caching messages), messages
pipelining, etc.
High performance of the fast Fourier transform
(FFT) and wavelet transforms is a key issue in many
image applications [1]. There have been several
attempts to parallelize and optimize FFT. For example,
an algorithm suitable for 64-processor nCUBE 3200
hypercube multicomputer was presented in [2] where a
speedup of up to 16 with 64 processors was demon-
strated. The FFT implementations for hypercube multi-
computers and vector multiprocessors were discussed
in [3]. A parallel FFT algorithm for 64-processor Intel
iPSC was described in [4]. Techniques for paralleliza-
tion of the multidimensional hypercomplex FFT are
considered in [5]. The FFT implementations based on
incomplete data structures were presented in [6].
Speedup factors between 1.83 and 5.05 were archived
for shared memory computers if the size of the input N
is available. Good speedup is achieved despite the sig-
nificant growth (N
log2
N) of the code size.
Nevertheless, in spite of the reduction in complexity
and time, FFT and wavelet transforms remain expen-
sive mainly for distributed memory parallel computers
where network latency significantly affects the perfor-
mance of the algorithm. In most cases, the speedup
gained by parallelization is limited due to inter-process
communication. That is why programming for distrib-
uted architectures has been somewhat restricted to reg-
ular, coarse-grained, and computation-intensive appli-
cations. FFT exploits fine grain parallelism, which
means that an improvement at the communication level
plays an extremely important role.
The aim of this work is to demonstrate that the per-
formance benefits of incomplete information process-
ing and partial evaluation can be brought to distributed
image processing in a high level manner that is trans-
parent to users. To this end, partial evaluation is used
not only to remove much of the excess overhead of
sequential and shared memory parallel computation,
but also to distributed application to reduce the number
of messages when some information about the image or
part of the image is known. The work demonstrates that
good parallel speedups are attainable using MPI and
can be integrated into existing distributed environment.
1D-Fast Fourier and 2D Haar wavelet transforms’ opti-
mization is presented.
The rest of the paper is organized as follows. In Sec-
tion 2 a general description of the proposed optimiza-
tion technique is presented. Partial evaluation, incom-
plete data structures and incomplete data structures
IMAGE PROCESSING, ANALYSIS,
RECOGNITION, AND UNDERSTANDING