Machine Vision and Applications (1997) 10:114–122 Machine Vision and Applications c Springer-Verlag 1997 Optic flow estimation by a Hopfield neural network using geometrical constraints G. Convertino, E. Stella, A. Branca, A. Distante Istituto Elaborazione Segnali ed Immagini - C.N.R., Via Amendola 166/5, I-70126 Bari, Italy Received: 26 December 1995 / Accepted: 20 February 1997 Abstract. Sparse optic flow maps are general enough to ob- tain useful information about camera motion. Usually, corre- spondences among features over an image sequence are esti- mated by radiometric similarity. When the camera moves un- der known conditions, global geometrical constraints can be introduced in order to obtain a more robust estimation of the optic flow. In this paper, a method is proposed for the com- putation of a robust sparse optic flow (OF) which integrates the geometrical constraints induced by camera motion to ver- ify the correspondences obtained by radiometric-similarity- based techniques. A raw OF map is estimated by matching features by correlation. The verification of the resulting cor- respondences is formulated as an optimization problem that is implemented on a Hopfield neural network (HNN). Ad- ditional constraints imposed in the energy function permit us to achieve a subpixel accuracy in the image locations of matched features. Convergence of the HNN is reached in a small enough number of iterations to make the proposed method suitable for real-time processing. It is shown that the proposed method is also suitable for identifying inde- pendently moving objects in front of a moving vehicle. Key words: Optical flow – Hopfield neural network – Fea- tures Matching – Trajectory geometric constraints 1 Introduction The estimation of reliable 2D motion information from an image sequence is an important task for many applications in which evaluation of 3D motion parameters plays a funda- mental role. An example is the estimation of the egomotion parameter in autonomous mobile robot navigation. Usually, navigation in indoor environments is performed on a planar surface (and is often called planar navigation) and the motion of the vehicle is characterized by having two planar transla- tional components and one rotational, around an axis normal to the ground. Moreover, vision-based navigation constrains the vehicle to move on trajectories whose curvature radius Correspondence to : E. Stella is not zero, in order to allow an adequate overlap of infor- mation among images acquired along the path. When the vehicle motion is only translational (along the optical axis of the camera), the 2D motion field estimated on the image sequence assumes a typical radial shape; all vectors radiate from a common point, the focus of expansion (FOE), which is a fixed point on the 2D motion field, located on the image plane at the principal point (the intersection between the op- tical axis and the image plane). If the vehicle motion has a rotational component, also, for example, following a curvi- linear path, the resulting 2D motion field is influenced by both translational and rotational components. If the transla- tional component dominates the rotational one (for example, on trajectories with a large curvature radius), the 2D motion field still has a radial shape, but the FOE is no longer located at the principal point. The image position of the FOE repre- sents the heading of the camera (Regan and Beverly 1982). To estimate the FOE location, sparse information on optic flow (OF) is enough. If the OF map is known, theoretically, two vectors permit the location of the FOE in the image, but in practice, due to the noise, reliable estimations can only be obtained by minimizing errors over many vectors. The estimation of a sparse OF is formulated here as a feature-matching problem. The matching methods consists of two steps: feature selection and feature matching. In the literature, several methods to extract features from images are reported. These methods can be classified according to the typology of the selected features: high-level and low- level techniques. Usually high-level features (such as lines, curves, closed contours) are considered more reliable for matching, but they require a burdensome preprocessing of images that represents the limiting factor for real-time appli- cations (Deriche and Faugeras 1990; Sawhney and Hanson 1993). On the other hand, the estimation of low-level features (such as areas of high variance) does not require complex preprocessing, but the correspondences-finding process can be affected by the problem of the ambiguous matching (Wu et al. 1995; Krishman and Raviv 1995). In this work, we suggest using low-level features because their selection makes the method more general (no particular structures in the scene are required). To manage the problem