Feature Correspondence: 
A Markov Chain Monte Carlo Approach 
Frank Dellaerr, Steven M. Seitz, Sebastian Thrun, and Charles Thorpe 
Department of Computer Science &;Robotics Institute 
Carnegie Mellon University 
Pittsburgh, PA 15213 
{dellaert,seitz,thrun,cet}@cs.cmu.edu 
Abstract 
When trying to recover 3D structure from a set of images, the 
most difficult problem is establishing the correspondence between 
the measurements. Most existing approaches assume that features 
can be tracked across frames, whereas methods that exploit rigidity 
constraints to facilitate matching do so only under restricted cam- 
era motion. In this paper we propose a Bayesian approach that 
avoids the brittleness associated with singling out one "best" cor- 
respondence, and instead consider the distribution over all possible 
correspondences. We treat both a fully Bayesian approach that 
yields a posterior distribution, and a MAP approach that makes 
use of EM to maximize this posterior. We show how Markov chain 
Monte Carlo methods can be used to implement these techniques 
in practice, and present experimental results on real data. 
I Introduction 
Structure from motion (SFM) addresses the problem of simultaneously recovering 
camera pose and a three-dimensional model from a collection of images. This prob- 
lem has received considerable attention in the computer vision community [1, 2, 3]. 
Methods that can robustly reconstruct the 3D structure of environments have a 
potentially large impact in many areas of societal importance, such as architecture, 
entertainment, space exploration and mobile robotics. 
A fundamental problem in SFM is data association, i.e., the question of determin- 
ing correspondence between features observed in different images. This problem has 
been referred to as the most difficult part of structure recovery [4], and is particu- 
larly challenging if the images have been taken from widely separated viewpoints. 
Virtually all existing approaches assume that either the correspondence is known a 
priori, or that features can be tracked from frame to frame [1, 2]. Methods based 
on the robust recovery of epipolar geometry [3, 4] can cope with larger inter-frame 
displacements, but still depend on the ability to identify a set of initial correspon- 
dences to seed the robust matching process. In this paper, we are interested in 
cases where individual camera images are recorded from vastly different viewpoints, 
which renders existing SFM approaches inapplicable. Traditional approaches for 
establishing correspondence between sets of 2D features [5, 6, 7] are of limited use 
in this domain, as the projected 3D structure can look very different in each image. 
This paper proposes a Bayesian approach to data association. Instead of considering 
a single correspondence only (which we conjecture to be brittle), our approach 
considers whole distributions over correspondences. As a result, our approach is 
more robust, and from a Bayesian perspective it is also sound. Unfortunately, no 
closed-form solution exists for calculating these distributions conditioned on the 
camera images. Therefore, we propose to use the Metropolis-Hastings algorithm, a 
popular Markov chain Monte Carlo (MCMC) method, to sample from the posterior. 
In particular, we propose two different algorithms. The first method, discussed in 
Section 2, is mathematically more powerful but computationally expensive. It uses 
MCMC to sample from the joint distribution over both correspondences and three- 
dimensional scene structure. While this approach is mathematically elegant from a 
Bayesian point of view, we have so far only been able to obtain results for simple, 
artificial domains. Thus, to cope with large-scale data sets, we propose in Section 
3 a maximum a posterJori (MAP) approach using the Expectation-Maximization 
(EM) algorithm to maximize the posterior. Here we use MCMC sampling only for 
the data association problem. Simulated annealing is used to reduce the danger of 
getting stuck in local minima. Experimental results obtained in realistic domains 
and presented in Section 4 suggest that this approach works well in the general 
SFM case, and that it scales favorably to complex computer vision problems. 
The idea of using MCMC for data association has been used before by [8] in the 
context of a traffic surveillance application. However, their approach is not directly 
applicable to SFM, as the computer vision domain is characterized by a large number 
of local minima. Our paper goes beyond theirs in two important aspects: First, we 
develop a framework for MCMC sampling over both the data association and the 
model, and second, we apply annealing to smooth the posterior so as to reduce the 
chance to get stuck in local minima. In a previous paper [9] we have discussed the 
idea of using EM for SFM, but without the unifying framework presented below. 
2 A Fully Bayesian Approach using MCMC 
Below we derive the general approach for MCMC sampling h'om the joint posterior 
over data association and models. We only show results for a simple example from 
pose estimation, as this approach is computationally very demanding. An EM 
approach based on the general principles described here, but applicable to larger- 
scale problems, will be described in the next section. 
2.1 Structure from Motion 
The structure from motion problem is this: given a set of images of a scene, taken 
from different viewpoints, recover the 3D structure of the scene along with the cam- 
era parameters. In the feature-based approach to SFM, we consider the situation in 
which a set of N 3D features xj is viewed by a set of rn cameras with parameters ni. 
As input data we are given the set of 2D measurements ui in the images, where 
k C {1..Ki} and Ki is the number of measurements in the i-th image. To model 
correspondence information, we introduce for each measurement ui the indicator 
variable ji, indicating that ui is a measurement of the ji-th feature xj. 
The choice of feature type and camera model determines the measurement function 
h(mi,xj), predicting the measurement ui given mi and xj (with j = 
ui: h(ni, Xj) q- n 
where n is the measurement noise. Without loss of generality, let us consider the 
case in which the features xj are 3D points and the measurements ui are points in 
the 2D image. In this case the measurement function can be written as a 3D rigid 
displacement followed by a projection: 
(1) 
where li and ti are the rotation matrix and translation of the i-th camera, respec- 
tively, and 4}  5 a -- 5 ' is the camera projection model. 
2.2 Deriving the Posterior 
Whereas previous methods single out a single "best" correspondence across images, 
in a Bayesian framework we are interested in characterizing our knowledge about the 
unknowns conditioned on the data otly, averaging over all possible correspondences. 
Thus, we are interested in the posterior distribution P(OIU), where 0 collects the 
unknown model parameters ni and xj. In the case of unknown correspondence, we 
need to sum over all possible assignments J: {j/h} to obtain 
P(OlU) -  P(J, OlU ) c P(O) y] P(UIJ, O)P(J[O) (2) 
where we have applied Bayes law and the chain rule. Let us assume for now that 
there are no occlusions or spurious measurements, so that Ii'i - N and J is a set of 
m permutations Ji of the indices 1..N. Then, assuming i.i.d. normally distributed 
noise on the measurements, each term in (2) can be calculated using 
rr K 
r(JI0) - ()" P(UIJ, 0) - I-[i: I-[,: A;(ui; h(mi, xj), c) (3) 
if each Ji is a permutation, and 0 otherwise. Here A/(.;/, or) denotes the normal 
distribution with mean/ and standard deviation or. The first identity in (3) holds 
if we assume each of the N! possible permutations to be equally likely a priori. 
2.3 Sampling from the Posterior using MCMC 
Unfortunately, direct computation of the total posterior distribution P(0[U) in 
(2) is intractable in general, because the number of correspondence assignments 
J is combinatorial in the number of features and images. As a solution to this 
computational challenge we propose to instead sample' from P(0[U). Sampling 
directly from P(0[U) is equally difficult, but if we can obtain a sample 
from the joint distribution P(0, J[U), we can simply discard the correspondence 
part J() to obtain a sample {0)} from the marginal distribution P(0[U). 
To sample fi'om the joint distribution P(O, J[U) we propose to use MCMC sam- 
pling, in particular the Metropolis-Hastings algorithm [10]. This method involves 
simulating a Markov chain whose equilibrium distribution is the desired posterior 
distribution P(O,J[U). Defining X _A (J,O), the algorithm is: 
1. Start with a random initial state X . 
2. Propose a new state X  using a chosen proposal density Q(X; X(0). 
- P(X'lU) Q(x);x') (4) 
p(x) Iu ) Q(x'; x)) 
4. Accept X  as X +) with probability min(a, 1), otherwise X +) = X . 
3. Compute the ratio 
Figure 1' Left: A 2D model shape, defined by the 6 feature points x 3. Right: Transformed 
shape (by a simple rotation) and 6 noisy measurements uk of the transformed features. 
The true rotation is 70 degrees, noise is zero-mean Gaussian. 
The sequence of tuples (0(), J)) thus generated will be a sample h'om P(0, JlU), if 
the sampler is run sufficiently long. To calculate the acceptance ratio a, we assume 
that the noise on the feature measurements is normally distributed and isotropic. 
Using Bayes law and eq. (3), we can then rewrite a from (4) as 
m K ! 
I-[i= I-[= 2V(ui; h(mi, xj), or) Q(X(); X ') 
.h(m) x?)),cr)Q(x';x ) 
, , 
Simplifying the notation by defining h? - h(m? ) - )' 
, j, , we obtain 
_Q(x);X)exp [ 1 _h()- -  ] 
a Q(X,.X) ) 2([[ui [[2 [[ui hi)[[ 2) (5) 
' i,k 
The proposal density O(.; .) is application dependent, and an example is given below. 
2.4 Example: A 2D Pose Estimation Problem 
To illustrate this method, we present a simple example h'om pose estimation. As- 
sume we have a 2D model shape, given in the form of a set of 2D points xi, as shown 
in Figure 1. We observe an image of this shape which has undergone a rotation 
0 to be estimated. This rotated shape is shown at right in the figure, along with 
6 noisy measurements u on the feature points. In Figure 2 at left we show the 
posterior distribution over the rotation parameter, given the measurements h'om 
Figure 1 and with known correspondence. In this case, the posterior is unimodal. 
In the case of unknown correspondence, the posterior conditioned on the data alone 
is shown at right in Figure 2 and is a mixture of 6!=720 functions of the form (3), 
with 6 equally likely modes induced by the symmetry of the model shape. 
In order to perform MCMC sampling, we implement the proposal step by choosing 
randomly between two strategies. (a) In a %mall perturbation" we keep the corre- 
spondence assignment J but add a small amount of noise to 0. This serves to explore 
the values of 0 within a mode of the posterior probability. (b) In a "long jump", we 
completely randomize both 0 and J. This provides a way to jump between proba- 
bility modes. Note that Q(X); X)/Q(X; X ) = 1 for this proposal density. The 
result of the sampling procedure is shown as a histogram of the rotation parameter 
0 in Figure 3. The histogram is a non-parametric approximation to the analytic 
posterior shown in Figure 2. The figure shows the results of running a sampler 
for 100,000 steps, the first 1000 of which were discarded as a transient. Note that 
even for this simple example, there is still considerable correlation in the sample 
Figure 2: (Left) The posterior distribution over rotation t with known correspondence, 
and (Right) with unknown correspondence, a mixture with 720 components. 
0.7 
0.5 
0.4 
Figure 3: Histogram for the values of t obtained in one MCMC run, for the situation in 
Figure 1. The MCMC sampler was run for 100,000 steps. 
of 100,000 states as evidenced by the uneven mass in each of the 6 analytically 
predicted modes. 
3 Maximum a Posteriori Estimation using MCEM 
As illustrated above, sampling from the joint probability over assignments J and 
parameters 0 using MCMC can be very expensive. However, if only a maz'imum a 
posterJori (MAP) estimate is needed, sampling over the joint space can be avoided 
by means of the EM algorithm. To obtain the MAP estimate, we need to maximize 
P(0IU ) as given by (2). This is intractable in general because of the combinatorial 
number of terms. The EM algorithm provides a tractable alternative to maximizing 
P(0IU), using the correspondence J as a hidden variable [11]. It iterates over: 
E-step: Calculate the expected log-posterior Qt((9): 
q*(o) Eo,{logP(01u, J)lU}- P(JIU, O*)logP(Olu, J) (G) 
J 
where the expectation is taken with respect to the posterior distribution P(JIU, 0 t) 
over all possible correspondence assignments J given the measurement data U and 
a current guess 0 t for the parameters. 
M-step: Re-estimate 0 t+ by maximizing Qt(O), i.e., 0 t+ = argmax o Qt() 
Instead of calculating Qt (O) exactly using (6), which again involves summing over a 
combinatorial number of terms, we can replace it by a Monte Carlo approximation: 
1 R 
Qt(o)   ylogP(01u, J()) (7) 
where {J)} is a sample h'om P(JIU, 0 t) obtained by MCMC sampling. Formally 
this can be justified in the context of a Monte Carlo EM or MCEM, a version 
Figure 4: Three out of 11 cube images. Although the images were originally taken as a 
sequence in time, the ordering of the images is irrelevant to our method. 
t=O c=O.O 
t=l (=25.1 t=10 (=18.7 t=20 (=13.5 t=100 (=1.0 
Figure 5: Starting from random structure (t=0) we recover gross 3D structure in the very 
first iteration (t=l). As the annealing parameter rr is gradually decreased, successively 
finer details are resolved (iterations 1,10,20, and 100 are shown). 
of the EM algorithm where the E-step is executed by a Monte-Carlo process [11]. 
The sampling proceeds as in the previous section, using the Metropolis-Hastings 
algorithm, but now with a fixed parameter 0: 0 t. Note that at each iteration the 
estimate 0 t changes and we sample from a different posterior distribution P(J I U, or). 
In practice it is important to add annealing to this basic EM scheme, to avoid 
getting stuck in local minima. In simulated annealing we artificially increase the 
noise parameter cr for the early iterations, gradually decreasing it to its correct value. 
This has two beneficial consequences. First, the posterior distribution P(JI U, 0 t) 
is less peaked when cr is high, allowing the MCMC sampler to explore the space of 
assignments J more easily. Second, the expected log-posterior Qt(O) is smoother 
and has fewer local maxima for higher values of or. 
4 Results 
To validate our approach we have conducted a number of experiments, one of which 
is presented here. The input data in this experiment consisted of 55 manually 
selected measurements in each of 11 input images, three of which are shown in 
Figure 4. Note that features are not tracked h'om frame to frame and the images 
can be presented in arbitrary order. To initialize the 11 cameras nil are all placed 
at the origin, looking towards the 55 model points xj, who themselves are normally 
distributed at unit distance h'om the cameras. We used an orthographic projection 
model. The EM algorithm was run for 100 iterations, and the sampler for 10000 
steps per image. For this data set the algorithm took about a minute to complete 
on a standard PC. 
The algorithm converges consistently and fast to an estimate for the structure and 
motion where the correct correspondence is the most probable one, and where all 
assignments in the different images agree with each other. A typical run of the 
algorithm is shown in Figure 5, where we have shown a wireframe model of the 
recovered structure at several points during the run. There are two important 
points to note: (a) the gross structure is recovered in the very first iteration, starting 
from random initial structure, and (b) finer details of the structure are gradually 
resolved as the annealing parameter cr is decreased. The estimate for the structure 
after convergence is almost identical to the one found by the factorization method 
[1] when this is provided with the correct correspondence. 
5 Conclusions and Future Directions 
In this paper we presented a theoretically sound method to deal with ambiguous 
feature correspondence, and have shown how Markov chain Monte Carlo sampling 
can be used to obtain practical algorithms. We have detailed this for two cases: (1) 
obtaining a posterior distribution over the parameters 0, and (2) obtaining a MAP 
estimate by means of EM. In future work, we would like to apply these methods 
in other domains where data association plays a central role. In particular, in 
the highly active area of mobile robot mapping, the data association problem is 
currently a major obstacle to building large-scale maps [12, 13]. We conjecture that 
our approach is equally applicable to the robotic mapping problem, and can lead 
to qualitatively new solutions in that domain. 
References 
[1] C. Tomasi and T. Kanade. Shape and motion from image streams under orthography: 
a factorization method. Int. J. of Computer Vision, 9(2):137-154, Nov. 1992. 
[2] R.I. Hartley. Euclidean reconstruction from uncalibrated views. In Application of 
Invariance in Computer Vision, pages 237-256, 1994. 
[3] P.A. Beardsley, P.H.S. Torr, and A. Zisserman. 3D model acquisition from extended 
image sequences. In Eur. Conf. on Computer Vision (ECCV), pages II:683-695, 1996. 
[4] P. Torr, A. Fitzgibbon, and A. Zisserman. Maintaining multiple motion model hy- 
potheses over many views to recover matching and structure. In Int. Conf. on Com- 
puter Vision (ICCV), pages 485-491, 1998. 
[5] G.L. Scott and H.C. Longuet-Higgins. An algorithm for associating the features of 
two images. Proceedings of Royal Society of London, B-244:21-26, 1991. 
[6] L.S. Shapiro and J.M. Brady. Feature-based correspondence: An eigenvector ap- 
proach. Image and Vision Computing, 10(5):283-288, June 1992. 
[7] S. Gold, A. Rangarajan, C. Lu, S. Pappu, and E. Mjolsness. New algorithms for 2D 
and 3D point matching. Pattern Recognition, 31(8):1019-1031, 1998. 
[8] H. Pasula, S. Russell, M. Ostland, and Y. Ritov. Tracking many objects with many 
sensors. In Int. Joint Conf. on Artificial Intelligence (IJCAI), Stockholm, 1999. 
[9] F. Dellaerr, S.M. Seitz, C.E. Thorpe, and S. Thrun. Structure from motion with- 
out correspondence. In IEEE Conf. on Computer Vision and Pattern Recognition 
(CVPR), June 2000. 
[10] W.R. Gilks, S. Richardson, and D.J. Spiegelhalter, editors. Markov chain Monte 
Carlo in practice. Chapman and Hall, 1996. 
[11] M.A. Tanner. Tools for Statistical Inference. Springer, 1996. 
[12] J.J. Leonard and H.J.S. Feder. A computationally efficient method for large-scale 
concurrent mapping and localization. In Proceedings of the Ninth International Sym- 
posium on Robotics Research, Salt Lake City, Utah, 1999. 
[13] J.A. Castellanos and J.D. Tard6s. Mobile Robot Localization and Map Building: A 
Multisensor Fusion Approach. Kluwer Academic Publishers, Boston, MA, 2000. 
