Monthly Archives: January 2014

Unit testing with gtest and cmake

After coding two years for Google, I got used to unit tests. Fortunately, part of the google testing framework is open source, it is called gtest. Using gtest is great, but it is usually not installed on the machine on which I want to compile my stuff. One option to address this problem would be to copy gtest sources in all my projects. That is not very satisfying.

I came up with another solution. Since I’m using cmake anyway, I wrote cmake files that would download sources and compile gtest at project compilation time. This solution avoids code duplication and limits system dependency to subversion which is rather common and easy to install on most development platforms.

If you like the simplicity of this approach, have a look at it and let me know if it works for you. Your pull requests on github are welcome if there is something wrong.

Optimizing RANSAC with SSE

When developing computer vision systems that needs to find plane projections from point-to-point correspondences, a RANdom SAmple Consensus (RANSAC) implementation is necessary. The algorithm rejects wrong correspondences (outliers) and find the geometric transformation, usually a homography, explaining inliers. The algorithm randomly picks 4 correspondences, find the corresponding homography, and count how many correspondences it explains. After a fixed number of iteration, the homography with the best support is chosen.

OpenCV offers an implementation in findHomography. It is unfortunately rather slow. A great optimization approach is to leverage SIMD instructions such as SSE or NEON. In theory, SIMD instructions could allow the processor to test 4 homographies in a single pass. However, conditional jumps are forbidden since the execution flow has to be common for the 4 tested homographies.

Computing homographies without conditional jumps

We need a jumpless way of computing a homography from 4 correspondences. Since I’m lazy, I asked maple to generate the function for me. Maple had problem inverting analytically the 8 by 8 matrix to solve the problem. It managed to compute analytically a homography sending the unit square to arbitrary points, though.

Here’s the maple code:

with(linalg);
eq := (x,y) -> <<x[1]| x[2] | 1 | 0| 0| 0| -y[1]*x[1]/1 | -y[1]*x[2]/1>, <0|0|0|x[1]|x[2]|1|- (y[2]/1)*x[1]|- (y[2]/1)*x[2]>>;
mat := < eq([0, 0],x) , eq([0,1],y), eq([1,1],z), eq([1,0],w)>;
inv := inverse(mat);
homo := simplify(evalm(inv &* <x[1],x[2],y[1],y[2],z[1],z[2],w[1],w[2]>));
homo_func := unapply([homo[1],homo[2],homo[3],homo[4],homo[5],homo[6],homo[7],homo[8]],x,y,z,w);
CodeGeneration[C](homo_func, optimize);

This little code produces a rather large function that does not use SIMD instructions at all: it uses double. To compute an arbitrary homography, I just call this function twice, invert one result, and multiply both matrix together. These function do not need conditional jumps.

Thank to the power of C++ (operator overloading in particular), replacing “double” with a SSE type simply amounts to defining a class.

Replacing double with fvec4

Compilers give access to SSE instructions through intrinsics that do not look very friendly. The following example is rather hard to read:

// declares a variable containing "1, 2, 3, 4"
__m128 a = _mm_set_ps(4, 3, 2, 1);
// Adds "7" to all entries of a
a = _mm_add_ps(a, _mm_set1_ps(7));

This version is much easier to read:

fvec4 a(1, 2, 3, 4);
fvec4 a += fvec4(7);

The fvec4 class can be implemented by declaring proper constructors and operators. The only data member is a single __m128 field. It might look like:

struct fvec4 {
    __m128 data;

    fvec4(float a) {data = _mm_set_ps1(a);}
    fvec4(float a, float b, float c, float d) { data = _mm_set_ps(a,b,c,d); }

    fvec4 operator += (const fvec4 &a) {
      this->data = _mm_add_ps(data, a.data);
      return *this;
    }
};

Once we have such a class (see my fvec4 class, used in polyora), we can simply replace “double” by “fvec4″ in the function generated by maple. The result is a function that computes 4 homographies in a single jumpless execution.

Parallel RANSAC loop

We now have everything we need to write our RANSAC loop:

// compute the homography
fvec4 H[3][3];
homography4_from_4corresp(pts1[0], pts1[1], pts1[2], pts1[3],
                          pts2[0], pts2[1], pts2[2], pts2[3],
                          H);
// evaluate support
fvec4 support(0);
for (int i=0; i<n; i++) {
  // Fetch the 4 next correspondences
  p[0] = fvec4( row(i,uv1,stride1)[0] );
  p[1] = fvec4( row(i,uv1,stride1)[1] );
  g[0] = fvec4( row(i,uv2,stride2)[0] );
  g[1] = fvec4( row(i,uv2,stride2)[1] );

  // Projects the points with the homography
  fvec4 t[2];
  homography4_transform(p, H, t);

  // Compute reprojection distances
  fvec4 d = dist2(t,g);

  // Compute support.
  support += (d < threshold) & fvec4(1 - .1 * (d / threshold));
}

And here we go ! See the full implementation here.

Results

Using SSE instructions to find homographies significantly speeds up computation. My implementation is much faster than OpenCV’s findHomography function which is a bit more more accurate, because it uses 64 bits doubles instead of 32 bits floats.