Knowing this, I organize my sources in a particular way. A class is usually declared in a .h file and implemented in a cpp file. Well, for almost each .h/.cpp pair I define a library and list in the CMakeLists.txt the dependencies directly used by the cpp file.

Such an approach as huge advantages. First, the code base can scale easily, with many utility classes and executables that link only what is required. For example, each h/cpp file pair should ideally have a unit test. This test probably needs to be linked with only a small fraction of the code base. Relying an CMake transitive dependencies is a great way of selecting only what is required and all that is required.

There is another advantage. Imagine a library L that depends on boost. Of course, all users of L must link boost. Later in the life of the project, L is refactored to rely on c++11 features and it does not need boost anymore. Well, simply changing the dependency list of L will remove boost from everything that does not need it anymore. How convenient!

Of course, this scheme blows up the number of libraries, but this is not a problem. Even visual studio can manage workspaces with hundreds of projects quite well.

CMakeLists.txt example:

```
cmake_minimum_required(VERSION 2.8)
add_library(logging logging.h logging.cpp)
cxx_test(loggingTest loggingTest.cpp logging gtest_main)
```

]]>I came up with another solution. Since I’m using cmake anyway, I wrote cmake files that would download sources and compile gtest *at project compilation time*. This solution avoids code duplication and limits system dependency to subversion which is rather common and easy to install on most development platforms.

If you like the simplicity of this approach, have a look at it and let me know if it works for you. Your pull requests on github are welcome if there is something wrong.

]]>OpenCV offers an implementation in findHomography. It is unfortunately rather slow. A great optimization approach is to leverage SIMD instructions such as SSE or NEON. In theory, SIMD instructions could allow the processor to test 4 homographies in a single pass. However, conditional jumps are forbidden since the execution flow has to be common for the 4 tested homographies.

We need a jumpless way of computing a homography from 4 correspondences. Since I’m lazy, I asked maple to generate the function for me. Maple had problem inverting analytically the 8 by 8 matrix to solve the problem. It managed to compute analytically a homography sending the unit square to arbitrary points, though.

Here’s the maple code:

```
with(linalg);
eq := (x,y) -> <<x[1]| x[2] | 1 | 0| 0| 0| -y[1]*x[1]/1 | -y[1]*x[2]/1>, <0|0|0|x[1]|x[2]|1|- (y[2]/1)*x[1]|- (y[2]/1)*x[2]>>;
mat := < eq([0, 0],x) , eq([0,1],y), eq([1,1],z), eq([1,0],w)>;
inv := inverse(mat);
homo := simplify(evalm(inv &* <x[1],x[2],y[1],y[2],z[1],z[2],w[1],w[2]>));
homo_func := unapply([homo[1],homo[2],homo[3],homo[4],homo[5],homo[6],homo[7],homo[8]],x,y,z,w);
CodeGeneration[C](homo_func, optimize);
```

This little code produces a rather large function that does not use SIMD instructions at all: it uses `double`

. To compute an arbitrary homography, I just call this function twice, invert one result, and multiply both matrix together. These function do not need conditional jumps.

Thank to the power of C++ (operator overloading in particular), replacing “double” with a SSE type simply amounts to defining a class.

Compilers give access to SSE instructions through intrinsics that do not look very friendly. The following example is rather hard to read:

```
// declares a variable containing "1, 2, 3, 4"
__m128 a = _mm_set_ps(4, 3, 2, 1);
// Adds "7" to all entries of a
a = _mm_add_ps(a, _mm_set1_ps(7));
```

This version is much easier to read:

```
fvec4 a(1, 2, 3, 4);
fvec4 a += fvec4(7);
```

The fvec4 class can be implemented by declaring proper constructors and operators. The only data member is a single __m128 field. It might look like:

```
struct fvec4 {
__m128 data;
fvec4(float a) {data = _mm_set_ps1(a);}
fvec4(float a, float b, float c, float d) { data = _mm_set_ps(a,b,c,d); }
fvec4 operator += (const fvec4 &a) {
this->data = _mm_add_ps(data, a.data);
return *this;
}
};
```

Once we have such a class (see my fvec4 class, used in polyora), we can simply replace “double” by “fvec4″ in the function generated by maple. The result is a function that computes 4 homographies in a single jumpless execution.

We now have everything we need to write our RANSAC loop:

```
// compute the homography
fvec4 H[3][3];
homography4_from_4corresp(pts1[0], pts1[1], pts1[2], pts1[3],
pts2[0], pts2[1], pts2[2], pts2[3],
H);
// evaluate support
fvec4 support(0);
for (int i=0; i<n; i++) {
// Fetch the 4 next correspondences
p[0] = fvec4( row(i,uv1,stride1)[0] );
p[1] = fvec4( row(i,uv1,stride1)[1] );
g[0] = fvec4( row(i,uv2,stride2)[0] );
g[1] = fvec4( row(i,uv2,stride2)[1] );
// Projects the points with the homography
fvec4 t[2];
homography4_transform(p, H, t);
// Compute reprojection distances
fvec4 d = dist2(t,g);
// Compute support.
support += (d < threshold) & fvec4(1 - .1 * (d / threshold));
}
```

And here we go ! See the full implementation here.

Using SSE instructions to find homographies significantly speeds up computation. My implementation is much faster than OpenCV’s findHomography function which is a bit more more accurate, because it uses 64 bits doubles instead of 32 bits floats.

]]>