<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>opticode.ch blog &#187; SSE</title>
	<atom:link href="http://opticode.ch/blog/tag/sse/feed/" rel="self" type="application/rss+xml" />
	<link>http://opticode.ch/blog</link>
	<description>the fine Art of coding - Julien Pilet</description>
	<lastBuildDate>Mon, 02 Dec 2024 07:54:32 +0000</lastBuildDate>
	<language>en-US</language>
		<sy:updatePeriod>hourly</sy:updatePeriod>
		<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=3.9.40</generator>
	<item>
		<title>Optimizing RANSAC with SSE</title>
		<link>http://opticode.ch/blog/ransac-sse/</link>
		<comments>http://opticode.ch/blog/ransac-sse/#comments</comments>
		<pubDate>Fri, 03 Jan 2014 16:57:00 +0000</pubDate>
		<dc:creator><![CDATA[Julien Pilet]]></dc:creator>
				<category><![CDATA[optimization]]></category>
		<category><![CDATA[RANSAC]]></category>
		<category><![CDATA[SIMD]]></category>
		<category><![CDATA[SSE]]></category>

		<guid isPermaLink="false">http://opticode.ch/?p=8</guid>
		<description><![CDATA[When developing computer vision systems that needs to find plane projections from point-to-point correspondences, a RANdom SAmple Consensus (RANSAC) implementation is necessary. The algorithm rejects wrong correspondences (outliers) and find the geometric transformation, usually a homography, explaining inliers. The algorithm randomly picks 4 correspondences, find the corresponding homography, and count how many correspondences it explains. [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>When developing computer vision systems that needs to find plane projections from point-to-point correspondences, a RANdom SAmple Consensus (RANSAC) implementation is necessary. The algorithm rejects wrong correspondences (outliers) and find the geometric transformation, usually a homography, explaining inliers. The algorithm randomly picks 4 correspondences, find the corresponding homography, and count how many correspondences it explains. After a fixed number of iteration, the homography with the best support is chosen.</p>
<p>OpenCV offers an implementation in <a href="http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#findhomography">findHomography</a>. It is unfortunately rather slow. A great optimization approach is to leverage SIMD instructions such as SSE or NEON. In theory, SIMD instructions could allow the processor to test 4 homographies in a single pass. However, conditional jumps are forbidden since the execution flow has to be common for the 4 tested homographies.</p>
<h2>Computing homographies without conditional jumps</h2>
<p>We need a jumpless way of computing a homography from 4 correspondences. Since I&#8217;m lazy, I asked maple to generate the function for me. Maple had problem inverting analytically the 8 by 8 matrix to solve the problem. It managed to compute analytically a homography sending the unit square to arbitrary points, though.</p>
<p>Here&#8217;s the maple code:</p>
<pre><code>with(linalg);
eq := (x,y) -&gt; &lt;&lt;x[1]| x[2] | 1 | 0| 0| 0| -y[1]*x[1]/1 | -y[1]*x[2]/1&gt;, &lt;0|0|0|x[1]|x[2]|1|- (y[2]/1)*x[1]|- (y[2]/1)*x[2]&gt;&gt;;
mat := &lt; eq([0, 0],x) , eq([0,1],y), eq([1,1],z), eq([1,0],w)&gt;;
inv := inverse(mat);
homo := simplify(evalm(inv &amp;* &lt;x[1],x[2],y[1],y[2],z[1],z[2],w[1],w[2]&gt;));
homo_func := unapply([homo[1],homo[2],homo[3],homo[4],homo[5],homo[6],homo[7],homo[8]],x,y,z,w);
CodeGeneration[C](homo_func, optimize);
</code></pre>
<p>This little code produces a rather large function that does not use SIMD instructions at all: it uses <code>double</code>. To compute an arbitrary homography, I just call this function twice, invert one result, and multiply both matrix together. These function do not need conditional jumps.</p>
<p>Thank to the power of C++ (operator overloading in particular), replacing &#8220;double&#8221; with a SSE type simply amounts to defining a class.</p>
<h2>Replacing double with fvec4</h2>
<p>Compilers give access to SSE instructions through intrinsics that do not look very friendly. The following example is rather hard to read:</p>
<pre><code>// declares a variable containing "1, 2, 3, 4"
__m128 a = _mm_set_ps(4, 3, 2, 1);
// Adds "7" to all entries of a
a = _mm_add_ps(a, _mm_set1_ps(7));
</code></pre>
<p>This version is much easier to read:</p>
<pre><code>fvec4 a(1, 2, 3, 4);
fvec4 a += fvec4(7);
</code></pre>
<p>The fvec4 class can be implemented by declaring proper constructors and operators. The only data member is a single __m128 field. It might look like:</p>
<pre><code>struct fvec4 {
    __m128 data;

    fvec4(float a) {data = _mm_set_ps1(a);}
    fvec4(float a, float b, float c, float d) { data = _mm_set_ps(a,b,c,d); }

    fvec4 operator += (const fvec4 &amp;a) {
      this-&gt;data = _mm_add_ps(data, a.data);
      return *this;
    }
};
</code></pre>
<p>Once we have such a class (<a href="https://github.com/jpilet/polyora/blob/master/polyora/fvec4.h#L39">see my fvec4 class</a>, used in <a href="https://github.com/jpilet/polyora">polyora</a>), we can simply replace &#8220;double&#8221; by &#8220;fvec4&#8243; in the function generated by maple. The result is <a href="https://github.com/jpilet/polyora/blob/master/polyora/homography4.cpp#L43">a function that computes 4 homographies in a single jumpless execution</a>.</p>
<h2>Parallel RANSAC loop</h2>
<p>We now have everything we need to write our RANSAC loop:</p>
<pre><code>// compute the homography
fvec4 H[3][3];
homography4_from_4corresp(pts1[0], pts1[1], pts1[2], pts1[3],
                          pts2[0], pts2[1], pts2[2], pts2[3],
                          H);
// evaluate support
fvec4 support(0);
for (int i=0; i&lt;n; i++) {
  // Fetch the 4 next correspondences
  p[0] = fvec4( row(i,uv1,stride1)[0] );
  p[1] = fvec4( row(i,uv1,stride1)[1] );
  g[0] = fvec4( row(i,uv2,stride2)[0] );
  g[1] = fvec4( row(i,uv2,stride2)[1] );

  // Projects the points with the homography
  fvec4 t[2];
  homography4_transform(p, H, t);

  // Compute reprojection distances
  fvec4 d = dist2(t,g);

  // Compute support.
  support += (d &lt; threshold) &amp; fvec4(1 - .1 * (d / threshold));
}
</code></pre>
<p>And here we go ! See the full implementation <a href="https://github.com/jpilet/polyora/blob/master/polyora/homography4.cpp#L200">here</a>.</p>
<h2>Results</h2>
<p>Using SSE instructions to find homographies significantly speeds up computation. My implementation is much faster than <a href="http://docs.opencv.org/modules/calib3d/doc/camera_calibration_and_3d_reconstruction.html#findhomography">OpenCV&#8217;s findHomography</a> function which is a bit more more accurate, because it uses 64 bits doubles instead of 32 bits floats.</p>
]]></content:encoded>
			<wfw:commentRss>http://opticode.ch/blog/ransac-sse/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

<!-- Dynamic page generated in 2.387 seconds. -->
<!-- Cached page generated by WP-Super-Cache on 2025-05-19 05:27:05 -->
