Jekyll2023-06-09T22:24:09+00:00https://ktkaufman03.github.io/feed.xmlKai Kaufman’s tech blogThe ramblings of a cybersecurity student and software (reverse) engineer.Kai Kaufman[email protected]Bringing runtime checks to compile time in Rust2023-04-20T02:30:00+00:002023-04-20T02:30:00+00:00https://ktkaufman03.github.io/blog/2023/04/20/rust-compile-time-checks<h2 id="introduction">Introduction</h2> <p>For the past couple of months, I’ve been participating in the <a href="https://ectf.mitre.org">MITRE Embedded Capture the Flag</a>, or <strong>eCTF</strong> for short, with a team of my <a href="https://wpi.edu">university</a> peers. As the name suggests, the eCTF involves writing secure firmware for a <a href="https://www.ti.com/product/TM4C123GH6PM">Tiva C microcontroller</a>, which presents some interesting challenges. Many teams chose to use C, likely because that’s what’s taught in most (if not all) university-level embedded systems courses. We wanted to live less dangerously, however, and we opted to use <a href="https://rust-lang.org">Rust</a> instead.</p> <p>Using Rust was more or less a no-brainer for us, because of the language’s strong memory-safety guarantees, excellent library ecosystem and developer experience, and… all the other good things about Rust. This isn’t an article about why Rust is the best thing ever - instead, we’re going to look at some <strong>real examples</strong> (from my team’s eCTF work) to see how Rust can be (and was) <em>leveraged</em> to enhance confidence in the correctness of code!</p> <h2 id="quick-primer">Quick primer</h2> <p>My implementations of compile-time checks relied on multiple Rust features, some more obscure than others:</p> <ul> <li><a href="https://doc.rust-lang.org/reference/items/associated-items.html#associated-constants">Associated constants</a> allow us to add constant members to types, either directly or indirectly through a <a href="https://doc.rust-lang.org/book/ch10-02-traits.html">trait</a>. Think of these as the equivalent to (for example) <code class="language-plaintext highlighter-rouge">static final</code> class members in Java… but better, because they exist at compile time!</li> <li><a href="https://doc.rust-lang.org/reference/items/generics.html#const-generics">Const generics</a> allow us to pass values, rather than types, as generic parameters. (Before const generics were implemented and stabilized, you couldn’t represent things such as “an array of any size” natively - only by using a library like <a href="https://docs.rs/generic-array/latest/generic_array/">generic_array</a>.)</li> <li><a href="https://rust-lang.github.io/rfcs/2345-const-panic.html">Const panics</a> allow us to abort compilation if necessary, by panicking during constant evaluation. We even get to choose the error message!</li> </ul> <p>With these three tools, a lot can be done.</p> <p>Other terms to know:</p> <ul> <li>A <em>reference</em> is basically a pointer without the footguns. The <a href="https://doc.rust-lang.org/book/ch04-02-references-and-borrowing.html">Rust book</a> goes into more detail. A reference to a type <code class="language-plaintext highlighter-rouge">T</code> is represented as <code class="language-plaintext highlighter-rouge">&amp;T</code>.</li> <li>An <em>array</em> is a fixed-length collection of items of a single type, usually represented as <code class="language-plaintext highlighter-rouge">[T; N]</code> (where <code class="language-plaintext highlighter-rouge">T</code> is the item type, and <code class="language-plaintext highlighter-rouge">N</code> is the length.) Arrays <strong>cannot be resized.</strong></li> <li>An <em>array reference</em> is simply a reference to an array, represented as <code class="language-plaintext highlighter-rouge">&amp;[T; N]</code>.</li> <li>A <em>slice</em> (usually represented as <code class="language-plaintext highlighter-rouge">[T]</code>, with <code class="language-plaintext highlighter-rouge">T</code> still being the item type) is a subsection of a larger collection, such as an array. Slices cannot be passed or stored directly - you can only work with <em>references</em> to them.</li> <li>Something being <em>infallible</em> means that it cannot fail. If it does, that’s a bug.</li> </ul> <h2 id="example-1-slicing-and-indexing-arraysarray-references">Example 1: Slicing and indexing arrays/array references</h2> <h3 id="shameless-plug">Shameless plug</h3> <p>I’ve polished and implemented the system described here as part of a <a href="https://github.com/ktkaufman03/static-slicing">library</a> that I released recently. If you like what you see here, maybe consider checking out the more refined version. I cannot overstate how much time it saved me!</p> <h3 id="background">Background</h3> <p>My team’s firmware made extensive use of slicing and indexing in order to (de)serialize data, among many, many other things. Ironically, these two simple tasks are where things start to get complicated.</p> <h3 id="the-first-problem-slicing">The first problem: slicing</h3> <p>Consider the following Rust program (which is valid and will compile):</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="c">// yes, I know I don't need to specify the types explicitly.</span> <span class="c">// there's a point to this :)</span> <span class="k">let</span> <span class="n">x</span><span class="p">:</span> <span class="p">[</span><span class="nb">i32</span><span class="p">;</span> <span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">];</span> <span class="k">let</span> <span class="n">y</span><span class="p">:</span> <span class="nb">i32</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="mi">0</span><span class="p">];</span> <span class="k">let</span> <span class="n">z</span> <span class="cm">/*: ??? */</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">x</span><span class="p">[</span><span class="mi">4</span><span class="o">..</span><span class="mi">6</span><span class="p">];</span> <span class="p">}</span> </code></pre></div></div> <p>We haven’t specified the type for <code class="language-plaintext highlighter-rouge">z</code>, so it’s up to the compiler to figure it out. What I (and, I suspect, many new users) would <em>expect</em> is for a process like this to occur:</p> <ol> <li><code class="language-plaintext highlighter-rouge">x</code> (which we are slicing) is a <code class="language-plaintext highlighter-rouge">[i32; 6]</code>.</li> <li>The slice index is <code class="language-plaintext highlighter-rouge">4..6</code>; in interval notation, that’s <code class="language-plaintext highlighter-rouge">[4, 6)</code>, and in “list of numbers” that’s 4 and 5. So, we want <em>2 items starting from index 4.</em></li> <li>The result of selecting 2 items should be, well, 2 items, or <code class="language-plaintext highlighter-rouge">[i32; 2]</code> in this case.</li> <li>Since we’re only taking a <em>reference</em> instead of “moving” the actual values, <code class="language-plaintext highlighter-rouge">z</code> should be <code class="language-plaintext highlighter-rouge">&amp;[i32; 2]</code>.</li> </ol> <p>Unfortunately, all we get is a <code class="language-plaintext highlighter-rouge">&amp;[i32]</code> - a reference to a <em>slice.</em> If we try to explicitly specify the type as <code class="language-plaintext highlighter-rouge">&amp;[i32; 2]</code> anyway, we get this error:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error[E0308]: mismatched types --&gt; src/bin/blogtest.rs:6:24 | 6 | let z: &amp;[i32; 2] = &amp;x[4..6]; | --------- ^^^^^^^^ expected `&amp;[i32; 2]`, found `&amp;[i32]` | | | expected due to this | = note: expected reference `&amp;[i32; 2]` found reference `&amp;[i32]` </code></pre></div></div> <p>The logical next step, then, is to try to <em>convert</em> this <code class="language-plaintext highlighter-rouge">&amp;[i32]</code> into a <code class="language-plaintext highlighter-rouge">&amp;[i32; 2]</code>. For infallible conversions, we can use the <code class="language-plaintext highlighter-rouge">into</code> method of the <a href="https://doc.rust-lang.org/std/convert/trait.Into.html"><code class="language-plaintext highlighter-rouge">Into</code></a> trait - since we <em>know</em> that the slice is 2 elements long, maybe <code class="language-plaintext highlighter-rouge">into</code> will work?</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// let z: &amp;[i32; 2] = &amp;x[4..6];</span> <span class="k">let</span> <span class="n">z</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">i32</span><span class="p">;</span> <span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">x</span><span class="p">[</span><span class="mi">4</span><span class="o">..</span><span class="mi">6</span><span class="p">]</span><span class="nf">.into</span><span class="p">();</span> </code></pre></div></div> <p>Nope:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error[E0277]: the trait bound `[i32; 2]: From&lt;&amp;[i32]&gt;` is not satisfied --&gt; src/bin/blogtest.rs:6:33 | 6 | let z: &amp;[i32; 2] = &amp;x[4..6].into(); | ^^^^ the trait `From&lt;&amp;[i32]&gt;` is not implemented for `[i32; 2]` | = help: the following other types implement trait `From&lt;T&gt;`: &lt;[T; LANES] as From&lt;Simd&lt;T, LANES&gt;&gt;&gt; &lt;[bool; LANES] as From&lt;Mask&lt;T, LANES&gt;&gt;&gt; = note: required for `&amp;[i32]` to implement `Into&lt;[i32; 2]&gt;` </code></pre></div></div> <p>I suppose this makes sense - although we know that the conversion would be infallible <em>in this case</em>, we can’t say the same for the <em>general case</em>, and <code class="language-plaintext highlighter-rouge">Into</code> has a strict general-case infallibility requriement:</p> <blockquote> <p>Note: This trait must not fail. If the conversion can fail, use TryInto. <em>(source: <a href="https://doc.rust-lang.org/std/convert/trait.Into.html"><code class="language-plaintext highlighter-rouge">Into</code> documentation</a>)</em></p> </blockquote> <p>If we take the hint and try the rather cumbersome <code class="language-plaintext highlighter-rouge">.try_into().unwrap()</code>, it works:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// let z: &amp;[i32; 2] = &amp;x[4..6];</span> <span class="k">let</span> <span class="n">z</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">i32</span><span class="p">;</span> <span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="o">&amp;</span><span class="n">x</span><span class="p">[</span><span class="mi">4</span><span class="o">..</span><span class="mi">6</span><span class="p">]</span><span class="nf">.try_into</span><span class="p">()</span><span class="nf">.unwrap</span><span class="p">();</span> </code></pre></div></div> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> Running `target/debug/blogtest` y = 1 z = [5, 6] </code></pre></div></div> <p>This <em>works</em>, but there are two issues:</p> <ul> <li>It looks weird. If I know something can’t fail, why am I “trying” to do it?</li> <li>It’s really easy to make a typo and end up crashing at runtime. (If you get either the length or the index range wrong, you get something called a <code class="language-plaintext highlighter-rouge">TryFromSliceError</code> with no extra information.)</li> </ul> <h3 id="the-second-problem-indexing">The second problem: indexing</h3> <p>Consider the following Rust program:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="k">let</span> <span class="n">x</span><span class="p">:</span> <span class="p">[</span><span class="nb">i32</span><span class="p">;</span> <span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">];</span> <span class="k">let</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="mi">8</span><span class="p">];</span> <span class="p">}</span> </code></pre></div></div> <p>Interestingly, this <em>doesn’t</em> compile - we get an error about an “unconditional panic.”</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error: this operation will panic at runtime --&gt; src/bin/blogtest.rs:3:13 | 3 | let y = x[8]; | ^^^^ index out of bounds: the length is 6 but the index is 8 | = note: `#[deny(unconditional_panic)]` on by default </code></pre></div></div> <p>I’m not entirely sure where this error comes from. A slightly different program will <em>compile</em>, but panic at runtime with the same exact “index out of bounds” message:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ops</span><span class="p">::</span><span class="nb">Index</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="k">let</span> <span class="n">x</span><span class="p">:</span> <span class="p">[</span><span class="nb">i32</span><span class="p">;</span> <span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">];</span> <span class="k">let</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="nf">.index</span><span class="p">(</span><span class="mi">8</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <p>So far so good, as long as we stick to the normal <code class="language-plaintext highlighter-rouge">[indexing]</code> syntax. Now, what happens if we change <code class="language-plaintext highlighter-rouge">x</code> from an <em>array</em> to an <em>array reference?</em></p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="k">let</span> <span class="n">x</span><span class="p">:</span> <span class="o">&amp;</span><span class="p">[</span><span class="nb">i32</span><span class="p">;</span> <span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="o">&amp;</span><span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">5</span><span class="p">,</span> <span class="mi">6</span><span class="p">];</span> <span class="k">let</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="mi">8</span><span class="p">];</span> <span class="p">}</span> </code></pre></div></div> <p>This compiles just fine, and panics at runtime (same message as before.) How unfortunate - it’s not like we’re missing any information here, since we know the length of the array behind <code class="language-plaintext highlighter-rouge">x</code>, and that tells us that index 8 doesn’t exist!</p> <h3 id="sketching-out-a-solution">Sketching out a solution</h3> <p>Thankfully, there’s a solution to <strong>both issues</strong>: use our own index types! Rust allows us to do this by implementing the <a href="https://doc.rust-lang.org/std/ops/trait.Index.html"><code class="language-plaintext highlighter-rouge">Index&lt;Idx&gt;</code></a> trait, with <code class="language-plaintext highlighter-rouge">Idx</code> being our custom index type. If we define our own type, called (for example) <code class="language-plaintext highlighter-rouge">CustomIndex</code>, we can do something like this:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ops</span><span class="p">::</span><span class="nb">Index</span><span class="p">;</span> <span class="k">struct</span> <span class="n">CustomIndex</span><span class="p">;</span> <span class="k">impl</span> <span class="nb">Index</span><span class="o">&lt;</span><span class="n">CustomIndex</span><span class="o">&gt;</span> <span class="k">for</span> <span class="p">[</span><span class="nb">i32</span><span class="p">;</span> <span class="mi">4</span><span class="p">]</span> <span class="p">{</span> <span class="k">type</span> <span class="n">Output</span> <span class="o">=</span> <span class="nb">i32</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">index</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="mi">_</span><span class="p">:</span> <span class="n">CustomIndex</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="nn">Self</span><span class="p">::</span><span class="n">Output</span> <span class="p">{</span> <span class="nd">todo!</span><span class="p">()</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>Obviously this isn’t very useful - we’ve only implemented it for 4-element arrays of signed 32-bit integers, and indexing will always fail - but it’s a start.</p> <p>There are a few constraints we should keep in mind:</p> <ol> <li>There should not be any extra runtime cost when using our custom indexes.</li> <li>Any checks that <em>can</em> be done at compile time <em>should</em> be done at compile time.</li> <li>This all needs to be safe <strong>and</strong> sound.</li> </ol> <p>Thankfully, all of these can be satisfied!</p> <h3 id="design-constraint-1-aim-for-zero-cost">Design constraint #1: Aim for zero cost</h3> <p>As it turns out, an easy way to avoid <em>storage</em> costs is to make sure no extra data is being carried around. If we make use of <a href="https://doc.rust-lang.org/nomicon/exotic-sizes.html#zero-sized-types-zsts">zero-sized types</a> (<strong>ZSTs</strong> for short), we can leave no trace of our custom indexing system!</p> <p>For example, consider following program:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">MyZST</span><span class="o">&lt;</span><span class="k">const</span> <span class="n">A</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">const</span> <span class="n">B</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span><span class="p">;</span> <span class="k">impl</span><span class="o">&lt;</span><span class="k">const</span> <span class="n">A</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">const</span> <span class="n">B</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span> <span class="n">MyZST</span><span class="o">&lt;</span><span class="n">A</span><span class="p">,</span> <span class="n">B</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">pub</span> <span class="k">fn</span> <span class="nf">sum</span><span class="p">(</span><span class="k">self</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="nb">usize</span> <span class="p">{</span> <span class="n">A</span> <span class="o">+</span> <span class="n">B</span> <span class="p">}</span> <span class="p">}</span> <span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="k">let</span> <span class="n">x</span> <span class="o">=</span> <span class="nn">MyZST</span><span class="p">::</span><span class="o">&lt;</span><span class="mi">67</span><span class="p">,</span> <span class="mi">96</span><span class="o">&gt;</span><span class="nf">.sum</span><span class="p">();</span> <span class="nd">println!</span><span class="p">(</span><span class="s">"x = {}"</span><span class="p">,</span> <span class="n">x</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <p>If we examine the optimized assembly using <a href="https://crates.io/crates/cargo-show-asm"><code class="language-plaintext highlighter-rouge">cargo-show-asm</code></a>, we can see that the <code class="language-plaintext highlighter-rouge">let x = ...</code> line was translated into a single <code class="language-plaintext highlighter-rouge">mov</code> instruction:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cargo asm --bin blogtest 0 --rust Finished release [optimized] target(s) in 0.01s .section .text.blogtest::main,"ax",@progbits .p2align 4, 0x90 .type blogtest::main,@function blogtest::main: ... // blogtest.rs : 5 A + B mov qword ptr [rsp], 163 &lt;----- this is 67+96! ... </code></pre></div></div> <p>There is no evidence of <code class="language-plaintext highlighter-rouge">MyZST</code>’s existence, which is encouraging! Returning to our <code class="language-plaintext highlighter-rouge">CustomIndex</code> example, we might modify it like so:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ops</span><span class="p">::</span><span class="nb">Index</span><span class="p">;</span> <span class="k">struct</span> <span class="n">CustomIndex</span><span class="o">&lt;</span><span class="k">const</span> <span class="n">I</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span><span class="p">;</span> <span class="k">impl</span><span class="o">&lt;</span><span class="k">const</span> <span class="n">I</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span> <span class="nb">Index</span><span class="o">&lt;</span><span class="n">CustomIndex</span><span class="o">&lt;</span><span class="n">I</span><span class="o">&gt;&gt;</span> <span class="k">for</span> <span class="p">[</span><span class="nb">i32</span><span class="p">;</span> <span class="mi">4</span><span class="p">]</span> <span class="p">{</span> <span class="k">type</span> <span class="n">Output</span> <span class="o">=</span> <span class="nb">i32</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">index</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="mi">_</span><span class="p">:</span> <span class="n">CustomIndex</span><span class="o">&lt;</span><span class="n">I</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="nn">Self</span><span class="p">::</span><span class="n">Output</span> <span class="p">{</span> <span class="c">// Delegating to Index&lt;usize&gt;</span> <span class="k">self</span><span class="nf">.index</span><span class="p">(</span><span class="n">I</span><span class="p">)</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>Of course, this still isn’t ideal - we’ve only implemented indexing for arrays of 4 32-bit signed integers. We can fix that, too:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ops</span><span class="p">::</span><span class="nb">Index</span><span class="p">;</span> <span class="k">struct</span> <span class="n">CustomIndex</span><span class="o">&lt;</span><span class="k">const</span> <span class="n">I</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span><span class="p">;</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">,</span> <span class="k">const</span> <span class="n">N</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">const</span> <span class="n">I</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span> <span class="nb">Index</span><span class="o">&lt;</span><span class="n">CustomIndex</span><span class="o">&lt;</span><span class="n">I</span><span class="o">&gt;&gt;</span> <span class="k">for</span> <span class="p">[</span><span class="n">T</span><span class="p">;</span> <span class="n">N</span><span class="p">]</span> <span class="p">{</span> <span class="k">type</span> <span class="n">Output</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">index</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="mi">_</span><span class="p">:</span> <span class="n">CustomIndex</span><span class="o">&lt;</span><span class="n">I</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="nn">Self</span><span class="p">::</span><span class="n">Output</span> <span class="p">{</span> <span class="c">// Delegating to Index&lt;usize&gt;</span> <span class="k">self</span><span class="nf">.index</span><span class="p">(</span><span class="n">I</span><span class="p">)</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>Now we can use it on arrays of any size that contain items of any type:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="k">let</span> <span class="n">x</span> <span class="o">=</span> <span class="p">[</span><span class="mi">1u8</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">32</span><span class="p">];</span> <span class="k">let</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="nn">CustomIndex</span><span class="p">::</span><span class="o">&lt;</span><span class="mi">5</span><span class="o">&gt;</span><span class="p">];</span> <span class="nd">println!</span><span class="p">(</span><span class="s">"y = {}"</span><span class="p">,</span> <span class="n">y</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <p>And if we check the generated assembly…</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>... // blogtest.rs : 22 let y = x[CustomIndex::&lt;5&gt;]; mov byte ptr [rsp + 7], 32 </code></pre></div></div> <p>Perfect! We’re not getting in the way of any optimizations, so we can move on to the next constraint.</p> <h3 id="design-constraint-2-prefer-compile-time-checks">Design constraint #2: Prefer compile-time checks</h3> <p>In order for this to be useful, we should do bounds checking at <em>compile-time</em> rather than <em>runtime.</em> Delegating to a lower-level implementation of <code class="language-plaintext highlighter-rouge">Index</code> - namely, <code class="language-plaintext highlighter-rouge">Index&lt;usize&gt;</code> - doesn’t help us one bit - we’ll still panic at runtime if we try to perform an out-of-bounds access. A potential compile-time-checked version could look like this:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">use</span> <span class="nn">std</span><span class="p">::</span><span class="nn">ops</span><span class="p">::</span><span class="nb">Index</span><span class="p">;</span> <span class="k">struct</span> <span class="n">CustomIndex</span><span class="o">&lt;</span><span class="k">const</span> <span class="n">I</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span><span class="p">;</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">,</span> <span class="k">const</span> <span class="n">N</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">const</span> <span class="n">I</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span> <span class="nb">Index</span><span class="o">&lt;</span><span class="n">CustomIndex</span><span class="o">&lt;</span><span class="n">I</span><span class="o">&gt;&gt;</span> <span class="k">for</span> <span class="p">[</span><span class="n">T</span><span class="p">;</span> <span class="n">N</span><span class="p">]</span> <span class="p">{</span> <span class="k">type</span> <span class="n">Output</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">index</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="mi">_</span><span class="p">:</span> <span class="n">CustomIndex</span><span class="o">&lt;</span><span class="n">I</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="nn">Self</span><span class="p">::</span><span class="n">Output</span> <span class="p">{</span> <span class="k">const</span> <span class="n">RESULT</span><span class="p">:</span> <span class="p">()</span> <span class="o">=</span> <span class="k">assert</span><span class="o">!</span><span class="p">(</span><span class="n">N</span> <span class="o">&gt;</span> <span class="n">I</span><span class="p">,</span> <span class="s">"Index is out of bounds!"</span><span class="p">);</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&amp;*</span><span class="p">(</span><span class="k">self</span><span class="nf">.as_ptr</span><span class="p">()</span><span class="nf">.add</span><span class="p">(</span><span class="n">I</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="n">T</span><span class="p">)</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>Unfortunately, this doesn’t compile. The compiler doesn’t like that we’re using <code class="language-plaintext highlighter-rouge">N</code> and <code class="language-plaintext highlighter-rouge">I</code>, which are both considered generic parameters from an “outer function”, inside the definition of the <code class="language-plaintext highlighter-rouge">RESULT</code> constant. Oh well.</p> <p>This is where type system hacking comes into play. Instead of doing the check within the <code class="language-plaintext highlighter-rouge">index</code> method, we can delegate it to something else… something that <em>can</em> make use of the generic parameters. For this, we introduce the concept of checker traits.</p> <p>Since we know that we can panic in a <code class="language-plaintext highlighter-rouge">const</code> context, and we know that traits can have associated <code class="language-plaintext highlighter-rouge">const</code> members, <em>and</em> we know that traits support both normal and const generics, we can create a <code class="language-plaintext highlighter-rouge">IsValidIndex</code> trait that <code class="language-plaintext highlighter-rouge">CustomIndex</code> will implement, like so:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c">// Target is the collection type, i.e., [u32; 17].</span> <span class="k">trait</span> <span class="n">IsValidIndex</span><span class="o">&lt;</span><span class="n">Target</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">const</span> <span class="n">RESULT</span><span class="p">:</span> <span class="p">();</span> <span class="p">}</span> <span class="k">struct</span> <span class="n">CustomIndex</span><span class="o">&lt;</span><span class="k">const</span> <span class="n">I</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span><span class="p">;</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">,</span> <span class="k">const</span> <span class="n">I</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">const</span> <span class="n">N</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span> <span class="n">IsValidIndex</span><span class="o">&lt;</span><span class="p">[</span><span class="n">T</span><span class="p">;</span> <span class="n">N</span><span class="p">]</span><span class="o">&gt;</span> <span class="k">for</span> <span class="n">CustomIndex</span><span class="o">&lt;</span><span class="n">I</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">const</span> <span class="n">RESULT</span><span class="p">:</span> <span class="p">()</span> <span class="o">=</span> <span class="k">assert</span><span class="o">!</span><span class="p">(</span><span class="n">N</span> <span class="o">&gt;</span> <span class="n">I</span><span class="p">,</span> <span class="s">"Index is out of bounds!"</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <p>This compiles without any trouble, and now all we have to do is integrate it into the <code class="language-plaintext highlighter-rouge">index</code> method…</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">,</span> <span class="k">const</span> <span class="n">N</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">const</span> <span class="n">I</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span> <span class="nb">Index</span><span class="o">&lt;</span><span class="n">CustomIndex</span><span class="o">&lt;</span><span class="n">I</span><span class="o">&gt;&gt;</span> <span class="k">for</span> <span class="p">[</span><span class="n">T</span><span class="p">;</span> <span class="n">N</span><span class="p">]</span> <span class="p">{</span> <span class="k">type</span> <span class="n">Output</span> <span class="o">=</span> <span class="n">T</span><span class="p">;</span> <span class="k">fn</span> <span class="nf">index</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="mi">_</span><span class="p">:</span> <span class="n">CustomIndex</span><span class="o">&lt;</span><span class="n">I</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="nn">Self</span><span class="p">::</span><span class="n">Output</span> <span class="p">{</span> <span class="k">let</span> <span class="mi">_</span> <span class="o">=</span> <span class="o">&lt;</span><span class="n">CustomIndex</span><span class="o">&lt;</span><span class="n">I</span><span class="o">&gt;</span> <span class="k">as</span> <span class="n">IsValidIndex</span><span class="o">&lt;</span><span class="n">Self</span><span class="o">&gt;&gt;</span><span class="p">::</span><span class="n">RESULT</span><span class="p">;</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&amp;*</span><span class="p">(</span><span class="k">self</span><span class="nf">.as_ptr</span><span class="p">()</span><span class="nf">.add</span><span class="p">(</span><span class="n">I</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="n">T</span><span class="p">)</span> <span class="p">}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>Now we’ll try to compile this program:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="c">// notice that `x` is an array reference now</span> <span class="k">let</span> <span class="n">x</span> <span class="o">=</span> <span class="o">&amp;</span><span class="p">[</span><span class="mi">1u8</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">8</span><span class="p">,</span> <span class="mi">16</span><span class="p">,</span> <span class="mi">32</span><span class="p">];</span> <span class="k">let</span> <span class="n">y</span> <span class="o">=</span> <span class="n">x</span><span class="p">[</span><span class="nn">CustomIndex</span><span class="p">::</span><span class="o">&lt;</span><span class="mi">7</span><span class="o">&gt;</span><span class="p">];</span> <span class="nd">println!</span><span class="p">(</span><span class="s">"y = {}"</span><span class="p">,</span> <span class="n">y</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <p>We get a compilation error! Our check worked.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>error[E0080]: evaluation of `&lt;CustomIndex&lt;7&gt; as IsValidIndex&lt;[u8; 6]&gt;&gt;::RESULT` failed --&gt; src/bin/blogtest.rs:18:24 | 18 | const RESULT: () = assert!(N &gt; I, "Index is out of bounds!"); | ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the evaluated program panicked at 'Index is out of bounds!', src/bin/blogtest.rs:18:24 </code></pre></div></div> <p>If we were to change <code class="language-plaintext highlighter-rouge">CustomIndex::&lt;7&gt;</code> to <code class="language-plaintext highlighter-rouge">7</code>, the program would compile but crash at runtime.</p> <p>We can go through this whole process again to create a better system for slicing. I won’t reiterate all the concepts involved, but here’s a quick and easy implementation:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="n">IsValidRangeIndex</span><span class="o">&lt;</span><span class="n">Target</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">const</span> <span class="n">RESULT</span><span class="p">:</span> <span class="p">();</span> <span class="p">}</span> <span class="k">pub</span> <span class="k">struct</span> <span class="n">CustomRangeIndex</span><span class="o">&lt;</span><span class="k">const</span> <span class="n">START</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">const</span> <span class="n">LENGTH</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span><span class="p">;</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">,</span> <span class="k">const</span> <span class="n">START</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">const</span> <span class="n">LENGTH</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">const</span> <span class="n">N</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span> <span class="n">IsValidRangeIndex</span><span class="o">&lt;</span><span class="p">[</span><span class="n">T</span><span class="p">;</span> <span class="n">N</span><span class="p">]</span><span class="o">&gt;</span> <span class="k">for</span> <span class="n">CustomRangeIndex</span><span class="o">&lt;</span><span class="n">START</span><span class="p">,</span> <span class="n">LENGTH</span><span class="o">&gt;</span> <span class="p">{</span> <span class="k">const</span> <span class="n">RESULT</span><span class="p">:</span> <span class="p">()</span> <span class="o">=</span> <span class="k">assert</span><span class="o">!</span><span class="p">(</span><span class="n">N</span> <span class="o">&gt;=</span> <span class="n">START</span> <span class="o">+</span> <span class="n">LENGTH</span><span class="p">,</span> <span class="s">"Ending index is out of bounds!"</span><span class="p">);</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">,</span> <span class="k">const</span> <span class="n">START</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">const</span> <span class="n">LENGTH</span><span class="p">:</span> <span class="nb">usize</span><span class="p">,</span> <span class="k">const</span> <span class="n">N</span><span class="p">:</span> <span class="nb">usize</span><span class="o">&gt;</span> <span class="nb">Index</span><span class="o">&lt;</span><span class="n">CustomRangeIndex</span><span class="o">&lt;</span><span class="n">START</span><span class="p">,</span> <span class="n">LENGTH</span><span class="o">&gt;&gt;</span> <span class="k">for</span> <span class="p">[</span><span class="n">T</span><span class="p">;</span> <span class="n">N</span><span class="p">]</span> <span class="p">{</span> <span class="k">type</span> <span class="n">Output</span> <span class="o">=</span> <span class="p">[</span><span class="n">T</span><span class="p">;</span> <span class="n">LENGTH</span><span class="p">];</span> <span class="k">fn</span> <span class="nf">index</span><span class="p">(</span><span class="o">&amp;</span><span class="k">self</span><span class="p">,</span> <span class="mi">_</span><span class="p">:</span> <span class="n">CustomRangeIndex</span><span class="o">&lt;</span><span class="n">START</span><span class="p">,</span> <span class="n">LENGTH</span><span class="o">&gt;</span><span class="p">)</span> <span class="k">-&gt;</span> <span class="o">&amp;</span><span class="nn">Self</span><span class="p">::</span><span class="n">Output</span> <span class="p">{</span> <span class="k">let</span> <span class="mi">_</span> <span class="o">=</span> <span class="o">&lt;</span><span class="n">CustomRangeIndex</span><span class="o">&lt;</span><span class="n">START</span><span class="p">,</span> <span class="n">LENGTH</span><span class="o">&gt;</span> <span class="k">as</span> <span class="n">IsValidRangeIndex</span><span class="o">&lt;</span><span class="n">Self</span><span class="o">&gt;&gt;</span><span class="p">::</span><span class="n">RESULT</span><span class="p">;</span> <span class="k">unsafe</span> <span class="p">{</span> <span class="o">&amp;*</span><span class="p">(</span><span class="k">self</span><span class="nf">.as_ptr</span><span class="p">()</span><span class="nf">.add</span><span class="p">(</span><span class="n">START</span><span class="p">)</span> <span class="k">as</span> <span class="o">*</span><span class="k">const</span> <span class="p">[</span><span class="n">T</span><span class="p">;</span> <span class="n">LENGTH</span><span class="p">])}</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>Okay, maybe I lied about it being “quick and easy”, but it’s probably about as simple as you can get. By using this, you can write the following program, which will compile and run successfully:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">fn</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span> <span class="k">let</span> <span class="n">x</span> <span class="o">=</span> <span class="o">&amp;</span><span class="p">[</span><span class="mi">0x01</span><span class="p">,</span> <span class="mi">0x03</span><span class="p">,</span> <span class="mi">0x00</span><span class="p">,</span> <span class="mi">0x00</span><span class="p">,</span> <span class="mi">0x88</span><span class="p">,</span> <span class="mi">0x77</span><span class="p">,</span> <span class="mi">0x66</span><span class="p">,</span> <span class="mi">0x55</span><span class="p">];</span> <span class="k">let</span> <span class="n">y</span> <span class="o">=</span> <span class="nn">u16</span><span class="p">::</span><span class="nf">from_le_bytes</span><span class="p">(</span><span class="n">x</span><span class="p">[</span><span class="nn">CustomRangeIndex</span><span class="p">::</span><span class="o">&lt;</span><span class="mi">4</span><span class="p">,</span> <span class="mi">2</span><span class="o">&gt;</span><span class="p">]);</span> <span class="nd">assert_eq!</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="mi">0x7788</span><span class="p">);</span> <span class="p">}</span> </code></pre></div></div> <p>You won’t win any code golf contests with this, but I think using something like <code class="language-plaintext highlighter-rouge">CustomRangeIndex</code> is a lot nicer than sprinkling <code class="language-plaintext highlighter-rouge">.try_into().unwrap()</code> everywhere.</p> <h3 id="design-constraint-3-safe-and-sound">Design constraint #3: Safe and sound</h3> <p>The nice thing about compile-time-checks is that as long as we do them correctly, we don’t really have to <em>worry</em> about creating unsound code. By design, our custom indexing system is 100% safe and perfectly sound. Both of these guarantees come from the compile-time checking - we won’t compile code that could access data out of bounds, and there’s no way to <em>bypass</em> the checks and create undefined behavior.</p> <h2 id="example-2-enforcing-data-alignment-requirements">Example 2: Enforcing data alignment requirements</h2> <p>My team’s eCTF firmware made extensive use of the microcontroller’s embedded <a href="https://en.wikipedia.org/wiki/EEPROM">EEPROM</a> as a persistent data store. Like many pieces of hardware, though, this EEPROM had some particular requirements that we needed to respect:</p> <ul> <li>All reads and writes had to be at 4-byte aligned addresses (<code class="language-plaintext highlighter-rouge">0x0</code>, <code class="language-plaintext highlighter-rouge">0x4</code>, <code class="language-plaintext highlighter-rouge">0x8</code>, etc, all the way up to <code class="language-plaintext highlighter-rouge">0x800</code>.)</li> <li>All read and write sizes had to be of multiples of 4 bytes. Reading or writing 5 bytes, for example, was not allowed - 8 was the next valid size after 4.</li> </ul> <p>Both of these problems were addressed using the same checker trait technique. For example, to prevent ourselves from accidentally reading or writing an unaligned data type, I created an <code class="language-plaintext highlighter-rouge">IsEEPROMCompatible</code> trait that would produce a compilation error if the implementing type was not properly aligned:</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">trait</span> <span class="n">IsEEPROMCompatible</span> <span class="p">{</span> <span class="k">const</span> <span class="n">RESULT</span><span class="p">:</span> <span class="p">();</span> <span class="p">}</span> <span class="k">impl</span><span class="o">&lt;</span><span class="n">T</span><span class="p">:</span> <span class="n">Sized</span><span class="o">&gt;</span> <span class="n">IsEEPROMCompatible</span> <span class="k">for</span> <span class="n">T</span> <span class="p">{</span> <span class="k">const</span> <span class="n">RESULT</span><span class="p">:</span> <span class="p">()</span> <span class="o">=</span> <span class="p">{</span> <span class="k">if</span> <span class="nn">core</span><span class="p">::</span><span class="nn">mem</span><span class="p">::</span><span class="nn">size_of</span><span class="p">::</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;</span><span class="p">()</span> <span class="o">%</span> <span class="mi">4</span> <span class="o">!=</span> <span class="mi">0</span> <span class="p">{</span> <span class="nd">panic!</span><span class="p">(</span><span class="s">"the size of this type is not a multiple of the EEPROM word size (4 bytes)"</span><span class="p">);</span> <span class="p">}</span> <span class="p">};</span> <span class="p">}</span> </code></pre></div></div> <p>We could then use it in our EEPROM interaction code and be secure in the knowledge that any incorrect interaction would not compile!</p> <div class="language-rs highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">impl</span><span class="o">&lt;</span><span class="n">Inner</span><span class="o">&gt;</span> <span class="n">EEPROMVar</span><span class="o">&lt;</span><span class="n">Inner</span><span class="o">&gt;</span> <span class="k">where</span> <span class="n">Inner</span><span class="p">:</span> <span class="n">Sized</span> <span class="o">+</span> <span class="n">PartialEq</span><span class="p">,</span> <span class="p">{</span> <span class="k">pub</span> <span class="k">fn</span> <span class="n">new</span><span class="o">&lt;</span><span class="k">const</span> <span class="n">ADDRESS</span><span class="p">:</span> <span class="nb">u32</span><span class="o">&gt;</span><span class="p">()</span> <span class="k">-&gt;</span> <span class="n">Self</span> <span class="p">{</span> <span class="c">// Safety check #1: ensure the data type is compatible with</span> <span class="c">// being read from/written to EEPROM, i.e., its size is</span> <span class="c">// a multiple of 4 bytes.</span> <span class="k">let</span> <span class="mi">_</span> <span class="o">=</span> <span class="o">&lt;</span><span class="n">Inner</span> <span class="k">as</span> <span class="n">IsEEPROMCompatible</span><span class="o">&gt;</span><span class="p">::</span><span class="n">RESULT</span><span class="p">;</span> <span class="c">// ...</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>Coincidentally, implementing this check highlighted a previously unnoticed bug in our code: we were, in fact, trying to do an unaligned read from EEPROM, which would have otherwise failed at runtime.</p> <h2 id="conclusions">Conclusions</h2> <p>While Rust is certainly a significant improvement over older languages (such as C) in terms of safety and developer experience, the language lends itself to <em>far</em> more than what is commonly advertised. This is also where the language’s relative immaturity begins to become more obvious, as awkward workarounds are necessary to deal with language limitations. (Try to implement <code class="language-plaintext highlighter-rouge">Index</code> with a custom index type for both <code class="language-plaintext highlighter-rouge">[T]</code> and <code class="language-plaintext highlighter-rouge">[T; N]</code>. You can’t, because an implementation for <code class="language-plaintext highlighter-rouge">[T; N]</code> is <em>automatically generated</em> if an implementation for <code class="language-plaintext highlighter-rouge">[T]</code> exists, and it’s seemingly impossible to override it.)</p> <p>Despite these difficulties, when all the pieces fall into place it’s quite remarkable what can be achieved with the language’s powerful type system and constant evaluation mechanism. As the language matures and features such as const generics are (hopefully) improved upon, I look forward to seeing what new tricks can be implemented to help make writing good, safe code even easier.</p>Kai Kaufman[email protected]IntroductionReviving the coolest scanner you’ve never heard of2022-09-04T01:18:00+00:002022-09-04T01:18:00+00:00https://ktkaufman03.github.io/blog/2022/09/04/pakon-reverse-engineering<h2 id="introduction">Introduction</h2> <p>Today’s digital cameras are nothing short of incredible when it comes to ease-of-use and image quality, and many take them for granted, myself included. For those of us in “generation Z”, though, it’s all too easy to be ignorant of what came <em>before</em> this now ubiquitous technology.</p> <h3 id="a-semi-brief-history">A semi-brief<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup> history</h3> <h4 id="the-rise-of-photo-cd">The rise of Photo CD</h4> <p>In the ’90s and early 2000s, getting film developed wasn’t exactly something for the average person to do at home. Instead, “minilabs” were a popular destination - you could take your film negatives to the local pharmacy, and in a reasonable amount of time, you’d have prints of your pictures!</p> <p>Now, prints are wonderful, but in 1990 Kodak came up with a better idea - digitizing photos and putting them on CDs. Thus began the era of the aptly named “Photo CD.” Photo CD was an entire line of products dedicated to the digitization of photos, including film scanners and special Photo CD players. Photo CD enjoyed some popularity for a number of years, but ultimately faded away due to its various issues (much like Kodak’s other attempts at breaking into the digital photography industry.)</p> <h4 id="a-new-player-enters-the-field">A new player enters the field</h4> <p>While Kodak’s Photo CD system languished, a relatively obscure company named Pakon was busy working on its own film scanner. It’s surprisingly difficult to find useful, confirmed information about Pakon, but public records and <a href="https://patents.google.com/patent/US5872591A/en">their first film scanner patent</a> indicate that their scanner work likely began in the early ’90s.</p> <p>Sometime around 2001, Pakon was acquired by Kodak. In the following years, Kodak would go on to release several models of Pakon film scanners - the F-135, F-235, F-335 and all their variants, collectively referred to as “F-X35.” These scanners came with a comprehensive software package for use in minilabs, and an SDK even existed to facilitate the development of specialized clients.</p> <p>The F-X35 scanners boasted high performance, great image quality and post-processing techniques<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup>, and relative ease of use. To say that they were popular would be an understatement - they found their way into minilabs all over the United States, including those in such major pharmacy chains as CVS and Walmart.</p> <p><img src="/assets/2022-08-28-pakon-reverse-engineering/f135plus_no_film_loaded.jpg" alt="An image of the Pakon F135 Plus film scanner without any film loaded." width="75%" height="75%" /> <em><strong>Added by popular demand:</strong> Here’s a picture of the scanner that I have access to: a Pakon F135 Plus. Yes, I know it says Kodak, but did you know that some units say Nexlab? It’s only slightly confusing.</em> <br /> <em>Film can be loaded into the feeding mechanism on the right side of the scanner (see the arrow pointing up), and motors pull it through until it comes out on the left side (see the arrow pointing down.) The F235 and F335 have a completely different design, and I don’t have either one to take pictures of - I recommend searching for them on Google Images.</em></p> <h4 id="the-fall-and-rise-of-pakon">The fall (and rise) of Pakon</h4> <p>Sadly, Pakon’s business met an untimely doom as the digital photography industry became dominant, and the company filed for bankruptcy in 2012. While there was still <em>some</em> demand for film scanning, there wasn’t <em>enough</em> demand, and pharmacy minilabs were scaled down. Amazingly, the scanners didn’t go to waste - hobbyists picked them up for pennies on the dollar, and a new community was formed. Today, a <a href="https://www.facebook.com/groups/657750677577306">Facebook group</a> dedicated to the Pakon scanner line boasts over 6,000 members, with a fair amount of weekly activity.</p> <h3 id="pakon-today">Pakon, today</h3> <p>With the software readily available, and the hardware being available as well (albeit for a pretty penny - some scanners go for as much as $2500), one would assume that the story ends here. Surprisingly, it <strong>doesn’t</strong> - for reasons we’ll explore together, the final version of the software is only usable on <strong>32-bit Windows XP</strong>! To make matters worse, setting it up tends to be extraordinarily difficult for anyone who doesn’t have a sacrificial computer - in that case, a virtual machine is necessary, which makes everything more complicated and extremely error-prone.</p> <h3 id="why-does-this-even-matter">Why does this even matter?</h3> <p>I’ll be honest - I’m not a photographer, and I personally have no reason to use a Pakon scanner. I happen to have <em>access</em> to one, though, and when I learned about its status as a user’s nightmare, I felt like taking a look. As a college student who is particularly interested in software preservation and reverse engineering, these scanners appealed to me - it turns out that <strong>people still want to use them</strong> despite their difficult nature, and I was interested in making that easier.</p> <p>“But why would anyone still want to use this?”, you might ask. Sure, there are other film scanners that come to mind - Epson makes their own, Nikon used to make their Coolscan scanners, and there are countless others available. Any of these options are perfectly fine, and in fact, some are even <em>better</em> than the Pakon in terms of technical specifications. Where Pakon scanners truly shine is in their seamlessness - unlike other scanners that require film rolls to be pre-cut into strips, the Pakon will accept entire rolls, and does everything for you: detecting frame edges and cropping images appropriately, performing color correction that <em>actually works</em>, reading DX codes if they’re available, and finally, giving you a set of high-quality, effectively noise-free scans. It does all of this at breakneck pace, too. While there are other <em>film scanners</em>, none of them that I’m aware of can even come close to the Pakon’s convenience.</p> <p>The ideal outcome for most of these users would be one where the Pakon scanner software is usable on modern versions of Windows, running on modern workstations - no more sacrificial laptops running 32-bit XP, and no more VMs that are slower than a tortoise.</p> <p>Now, finally, let’s see how we can make that happen. Extremely technical content ahead!</p> <h2 id="glossary">Glossary</h2> <p><strong>TLA, B and C:</strong> Client libraries for the F-235, F-135 and F-335 scanners respectively.</p> <p><strong>TLX:</strong> A wrapper around TLA, B and C. It is <em>the</em> Pakon scanner SDK. (TLA originally held this title, and then TLB and TLC came along.)</p> <p><strong>PSI:</strong> Pakon Scanning Interface. All-in-one desktop app for minilabs to run scans, make CDs, and do all sorts of other things.</p> <p><strong>TLXClientDemo:</strong> A much simpler interface that was meant to serve as a demo for the TLX SDK. It allows the user to control every setting and pretty much do whatever they want within the confines of the SDK. This app is also sometimes referred to as just “TLX”, which isn’t confusing at all!</p> <p><strong>User-mode:</strong> Refers to code running in “userland” - for simplicity’s sake, think of this as the desktop environment that you interact with and run applications in. Anything in userland is subject to various safety checks and interventions to ensure that the entire system can’t be brought down by a single program crashing.</p> <p><strong>Kernel-mode:</strong> Refers to code running at the kernel, or operating system level. User-mode safeguards don’t exist here, and many errors can cause the entire system to crash. On Windows, crashes of this nature trigger the notorious Blue Screen of Death, or <strong>BSOD</strong>.</p> <p><strong>Anything not explicitly defined here or anywhere else in this article is assumed to be known by those who choose to read further. If you <em>don’t</em> know, your search engine of choice should come in handy - but in general, I won’t throw anything too obscure at you without explaining it myself.</strong></p> <h2 id="getting-to-work---a-compatibility-investigation">Getting to work - a compatibility investigation</h2> <h3 id="what-are-we-targeting">What are we targeting?</h3> <p>Any external hardware that you want the operating system to be able to interact with requires a <strong>driver.</strong> At a high level, the driver is responsible for accepting commands from the operating system (often on behalf of the user) and processing them in some well-defined way. On Windows, devices exist as fake files (similar to Unix’s rule of “everything is a file”) that user-mode applications can interact with. These fake files are set up by device drivers.</p> <p>Since Windows has an astonishing compatibility track record for “normal” software, and we’re dealing with some rather obscure and unique hardware that requires a special driver, we can <em>guess</em> that the driver is going to be the source of any issues that come up. To verify this guess, we just have to try running a scan on a 32-bit version of Windows that’s <strong>not</strong> XP. In my own tests, I went with Windows 10, and as soon as the scanning software was starting to actually do something…</p> <p><em>(pause for dramatic effect)</em></p> <p>The system crashed, shocking absolutely nobody, but greatly disappointing me. Just getting Windows to recognize the scanner’s very existence was a minor production, the details of which I’ve chosen to omit because of how boring they are in comparison to everything else. In the end, if it were as simple as manually installing some drivers, it’s very likely that someone would’ve figured it out years ago. Onward!</p> <h3 id="isolating-the-problem">Isolating the problem</h3> <p>In order to determine <em>why</em> the entire system crashes, we’ll have to use a <strong>kernel debugger.</strong> I went with <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/debugger-download-tools">WinDbg</a>, mainly because it actually works (which is more than can be said of certain other debuggers), but also because I’m already somewhat familiar with it from past kernel adventures.</p> <p>After a bit of setup, my testing virtual machine was all set for debugging. A few minutes later, and round 2 of testing began…</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>REFERENCE_BY_POINTER (18) Arguments: Arg1: 00000000, Object type of the object whose reference count is being lowered Arg2: 8d704d2c, Object whose reference count is being lowered Arg3: 00000001, Reserved Arg4: ee751000, Reserved </code></pre></div></div> <p>We got our first <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-code-reference2">bug check</a>! Bug checks normally lead to BSODs, but now that we’re using a kernel debugger, we can take a look around before restarting the system. A bit more analysis (powered by WinDbg’s <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/-analyze"><code class="language-plaintext highlighter-rouge">!analyze</code></a> command) reveals that a call to <code class="language-plaintext highlighter-rouge">ObfDereferenceObject</code> at <code class="language-plaintext highlighter-rouge">F135usb2.sys+0x1db4</code> is to blame.</p> <p>If you have no idea what that meant, I’m happy for you. You’ve spared yourself the immense mental pain that comes with trying to figure out how any of this actually works. Now it’s time to go down the first of many rabbit holes - what <em>is</em> <code class="language-plaintext highlighter-rouge">ObfDereferenceObject</code>, what is it meant to do, and why is it catastrophically failing?</p> <h3 id="kernel-lore---what-are-we-looking-at-here">Kernel lore - what are we looking at here?</h3> <p>First and foremost, for reasons that are unclear to me, <code class="language-plaintext highlighter-rouge">ObfDereferenceObject</code> is exposed to driver developers through a <strong>macro</strong><sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> called <code class="language-plaintext highlighter-rouge">ObDereferenceObject</code>. I don’t understand what the point of this even is, but in any case, we can find the documentation for the macro <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-obdereferenceobject">here.</a></p> <blockquote> <p>The <strong><code class="language-plaintext highlighter-rouge">ObDereferenceObject</code></strong> routine decrements the given object’s reference count and performs retention checks.</p> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">ObDereferenceObject</span><span class="p">(</span> <span class="p">[</span><span class="n">in</span><span class="p">]</span> <span class="n">a</span> <span class="p">);</span> </code></pre></div> </div> <p><code class="language-plaintext highlighter-rouge">[in] a</code>: Pointer to the object’s body.</p> </blockquote> <p>This doesn’t seem all that complicated. Before moving on, though, we should make sure we know what an “object” is in this context. According to <a href="https://docs.microsoft.com/en-us/windows/win32/sysinfo/handles-and-objects">Microsoft documentation</a>:</p> <blockquote> <p>An object is a data structure that represents a system resource, such as a file, thread, or graphic image. Your application can’t directly access object data, nor the system resource that an object represents. Instead, your application must obtain an object <em>handle</em>, which it can use to examine or modify the system resource. Each handle has an entry in an internally maintained table. Those entries contain the addresses of the resources, and the means to identify the resource type.</p> </blockquote> <p>So, essentially, an “object” in the Windows kernel is just a resource of some sort that can have data associated with it. Furthermore, the “reference count” mentioned earlier is the number of times some bit of code has said “I care about this thing, don’t let it go away!” by using one of the functions in the <code class="language-plaintext highlighter-rouge">ObReferenceObject</code> family, which we’ll learn more about later.</p> <p>Let’s move on - we’re taking the scenic route with figuring out this crash. The ending will shock you.</p> <h3 id="reverse-engineering-the-drivers">Reverse engineering the driver(s)</h3> <p>When trying to fix something, it’s often necessary to actually <em>understand</em> it. This is where reverse engineering, or “reversing” skills come in handy, and I was up to the task. Since I was doing my tests with an F-135 scanner, I focused on the F-135 drivers, consisting of 3 files:</p> <ul> <li><code class="language-plaintext highlighter-rouge">F135usb2.sys</code>: The device-specific driver.</li> <li><code class="language-plaintext highlighter-rouge">F235Ldr.sys</code>: A firmware loader driver that is shared by all scanner models.</li> <li><code class="language-plaintext highlighter-rouge">F235Lib.sys</code>: A “framework” driver that is also shared by all scanner models.</li> </ul> <h4 id="recon">Recon</h4> <p>Whenever I reverse engineer software, my first goal is to identify code that came from <em>somewhere else</em> - whether that’s OpenSSL or the sample CD included in an obscure book, I want to know what I can <em>avoid</em> painstakingly reversing. This time was no different, and I got to work trying to figure out exactly what I was looking at.</p> <p>First, I took a look at <code class="language-plaintext highlighter-rouge">F235Ldr.sys</code>. Interestingly, it came with a little bit of useful information:</p> <p><img src="/assets/2022-08-28-pakon-reverse-engineering/f235ldr_metadata.png" alt="Image of F235Ldr.sys metadata. F235Ldr.sys is described as &quot;ezloader&quot; by &quot;Anchor Chips&quot;" /></p> <p>This gave me a real lead to follow. A bit of additional research revealed that the “ezloader” component was part of the “EZ-USB” kit sold by a company called Anchor Chips<sup id="fnref:4" role="doc-noteref"><a href="#fn:4" class="footnote" rel="footnote">4</a></sup>. Surprisingly, I was able to <em>find</em> one of these kits on eBay, and a few days later, it arrived:</p> <p><img src="/assets/2022-08-28-pakon-reverse-engineering/anchorchips_box.jpg" width="480" height="640" alt="Image of a box labeled 'The EZ-USB(TM) Integrated Circuit'" /></p> <p>Amusingly, the package had never actually been opened, and contained the original packing slip… with a date several years before I was born, addressed to a company that I don’t think exists anymore.</p> <p><img src="/assets/2022-08-28-pakon-reverse-engineering/anchorchips_packing_slip.jpg" width="480" height="640" alt="Image of a packing slip dated August 10, 1999" /></p> <p>The package contained lots of hardware, some books, and best of all, two CDs with software and documentation! After a quick search, I found what I was looking for: the source code to <code class="language-plaintext highlighter-rouge">ezloader</code>.</p> <p><img src="/assets/2022-08-28-pakon-reverse-engineering/anchorchips_ezloader_srcdir.png" alt="Image of a list of ezloader source files in File Explorer" /></p> <p>The story of <code class="language-plaintext highlighter-rouge">F235Ldr.sys</code> doesn’t quite end here, though. I wanted to make sure that the code I got from the CD was the same as the code in the compiled driver. To do this, I would have to look at <code class="language-plaintext highlighter-rouge">F235Ldr.sys</code> under a microscope. My binary microscope of choice is <a href="https://hex-rays.com/ida-pro/">IDA Pro</a> with the <a href="https://hex-rays.com/decompiler/">Hex-Rays Decompiler</a>.</p> <p>After browsing around <code class="language-plaintext highlighter-rouge">F235Ldr.sys</code> in IDA, and comparing the code to the <code class="language-plaintext highlighter-rouge">ezloader</code> source code, pretty much everything seemed identical. There was one difference, however: the version provided by Anchor Chips requires the device firmware to be embedded into the compiled driver, while the Pakon version reads firmware from <a href="https://en.wikipedia.org/wiki/Intel_HEX">Intel HEX</a> files. This was almost certainly done to facilitate the sharing of the driver among all of the different scanner models. No other significant code changes were made.</p> <p>While reversing this driver from scratch would not have been difficult, I was happy to have saved some time.</p> <p>My next target was <code class="language-plaintext highlighter-rouge">F235Lib.sys</code>, which proved to be somewhat more challenging to identify. It, too, had publisher information:</p> <p><img src="/assets/2022-08-28-pakon-reverse-engineering/f235lib_metadata.png" alt="Image of F235Lib.sys metadata." /></p> <p>I have to say, I never would have thought that a driver called <code class="language-plaintext highlighter-rouge">F235Lib</code> would be the “F235 Usb 2.0 Library Driver.” Since this information is practically useless, we’ll once again have to examine the file under the microscope.</p> <p>Upon loading <code class="language-plaintext highlighter-rouge">F235Lib.sys</code> into IDA, I noticed it exported several functions, as would be expected of a library:</p> <p><img src="/assets/2022-08-28-pakon-reverse-engineering/f235lib_exports.png" alt="Image of F235Lib.sys export list displayed in IDA Pro" /></p> <p>I didn’t notice anything that <em>obviously</em> came from somewhere else, and looking at the list of strings in IDA (much like running the <code class="language-plaintext highlighter-rouge">strings</code> command on the file) didn’t reveal anything either. So I got to work reverse engineering all the different functions, aided somewhat by the presence of these exported function names - about half of the functions in the driver were named, and the other half were unknown.</p> <p>I made a good amount of progress before I decided to make one last attempt at figuring out if there was more to the story. I searched for the name of one of the functions - <code class="language-plaintext highlighter-rouge">GenericHandlePowerIoctl</code> - and found <a href="https://github.com/artemsv/usbjk/blob/master/DOC/LkWork/src/LKsolutions/Driver/BTV/generic/Power.cpp">a match on GitHub</a>. I noticed the name “Walter Oney” at the top of the file:</p> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Power request handler for Generic driver</span> <span class="c1">// Copyright (C) 1999 by Walter Oney</span> <span class="c1">// All rights reserved</span> <span class="c1">// @doc</span> <span class="cp">#include "stddcls.h" </span><span class="c1">// ...</span> </code></pre></div></div> <p>My next search was for “Walter Oney USB”, as I figured this individual had <em>probably</em> (read: definitely) worked on device drivers. That led me to a book called “Programming the Microsoft Windows Driver Model”, <em>written by Oney</em> and published by Microsoft as the official how-to guide for Windows driver programming.</p> <p>A further search for “Programming the Microsoft Windows Driver Model” led me to the <a href="https://resources.oreilly.com/examples/9780735618039">published sample code</a> from the second edition of the book. And sure enough, there was an entire “generic” driver included, with <strong>all of the functions</strong> that I had found within <code class="language-plaintext highlighter-rouge">F235Lib.sys</code>. Although I was somewhat disappointed to find some differences between the published sample code and the compiled code I was looking at, I realized that they could all be explained by the passing of time - the sample code I had obtained was from the <em>second</em>, not first edition of the book, and for that reason was likely younger than the Pakon driver.</p> <p>Despite this surprise, I was able to match almost all functions and <em>all</em> data structures in the compiled <code class="language-plaintext highlighter-rouge">F235Lib.sys</code> driver to the sample source code. The Hex-Rays decompiler really came in handy here. As an aside, just look at this sample decompilation:</p> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NTSTATUS</span> <span class="kr">__stdcall</span> <span class="nf">SequenceCompletionRoutine</span><span class="p">(</span><span class="n">PDEVICE_OBJECT</span> <span class="n">junk</span><span class="p">,</span> <span class="n">PIRP</span> <span class="n">Irp</span><span class="p">,</span> <span class="n">PPOWCONTEXT</span> <span class="n">Context</span><span class="p">)</span> <span class="p">{</span> <span class="n">Context</span><span class="o">-&gt;</span><span class="n">status</span> <span class="o">=</span> <span class="n">Irp</span><span class="o">-&gt;</span><span class="n">IoStatus</span><span class="p">.</span><span class="n">Status</span><span class="p">;</span> <span class="n">HandlePowerEvent</span><span class="p">(</span><span class="n">Context</span><span class="p">,</span> <span class="n">AsyncNotify</span><span class="p">);</span> <span class="n">IoFreeIrp</span><span class="p">(</span><span class="n">Irp</span><span class="p">);</span> <span class="k">return</span> <span class="n">STATUS_MORE_PROCESSING_REQUIRED</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>and compare it to the original code:</p> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NTSTATUS</span> <span class="nf">SequenceCompletionRoutine</span><span class="p">(</span><span class="n">PDEVICE_OBJECT</span> <span class="n">junk</span><span class="p">,</span> <span class="n">PIRP</span> <span class="n">Irp</span><span class="p">,</span> <span class="n">PPOWCONTEXT</span> <span class="n">ctx</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// SequenceCompletionRoutine</span> <span class="p">(</span><span class="kt">void</span><span class="p">)</span><span class="n">junk</span><span class="p">;</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">status</span> <span class="o">=</span> <span class="n">Irp</span><span class="o">-&gt;</span><span class="n">IoStatus</span><span class="p">.</span><span class="n">Status</span><span class="p">;</span> <span class="n">HandlePowerEvent</span><span class="p">(</span><span class="n">ctx</span><span class="p">,</span> <span class="n">AsyncNotify</span><span class="p">);</span> <span class="n">IoFreeIrp</span><span class="p">(</span><span class="n">Irp</span><span class="p">);</span> <span class="k">return</span> <span class="n">STATUS_MORE_PROCESSING_REQUIRED</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// SequenceCompletionRoutine</span> </code></pre></div></div> <p>The differences are so minor that they might as well not exist. There are plenty of excellent examples of the decompiler’s abilities, but this tangent has run its course already.</p> <p>There really isn’t much else to cover for these 2 supplementary drivers - ultimately, they’re just repackaged and modified versions of code written by third parties.</p> <h4 id="back-to-f135usb2">Back to F135usb2</h4> <p>Now that we’ve dealt with the firmware loader and generic driver, it’s time to get back down to business and figure out why we were getting those pesky system crashes.</p> <p>Here’s a reminder of what we were faced with earlier:</p> <blockquote> <p>After a bit of setup, my testing virtual machine was all set for debugging. A few minutes later, and round 2 of testing began…</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>REFERENCE_BY_POINTER (18) Arguments: Arg1: 00000000, Object type of the object whose reference count is being lowered Arg2: 8d704d2c, Object whose reference count is being lowered Arg3: 00000001, Reserved Arg4: ee751000, Reserved </code></pre></div> </div> <p>…</p> <p>A bit more analysis reveals that a call to <code class="language-plaintext highlighter-rouge">ObfDereferenceObject</code> at <code class="language-plaintext highlighter-rouge">F135usb2.sys+0x1db4</code> is to blame.</p> </blockquote> <p>What’s also worth noting about this bug check report is the value of “Arg4” - <code class="language-plaintext highlighter-rouge">0xee751000</code>. According to WinDbg, this bug check can occur</p> <blockquote> <p>when the object’s reference count <strong>drops below zero</strong> whether or not there are open handles to the object; in that case, [Arg4] contains the actual value of the pointer references count.</p> </blockquote> <p><code class="language-plaintext highlighter-rouge">0xee751000</code>, when interpreted as a signed 32-bit integer, is a negative number. My first thought was that a kernel structure was being corrupted, but I couldn’t find anything in the driver that could possibly cause such a thing to happen.</p> <p>With <code class="language-plaintext highlighter-rouge">F135usb2.sys</code> under the binary microscope, let’s try to figure out what’s going on.</p> <p><strong>WARNING: <em>Extremely</em> technical content ahead!</strong> (But if you’ve made it this far, odds are you won’t be scared away now.)</p> <p>According to IDA, the driver’s base address is <code class="language-plaintext highlighter-rouge">0x10000</code>. This means that <code class="language-plaintext highlighter-rouge">F135usb2.sys+0x1db4</code> refers to <code class="language-plaintext highlighter-rouge">0x10000+0x1db4</code>, or <code class="language-plaintext highlighter-rouge">0x11DB4</code>. Going to that address in IDA reveals the following relevant x86 assembly instructions:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>.text:00011DB1 004 8D 4E 34 lea ecx, [esi+34h] ; Object .text:00011DB4 004 FF 15 7C 27 01 00 call ds:ObfDereferenceObject </code></pre></div></div> <p>The first instruction - <code class="language-plaintext highlighter-rouge">lea ecx, [esi+34h]</code> - is taking the value of the <code class="language-plaintext highlighter-rouge">esi</code> register, adding hexadecimal <code class="language-plaintext highlighter-rouge">34</code> (or decimal 52) to it, and storing it into the <code class="language-plaintext highlighter-rouge">ecx</code> register. The second instruction is calling the <code class="language-plaintext highlighter-rouge">ObfDereferenceObject</code> function.</p> <p>Some very rough pseudo-C for these 2 instructions is <code class="language-plaintext highlighter-rouge">ObfDereferenceObject(&amp;esi_struct-&gt;field_0x34)</code>.</p> <p>Now, figuring out what the problem was took me longer than I would have liked. After staring at this code for a fairly long time, though, I finally realized what was going on. The key was to understand what lies at <code class="language-plaintext highlighter-rouge">esi+34h</code>, as opposed to <strong>what <code class="language-plaintext highlighter-rouge">ObfDereferenceObject</code> expects to be given.</strong></p> <p>IDA has a useful feature called “immediate search”, where one can search for constants and offsets used in instructions. Searching for <code class="language-plaintext highlighter-rouge">0x34</code> revealed a block of code at <code class="language-plaintext highlighter-rouge">F135usb2.sys+0x1cef</code> that seemed to be accessing the same field.</p> <p>If you’re wondering where all of the symbol (variable/field/function) names came from, they’re based on my multi-day reverse engineering effort. This particular driver is pretty much a bunch of prewritten code glued together - some from Walter Oney’s book, some from Anchor Chips - with some Pakon-specific code on top, which made it pretty easy to figure out what the original names likely were.</p> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span> <span class="n">ObReferenceObjectByHandle</span><span class="p">(</span> <span class="n">pRingTail</span><span class="o">-&gt;</span><span class="n">EventScanPacketReady</span><span class="p">,</span> <span class="mi">2u</span><span class="p">,</span> <span class="n">ExEventObjectType</span><span class="p">,</span> <span class="n">ctx</span><span class="o">-&gt;</span><span class="n">RequestorMode</span><span class="p">,</span> <span class="c1">// `EventScanPacketReady` is the field at offset 0x34 in `ctx`. </span> <span class="c1">// Corresponding instructions: </span> <span class="c1">// lea ecx, [esi+34h]</span> <span class="c1">// push ecx</span> <span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">EventScanPacketReady</span><span class="p">,</span> <span class="mi">0u</span><span class="p">)</span> <span class="o">&gt;=</span> <span class="mi">0</span> <span class="p">)</span> <span class="p">{</span> <span class="c1">// do some stuff</span> <span class="p">}</span> </code></pre></div></div> <p>This seems like pretty clear evidence that we found the right code. Earlier I mentioned the “<code class="language-plaintext highlighter-rouge">ObReferenceObject</code> family of functions”, and here we see one of them being used: <code class="language-plaintext highlighter-rouge">ObReferenceObjectByHandle</code>. Let’s read the friendly manual!</p> <blockquote> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">NTSTATUS</span> <span class="nf">ObReferenceObjectByHandle</span><span class="p">(</span> <span class="p">[</span><span class="n">in</span><span class="p">]</span> <span class="n">HANDLE</span> <span class="n">Handle</span><span class="p">,</span> <span class="p">[</span><span class="n">in</span><span class="p">]</span> <span class="n">ACCESS_MASK</span> <span class="n">DesiredAccess</span><span class="p">,</span> <span class="p">[</span><span class="n">in</span><span class="p">,</span> <span class="n">optional</span><span class="p">]</span> <span class="n">POBJECT_TYPE</span> <span class="n">ObjectType</span><span class="p">,</span> <span class="p">[</span><span class="n">in</span><span class="p">]</span> <span class="n">KPROCESSOR_MODE</span> <span class="n">AccessMode</span><span class="p">,</span> <span class="p">[</span><span class="n">out</span><span class="p">]</span> <span class="n">PVOID</span> <span class="o">*</span><span class="n">Object</span><span class="p">,</span> <span class="p">[</span><span class="n">out</span><span class="p">,</span> <span class="n">optional</span><span class="p">]</span> <span class="n">POBJECT_HANDLE_INFORMATION</span> <span class="n">HandleInformation</span> <span class="p">);</span> </code></pre></div> </div> <p><code class="language-plaintext highlighter-rouge">[in] Handle</code>: Specifies an open handle for an object.</p> <p>…</p> <p><code class="language-plaintext highlighter-rouge">[out] Object</code>: Pointer to a variable that receives a pointer to the object’s body. The following table contains the pointer types.</p> <p>…</p> </blockquote> <p>So, let’s get this straight:</p> <ul> <li>The <code class="language-plaintext highlighter-rouge">Object</code> parameter is a “pointer to a variable that receives a pointer to the object’s body.” In other words, it is a <strong>pointer to a pointer</strong>, which allows the kernel to “fill in the blank”, so to speak.</li> <li>The driver is passing <code class="language-plaintext highlighter-rouge">&amp;ctx-&gt;EventScanPacketReady</code> as the <code class="language-plaintext highlighter-rouge">Object</code> parameter. <strong>This is totally fine and correct!</strong> <code class="language-plaintext highlighter-rouge">ctx-&gt;EventScanPacketReady</code> will be set by the kernel since we provided a <strong>pointer</strong> (or reference) to it.</li> <li>When this is done, <code class="language-plaintext highlighter-rouge">ctx-&gt;EventScanPacketReady</code> (note the lack of <code class="language-plaintext highlighter-rouge">&amp;</code>) will be a <strong>pointer to the event object’s body.</strong></li> </ul> <p>Now, let’s look at the code that’s calling <code class="language-plaintext highlighter-rouge">ObDereferenceObject</code>:</p> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="kr">__stdcall</span> <span class="nf">ReleaseContextResources</span><span class="p">(</span><span class="n">PRWCONTEXT</span> <span class="n">ctx</span><span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span> <span class="p">...</span> <span class="p">)</span> <span class="p">{</span> <span class="n">ObDereferenceObject</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ctx</span><span class="o">-&gt;</span><span class="n">EventScanPacketReady</span><span class="p">);</span> <span class="c1">// ...</span> <span class="p">}</span> <span class="p">}</span> </code></pre></div></div> <p>and at the documentation for <code class="language-plaintext highlighter-rouge">ObDereferenceObject</code>…</p> <blockquote> <p>The <strong><code class="language-plaintext highlighter-rouge">ObDereferenceObject</code></strong> routine decrements the given object’s reference count and performs retention checks.</p> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">ObDereferenceObject</span><span class="p">(</span> <span class="p">[</span><span class="n">in</span><span class="p">]</span> <span class="n">a</span> <span class="p">);</span> </code></pre></div> </div> <p><code class="language-plaintext highlighter-rouge">[in] a</code>: Pointer to the object’s body.</p> </blockquote> <p>Hold on a second. <code class="language-plaintext highlighter-rouge">ObDereferenceObject</code> wants a “pointer to the object’s body”, but we’re giving it <code class="language-plaintext highlighter-rouge">&amp;ctx-&gt;EventScanPacketReady</code>… which is <em>a pointer to the pointer to the object’s body.</em> The reason the system crashes is because <em>we’re misdirecting it.</em> Remember this from earlier:</p> <blockquote> <p>What’s also worth noting about this bug check report is the value of “Arg4” - <code class="language-plaintext highlighter-rouge">0xee751000</code>. According to WinDbg, this bug check can occur</p> <blockquote> <p>when the object’s reference count <strong>drops below zero</strong> whether or not there are open handles to the object; in that case, [Arg4] contains the actual value of the pointer references count.</p> </blockquote> <p><code class="language-plaintext highlighter-rouge">0xee751000</code>, when interpreted as a signed 32-bit integer, is a negative number.</p> </blockquote> <p>Recall that we’re supposed to provide <code class="language-plaintext highlighter-rouge">ObDereferenceObject</code> with a “pointer to the object’s <em>body.</em>” Every kernel object has a <em>header</em> that comes immediately before the body, and the header stores the reference count! Because we were providing a pointer to the body pointer, rather than the body pointer itself, the kernel was reading the object header from the wrong place. This explains the weird reference count.</p> <p>Additionally, using WinDbg’s <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/-object">!object</a> command on the supposed object body pointer (<code class="language-plaintext highlighter-rouge">8d704d2c</code>, from the bug check report) gives us a pretty damning error:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1: kd&gt; !object 8d704d2c 8d704d2c: Not a valid object (ObjectType invalid) </code></pre></div></div> <p>It’s pretty obvious now that we’ve found the issue. For some reason, Windows XP’s implementation of <code class="language-plaintext highlighter-rouge">ObfDereferenceObject</code> doesn’t validate the object’s reference count, but that seems to have changed in Windows Vista. (This may be one of the only good parts of Vista.) Essentially, the only reason this code worked at all was an implementation detail.</p> <p>The solution is almost infuriating in its simplicity: at <code class="language-plaintext highlighter-rouge">F135usb2.sys+0x1db1</code>, replace <code class="language-plaintext highlighter-rouge">lea ecx, [esi+34h]</code> with <code class="language-plaintext highlighter-rouge">mov ecx, [esi+34h]</code>. Thankfully, this can be done by changing a single byte: <code class="language-plaintext highlighter-rouge">8d 4e 34</code> changes to <code class="language-plaintext highlighter-rouge">8b 4e 34</code>. (A more precise search-replace is: <code class="language-plaintext highlighter-rouge">83 7e 24 00 74 2a 8d 4e 34</code> -&gt; <code class="language-plaintext highlighter-rouge">83 7e 24 00 74 2a 8b 4e 34</code>)</p> <p>In source-code, this would amount to changing <code class="language-plaintext highlighter-rouge">ObDereferenceObject(&amp;ctx-&gt;EventScanPacketReady);</code> to <code class="language-plaintext highlighter-rouge">ObDereferenceObject(ctx-&gt;EventScanPacketReady);</code>. That’s right - <strong>a single rogue ampersand is to blame for the catastrophic failure.</strong></p> <p>Re-install the patched driver, try a scan, and… it just works! Luckily, the exact same patch can be applied to the drivers for the other scanner models.</p> <p>At this point, I was ready to release what I had and then call it quits. I realized that a 32-bit device driver had no chance at running on 64-bit Windows, and I figured this was likely the end of the road. I had a change of heart, though, and decided to give it a shot anyway.</p> <h2 id="the-64-bit-journey">The 64-bit journey</h2> <p>Much to the chagrin of many people who wish to use very old external devices that require special drivers, Windows doesn’t provide a compatibility layer for kernel drivers like it does for normal user-facing applications with <a href="https://docs.microsoft.com/en-us/windows/win32/winprog64/running-32-bit-applications">WoW64</a>. This means that all drivers <strong>must</strong> be compiled to run on a 64-bit system. One of the major obstacles to 64-bit support is dealing with the use of “pointer-precision” data - 32-bit systems use 32-bit pointers, but 64-bit systems use 64-bit pointers, and this difference in size has a whole bunch of cascading effects.</p> <p>The “obvious” solution to this problem is to <em>just decompile the driver and recompile it</em>, but that’s much easier said than done. It requires a complete reverse-engineering of the original code, especially anything that will end up using pointer-precision data somehow (for example, structures that store pointers.) Not one to shy away from a seemingly impossible challenge, I decided to give it a shot.</p> <h3 id="rebuilding-the-firmware-loader-driver">Rebuilding the firmware loader driver</h3> <p>My first target was the firmware loader driver - <code class="language-plaintext highlighter-rouge">F235Ldr.sys</code>. My goal with each driver was to get as much code to cleanly decompile as possible, and clean up the rest manually. The loader was relatively easy to deal with, as it only had 24 relatively simple functions.</p> <p><img src="/assets/2022-08-28-pakon-reverse-engineering/f235ldr_re_functions.png" alt="A screenshot of reverse engineered function names shown in IDA Pro." /></p> <p>Interestingly, some of the “Ezusb” functions differed slightly from the sample code that came in my kit - I guess the Pakon developers were working with a later revision. Regardless, within a couple of days I had a 64-bit version of the firmware loader that <em>compiled</em> successfully. After fixing a minor mistake that came about while I was trying to get rid of some deprecation warnings<sup id="fnref:5" role="doc-noteref"><a href="#fn:5" class="footnote" rel="footnote">5</a></sup>, the driver sprang into action and successfully downloaded firmware to the scanner!</p> <h3 id="rebuilding-the-device-driver">Rebuilding the device driver</h3> <p>As we discussed earlier, the Pakon <em>device</em> drivers consist of 2 components: Walter Oney’s “generic” driver (<code class="language-plaintext highlighter-rouge">F235Lib.sys</code>) and the actual device-specific driver (<code class="language-plaintext highlighter-rouge">F*35usb2.sys</code>.) I felt like doing things a bit differently - instead of having 2 separate drivers, I chose to merge them. I also took the liberty of using the newer version of Oney’s code, since it was readily available and presumably<sup id="fnref:6" role="doc-noteref"><a href="#fn:6" class="footnote" rel="footnote">6</a></sup> a bit more refined than the Pakon version. After a <em>lot</em> more decompilation and cleanup, I ran a test…</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>F135USB - Configuring device from Pakon F135USB - Product is F135-USB Film Scanner F135USB - Serial number is xxx-yyy-zz F135USB - Device reports 3 endpoints F135usb2 - To WORKING from STOPPED F135usb2 - PNP Request (IRP_MN_QUERY_CAPABILITIES) F135usb2 - PNP Request (IRP_MN_QUERY_PNP_DEVICE_STATE) F135usb2 - PNP Request (IRP_MN_QUERY_DEVICE_RELATIONS) </code></pre></div></div> <p>Excellent! It seemed to be working - at least until I <em>unplugged</em> the scanner, at which point things started to go wrong. Turns out the Generic driver had a pretty subtle bug…</p> <h4 id="unplug-and-pray">Unplug-and-pray</h4> <p>One of the many things the Generic driver takes care of is supporting <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/kernel/introduction-to-plug-and-play">Plug and Play</a>. For the most part, this just works and I don’t need to worry about it. However, there was one thing that clearly <em>didn’t</em> work - whatever code ran as soon as the device was no longer available.</p> <p>WinDbg helpfully informed me that something was going wrong with these two functions:</p> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Code is slightly modified from the original to remove some unimportant details</span> <span class="n">VOID</span> <span class="nf">DeregisterAllInterfaces</span><span class="p">(</span><span class="n">PGENERIC_EXTENSION</span> <span class="n">pdx</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">IsListEmpty</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pdx</span><span class="o">-&gt;</span><span class="n">iflist</span><span class="p">))</span> <span class="p">{</span> <span class="n">PLIST_ENTRY</span> <span class="n">list</span> <span class="o">=</span> <span class="n">RemoveHeadList</span><span class="p">(</span><span class="o">&amp;</span><span class="n">pdx</span><span class="o">-&gt;</span><span class="n">iflist</span><span class="p">);</span> <span class="n">PINTERFACE_RECORD</span> <span class="n">ifp</span> <span class="o">=</span> <span class="n">CONTAINING_RECORD</span><span class="p">(</span><span class="n">list</span><span class="p">,</span> <span class="n">INTERFACE_RECORD</span><span class="p">,</span> <span class="n">list</span><span class="p">);</span> <span class="n">DeregisterInterface</span><span class="p">(</span><span class="n">pdx</span><span class="p">,</span> <span class="n">ifp</span><span class="p">);</span> <span class="p">}</span> <span class="c1">// ...</span> <span class="p">}</span> <span class="n">VOID</span> <span class="nf">DeregisterInterface</span><span class="p">(</span><span class="n">PGENERIC_EXTENSION</span> <span class="n">pdx</span><span class="p">,</span> <span class="n">PINTERFACE_RECORD</span> <span class="n">ifp</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="n">RemoveEntryList</span><span class="p">(</span><span class="o">&amp;</span><span class="n">ifp</span><span class="o">-&gt;</span><span class="n">list</span><span class="p">);</span> <span class="c1">// ...</span> <span class="p">}</span> </code></pre></div></div> <p>The calls to <code class="language-plaintext highlighter-rouge">DeregisterInterface</code> from <code class="language-plaintext highlighter-rouge">DeregisterAllInterfaces</code> were failing, and bringing the entire system down with them! <code class="language-plaintext highlighter-rouge">RemoveEntryList</code> in particular was causing this, and figuring out why didn’t take too long: <code class="language-plaintext highlighter-rouge">DeregisterAllInterfaces</code> was removing an item from a list (using <code class="language-plaintext highlighter-rouge">RemoveHeadList</code>), and passing that item to <code class="language-plaintext highlighter-rouge">DeregisterInterface</code>… which then tried to remove it <em>again.</em> Pretty subtle, and it didn’t cause any issues in older versions of Windows, so I can understand how it went unnoticed.</p> <p>Fixing this was trivial - I simply added a flag parameter to <code class="language-plaintext highlighter-rouge">DeregisterInterface</code> to dictate whether <code class="language-plaintext highlighter-rouge">RemoveEntryList</code> should be called.<sup id="fnref:7" role="doc-noteref"><a href="#fn:7" class="footnote" rel="footnote">7</a></sup> From then on, I was free to power cycle the scanner with impunity. My next test: trying to run a scan.</p> <h4 id="i-can-not-has-pictures">I can (not) has pictures?</h4> <p>Considering how well things had gone so far, I was sure that scanning would <em>just work.</em> I launched TLX (remember that?) and was perplexed by what happened. TLX has 2 status indicators, one for “Scan” and one for “Save.” <em>Usually</em>, the “Scan” status would start off as “Initializing Scanner” and then quickly change to “Idle”, indicating it was all ready to go. In this test, however, it got stuck on “Initializing Scanner.” Thus began a journey down yet another rabbit hole.</p> <p>In the background, I had also been reversing <code class="language-plaintext highlighter-rouge">TLB.dll</code> in hopes of learning about the communication protocol. Thanks to the abundance of detailed error codes, I was able to assign names to a lot of functions, <em>including</em> some relevant to communication! I suspected that the initialization issue was likely related to communications, so I started looking for ways to see what communications were happening.</p> <p>My first attempt at this was on the kernel driver side, as I already knew which function in the device driver was responsible for facilitating this type of communication. I added some code to print <a href="https://en.wikipedia.org/wiki/Hex_dump">hex dumps</a> of the packets, and this is what I saw:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Input buffer (client message): 04 03 10 00 85 Output buffer (scanner response): 07 02 10 00 Input buffer: 03 01 00 Output buffer: 07 02 00 09 </code></pre></div></div> <p>Without context this is meaningless, but let’s compare it to the opening of a session recorded on a <em>32-bit</em> system with the original drivers, using WinDbg for the logging this time:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Input buffer: 04 03 10 00 85 Output buffer: 07 02 10 00 Input buffer: 02 04 10 01 8f 00 Output buffer: 07 02 10 00 </code></pre></div></div> <p>Clearly, something strange happened after the first message/response exchange of my 64-bit test. By reversing TLB’s communication code, I was able to determine the structure of the packets:</p> <table> <thead> <tr> <th>offset</th> <th>type</th> <th>name</th> <th>description</th> </tr> </thead> <tbody> <tr> <td>0</td> <td>byte (enum)</td> <td>type</td> <td>Packet type</td> </tr> <tr> <td>1</td> <td>byte</td> <td>count</td> <td>Packet data length</td> </tr> <tr> <td>2</td> <td>bytes</td> <td>data</td> <td>Packet data (<code class="language-plaintext highlighter-rouge">count</code> bytes, up to 34)</td> </tr> </tbody> </table> <p>The basic structure of the packet data is as follows:</p> <table> <thead> <tr> <th>offset</th> <th>type</th> <th>name</th> <th>description</th> </tr> </thead> <tbody> <tr> <td>0</td> <td>byte (enum)</td> <td>address</td> <td>Unknown purpose, but known possible values</td> </tr> <tr> <td>1</td> <td>bytes</td> <td>data</td> <td>Context-dependent. For scanner, the first byte of this section is a status code.</td> </tr> </tbody> </table> <p>Addresses are:</p> <table> <thead> <tr> <th>Address</th> <th>Name</th> </tr> </thead> <tbody> <tr> <td>0x10</td> <td>AD_HOST</td> </tr> <tr> <td>0x20</td> <td>AD_PICL</td> </tr> <tr> <td>0x22</td> <td>AD_BOOT_PICL</td> </tr> <tr> <td>0x24</td> <td>AD_PICM</td> </tr> <tr> <td>0x26</td> <td>AD_BOOT_PICM</td> </tr> <tr> <td>0x28</td> <td>unknown</td> </tr> <tr> <td>0x40</td> <td>AD_PICL_PLUS</td> </tr> <tr> <td>0x42</td> <td>AD_BOOT_PICL_PLUS</td> </tr> <tr> <td>0x44</td> <td>AD_PICM_PLUS</td> </tr> <tr> <td>0x46</td> <td>AD_BOOT_PICM_PLUS</td> </tr> </tbody> </table> <p>Scanner status codes are:</p> <table> <thead> <tr> <th>Code</th> <th>Meaning</th> </tr> </thead> <tbody> <tr> <td>0</td> <td>Success</td> </tr> <tr> <td>1</td> <td>Packet not acknowledged</td> </tr> <tr> <td>2</td> <td>Invalid packet</td> </tr> <tr> <td>3</td> <td>Invalid checksum</td> </tr> <tr> <td>4-6</td> <td>Something to do with USB?</td> </tr> <tr> <td>7</td> <td>Unknown: “EC_DRV_PacketHostErrorAlgo”</td> </tr> <tr> <td>8</td> <td>Success</td> </tr> <tr> <td>9</td> <td>Bus error</td> </tr> </tbody> </table> <p>This tells us that the <strong>4th byte</strong> of a packet from the scanner is a <strong>status code.</strong> If it’s 0 or 8, everything is fine, but if it’s <em>not</em>, something’s gone wrong. Let’s look at the 64-bit test again:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Input buffer (client message): 04 03 10 00 85 Output buffer (scanner response): 07 02 10 00 Input buffer: 03 01 00 Output buffer: 07 02 00 09 </code></pre></div></div> <p>The first exchange in this log is fine - the scanner replies with a status code of 0 - but in the <em>second</em> exchange, the scanner replies with a status code of 9. This indicates a “bus error.” Not only that, but <code class="language-plaintext highlighter-rouge">03 01 00</code> looks <em>nothing at all</em> like <code class="language-plaintext highlighter-rouge">02 04 10 01 8f 00</code>! What happened here?</p> <h4 id="the-perils-of-lying-to-the-os">The perils of lying to the OS</h4> <p>I was so perplexed by this issue that I went right back into my debugger and started tracing the events that unfolded upon launching TLX. After collecting some data, I was able to identify the source of this weird packet: a function named <code class="language-plaintext highlighter-rouge">CiCmdComm::bDrvGetPpbDeviceReadyNL</code><sup id="fnref:8" role="doc-noteref"><a href="#fn:8" class="footnote" rel="footnote">8</a></sup>.</p> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="c1">// ...</span> <span class="k">if</span> <span class="p">(</span> <span class="n">CiCmdComm</span><span class="o">::</span><span class="n">bDrvOpen</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="n">errorHandler</span><span class="p">)</span> <span class="p">)</span> <span class="p">{</span> <span class="k">if</span> <span class="p">(</span> <span class="n">address</span> <span class="o">==</span> <span class="n">AD_HOST</span> <span class="p">)</span> <span class="p">{</span> <span class="k">return</span> <span class="mi">1</span><span class="p">;</span> <span class="p">}</span> <span class="k">else</span> <span class="p">{</span> <span class="n">InBuffer</span><span class="p">.</span><span class="n">type</span> <span class="o">=</span> <span class="n">PH_READ_STATUS</span><span class="p">;</span> <span class="n">InBuffer</span><span class="p">.</span><span class="n">count</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="c1">// address is junk!!!</span> <span class="n">InBuffer</span><span class="p">.</span><span class="n">address</span> <span class="o">=</span> <span class="n">address</span><span class="p">;</span> <span class="c1">// start a busy loop, sending this packet repeatedly!</span> <span class="p">}</span> <span class="c1">// ...</span> <span class="p">}</span> </code></pre></div></div> <p>Note the comment about <code class="language-plaintext highlighter-rouge">address</code> being “junk” - recall that the <em>third</em> byte of every packet is an “address”, and <code class="language-plaintext highlighter-rouge">00</code> does <em>not</em> correspond to a valid address! The question is… where did <code class="language-plaintext highlighter-rouge">00</code> come from?</p> <p>Working my way down the stack, I found that <code class="language-plaintext highlighter-rouge">CiCmdComm::bDrvGetPpbDeviceReadyNL</code> was being called from a function named <code class="language-plaintext highlighter-rouge">CiCmdComm::bDrvWritePacketNL</code>:</p> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">__thiscall</span> <span class="n">CiCmdComm</span><span class="o">::</span><span class="n">bDrvWritePacketNL</span><span class="p">(</span><span class="n">CiCmdComm</span> <span class="o">*</span><span class="k">this</span><span class="p">,</span> <span class="n">CiErrorHandler</span> <span class="o">*</span><span class="n">errorHandler</span><span class="p">,</span> <span class="n">PPB_REQUEST</span> <span class="o">*</span><span class="n">requestPacket</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="k">while</span> <span class="p">(</span> <span class="n">CiCmdComm</span><span class="o">::</span><span class="n">bDrvPacketExecuteNL</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="n">errorHandler</span><span class="p">,</span> <span class="n">requestPacket</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">responsePacket</span><span class="p">)</span> <span class="p">)</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="k">if</span> <span class="p">(</span> <span class="o">!</span><span class="n">CiCmdComm</span><span class="o">::</span><span class="n">bDrvGetPpbDeviceReadyNL</span><span class="p">(</span><span class="k">this</span><span class="p">,</span> <span class="n">errorHandler</span><span class="p">,</span> <span class="n">requestPacket</span><span class="o">-&gt;</span><span class="n">address</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">responsePacket</span><span class="p">.</span><span class="n">status</span><span class="p">)</span> <span class="o">||</span> <span class="o">!</span><span class="n">CiCmdComm</span><span class="o">::</span><span class="n">bDrvPacketHandleErrorNL</span><span class="p">(</span> <span class="k">this</span><span class="p">,</span> <span class="n">errorHandler</span><span class="p">,</span> <span class="n">requestPacket</span><span class="o">-&gt;</span><span class="n">address</span><span class="p">,</span> <span class="n">requestPacket</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">responsePacket</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">v13</span><span class="p">,</span> <span class="o">&amp;</span><span class="n">v10</span><span class="p">)</span> <span class="p">)</span> <span class="p">{</span> <span class="k">break</span><span class="p">;</span> <span class="p">}</span> <span class="c1">// ...</span> <span class="p">}</span> <span class="c1">// ...</span> <span class="p">}</span> <span class="n">CiErrorHandler</span><span class="o">::</span><span class="n">LogError</span><span class="p">(</span><span class="n">errorHandler</span><span class="p">,</span> <span class="k">this</span><span class="o">-&gt;</span><span class="n">classId</span><span class="p">,</span> <span class="n">FN_bDrvWritePacketNL</span><span class="p">,</span> <span class="n">EC_PreviousError</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">0</span><span class="p">);</span> <span class="k">return</span> <span class="mi">0</span><span class="p">;</span> <span class="p">}</span> </code></pre></div></div> <p>At this point, I realized I was going to have to get more precise with my debugging if I wanted to figure out exactly where things were going wrong. My first revelation was that the call to <code class="language-plaintext highlighter-rouge">CiCmdComm::bDrvPacketExecuteNL</code> was <strong>corrupting the <code class="language-plaintext highlighter-rouge">requestPacket</code>!</strong></p> <p>In <code class="language-plaintext highlighter-rouge">CiCmdComm::bDrvWritePacketNL</code>, the <code class="language-plaintext highlighter-rouge">requestPacket</code> started out totally valid:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Stack[00001CDC]:04F8FD58 db 4 ; packet type: PH_CMD Stack[00001CDC]:04F8FD59 db 3 ; data length: 3 bytes Stack[00001CDC]:04F8FD5A db 10h ; address: AD_HOST Stack[00001CDC]:04F8FD5B db 0 ; unknown Stack[00001CDC]:04F8FD5C db 85h ; unknown </code></pre></div></div> <p>but oddly, after <code class="language-plaintext highlighter-rouge">CiCmdComm::bDrvPacketExecuteNL</code> was called, the first 4 bytes vanished!</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Stack[00001CDC]:04F8FD58 db 0 ; packet type: PH_INVALID (0) Stack[00001CDC]:04F8FD59 db 0 ; data length: 0 bytes Stack[00001CDC]:04F8FD5A db 0 ; address: undefined (0) Stack[00001CDC]:04F8FD5B db 0 ; unknown Stack[00001CDC]:04F8FD5C db 85h ; unknown </code></pre></div></div> <p>Next, I stepped through <code class="language-plaintext highlighter-rouge">CiCmdComm::bDrvPacketExecuteNL</code>, and discovered that the data corruption was happening <em>after the driver had finished processing the request.</em> <code class="language-plaintext highlighter-rouge">CiCmdComm::bDrvPacketExecuteNL</code> uses a function named <a href="https://docs.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-deviceiocontrol"><code class="language-plaintext highlighter-rouge">DeviceIoControl</code></a> to send packets to the driver, and <em>immediately after</em> the call to <code class="language-plaintext highlighter-rouge">DeviceIoControl</code>, I observed the data corruption!</p> <div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">int</span> <span class="n">__thiscall</span> <span class="n">CiCmdComm</span><span class="o">::</span><span class="n">bDrvPacketExecuteNL</span><span class="p">(</span> <span class="n">CiCmdComm</span> <span class="o">*</span><span class="k">this</span><span class="p">,</span> <span class="n">CiErrorHandler</span> <span class="o">*</span><span class="n">errorHandler</span><span class="p">,</span> <span class="n">PPB_PACKET</span> <span class="o">*</span><span class="n">requestPacket</span><span class="p">,</span> <span class="n">PPB_PACKET</span> <span class="o">*</span><span class="n">responsePacket</span><span class="p">)</span> <span class="p">{</span> <span class="c1">// ...</span> <span class="k">if</span> <span class="p">(</span> <span class="o">!</span><span class="n">DeviceIoControl</span><span class="p">(</span> <span class="o">*</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">pDeviceFileHandle</span><span class="p">,</span> <span class="mh">0x222090u</span><span class="p">,</span> <span class="c1">// IO control code for packet exchange</span> <span class="n">requestPacket</span><span class="p">,</span> <span class="c1">// Input buffer</span> <span class="n">requestPacket</span><span class="o">-&gt;</span><span class="n">count</span> <span class="o">+</span> <span class="mi">2</span><span class="p">,</span> <span class="c1">// Size of input buffer</span> <span class="n">responsePacket</span><span class="p">,</span> <span class="c1">// Output buffer</span> <span class="mh">0x40u</span><span class="p">,</span> <span class="c1">// Size of output buffer</span> <span class="o">&amp;</span><span class="n">numBytesReturned</span><span class="p">,</span> <span class="o">&amp;</span><span class="k">this</span><span class="o">-&gt;</span><span class="n">m_DriverOverlappedPPB</span><span class="p">)</span> <span class="p">)</span> <span class="p">{</span> <span class="c1">// error handling</span> <span class="p">}</span> <span class="c1">// ... a bunch of other stuff</span> <span class="p">}</span> </code></pre></div></div> <p>Notice the “size of output buffer” supplied to <code class="language-plaintext highlighter-rouge">DeviceIoControl</code> - <code class="language-plaintext highlighter-rouge">0x40</code>, or 64. Therein lies the problem - <strong>the structure used for <code class="language-plaintext highlighter-rouge">responsePacket</code> is only 36 bytes in length.</strong> As a result of this inconsistency, <em>something</em> (possibly the USB stack) was zeroing out memory that it really shouldn’t have, which ultimately led to the strange packet we saw!</p> <p>I have no clue how this code <em>ever</em> worked (thanks, implementation details…), but that’s beside the point. After making another one-byte patch to fix this, I was delighted to see TLX progress from “Initializing Scanner” all the way to “Idle” as I had hoped for!</p> <h4 id="the-final-frontier---scanning">The final frontier - scanning</h4> <p>It turned out that I wasn’t quite out of the woods yet - attempting to run a scan resulted in a mysterious error.</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>CN_CiScanner FN_bCalibrateEndDataFlow EC_WIN_GetOverlappedResult (177) The parameter is incorrect. CN_CiScanner FN_bCalibrateFindCorrections EC_PreviousError (25) 0 CN_CiScanner FN_bBeforeScan EC_PreviousError (25) 0 CN_Global FN_FuncScanPictures EC_PreviousError (25) 0 CN_Global FN_FuncScanPictures EC_PreviousError (25) 0 </code></pre></div></div> <p>My driver reported a similar issue:</p> <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>F135USB - RingPacketComplete:103 [ERROR]: Ring packet info has failing status: 80000300 (USBD_STATUS_INVALID_PARAMETER) </code></pre></div></div> <p>I couldn’t figure out where this status code was even <em>coming</em> from at first, but the name gave me a hint: it had something to do with the USB stack. I eventually realized that I had failed to recognize a certain data structure for what it really was, and that some of my driver code was almost certainly incorrect as a result. Ironically, this idea came to me just as I was going to sleep. When I got up the following morning I updated the driver to account for my discovery, crossed my fingers, and clicked the “scan” button for the millionth time…</p> <p>…and it worked! After the equivalent of a full work week dedicated to these custom drivers, they finally worked - not just on Windows 10, but Windows 11 too. Some extra tweaks were necessary to prevent various crashes, like one that only occurred when a USB 3.0 controller was being used, as well as to fix some issues detected by the <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/devtest/driver-verifier">Driver Verifier</a>.</p> <p><img src="/assets/2022-08-28-pakon-reverse-engineering/example_test_scan2.jpg" alt="Image of a successful scan on Windows 11" /></p> <h2 id="the-end">The end</h2> <p>In the course of this article, the Pakon film scanners went from being a user’s nightmare to being totally usable with modern versions of Windows. While not all of the Pakon client software has stood the test of time - the PSI application, for example, isn’t 100% functional - this opens the door for so much more development, potentially including a new scanning client.</p> <p>There’s a lot that I <em>didn’t</em> discuss in this article, including a lot of the details of my reverse engineering process. In the future I might write an article specifically about that, as I learned a lot of valuable information about how the scanner really works that I think is well worth sharing.</p> <p>If you made it this far, I applaud you. Thanks for reading!</p> <hr /> <div class="footnotes" role="doc-endnotes"> <ol> <li id="fn:1" role="doc-endnote"> <p>I’m intentionally omitting discussion of the early days of film photography, partly because I’m not the most educated in that area, but also because we’re talking about minilabs here. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:2" role="doc-endnote"> <p>One such post-processing technique is known as “Digital ICE” (ICE stands for “Image Correction and Enhancement”), and it’s really cool. <a href="https://www.youtube.com/watch?v=E0LVjGp1Wtc">This excellent video</a> goes into more detail about how it works. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:3" role="doc-endnote"> <p>That’s right, we’re talking about C now. Don’t be afraid, it won’t get too horrible. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:4" role="doc-endnote"> <p>Anchor Chips was <a href="https://mergr.com/cypress-semiconductors-acquires-anchor-chips">acquired</a> by Cypress Semiconductor in 1999. Cypress Semiconductor was itself <a href="https://www.infineon.com/cms/en/about-infineon/press/press-releases/2020/INFXX202004-049.html">acquired</a> by Infineon Technologies in 2020. At this rate, <a href="https://xkcd.com/605/">we can expect</a> Infineon Technologies to be acquired in 2041. <a href="#fnref:4" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:5" role="doc-endnote"> <p>Older drivers typically use a function called <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-exallocatepoolwithtag"><code class="language-plaintext highlighter-rouge">ExAllocatePoolWithTag</code></a> to perform memory allocations, with the “tag” being an identifier of some sort for the allocation. In the latest versions of Windows, this function is deprecated in favor of a new one called <a href="https://docs.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-exallocatepool2"><code class="language-plaintext highlighter-rouge">ExAllocatePool2</code></a>, which has a subtly different signature. I did not account for this at first, and as a result, every single memory allocation was doomed to fail. <a href="#fnref:5" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:6" role="doc-endnote"> <p>I didn’t realize what was coming. <a href="#fnref:6" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:7" role="doc-endnote"> <p>I later came up with a slightly more elegant solution, but elegance is not really important at this point. <a href="#fnref:7" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> <li id="fn:8" role="doc-endnote"> <p>My interpretation: “Drv” means “Driver”, “Ppb” refers to the protocol (this is heavily implied in various places), and “NL” means “no lock” (i.e., not using a mutex or spin lock or any form of concurrency control - this aligns with the code) <a href="#fnref:8" class="reversefootnote" role="doc-backlink">&#8617;</a></p> </li> </ol> </div>Kai Kaufman[email protected]Introduction