Kai Kaufman’s tech blog

Bringing runtime checks to compile time in Rust

2023-04-20T02:30:00+00:00

Introduction

For the past couple of months, I’ve been participating in the MITRE Embedded Capture the Flag, or eCTF for short, with a team of my university peers. As the name suggests, the eCTF involves writing secure firmware for a Tiva C microcontroller, which presents some interesting challenges. Many teams chose to use C, likely because that’s what’s taught in most (if not all) university-level embedded systems courses. We wanted to live less dangerously, however, and we opted to use Rust instead.

Using Rust was more or less a no-brainer for us, because of the language’s strong memory-safety guarantees, excellent library ecosystem and developer experience, and… all the other good things about Rust. This isn’t an article about why Rust is the best thing ever - instead, we’re going to look at some real examples (from my team’s eCTF work) to see how Rust can be (and was) leveraged to enhance confidence in the correctness of code!

Quick primer

My implementations of compile-time checks relied on multiple Rust features, some more obscure than others:

Associated constants allow us to add constant members to types, either directly or indirectly through a trait. Think of these as the equivalent to (for example) static final class members in Java… but better, because they exist at compile time!
Const generics allow us to pass values, rather than types, as generic parameters. (Before const generics were implemented and stabilized, you couldn’t represent things such as “an array of any size” natively - only by using a library like generic_array.)
Const panics allow us to abort compilation if necessary, by panicking during constant evaluation. We even get to choose the error message!

With these three tools, a lot can be done.

Other terms to know:

A reference is basically a pointer without the footguns. The Rust book goes into more detail. A reference to a type T is represented as &T.
An array is a fixed-length collection of items of a single type, usually represented as [T; N] (where T is the item type, and N is the length.) Arrays cannot be resized.
An array reference is simply a reference to an array, represented as &[T; N].
A slice (usually represented as [T], with T still being the item type) is a subsection of a larger collection, such as an array. Slices cannot be passed or stored directly - you can only work with references to them.
Something being infallible means that it cannot fail. If it does, that’s a bug.

Example 1: Slicing and indexing arrays/array references

Shameless plug

I’ve polished and implemented the system described here as part of a library that I released recently. If you like what you see here, maybe consider checking out the more refined version. I cannot overstate how much time it saved me!

Background

My team’s firmware made extensive use of slicing and indexing in order to (de)serialize data, among many, many other things. Ironically, these two simple tasks are where things start to get complicated.

The first problem: slicing

Consider the following Rust program (which is valid and will compile):

fn main() {
    // yes, I know I don't need to specify the types explicitly.
    // there's a point to this :)
    let x: [i32; 6] = [1, 2, 3, 4, 5, 6];
    let y: i32 = x[0];
    let z /*: ??? */ = &x[4..6];
}

We haven’t specified the type for z, so it’s up to the compiler to figure it out. What I (and, I suspect, many new users) would expect is for a process like this to occur:

x (which we are slicing) is a [i32; 6].
The slice index is 4..6; in interval notation, that’s [4, 6), and in “list of numbers” that’s 4 and 5. So, we want 2 items starting from index 4.
The result of selecting 2 items should be, well, 2 items, or [i32; 2] in this case.
Since we’re only taking a reference instead of “moving” the actual values, z should be &[i32; 2].

Unfortunately, all we get is a &[i32] - a reference to a slice. If we try to explicitly specify the type as &[i32; 2] anyway, we get this error:

error[E0308]: mismatched types
 --> src/bin/blogtest.rs:6:24
  |
6 |     let z: &[i32; 2] = &x[4..6];
  |            ---------   ^^^^^^^^ expected `&[i32; 2]`, found `&[i32]`
  |            |
  |            expected due to this
  |
  = note: expected reference `&[i32; 2]`
             found reference `&[i32]`

The logical next step, then, is to try to convert this &[i32] into a &[i32; 2]. For infallible conversions, we can use the into method of the Into trait - since we know that the slice is 2 elements long, maybe into will work?

// let z: &[i32; 2] = &x[4..6];
let z: &[i32; 2] = &x[4..6].into();

Nope:

error[E0277]: the trait bound `[i32; 2]: From<&[i32]>` is not satisfied
 --> src/bin/blogtest.rs:6:33
  |
6 |     let z: &[i32; 2] = &x[4..6].into();
  |                                 ^^^^ the trait `From<&[i32]>` is not implemented for `[i32; 2]`
  |
  = help: the following other types implement trait `From<T>`:
            <[T; LANES] as From<Simd<T, LANES>>>
            <[bool; LANES] as From<Mask<T, LANES>>>
  = note: required for `&[i32]` to implement `Into<[i32; 2]>`

I suppose this makes sense - although we know that the conversion would be infallible in this case, we can’t say the same for the general case, and Into has a strict general-case infallibility requriement:

Note: This trait must not fail. If the conversion can fail, use TryInto. (source: Into documentation)

If we take the hint and try the rather cumbersome .try_into().unwrap(), it works:

// let z: &[i32; 2] = &x[4..6];
let z: &[i32; 2] = &x[4..6].try_into().unwrap();

     Running `target/debug/blogtest`
y = 1
z = [5, 6]

This works, but there are two issues:

It looks weird. If I know something can’t fail, why am I “trying” to do it?
It’s really easy to make a typo and end up crashing at runtime. (If you get either the length or the index range wrong, you get something called a TryFromSliceError with no extra information.)

The second problem: indexing

Consider the following Rust program:

fn main() {
    let x: [i32; 6] = [1, 2, 3, 4, 5, 6];
    let y = x[8];
}

Interestingly, this doesn’t compile - we get an error about an “unconditional panic.”

error: this operation will panic at runtime
 --> src/bin/blogtest.rs:3:13
  |
3 |     let y = x[8];
  |             ^^^^ index out of bounds: the length is 6 but the index is 8
  |
  = note: `#[deny(unconditional_panic)]` on by default

I’m not entirely sure where this error comes from. A slightly different program will compile, but panic at runtime with the same exact “index out of bounds” message:

use std::ops::Index;

fn main() {
    let x: [i32; 6] = [1, 2, 3, 4, 5, 6];
    let y = x.index(8);
}

So far so good, as long as we stick to the normal [indexing] syntax. Now, what happens if we change x from an array to an array reference?

fn main() {
    let x: &[i32; 6] = &[1, 2, 3, 4, 5, 6];
    let y = x[8];
}

This compiles just fine, and panics at runtime (same message as before.) How unfortunate - it’s not like we’re missing any information here, since we know the length of the array behind x, and that tells us that index 8 doesn’t exist!

Sketching out a solution

Thankfully, there’s a solution to both issues: use our own index types! Rust allows us to do this by implementing the Index<Idx> trait, with Idx being our custom index type. If we define our own type, called (for example) CustomIndex, we can do something like this:

use std::ops::Index;

struct CustomIndex;
impl Index<CustomIndex> for [i32; 4] {
    type Output = i32;

    fn index(&self, _: CustomIndex) -> &Self::Output {
        todo!()
    }
}

Obviously this isn’t very useful - we’ve only implemented it for 4-element arrays of signed 32-bit integers, and indexing will always fail - but it’s a start.

There are a few constraints we should keep in mind:

There should not be any extra runtime cost when using our custom indexes.
Any checks that can be done at compile time should be done at compile time.
This all needs to be safe and sound.

Thankfully, all of these can be satisfied!

Design constraint #1: Aim for zero cost

As it turns out, an easy way to avoid storage costs is to make sure no extra data is being carried around. If we make use of zero-sized types (ZSTs for short), we can leave no trace of our custom indexing system!

For example, consider following program:

struct MyZST<const A: usize, const B: usize>;

impl<const A: usize, const B: usize> MyZST<A, B> {
  pub fn sum(self) -> usize {
    A + B
  }
}

fn main() {
    let x = MyZST::<67, 96>.sum();
    println!("x = {}", x);
}

If we examine the optimized assembly using cargo-show-asm, we can see that the let x = ... line was translated into a single mov instruction:

$ cargo asm --bin blogtest 0 --rust
    Finished release [optimized] target(s) in 0.01s

.section .text.blogtest::main,"ax",@progbits
        .p2align        4, 0x90
        .type   blogtest::main,@function
blogtest::main:

                ...
                // blogtest.rs : 5
                A + B
        mov qword ptr [rsp], 163 <----- this is 67+96!
        ...

There is no evidence of MyZST’s existence, which is encouraging! Returning to our CustomIndex example, we might modify it like so:

use std::ops::Index;

struct CustomIndex<const I: usize>;
impl<const I: usize> Index<CustomIndex<I>> for [i32; 4] {
    type Output = i32;

    fn index(&self, _: CustomIndex<I>) -> &Self::Output {
        // Delegating to Index<usize>
        self.index(I)
    }
}

Of course, this still isn’t ideal - we’ve only implemented indexing for arrays of 4 32-bit signed integers. We can fix that, too:

use std::ops::Index;

struct CustomIndex<const I: usize>;
impl<T, const N: usize, const I: usize> Index<CustomIndex<I>> for [T; N] {
    type Output = T;

    fn index(&self, _: CustomIndex<I>) -> &Self::Output {
        // Delegating to Index<usize>
        self.index(I)
    }
}

Now we can use it on arrays of any size that contain items of any type:

fn main() {
    let x = [1u8, 2, 4, 8, 16, 32];
    let y = x[CustomIndex::<5>];
    println!("y = {}", y);
}

And if we check the generated assembly…

...
                // blogtest.rs : 22
                let y = x[CustomIndex::<5>];
        mov byte ptr [rsp + 7], 32

Perfect! We’re not getting in the way of any optimizations, so we can move on to the next constraint.

Design constraint #2: Prefer compile-time checks

In order for this to be useful, we should do bounds checking at compile-time rather than runtime. Delegating to a lower-level implementation of Index - namely, Index<usize> - doesn’t help us one bit - we’ll still panic at runtime if we try to perform an out-of-bounds access. A potential compile-time-checked version could look like this:

use std::ops::Index;

struct CustomIndex<const I: usize>;
impl<T, const N: usize, const I: usize> Index<CustomIndex<I>> for [T; N] {
    type Output = T;

    fn index(&self, _: CustomIndex<I>) -> &Self::Output {
        const RESULT: () = assert!(N > I, "Index is out of bounds!");
        unsafe { &*(self.as_ptr().add(I) as *const T) }
    }
}

Unfortunately, this doesn’t compile. The compiler doesn’t like that we’re using N and I, which are both considered generic parameters from an “outer function”, inside the definition of the RESULT constant. Oh well.

This is where type system hacking comes into play. Instead of doing the check within the index method, we can delegate it to something else… something that can make use of the generic parameters. For this, we introduce the concept of checker traits.

Since we know that we can panic in a const context, and we know that traits can have associated const members, and we know that traits support both normal and const generics, we can create a IsValidIndex trait that CustomIndex will implement, like so:

// Target is the collection type, i.e., [u32; 17].
trait IsValidIndex<Target> {
    const RESULT: ();
}

struct CustomIndex<const I: usize>;

impl<T, const I: usize, const N: usize> IsValidIndex<[T; N]> for CustomIndex<I> {
    const RESULT: () = assert!(N > I, "Index is out of bounds!");
}

This compiles without any trouble, and now all we have to do is integrate it into the index method…

impl<T, const N: usize, const I: usize> Index<CustomIndex<I>> for [T; N] {
    type Output = T;

    fn index(&self, _: CustomIndex<I>) -> &Self::Output {
        let _ = <CustomIndex<I> as IsValidIndex<Self>>::RESULT;
        unsafe { &*(self.as_ptr().add(I) as *const T) }
    }
}

Now we’ll try to compile this program:

fn main() {
    // notice that `x` is an array reference now
    let x = &[1u8, 2, 4, 8, 16, 32];
    let y = x[CustomIndex::<7>];
    println!("y = {}", y);
}

We get a compilation error! Our check worked.

error[E0080]: evaluation of `<CustomIndex<7> as IsValidIndex<[u8; 6]>>::RESULT` failed
  --> src/bin/blogtest.rs:18:24
   |
18 |     const RESULT: () = assert!(N > I, "Index is out of bounds!");
   |                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ the evaluated program panicked at 'Index is out of bounds!', src/bin/blogtest.rs:18:24

If we were to change CustomIndex::<7> to 7, the program would compile but crash at runtime.

We can go through this whole process again to create a better system for slicing. I won’t reiterate all the concepts involved, but here’s a quick and easy implementation:

trait IsValidRangeIndex<Target> {
    const RESULT: ();
}

pub struct CustomRangeIndex<const START: usize, const LENGTH: usize>;

impl<T, const START: usize, const LENGTH: usize, const N: usize> IsValidRangeIndex<[T; N]> for CustomRangeIndex<START, LENGTH> {
    const RESULT: () = assert!(N >= START + LENGTH, "Ending index is out of bounds!");
}

impl<T, const START: usize, const LENGTH: usize, const N: usize> Index<CustomRangeIndex<START, LENGTH>> for [T; N] {
    type Output = [T; LENGTH];

    fn index(&self, _: CustomRangeIndex<START, LENGTH>) -> &Self::Output {
        let _ = <CustomRangeIndex<START, LENGTH> as IsValidRangeIndex<Self>>::RESULT;
        unsafe { &*(self.as_ptr().add(START) as *const [T; LENGTH])}
    }
}

Okay, maybe I lied about it being “quick and easy”, but it’s probably about as simple as you can get. By using this, you can write the following program, which will compile and run successfully:

fn main() {
    let x = &[0x01, 0x03, 0x00, 0x00, 0x88, 0x77, 0x66, 0x55];
    let y = u16::from_le_bytes(x[CustomRangeIndex::<4, 2>]);
    assert_eq!(y, 0x7788);
}

You won’t win any code golf contests with this, but I think using something like CustomRangeIndex is a lot nicer than sprinkling .try_into().unwrap() everywhere.

Design constraint #3: Safe and sound

The nice thing about compile-time-checks is that as long as we do them correctly, we don’t really have to worry about creating unsound code. By design, our custom indexing system is 100% safe and perfectly sound. Both of these guarantees come from the compile-time checking - we won’t compile code that could access data out of bounds, and there’s no way to bypass the checks and create undefined behavior.

Example 2: Enforcing data alignment requirements

My team’s eCTF firmware made extensive use of the microcontroller’s embedded EEPROM as a persistent data store. Like many pieces of hardware, though, this EEPROM had some particular requirements that we needed to respect:

All reads and writes had to be at 4-byte aligned addresses (0x0, 0x4, 0x8, etc, all the way up to 0x800.)
All read and write sizes had to be of multiples of 4 bytes. Reading or writing 5 bytes, for example, was not allowed - 8 was the next valid size after 4.

Both of these problems were addressed using the same checker trait technique. For example, to prevent ourselves from accidentally reading or writing an unaligned data type, I created an IsEEPROMCompatible trait that would produce a compilation error if the implementing type was not properly aligned:

trait IsEEPROMCompatible {
    const RESULT: ();
}

impl<T: Sized> IsEEPROMCompatible for T {
    const RESULT: () = {
        if core::mem::size_of::<T>() % 4 != 0 {
            panic!("the size of this type is not a multiple of the EEPROM word size (4 bytes)");
        }
    };
}

We could then use it in our EEPROM interaction code and be secure in the knowledge that any incorrect interaction would not compile!

impl<Inner> EEPROMVar<Inner>
where
    Inner: Sized + PartialEq,
{
    pub fn new<const ADDRESS: u32>() -> Self {
        // Safety check #1: ensure the data type is compatible with
        // being read from/written to EEPROM, i.e., its size is
        // a multiple of 4 bytes.
        let _ = <Inner as IsEEPROMCompatible>::RESULT;

        // ...
    }
}

Coincidentally, implementing this check highlighted a previously unnoticed bug in our code: we were, in fact, trying to do an unaligned read from EEPROM, which would have otherwise failed at runtime.

Conclusions

While Rust is certainly a significant improvement over older languages (such as C) in terms of safety and developer experience, the language lends itself to far more than what is commonly advertised. This is also where the language’s relative immaturity begins to become more obvious, as awkward workarounds are necessary to deal with language limitations. (Try to implement Index with a custom index type for both [T] and [T; N]. You can’t, because an implementation for [T; N] is automatically generated if an implementation for [T] exists, and it’s seemingly impossible to override it.)

Despite these difficulties, when all the pieces fall into place it’s quite remarkable what can be achieved with the language’s powerful type system and constant evaluation mechanism. As the language matures and features such as const generics are (hopefully) improved upon, I look forward to seeing what new tricks can be implemented to help make writing good, safe code even easier.

Reviving the coolest scanner you’ve never heard of

2022-09-04T01:18:00+00:00

Introduction

Today’s digital cameras are nothing short of incredible when it comes to ease-of-use and image quality, and many take them for granted, myself included. For those of us in “generation Z”, though, it’s all too easy to be ignorant of what came before this now ubiquitous technology.

A semi-brief¹ history

The rise of Photo CD

In the ’90s and early 2000s, getting film developed wasn’t exactly something for the average person to do at home. Instead, “minilabs” were a popular destination - you could take your film negatives to the local pharmacy, and in a reasonable amount of time, you’d have prints of your pictures!

Now, prints are wonderful, but in 1990 Kodak came up with a better idea - digitizing photos and putting them on CDs. Thus began the era of the aptly named “Photo CD.” Photo CD was an entire line of products dedicated to the digitization of photos, including film scanners and special Photo CD players. Photo CD enjoyed some popularity for a number of years, but ultimately faded away due to its various issues (much like Kodak’s other attempts at breaking into the digital photography industry.)

A new player enters the field

While Kodak’s Photo CD system languished, a relatively obscure company named Pakon was busy working on its own film scanner. It’s surprisingly difficult to find useful, confirmed information about Pakon, but public records and their first film scanner patent indicate that their scanner work likely began in the early ’90s.

Sometime around 2001, Pakon was acquired by Kodak. In the following years, Kodak would go on to release several models of Pakon film scanners - the F-135, F-235, F-335 and all their variants, collectively referred to as “F-X35.” These scanners came with a comprehensive software package for use in minilabs, and an SDK even existed to facilitate the development of specialized clients.

The F-X35 scanners boasted high performance, great image quality and post-processing techniques², and relative ease of use. To say that they were popular would be an understatement - they found their way into minilabs all over the United States, including those in such major pharmacy chains as CVS and Walmart.

Added by popular demand: Here’s a picture of the scanner that I have access to: a Pakon F135 Plus. Yes, I know it says Kodak, but did you know that some units say Nexlab? It’s only slightly confusing.
Film can be loaded into the feeding mechanism on the right side of the scanner (see the arrow pointing up), and motors pull it through until it comes out on the left side (see the arrow pointing down.) The F235 and F335 have a completely different design, and I don’t have either one to take pictures of - I recommend searching for them on Google Images.

The fall (and rise) of Pakon

Sadly, Pakon’s business met an untimely doom as the digital photography industry became dominant, and the company filed for bankruptcy in 2012. While there was still some demand for film scanning, there wasn’t enough demand, and pharmacy minilabs were scaled down. Amazingly, the scanners didn’t go to waste - hobbyists picked them up for pennies on the dollar, and a new community was formed. Today, a Facebook group dedicated to the Pakon scanner line boasts over 6,000 members, with a fair amount of weekly activity.

Pakon, today

With the software readily available, and the hardware being available as well (albeit for a pretty penny - some scanners go for as much as $2500), one would assume that the story ends here. Surprisingly, it doesn’t - for reasons we’ll explore together, the final version of the software is only usable on 32-bit Windows XP! To make matters worse, setting it up tends to be extraordinarily difficult for anyone who doesn’t have a sacrificial computer - in that case, a virtual machine is necessary, which makes everything more complicated and extremely error-prone.

Why does this even matter?

I’ll be honest - I’m not a photographer, and I personally have no reason to use a Pakon scanner. I happen to have access to one, though, and when I learned about its status as a user’s nightmare, I felt like taking a look. As a college student who is particularly interested in software preservation and reverse engineering, these scanners appealed to me - it turns out that people still want to use them despite their difficult nature, and I was interested in making that easier.

“But why would anyone still want to use this?”, you might ask. Sure, there are other film scanners that come to mind - Epson makes their own, Nikon used to make their Coolscan scanners, and there are countless others available. Any of these options are perfectly fine, and in fact, some are even better than the Pakon in terms of technical specifications. Where Pakon scanners truly shine is in their seamlessness - unlike other scanners that require film rolls to be pre-cut into strips, the Pakon will accept entire rolls, and does everything for you: detecting frame edges and cropping images appropriately, performing color correction that actually works, reading DX codes if they’re available, and finally, giving you a set of high-quality, effectively noise-free scans. It does all of this at breakneck pace, too. While there are other film scanners, none of them that I’m aware of can even come close to the Pakon’s convenience.

The ideal outcome for most of these users would be one where the Pakon scanner software is usable on modern versions of Windows, running on modern workstations - no more sacrificial laptops running 32-bit XP, and no more VMs that are slower than a tortoise.

Now, finally, let’s see how we can make that happen. Extremely technical content ahead!

Glossary

TLA, B and C: Client libraries for the F-235, F-135 and F-335 scanners respectively.

TLX: A wrapper around TLA, B and C. It is the Pakon scanner SDK. (TLA originally held this title, and then TLB and TLC came along.)

PSI: Pakon Scanning Interface. All-in-one desktop app for minilabs to run scans, make CDs, and do all sorts of other things.

TLXClientDemo: A much simpler interface that was meant to serve as a demo for the TLX SDK. It allows the user to control every setting and pretty much do whatever they want within the confines of the SDK. This app is also sometimes referred to as just “TLX”, which isn’t confusing at all!

User-mode: Refers to code running in “userland” - for simplicity’s sake, think of this as the desktop environment that you interact with and run applications in. Anything in userland is subject to various safety checks and interventions to ensure that the entire system can’t be brought down by a single program crashing.

Kernel-mode: Refers to code running at the kernel, or operating system level. User-mode safeguards don’t exist here, and many errors can cause the entire system to crash. On Windows, crashes of this nature trigger the notorious Blue Screen of Death, or BSOD.

Anything not explicitly defined here or anywhere else in this article is assumed to be known by those who choose to read further. If you don’t know, your search engine of choice should come in handy - but in general, I won’t throw anything too obscure at you without explaining it myself.

Getting to work - a compatibility investigation

What are we targeting?

Any external hardware that you want the operating system to be able to interact with requires a driver. At a high level, the driver is responsible for accepting commands from the operating system (often on behalf of the user) and processing them in some well-defined way. On Windows, devices exist as fake files (similar to Unix’s rule of “everything is a file”) that user-mode applications can interact with. These fake files are set up by device drivers.

Since Windows has an astonishing compatibility track record for “normal” software, and we’re dealing with some rather obscure and unique hardware that requires a special driver, we can guess that the driver is going to be the source of any issues that come up. To verify this guess, we just have to try running a scan on a 32-bit version of Windows that’s not XP. In my own tests, I went with Windows 10, and as soon as the scanning software was starting to actually do something…

(pause for dramatic effect)

The system crashed, shocking absolutely nobody, but greatly disappointing me. Just getting Windows to recognize the scanner’s very existence was a minor production, the details of which I’ve chosen to omit because of how boring they are in comparison to everything else. In the end, if it were as simple as manually installing some drivers, it’s very likely that someone would’ve figured it out years ago. Onward!

Isolating the problem

In order to determine why the entire system crashes, we’ll have to use a kernel debugger. I went with WinDbg, mainly because it actually works (which is more than can be said of certain other debuggers), but also because I’m already somewhat familiar with it from past kernel adventures.

After a bit of setup, my testing virtual machine was all set for debugging. A few minutes later, and round 2 of testing began…

REFERENCE_BY_POINTER (18)
Arguments:
Arg1: 00000000, Object type of the object whose reference count is being lowered
Arg2: 8d704d2c, Object whose reference count is being lowered
Arg3: 00000001, Reserved
Arg4: ee751000, Reserved

We got our first bug check! Bug checks normally lead to BSODs, but now that we’re using a kernel debugger, we can take a look around before restarting the system. A bit more analysis (powered by WinDbg’s !analyze command) reveals that a call to ObfDereferenceObject at F135usb2.sys+0x1db4 is to blame.

If you have no idea what that meant, I’m happy for you. You’ve spared yourself the immense mental pain that comes with trying to figure out how any of this actually works. Now it’s time to go down the first of many rabbit holes - what is ObfDereferenceObject, what is it meant to do, and why is it catastrophically failing?

Kernel lore - what are we looking at here?

First and foremost, for reasons that are unclear to me, ObfDereferenceObject is exposed to driver developers through a macro³ called ObDereferenceObject. I don’t understand what the point of this even is, but in any case, we can find the documentation for the macro here.

The ObDereferenceObject routine decrements the given object’s reference count and performs retention checks.
void ObDereferenceObject(
  [in]  a
);
[in] a: Pointer to the object’s body.

This doesn’t seem all that complicated. Before moving on, though, we should make sure we know what an “object” is in this context. According to Microsoft documentation:

An object is a data structure that represents a system resource, such as a file, thread, or graphic image. Your application can’t directly access object data, nor the system resource that an object represents. Instead, your application must obtain an object handle, which it can use to examine or modify the system resource. Each handle has an entry in an internally maintained table. Those entries contain the addresses of the resources, and the means to identify the resource type.

So, essentially, an “object” in the Windows kernel is just a resource of some sort that can have data associated with it. Furthermore, the “reference count” mentioned earlier is the number of times some bit of code has said “I care about this thing, don’t let it go away!” by using one of the functions in the ObReferenceObject family, which we’ll learn more about later.

Let’s move on - we’re taking the scenic route with figuring out this crash. The ending will shock you.

Reverse engineering the driver(s)

When trying to fix something, it’s often necessary to actually understand it. This is where reverse engineering, or “reversing” skills come in handy, and I was up to the task. Since I was doing my tests with an F-135 scanner, I focused on the F-135 drivers, consisting of 3 files:

F135usb2.sys: The device-specific driver.
F235Ldr.sys: A firmware loader driver that is shared by all scanner models.
F235Lib.sys: A “framework” driver that is also shared by all scanner models.

Recon

Whenever I reverse engineer software, my first goal is to identify code that came from somewhere else - whether that’s OpenSSL or the sample CD included in an obscure book, I want to know what I can avoid painstakingly reversing. This time was no different, and I got to work trying to figure out exactly what I was looking at.

First, I took a look at F235Ldr.sys. Interestingly, it came with a little bit of useful information:

This gave me a real lead to follow. A bit of additional research revealed that the “ezloader” component was part of the “EZ-USB” kit sold by a company called Anchor Chips⁴. Surprisingly, I was able to find one of these kits on eBay, and a few days later, it arrived:

Amusingly, the package had never actually been opened, and contained the original packing slip… with a date several years before I was born, addressed to a company that I don’t think exists anymore.

The package contained lots of hardware, some books, and best of all, two CDs with software and documentation! After a quick search, I found what I was looking for: the source code to ezloader.

The story of F235Ldr.sys doesn’t quite end here, though. I wanted to make sure that the code I got from the CD was the same as the code in the compiled driver. To do this, I would have to look at F235Ldr.sys under a microscope. My binary microscope of choice is IDA Pro with the Hex-Rays Decompiler.

After browsing around F235Ldr.sys in IDA, and comparing the code to the ezloader source code, pretty much everything seemed identical. There was one difference, however: the version provided by Anchor Chips requires the device firmware to be embedded into the compiled driver, while the Pakon version reads firmware from Intel HEX files. This was almost certainly done to facilitate the sharing of the driver among all of the different scanner models. No other significant code changes were made.

While reversing this driver from scratch would not have been difficult, I was happy to have saved some time.

My next target was F235Lib.sys, which proved to be somewhat more challenging to identify. It, too, had publisher information:

I have to say, I never would have thought that a driver called F235Lib would be the “F235 Usb 2.0 Library Driver.” Since this information is practically useless, we’ll once again have to examine the file under the microscope.

Upon loading F235Lib.sys into IDA, I noticed it exported several functions, as would be expected of a library:

I didn’t notice anything that obviously came from somewhere else, and looking at the list of strings in IDA (much like running the strings command on the file) didn’t reveal anything either. So I got to work reverse engineering all the different functions, aided somewhat by the presence of these exported function names - about half of the functions in the driver were named, and the other half were unknown.

I made a good amount of progress before I decided to make one last attempt at figuring out if there was more to the story. I searched for the name of one of the functions - GenericHandlePowerIoctl - and found a match on GitHub. I noticed the name “Walter Oney” at the top of the file:

// Power request handler for Generic driver
// Copyright (C) 1999 by Walter Oney
// All rights reserved
// @doc

#include "stddcls.h"
// ...

My next search was for “Walter Oney USB”, as I figured this individual had probably (read: definitely) worked on device drivers. That led me to a book called “Programming the Microsoft Windows Driver Model”, written by Oney and published by Microsoft as the official how-to guide for Windows driver programming.

A further search for “Programming the Microsoft Windows Driver Model” led me to the published sample code from the second edition of the book. And sure enough, there was an entire “generic” driver included, with all of the functions that I had found within F235Lib.sys. Although I was somewhat disappointed to find some differences between the published sample code and the compiled code I was looking at, I realized that they could all be explained by the passing of time - the sample code I had obtained was from the second, not first edition of the book, and for that reason was likely younger than the Pakon driver.

Despite this surprise, I was able to match almost all functions and all data structures in the compiled F235Lib.sys driver to the sample source code. The Hex-Rays decompiler really came in handy here. As an aside, just look at this sample decompilation:

NTSTATUS __stdcall SequenceCompletionRoutine(PDEVICE_OBJECT junk, PIRP Irp, PPOWCONTEXT Context)
{
  Context->status = Irp->IoStatus.Status;
  HandlePowerEvent(Context, AsyncNotify);
  IoFreeIrp(Irp);
  return STATUS_MORE_PROCESSING_REQUIRED;
}

and compare it to the original code:

NTSTATUS SequenceCompletionRoutine(PDEVICE_OBJECT junk, PIRP Irp, PPOWCONTEXT ctx)
	{							// SequenceCompletionRoutine
	(void)junk;
	ctx->status = Irp->IoStatus.Status;
	HandlePowerEvent(ctx, AsyncNotify);
	IoFreeIrp(Irp);
	return STATUS_MORE_PROCESSING_REQUIRED;
	}							// SequenceCompletionRoutine

The differences are so minor that they might as well not exist. There are plenty of excellent examples of the decompiler’s abilities, but this tangent has run its course already.

There really isn’t much else to cover for these 2 supplementary drivers - ultimately, they’re just repackaged and modified versions of code written by third parties.

Back to F135usb2

Now that we’ve dealt with the firmware loader and generic driver, it’s time to get back down to business and figure out why we were getting those pesky system crashes.

Here’s a reminder of what we were faced with earlier:

After a bit of setup, my testing virtual machine was all set for debugging. A few minutes later, and round 2 of testing began…
REFERENCE_BY_POINTER (18)
Arguments:
Arg1: 00000000, Object type of the object whose reference count is being lowered
Arg2: 8d704d2c, Object whose reference count is being lowered
Arg3: 00000001, Reserved
Arg4: ee751000, Reserved
…

A bit more analysis reveals that a call to ObfDereferenceObject at F135usb2.sys+0x1db4 is to blame.

What’s also worth noting about this bug check report is the value of “Arg4” - 0xee751000. According to WinDbg, this bug check can occur

when the object’s reference count drops below zero whether or not there are open handles to the object; in that case, [Arg4] contains the actual value of the pointer references count.

0xee751000, when interpreted as a signed 32-bit integer, is a negative number. My first thought was that a kernel structure was being corrupted, but I couldn’t find anything in the driver that could possibly cause such a thing to happen.

With F135usb2.sys under the binary microscope, let’s try to figure out what’s going on.

WARNING: Extremely technical content ahead! (But if you’ve made it this far, odds are you won’t be scared away now.)

According to IDA, the driver’s base address is 0x10000. This means that F135usb2.sys+0x1db4 refers to 0x10000+0x1db4, or 0x11DB4. Going to that address in IDA reveals the following relevant x86 assembly instructions:

.text:00011DB1 004 8D 4E 34         	lea     ecx, [esi+34h]  ; Object
.text:00011DB4 004 FF 15 7C 27 01 00	call    ds:ObfDereferenceObject

The first instruction - lea ecx, [esi+34h] - is taking the value of the esi register, adding hexadecimal 34 (or decimal 52) to it, and storing it into the ecx register. The second instruction is calling the ObfDereferenceObject function.

Some very rough pseudo-C for these 2 instructions is ObfDereferenceObject(&esi_struct->field_0x34).

Now, figuring out what the problem was took me longer than I would have liked. After staring at this code for a fairly long time, though, I finally realized what was going on. The key was to understand what lies at esi+34h, as opposed to what ObfDereferenceObject expects to be given.

IDA has a useful feature called “immediate search”, where one can search for constants and offsets used in instructions. Searching for 0x34 revealed a block of code at F135usb2.sys+0x1cef that seemed to be accessing the same field.

If you’re wondering where all of the symbol (variable/field/function) names came from, they’re based on my multi-day reverse engineering effort. This particular driver is pretty much a bunch of prewritten code glued together - some from Walter Oney’s book, some from Anchor Chips - with some Pakon-specific code on top, which made it pretty easy to figure out what the original names likely were.

if ( ObReferenceObjectByHandle(
	pRingTail->EventScanPacketReady,
	2u,
	ExEventObjectType,
	ctx->RequestorMode,
	// `EventScanPacketReady` is the field at offset 0x34 in `ctx`. 
	// Corresponding instructions: 
	// 		lea     ecx, [esi+34h]
	//		push	ecx
	&ctx->EventScanPacketReady,
	0u) >= 0 )
{
	// do some stuff
}

This seems like pretty clear evidence that we found the right code. Earlier I mentioned the “ObReferenceObject family of functions”, and here we see one of them being used: ObReferenceObjectByHandle. Let’s read the friendly manual!

NTSTATUS ObReferenceObjectByHandle(
  [in]            HANDLE                     Handle,
  [in]            ACCESS_MASK                DesiredAccess,
  [in, optional]  POBJECT_TYPE               ObjectType,
  [in]            KPROCESSOR_MODE            AccessMode,
  [out]           PVOID                      *Object,
  [out, optional] POBJECT_HANDLE_INFORMATION HandleInformation
);

[in] Handle: Specifies an open handle for an object.

…

[out] Object: Pointer to a variable that receives a pointer to the object’s body. The following table contains the pointer types.

…

So, let’s get this straight:

The Object parameter is a “pointer to a variable that receives a pointer to the object’s body.” In other words, it is a pointer to a pointer, which allows the kernel to “fill in the blank”, so to speak.
The driver is passing &ctx->EventScanPacketReady as the Object parameter. This is totally fine and correct! ctx->EventScanPacketReady will be set by the kernel since we provided a pointer (or reference) to it.
When this is done, ctx->EventScanPacketReady (note the lack of &) will be a pointer to the event object’s body.

Now, let’s look at the code that’s calling ObDereferenceObject:

void __stdcall ReleaseContextResources(PRWCONTEXT ctx)
{
  if ( ... )
  {
    ObDereferenceObject(&ctx->EventScanPacketReady);
    // ...
  }
}

and at the documentation for ObDereferenceObject…

The ObDereferenceObject routine decrements the given object’s reference count and performs retention checks.
void ObDereferenceObject(
  [in]  a
);
[in] a: Pointer to the object’s body.

Hold on a second. ObDereferenceObject wants a “pointer to the object’s body”, but we’re giving it &ctx->EventScanPacketReady… which is a pointer to the pointer to the object’s body. The reason the system crashes is because we’re misdirecting it. Remember this from earlier:

What’s also worth noting about this bug check report is the value of “Arg4” - 0xee751000. According to WinDbg, this bug check can occur

when the object’s reference count drops below zero whether or not there are open handles to the object; in that case, [Arg4] contains the actual value of the pointer references count.

0xee751000, when interpreted as a signed 32-bit integer, is a negative number.

Recall that we’re supposed to provide ObDereferenceObject with a “pointer to the object’s body.” Every kernel object has a header that comes immediately before the body, and the header stores the reference count! Because we were providing a pointer to the body pointer, rather than the body pointer itself, the kernel was reading the object header from the wrong place. This explains the weird reference count.

Additionally, using WinDbg’s !object command on the supposed object body pointer (8d704d2c, from the bug check report) gives us a pretty damning error:

1: kd> !object 8d704d2c
8d704d2c: Not a valid object (ObjectType invalid)

It’s pretty obvious now that we’ve found the issue. For some reason, Windows XP’s implementation of ObfDereferenceObject doesn’t validate the object’s reference count, but that seems to have changed in Windows Vista. (This may be one of the only good parts of Vista.) Essentially, the only reason this code worked at all was an implementation detail.

The solution is almost infuriating in its simplicity: at F135usb2.sys+0x1db1, replace lea ecx, [esi+34h] with mov ecx, [esi+34h]. Thankfully, this can be done by changing a single byte: 8d 4e 34 changes to 8b 4e 34. (A more precise search-replace is: 83 7e 24 00 74 2a 8d 4e 34 -> 83 7e 24 00 74 2a 8b 4e 34)

In source-code, this would amount to changing ObDereferenceObject(&ctx->EventScanPacketReady); to ObDereferenceObject(ctx->EventScanPacketReady);. That’s right - a single rogue ampersand is to blame for the catastrophic failure.

Re-install the patched driver, try a scan, and… it just works! Luckily, the exact same patch can be applied to the drivers for the other scanner models.

At this point, I was ready to release what I had and then call it quits. I realized that a 32-bit device driver had no chance at running on 64-bit Windows, and I figured this was likely the end of the road. I had a change of heart, though, and decided to give it a shot anyway.

The 64-bit journey

Much to the chagrin of many people who wish to use very old external devices that require special drivers, Windows doesn’t provide a compatibility layer for kernel drivers like it does for normal user-facing applications with WoW64. This means that all drivers must be compiled to run on a 64-bit system. One of the major obstacles to 64-bit support is dealing with the use of “pointer-precision” data - 32-bit systems use 32-bit pointers, but 64-bit systems use 64-bit pointers, and this difference in size has a whole bunch of cascading effects.

The “obvious” solution to this problem is to just decompile the driver and recompile it, but that’s much easier said than done. It requires a complete reverse-engineering of the original code, especially anything that will end up using pointer-precision data somehow (for example, structures that store pointers.) Not one to shy away from a seemingly impossible challenge, I decided to give it a shot.

Rebuilding the firmware loader driver

My first target was the firmware loader driver - F235Ldr.sys. My goal with each driver was to get as much code to cleanly decompile as possible, and clean up the rest manually. The loader was relatively easy to deal with, as it only had 24 relatively simple functions.

Interestingly, some of the “Ezusb” functions differed slightly from the sample code that came in my kit - I guess the Pakon developers were working with a later revision. Regardless, within a couple of days I had a 64-bit version of the firmware loader that compiled successfully. After fixing a minor mistake that came about while I was trying to get rid of some deprecation warnings⁵, the driver sprang into action and successfully downloaded firmware to the scanner!

Rebuilding the device driver

As we discussed earlier, the Pakon device drivers consist of 2 components: Walter Oney’s “generic” driver (F235Lib.sys) and the actual device-specific driver (F*35usb2.sys.) I felt like doing things a bit differently - instead of having 2 separate drivers, I chose to merge them. I also took the liberty of using the newer version of Oney’s code, since it was readily available and presumably⁶ a bit more refined than the Pakon version. After a lot more decompilation and cleanup, I ran a test…

F135USB - Configuring device from Pakon
F135USB - Product is F135-USB Film Scanner
F135USB - Serial number is xxx-yyy-zz
F135USB - Device reports 3 endpoints
F135usb2 - To WORKING from STOPPED
F135usb2 - PNP Request (IRP_MN_QUERY_CAPABILITIES)
F135usb2 - PNP Request (IRP_MN_QUERY_PNP_DEVICE_STATE)
F135usb2 - PNP Request (IRP_MN_QUERY_DEVICE_RELATIONS)

Excellent! It seemed to be working - at least until I unplugged the scanner, at which point things started to go wrong. Turns out the Generic driver had a pretty subtle bug…

Unplug-and-pray

One of the many things the Generic driver takes care of is supporting Plug and Play. For the most part, this just works and I don’t need to worry about it. However, there was one thing that clearly didn’t work - whatever code ran as soon as the device was no longer available.

WinDbg helpfully informed me that something was going wrong with these two functions:

// Code is slightly modified from the original to remove some unimportant details
VOID DeregisterAllInterfaces(PGENERIC_EXTENSION pdx)
{
	// ...
	while (!IsListEmpty(&pdx->iflist))
	{
		PLIST_ENTRY list = RemoveHeadList(&pdx->iflist);
		PINTERFACE_RECORD ifp = CONTAINING_RECORD(list, INTERFACE_RECORD, list);
		DeregisterInterface(pdx, ifp);
	}
	// ...
}

VOID DeregisterInterface(PGENERIC_EXTENSION pdx, PINTERFACE_RECORD ifp)
{
	// ...
	RemoveEntryList(&ifp->list);
	// ...
}

The calls to DeregisterInterface from DeregisterAllInterfaces were failing, and bringing the entire system down with them! RemoveEntryList in particular was causing this, and figuring out why didn’t take too long: DeregisterAllInterfaces was removing an item from a list (using RemoveHeadList), and passing that item to DeregisterInterface… which then tried to remove it again. Pretty subtle, and it didn’t cause any issues in older versions of Windows, so I can understand how it went unnoticed.

Fixing this was trivial - I simply added a flag parameter to DeregisterInterface to dictate whether RemoveEntryList should be called.⁷ From then on, I was free to power cycle the scanner with impunity. My next test: trying to run a scan.

I can (not) has pictures?

Considering how well things had gone so far, I was sure that scanning would just work. I launched TLX (remember that?) and was perplexed by what happened. TLX has 2 status indicators, one for “Scan” and one for “Save.” Usually, the “Scan” status would start off as “Initializing Scanner” and then quickly change to “Idle”, indicating it was all ready to go. In this test, however, it got stuck on “Initializing Scanner.” Thus began a journey down yet another rabbit hole.

In the background, I had also been reversing TLB.dll in hopes of learning about the communication protocol. Thanks to the abundance of detailed error codes, I was able to assign names to a lot of functions, including some relevant to communication! I suspected that the initialization issue was likely related to communications, so I started looking for ways to see what communications were happening.

My first attempt at this was on the kernel driver side, as I already knew which function in the device driver was responsible for facilitating this type of communication. I added some code to print hex dumps of the packets, and this is what I saw:

Input buffer (client message): 04 03 10 00 85 
Output buffer (scanner response): 07 02 10 00 
Input buffer: 03 01 00
Output buffer: 07 02 00 09

Without context this is meaningless, but let’s compare it to the opening of a session recorded on a 32-bit system with the original drivers, using WinDbg for the logging this time:

Input buffer: 04 03 10 00 85
Output buffer: 07 02 10 00
Input buffer: 02 04 10 01 8f 00
Output buffer: 07 02 10 00

Clearly, something strange happened after the first message/response exchange of my 64-bit test. By reversing TLB’s communication code, I was able to determine the structure of the packets:

offset	type	name	description
0	byte (enum)	type	Packet type
1	byte	count	Packet data length
2	bytes	data	Packet data (`count` bytes, up to 34)

The basic structure of the packet data is as follows:

offset	type	name	description
0	byte (enum)	address	Unknown purpose, but known possible values
1	bytes	data	Context-dependent. For scanner, the first byte of this section is a status code.

Addresses are:

Address	Name
0x10	AD_HOST
0x20	AD_PICL
0x22	AD_BOOT_PICL
0x24	AD_PICM
0x26	AD_BOOT_PICM
0x28	unknown
0x40	AD_PICL_PLUS
0x42	AD_BOOT_PICL_PLUS
0x44	AD_PICM_PLUS
0x46	AD_BOOT_PICM_PLUS

Scanner status codes are:

Code	Meaning
0	Success
1	Packet not acknowledged
2	Invalid packet
3	Invalid checksum
4-6	Something to do with USB?
7	Unknown: “EC_DRV_PacketHostErrorAlgo”
8	Success
9	Bus error

This tells us that the 4th byte of a packet from the scanner is a status code. If it’s 0 or 8, everything is fine, but if it’s not, something’s gone wrong. Let’s look at the 64-bit test again:

Input buffer (client message): 04 03 10 00 85 
Output buffer (scanner response): 07 02 10 00 
Input buffer: 03 01 00
Output buffer: 07 02 00 09

The first exchange in this log is fine - the scanner replies with a status code of 0 - but in the second exchange, the scanner replies with a status code of 9. This indicates a “bus error.” Not only that, but 03 01 00 looks nothing at all like 02 04 10 01 8f 00! What happened here?

The perils of lying to the OS

I was so perplexed by this issue that I went right back into my debugger and started tracing the events that unfolded upon launching TLX. After collecting some data, I was able to identify the source of this weird packet: a function named CiCmdComm::bDrvGetPpbDeviceReadyNL⁸.

  // ...
  if ( CiCmdComm::bDrvOpen(this, errorHandler) )
  {
    if ( address == AD_HOST )
    {
      return 1;
    }
    else
    {
      InBuffer.type = PH_READ_STATUS;
      InBuffer.count = 1;

      // address is junk!!!
      InBuffer.address = address;

      // start a busy loop, sending this packet repeatedly!
    }
    // ...
  }

Note the comment about address being “junk” - recall that the third byte of every packet is an “address”, and 00 does not correspond to a valid address! The question is… where did 00 come from?

Working my way down the stack, I found that CiCmdComm::bDrvGetPpbDeviceReadyNL was being called from a function named CiCmdComm::bDrvWritePacketNL:

int __thiscall CiCmdComm::bDrvWritePacketNL(CiCmdComm *this, CiErrorHandler *errorHandler, PPB_REQUEST *requestPacket)
{
  // ...
  while ( CiCmdComm::bDrvPacketExecuteNL(this, errorHandler, requestPacket, &responsePacket) )
  {
    // ...
    {
      // ...
      if ( !CiCmdComm::bDrvGetPpbDeviceReadyNL(this, errorHandler, requestPacket->address, &responsePacket.status)
        || !CiCmdComm::bDrvPacketHandleErrorNL(
              this,
              errorHandler,
              requestPacket->address,
              requestPacket,
              &responsePacket,
              &v13,
              &v10) )
      {
        break;
      }
	  // ...
    }
	// ...
  }
  CiErrorHandler::LogError(errorHandler, this->classId, FN_bDrvWritePacketNL, EC_PreviousError, 0, 0, 0);
  return 0;
}

At this point, I realized I was going to have to get more precise with my debugging if I wanted to figure out exactly where things were going wrong. My first revelation was that the call to CiCmdComm::bDrvPacketExecuteNL was corrupting the requestPacket!

In CiCmdComm::bDrvWritePacketNL, the requestPacket started out totally valid:

Stack[00001CDC]:04F8FD58 db 4       ; packet type: PH_CMD
Stack[00001CDC]:04F8FD59 db 3       ; data length: 3 bytes
Stack[00001CDC]:04F8FD5A db 10h     ; address: AD_HOST
Stack[00001CDC]:04F8FD5B db 0       ; unknown
Stack[00001CDC]:04F8FD5C db 85h     ; unknown

but oddly, after CiCmdComm::bDrvPacketExecuteNL was called, the first 4 bytes vanished!

Stack[00001CDC]:04F8FD58 db 0		; packet type: PH_INVALID (0)
Stack[00001CDC]:04F8FD59 db 0		; data length: 0 bytes
Stack[00001CDC]:04F8FD5A db 0		; address: undefined (0)
Stack[00001CDC]:04F8FD5B db 0		; unknown
Stack[00001CDC]:04F8FD5C db 85h		; unknown

Next, I stepped through CiCmdComm::bDrvPacketExecuteNL, and discovered that the data corruption was happening after the driver had finished processing the request. CiCmdComm::bDrvPacketExecuteNL uses a function named DeviceIoControl to send packets to the driver, and immediately after the call to DeviceIoControl, I observed the data corruption!

int __thiscall CiCmdComm::bDrvPacketExecuteNL(
        CiCmdComm *this,
        CiErrorHandler *errorHandler,
        PPB_PACKET *requestPacket,
        PPB_PACKET *responsePacket)
{
  // ...
  if ( !DeviceIoControl(
          *this->pDeviceFileHandle,
          0x222090u,                // IO control code for packet exchange
          requestPacket,            // Input buffer
          requestPacket->count + 2, // Size of input buffer
          responsePacket,           // Output buffer
          0x40u,                    // Size of output buffer
          &numBytesReturned,
          &this->m_DriverOverlappedPPB) )
  {
	// error handling
  }
  // ... a bunch of other stuff
}

Notice the “size of output buffer” supplied to DeviceIoControl - 0x40, or 64. Therein lies the problem - the structure used for responsePacket is only 36 bytes in length. As a result of this inconsistency, something (possibly the USB stack) was zeroing out memory that it really shouldn’t have, which ultimately led to the strange packet we saw!

I have no clue how this code ever worked (thanks, implementation details…), but that’s beside the point. After making another one-byte patch to fix this, I was delighted to see TLX progress from “Initializing Scanner” all the way to “Idle” as I had hoped for!

The final frontier - scanning

It turned out that I wasn’t quite out of the woods yet - attempting to run a scan resulted in a mysterious error.

CN_CiScanner FN_bCalibrateEndDataFlow EC_WIN_GetOverlappedResult (177) The parameter is incorrect.
CN_CiScanner FN_bCalibrateFindCorrections EC_PreviousError (25) 0
CN_CiScanner FN_bBeforeScan EC_PreviousError (25) 0
CN_Global FN_FuncScanPictures EC_PreviousError (25) 0
CN_Global FN_FuncScanPictures EC_PreviousError (25) 0

My driver reported a similar issue:

F135USB - RingPacketComplete:103 [ERROR]: Ring packet info has failing status: 80000300 (USBD_STATUS_INVALID_PARAMETER)

I couldn’t figure out where this status code was even coming from at first, but the name gave me a hint: it had something to do with the USB stack. I eventually realized that I had failed to recognize a certain data structure for what it really was, and that some of my driver code was almost certainly incorrect as a result. Ironically, this idea came to me just as I was going to sleep. When I got up the following morning I updated the driver to account for my discovery, crossed my fingers, and clicked the “scan” button for the millionth time…

…and it worked! After the equivalent of a full work week dedicated to these custom drivers, they finally worked - not just on Windows 10, but Windows 11 too. Some extra tweaks were necessary to prevent various crashes, like one that only occurred when a USB 3.0 controller was being used, as well as to fix some issues detected by the Driver Verifier.

The end

In the course of this article, the Pakon film scanners went from being a user’s nightmare to being totally usable with modern versions of Windows. While not all of the Pakon client software has stood the test of time - the PSI application, for example, isn’t 100% functional - this opens the door for so much more development, potentially including a new scanning client.

There’s a lot that I didn’t discuss in this article, including a lot of the details of my reverse engineering process. In the future I might write an article specifically about that, as I learned a lot of valuable information about how the scanner really works that I think is well worth sharing.

If you made it this far, I applaud you. Thanks for reading!

I’m intentionally omitting discussion of the early days of film photography, partly because I’m not the most educated in that area, but also because we’re talking about minilabs here. ↩
One such post-processing technique is known as “Digital ICE” (ICE stands for “Image Correction and Enhancement”), and it’s really cool. This excellent video goes into more detail about how it works. ↩
That’s right, we’re talking about C now. Don’t be afraid, it won’t get too horrible. ↩
Anchor Chips was acquired by Cypress Semiconductor in 1999. Cypress Semiconductor was itself acquired by Infineon Technologies in 2020. At this rate, we can expect Infineon Technologies to be acquired in 2041. ↩
Older drivers typically use a function called ExAllocatePoolWithTag to perform memory allocations, with the “tag” being an identifier of some sort for the allocation. In the latest versions of Windows, this function is deprecated in favor of a new one called ExAllocatePool2, which has a subtly different signature. I did not account for this at first, and as a result, every single memory allocation was doomed to fail. ↩
I didn’t realize what was coming. ↩
I later came up with a slightly more elegant solution, but elegance is not really important at this point. ↩
My interpretation: “Drv” means “Driver”, “Ppb” refers to the protocol (this is heavily implied in various places), and “NL” means “no lock” (i.e., not using a mutex or spin lock or any form of concurrency control - this aligns with the code) ↩

Kai Kaufman’s tech blog

Bringing runtime checks to compile time in Rust

Introduction

Quick primer

Example 1: Slicing and indexing arrays/array references

Shameless plug

Background

The first problem: slicing

The second problem: indexing

Sketching out a solution

Design constraint #1: Aim for zero cost

Design constraint #2: Prefer compile-time checks

Design constraint #3: Safe and sound

Example 2: Enforcing data alignment requirements

Conclusions

Reviving the coolest scanner you’ve never heard of

Introduction

A semi-brief1 history

The rise of Photo CD

A new player enters the field

The fall (and rise) of Pakon

Pakon, today

Why does this even matter?

Glossary

Getting to work - a compatibility investigation

What are we targeting?

Isolating the problem

Kernel lore - what are we looking at here?

Reverse engineering the driver(s)

Recon

Back to F135usb2

The 64-bit journey

Rebuilding the firmware loader driver

Rebuilding the device driver

Unplug-and-pray

I can (not) has pictures?

The perils of lying to the OS

The final frontier - scanning

The end

A semi-brief¹ history