Phillip Schanely

Handling Nondeterminism in CrossHair

2022-05-17T00:00:00+00:00

CrossHair v0.0.23 was released today with an important new capability for dealing with nondeterministic behavior. Let’s get into it.

Handling Nondeterminism

CrossHair will fail or even yield a incorrect result if you run it on nondeterministic code.

Here are some common things that cause nondeterminism:

Code that uses the current time, e.g. time.time().
Code that uses random numbers, e.g. random.randint(...)
Code that uses a cache, e.g. functools.lru_cache(...)
Code that reads data from the disk or network.

As you might imagine, it’s very common to use functions like these. Now, with the work that Loïc Montandon just integrated, we have ways to check properties even when nondeterministic behavior exists.

By identifing the core functions that contain the nondeterminism, we can now tell CrossHair to skip them and instead allow the function to return any value.

One can also apply contracts to these functions to constrain the parameters and return value.

What’s more, this capability is part of the plugin system, so you can create this kind of behavior for 3rd party modules in a reusable & shareable fashion.

You can read more about this capability in the updated plugins docs!

What’s Up Next?

I’m going to do a pass through the standard library and patch the most common sources of nondeterminism (time.time, random, etc) using Loïc’s work. You can follow that effort in this issue.
Separately, I’m making some progress refactoring hypothesis to better support the CrossHair integration. Importantly, this will give us access to a large number of properties to test CrossHair. And perhaps start a bug trophy list! You can follow this effort here.

As always, thanks for taking some time to share this journey with me. - Phil

Simple length estimation for variable-width fonts

2022-04-11T00:00:00+00:00

This post is much more mundane than CrossHair! But handy, I think.

It’s hard to compute the width of text in a font because the character “i” is likely much more narrow than the character “m”.

If you just want an estimate, you can count characters.

But can we get a better estimate without needing to know the font details?

Recently, I wanted to attempt per-character estimates, and wrote a quick script . Here they are!:

{" ": 0.328, "!": 0.381, "\"": 0.475, "#": 0.655, "$": 0.655, "%": 1.069,
"&": 0.9, "'": 0.347, "(": 0.414, ")": 0.414, "*": 0.555, "+": 0.713,
",": 0.328, "-": 0.414, ".": 0.328, "/": 0.346, "0": 0.655, "1": 0.655,
"2": 0.655, "3": 0.655, "4": 0.655, "5": 0.655, "6": 0.655, "7": 0.655,
"8": 0.655, "9": 0.655, ":": 0.346, ";": 0.346, "<": 0.713, "=": 0.713,
">": 0.713, "?": 0.619, "@": 1.201, "A": 0.864, "B": 0.829, "C": 0.862,
"D": 0.897, "E": 0.793, "F": 0.724, "G": 0.931, "H": 0.897, "I": 0.381,
"J": 0.55, "K": 0.864, "L": 0.726, "M": 1.071, "N": 0.897, "O": 0.931,
"P": 0.758, "Q": 0.931, "R": 0.862, "S": 0.758, "T": 0.759, "U": 0.897,
"V": 0.864, "W": 1.173, "X": 0.864, "Y": 0.864, "Z": 0.759, "[": 0.381,
"\\": 0.346, "]": 0.381, "^": 0.583, "_": 0.655, "`": 0.347, "a": 0.619,
"b": 0.655, "c": 0.585, "d": 0.655, "e": 0.619, "f": 0.381, "g": 0.655,
"h": 0.655, "i": 0.312, "j": 0.312, "k": 0.621, "l": 0.312, "m": 1.0,
"n": 0.655, "o": 0.655, "p": 0.655, "q": 0.655, "r": 0.414, "s": 0.55,
"t": 0.346, "u": 0.655, "v": 0.621, "w": 0.897, "x": 0.621, "y": 0.621,
"z": 0.585, "{": 0.509, "|": 0.285, "}": 0.509, "~": 0.698}

Each value in this JSON map is a width, relative to the width of a lowercase “m” character. (e.g. the width of the “i” character is 0.312 times the width of the “m” character)

This data is limited to printable ASCII characters. (the .afm files I found don’t have information for unicode, sadly) But if you find it useful, feel free to use it in your project.

The Details

I easily found some metrics for a few fonts: “Helvetica”, and “Times New Roman”. I computed a relative-width map for each, and averaged them to produce the above map. Hopfully combining a serif and sans-sarif font gives us some diversity.

This approach ignores a lot of details that matter, like letter spacing, word spacing, and ligatures.

How much better is it?

Digging up metrics for a third font, “Arial”, I computed the average per-character error compared to that font.

The estimate from the map above will be 6% different than the actual width, on average.

On the other hand, if you use a constant character width, your estimate will be 17% different than the actual width.

Using the map is clearly an improvment, but it’s also no substitute for working with the real font you’re using.

CrossHair string support, semver, and formal methods

2022-04-01T00:00:00+00:00

Hello friends. This is another bucket-of-updates kind of newsletter.

I’m hoping to write some more in-depth content soon (things like using CrossHair in practice, what CrossHair v1.0 looks like, a comparison with fuzzing). Let me know what you want to hear about too!

String support complete

Since v0.0.20, CrossHair has complete string support! CrossHair can reason about every single string method and every single regular expression feature (albeit with varying effectiveness).

Be sure to let me know how it goes if you try it out.

Contractual SemVer

SemVer is frequently criticized on the basis that every code change has the potential to break someone. We are then challenged to describe what “backwards-compatible” really means.

What if we used contracts to define “backwards-compatible?”

I’ve proposed a variant of SemVer, Contractual SemVer, to explore that idea.

CrossHair can help library authors and consumers using Contractual SemVer, too. See this example.

Formal Methods and CrossHair

At present, CrossHair is “bug finder” and not a verification tool: in particular, your code can easily have bugs that CrossHair does not discover. See this discussion for details.

However, Loïc Montandon is actively pushing CrossHair in the verification-tool direction. As a result of his work, CrossHair will:

More accurately describe when CrossHair’s symbolic exploration has been exhaustive.
Produce an exhaustive result in more situations.

I am really exicted about this work. Stay tuned for the details.

CrossHair unicode and more

2021-10-13T00:00:00+00:00

Plenty of things to talk about in this update. I’ll get right to it.

Full Unicode Support

In version 0.0.18, CrossHair can generate counterexample strings in full unicode. Until now, unicode support was probably the most glaring omission in CrossHair’s capability list.

It also has expanded symbolic support for regexes and various string methods. (full release notes)

Deal Support

CrossHair can now check contracts written in deal! Check out the details here.

One neat thing about deal in particular: it encourages you to tag side-effects of your functions. CrossHair uses some of these tags to avoid unintended side effects while checking contracts.

EuroPython Talk - Released!

All of the EuroPython recordings have been released!

I shared a talk about contracts, and you can skip ahead to my CrossHair section if you like.

One of the neat things about this section of the talk is that I get into some of the design consequences of developing with contracts specifically (as opposed to related strategies like fuzzing and property-based testing, which can also be CrossHair-facilitated).

Hacktoberfest

CrossHair has a few open issues marked for Hacktoberfest.

That said, one of the biggest ways to help CrossHair is to:

Try using CrossHair while out while you work on other Hacktoberfest projects!
Find CrossHair bugs. File issues.
File a pull request against CrossHair with failing unit test(s) (with @pytest.mark.skip).
Watch me joyously merge them and get your PR count even higher.

The Feels

Watching Python take the leader position this month in the TIOBE index, I am filled with optimism. Python, the language, has some properties that are particularly suited to the strengths of SMT solvers. (arbitrary size integers, for one)

We just need to wield that strength.

Let’s get to work.

An early Hypothesis-CrossHair integration

2021-08-25T00:00:00+00:00

TLDR: As of today, CrossHair can check Hypothesis tests. (but it’s bad at it right now!)

Preliminary Concepts:

Hypothesis - Write property-based tests in Python
CrossHair - Check Python contracts with concolic testing
design by contract - Describe your functions with preconditions and postconditions
property-based testing - Test properties of your code, not just examples
concolic testing - Test concrete execution paths with symbolic inputs
SMT solvers - Find values that satisfy constraints

The Story

I recently mentioned that property-based testing and contracts are similar.

This similarity has prompted a variety of great thought and discussions about whether CrossHair’s concolic execution engine could be applied to Hypothesis, including this longstanding github issue.

The idea of applying both fuzzing and symbolic approaches to the same set of specifications isn’t new, though these tend to be bleeding-edge projects (proptest in Rust can be both fuzzed and sometimes verified)!

Late last year, Zac Hatfield-Dodds and I brainstormed about this and dug into the challenges. This year, Matthew Law experimented with a hypothesis-strategy-introspection approach, which I’ll touch on below. And today, I’ve released a comprehensive-but-slow proof of concept (mostly in commits 1, 2, and 3).

Show it to me!

Sometimes, Hypothesis doesn’t pick the right inputs for your test; an example:

from hypothesis import given
import hypothesis.strategies as st

def round_to_millions(number):
    return ((number + 500_000) // 1_000_000) * 1_000_000

@given(st.integers())
def test_round(number):
    difference = abs(number - round_to_millions(number))
    assert difference < 500_000  # <- This should be "<=", not "<"

Hypothesis doesn’t readily guess that it needs to try an number like 500_000 to make this test fail.

But now you can run CrossHair on the same test file:

$ crosshair watch --analysis_kind=hypothesis test_rounding.py

And, after almost 3 minutes(!), it’ll find a counterexample:

I found an exception while running your function.
test_rounding.py:12:
|@given(st.integers())
|def test_round(number):
>    assert abs(number - round_to_millions(number)) < 500_000

AssertionError: 
when calling test_round(number = -500000)

Three minutes seems like a long time for a constraint solver to find this input!

In fact, CrossHair’s contract-based version of the same problem can be solved in seconds. What gives?

CrossHair is symbolically executing all of the hypothesis code which generates function inputs. That code is pretty sophisticated - it’s essentially parsing values out of a byte sequence. So CrossHair needs to do a lot of work before it even gets to the body of the test.

Some datatypes are harder to generate than others; for example, CrossHair cannot analyze Hypothesis string inputs in any reasonable timeframe.

What’s Next?

Matthew’s work above can help us detect some common cases when we don’t need to run the hypothesis input generation code. I think this makes a lot of sense in cases where this doesn’t add too much complexity to CrossHair or dependencies on Hypothesis internals - I’m hoping to incorporate at least some of this work soon.

A more ambitious arc is to introduce a new “mid-level” representation into Hypothesis itself! We’d add a layer above the byte string that can generate a fixed set of richer types: ints, strings, and floats.

The mid-level would have bi-directional transformations with the byte string format. Then, CrossHair can start at the mid-level representation; this lets us more directly apply a lot of the type-specific reasoning that SMT solvers are known for.

Zac has some ideas about how this mid-level representation could be added to Hypothesis. If you’re interested in helping to bring more effective symbolic reasoning to Hypothesis, we’d love your help - don’t hesitate to reach out!

Contracts Propagate Requirements

2021-07-30T00:00:00+00:00

Contracts and property-based testing have a lot of overlap: they both can be used to check arbitrary behaviors of your code. But contracts propagate, and that has big implications.

I was very happy to share a EuroPython talk with Marko Ristin-Kaufmann and Lauren De bruyn today. We introduced the concept of code contracts and two tools that can be used to check contracts, icontract-hypothesis and CrossHair.

One obvious difference between contracts and property-based tests: only contracts can be run in staging/production.

But I emphasized another important difference:

Unlike proprety-based tests, contract requirements propagate through your codebase.

I’ll explain with the example I used in my talk. Imagine we’re building an online shopping site, and need a function to compute the total price for an order:

class LineItem:
    item_id: str
    quantity: int

def compute_total(items: List[LineItem], prices: Dict[str, float]) -> float:
    total = 0.0
    for item in items:
        total += prices[item.item_id] * item.quantity
    return total

One might imagine that we want every order total to be greater than zero. It’s easy enough to make a postcondition for that (in the icontract syntax):

@ensure(lambda result: result > 0)
def compute_total(items: List[LineItem], prices: Dict[str, float]) -> float:
    ...

CrossHair quickly points out several ways to break this postcondition.

We might dutifully handle each of these cases. There are 4 of them!:

# There is at least one item:
@require(lambda items: len(items) > 0)

# Each quantity is at least one:
@require(lambda items: all(i.quantity > 0 for i in items))

# Every item has a price:
@require(lambda items, prices: all(i.item_id in prices for i in items))

# Every price is greater than zero:
@require(lambda prices: all(p > 0 for p in prices.values()))

After we add all these preconditions, CrossHair is satisfied.

This is an awful lot of work, and we didn’t find a single bug in our function. Was it worth it?

If this were a property-based test, probably not.

But as a contract, these new preconditions propagate to callers. And, via more contracts, to the callers of callers. This kind of propagation is very similar to what happens when you change a type: the change ripples throughout the system, causing you to notice all the other places that need to be changed.

Zooming out to a wider scale, each of the preconditions imply requirements for other parts of the system. Let’s look at each of these preconditions again, this time while “zoomed out”:

@require(lambda items: len(items) > 0)

You cannot check out if your cart is empty!

@require(lambda items: all(i.quantity > 0 for i in items))

Setting a quantity to zero should remove the entire line item!

@require(lambda items, prices: all(i.item_id in prices for i in items))

You can’t add something to a cart that doesn’t have a price!

@require(lambda prices: all(p > 0 for p in prices.values()))

When parsing prices from a feed or table, verify the prices are nonzero!

All of these requirements are important and could be easy to forget. Contracts, along with tooling like icontract-hypothesis and CrossHair, help us discover them.

Sometimes, we don’t want to propagate requirements, and that’s fine too. This comes back to a point that Marko made today - all these testing approaches (unit tests, property-based tests, contract) are complementary.

UPDATE 2021-10-12: The talk recording has been released! You can jump to the CrossHair part here!

A little about how CrossHair works: Part 1

2020-02-04T00:00:00+00:00

CrossHair is an ambitous project, and a lot of it seems pretty magical. It’s less magical than you might think, and Python does a lot of the heavy lifting for us.

This is part one in hopefully a series of posts about how CrossHair works.

I recommend the following reading as well, depending on your familiarity with other work in this space.

First, I can’t recommend the fuzzing book enough. CrossHair’s approach largely corresponds to the chapter on Concolic Fuzzing. Some parts of the Symbolic Fuzzing chapter are relevant as well. Unlike concolic execution, CrossHair’s values start each path execution as purely symbolic values, and do not get concrete values until they are needed.
CrossHair uses the Z3 SMT solver to perform its deductions. This Python-specific introduction is a good place to get a feel for what it does. Please note that I’ll say “Z3” below, but many statements about Z3 apply to all SMT solvers and/or the SMT-LIB language.

The Basic Idea

CrossHair just repeatedly calls the function you want to analyze and passes in special objects that behave like things your function expects.

CrossHair doesn’t do any kind of AST or bytecode analysis. It just calls your function.

These “special objects” behave differently on each execution. That’s how CrossHair explores different paths through your function.

These special CrossHair objects hold one or more Z3 expressions which are used in the Z3 solver. Here are some examples:

When your function takes a parameter with this python type,	we supply an object of this CrossHair type,	which holds an expression with this Z3 sort:
`int`	`SmtInt`	`IntSort()`
`bool`	`SmtBool`	`BoolSort()`
`str`	`SmtStr`	`StringSort()`
`dict`	`SmtDict`	`ArraySort(K, V)` and `IntSort()` for the length

Let’s Explore

We can initialize a CrossHair object by giving it a name:

>>> crosshair_x = SmtInt('x')

We can access the .var attribute of any CrossHair object to get the Z3 variable(s) that it holds:

>>> crosshair_x.var
x
>>> type(crosshair_x.var)

This takes the Z3 variable we just defined and adds one to it:

>>> expr = crosshair_x.var + Z3.IntVal(1)
>>> expr
x + 1
>>> type(expr)

We can create CrossHair objects not only for fresh variables, but also for Z3 expressions. So, if we wanted to wrap x + 1 back into a CrossHair object, we’d write:

>>> SmtInt(crosshair_x.var + Z3.IntVal(1))

The SmtInt class defines the __add__ method so that you don’t have to spell that out, though. You can just say crosshair_x + 1, and SmtInt does the necessary unwrapping and re-wrapping:

>>> type(crosshair_x + 1)

SmtInt also defines the comparison methods so that they return symbolic booleans:

>>> type(crosshair_x >= 0)

The symbolic boolean holds an equivalent Z3 expression:

>>> (crosshair_x >= 0).var
0 <= x

So far, everything is symbolic. But eventually, the Python interpreter needs a real value; consider:

>>> if crosshair_x > 0:
>>>   print('bigger than zero')

Should this execute the print or not? When python executes the if statement, it calls __bool__ on the SmtBool object. This method does something very special. It consults Z3:

If the Z3 boolean expression must be True (or False), just return that value.
Otherwise, decide it to be True or False randomly. Take that decision and add it to the set of Z3 constraints for this execution path. Return the (concrete) bool that we decided.

CrossHair will remember what decisions it has made so that it can make different decisions on future executions. Ultimately, we’re looking for some target thing to happen: an exception to be raised, or a postcondition to return False. When that happens, we ask Z3 for a model and report it as a counterexample.

That’s the core of how CrossHair works.

Simple, right?

Well, if there is an accomplishment about CrossHair, it’s that it tries hard to get the details right. And there are a lot of details.

Here are some of the topics I’m considering talking about next. Let me know which ones interest you the most!

Balancing the amount of work done inside and outside the solver.
Developing heuristics for effective path exploration.
Dealing with the cases that Z3 cannot. (concrete/symbolic scaling)
Interpreting logic that’s implemented in C.
Reconciling semantic differences between Python and Z3.
Dealing with mutable values.
Dealing with potentially aliased mutable values (x is y).
Creating symbolics for your custom classes.
Reconciling error behavior (ValueErrors, TypeErrors).
Implicitly converting types accurately.
Managing evaluation order. (under-approximation and over-approximation tactics)
Creating symbolics for base classes, or even for object.