Integrate formal SMT-based methods by FrNecas · Pull Request #322 · diffkemp/diffkemp

FrNecas · 2024-03-18T18:48:44Z

This PR introduces an SMT-based comparison of short sequential snippets using Z3 solver. The general approach can be described as follows:

when we find a difference and no pattern is applicable, try to check equivalence using an SMT solver
the differing code snippets to be analyzed are determined based on searching for a synchronization point, i.e. a pair of instructions after which the code can be synchronized until the end of the basic block
we construct a formula expressing the equivalence of the identified snippets -- if the inputs are the same and the blocks are executed, the outputs of the blocks must be the same. Since checking satisfiability is a hard problem, we apply a timeout.
furthermore, the extension facilitates inverse-branching-condition pattern, e.g. it can detect a more complex change such as icmp slt %1, 101 being an inverse of icmp sgt %1, 100 if %1 is an integer.

Something to discuss:

The new component is off by default and must be enabled using --use-smt. Do we always want to build this extension? The problem is that it introduces z3 as a dependency which is quite significant. However, since SMTBlockComparator is tightly coupled with DifferentialFunctionComparator, trying to exclude it from the build unless explicitly turned on would require quite a lot of preprocessor if-s. Furthermore, even if it was conditionally removed from simpll, it would still be part of the python frontend CLI which could be quite confusing.

Depends on: #323 #325 (needed for correct functionality in some more complicated cases, e.g. the KABI experiment)

FrNecas · 2024-03-19T14:00:53Z

Hm, trying to run this on the KABI experiments results in a segfault... I guess more debugging time :/

lenticularis39 · 2024-03-19T21:10:38Z

Since LLVM is built for performance, it usually just segfaults when something goes wrong, one needs to extensively use GDB :D

tests/unit_tests/simpll/DifferentialFunctionComparatorTest.cpp

diffkemp/simpll/SMTBlockComparator.cpp

viktormalik · 2024-07-03T05:21:24Z

The new component is off by default and must be enabled using --use-smt. Do we always want to build this extension? The problem is that it introduces z3 as a dependency which is quite significant. However, since SMTBlockComparator is tightly coupled with DifferentialFunctionComparator, trying to exclude it from the build unless explicitly turned on would require quite a lot of preprocessor if-s. Furthermore, even if it was conditionally removed from simpll, it would still be part of the python frontend CLI which could be quite confusing.

I would say, let's not complicate things now and always build the extension in. If we wanted to make the z3 dependency optional, probably the easiest way is to have an alternative (ifdef-ed) version of SMTBlockComparator which has empty implementations of the exported functions.

FWIW, I would also love to make it on by default but IIRC, the overhead is still quite high, especially due to finding the snippets.

viktormalik

Reviewed the first 2 commits so far. There's a couple of points for discussion so I'm posting them straight away.

Also, could you please expand the commit messages? I prefer a commit to explain what it does (even if it is obvious) so that its intentions are clear when we get back to it in future.

viktormalik · 2024-07-03T05:25:31Z

diffkemp/cli.py

+    compare_ap.add_argument("--use-smt",
+                            help="Use SMT-based checking of short snippets",
+                            action="store_true")


Point for discussion: should we make the contents of this PR a new "builtin pattern" instead of a new CLI option? I can imagine that we'll eventually have more patterns which require SMT and we'd want to allow selecting just some. Using any such pattern would automatically turn on the SMT support.

Interesting idea. I'd suggest keeping it as an option for now -- all patterns are enabled by default, right? Making this a pattern that's disabled by default feels a bit inconsistent. Furthermore, I perceive the builtin patterns as something rather lightweight and quite fast which definitely is not the case here. We can obviously change it later if we have more SMT-based patterns.

Interesting idea. I'd suggest keeping it as an option for now -- all patterns are enabled by default, right? Making this a pattern that's disabled by default feels a bit inconsistent.

No, there are patterns which are off by default, namely control-flow-only and type-casts.

Furthermore, I perceive the builtin patterns as something rather lightweight and quite fast which definitely is not the case here. We can obviously change it later if we have more SMT-based patterns.

Yes, but it feels like changing it later would remove the new --use-smt option (or at least change) which is not a nice breakage of the CLI. I'm a bit indecisive here, too, but I don't see any fundamental difference between this and built-in patterns. We'd just have to come up with a name for it.

viktormalik · 2024-07-03T05:29:57Z

diffkemp/simpll/DifferentialFunctionComparator.cpp

            // Try to find a match by moving one of the instruction iterators
            // forward (find a code relocation).
-            if (config.Patterns.Relocations
+            if (config.Patterns.Relocations && !suppressRelocationsAndSMT


The suppressRelocationsAndSMT parameter feels a bit clumsy, TBH. What if we find out that we need to turn off more builtin patterns in future?

Two alternatives come to mind:

Turn off config.Patterns.Relocations and config.UseSMT for the particular call of cmpBasicBlocksFromInstructions.

Make SMTComparator use a separate instance of DifferentialFunctionComparator having a different config.

Yeah, I wasn't a big fan of this either but couldn't think of a better solution. The first alternative feels quite dirty as well, Config represents the user configuration and should therefore IMO be constant throughout the whole run. Furthermore, I am not sure if it is even possible to use it, cmpBasicBlocks is declared as const inside LLVM API, i.e., it cannot change the values of data fields. So even if we changed config to non-const, it probably wouldn't be possible to modify its fields inside SMTBlockComparator since it is called from cmpBasicBlocks.

I can try the second alternative but there may be other drawbacks. One thing that comes to mind -- DifferentialFunctionComparator has some state, e.g. some patterns keep track of various information. We would somehow have to transfer this information (the current constructor is rather basic and accepts only a few arguments). Maintaing the list of fields that need to be transferred for the new DifferentialFunctionComparator instance to be fully functional seems quite tedious. WDYT?

Config represents the user configuration and should therefore IMO be constant throughout the whole run.

Fair point. OTOH, I think that it could be beneficial to have a way to run DifferentialFunctionComparator methods (preferably in the current context) with a different configuration, especially for tasks other than the analysis itself. For instance, when searching for the code snippets, I can imagine that we'd like to skip difference localisation as it just slows it down. Likewise, we may not want top count extended statistics for some runs.

diffkemp/simpll/SMTBlockComparator.cpp

viktormalik

Adding a couple of lower-level comments for the code itself.

Also, I'm wondering if the SMT-based inverse branching condition "pattern" could be somehow implemented closer to the original inverse branching condition pattern or it needs to be kept separate.

diffkemp/simpll/Config.cpp

diffkemp/simpll/SMTBlockComparator.cpp

diffkemp/simpll/SMTBlockComparator.h

tests/unit_tests/simpll/DifferentialFunctionComparatorTest.cpp

viktormalik · 2024-07-09T13:23:21Z

tests/unit_tests/simpll/CMakeLists.txt

+# Try to find system-wide Z3 (e.g. local build)
+find_package(Z3 CONFIG)
+if (NOT ${Z3_FOUND})
+    # Use our FindZ3.cmake (e.g. nix build)
+    find_package(Z3 REQUIRED MODULE)
+endif()
+
+target_include_directories(runTests PRIVATE ${Z3_CXX_INCLUDE_DIRS})
+target_link_libraries(runTests gtest simpll-lib ${llvm_libs} ${Z3_LIBRARIES})
+target_compile_options(runTests PRIVATE ${Z3_COMPONENT_CXX_FLAGS})


IIUC, Z3 is already linked with simpll-lib, no need to link it here again. In addition, tests do not use any Z3 includes so we don't need to add include directories either.

I would expect this as well but compilation of tests fails without this, which is weird... Maybe we are doing something wrong in our test setup?

FrNecas · 2024-10-27T19:01:35Z

I've rebased the PR and addressed the comments (hopefully all of them, other than the more high-level comment if this should be a pattern or an option -- which we probably need to discuss more closely). This is probably ready to be reviewed again.

FrNecas · 2024-11-03T12:54:44Z

Option converted to a pattern (named smt-sequential-blocks -- if you find a better name, happy to rename). I decided to keep the smt-timeout option not specific to the pattern for now, as I believe that even if we add new patterns based on SMT solving (like the more complex branching stuff), it doesn't hurt if they share the same timeout -- if a user increases the timeout, they are fine with diffkemp running longer and so it doesn't really matter if the longer runtime comes from our current sequential-blocks pattern or from some other more complex patterns.

viktormalik

Looks great! Just some naming stuff to resolve, otherwise it's pretty much ready.

diffkemp/simpll/DifferentialFunctionComparator.h

diffkemp/cli.py

The new component is intended to be used whenever a difference is found and no built-in (and custom) pattern can be used. It is then going to try to refute or confirm the found semantic difference using an SMT solver. Since SMT solving is time consuming (and the solution is more of a proof-of-concept for now), we hide this capability behind an option.

Before encoding the equivalence of differing blocks into an SMT formula, we need to find the code blocks to compare. We do so similarly to how relocations are detected, i.e., we search for a synchronization point after which the remainder of the basic blocks are semantically equivalent.

We encode the semantic equivalence of the detected blocks into an SMT formula as follows. The final formula has the form precondition && block1 && block2 && !postcondition, where precondition encodes the equivalence of input variables of the block (based on varmap), block1 and block2 encode the semantics of the compared blocks and postcondition encodes the equivalence of output variables of the blocks (i.e. those variables that are used outside the block). This commit adds implementation of the precondition and encoding of blocks. The interesting part is encoding of blocks, where we make use of LLVM IR's SSA property to build a new SMT variable for each LLVM IR register. The encoding makes use of pointer addresses of the LLVM instructions. Using this encoding, we can conveniently compare the operation performed by each instruction type.

There are two types of output variables that need to be considered when consturcting the postcondition: 1) variables used in the same basic block 2) variables used outside the current basic block For the first case, we have already analyzed these variables/instruction when searhcing for a synchronization point, i.e., we already know which variables should be semantically equivalent. Based on this, we can construct the postcondition formula. On the other hand, for the second case, we do not know the relationship of variables in the left and right block, therefore, for now, we only consider the simplest case where there is exactly 1 variable of such type on both the left and the right side.

This extension aims to facilitate a more advanced case of inverse-branch-condition pattern, where the condition is not refactored to its syntactic negation but it's still semantically inverted, for example changing x < 101 with x > 100 (where x is an integer) and then swapping branches. To facilitate analysis of these cases, whenever the initial SMT solving phase, we check if there are possibly invertible conditions. If there are, we try to invert them and check semantic equivalence again.

With the use of SMT solving, we potentially explore a lot more paths leading to a larger vertex graph (due to search for a synchronization point). This is a temporary workaround before we figure out how to reduce the call graph size in a more efficient way.

Based on experiments, SMT solving in the observed cases only takes a couple of milliseconds, or seconds at worst. We also try to reduce unnecessary SMT solving by checking whether we support all the instructions in the blocks to be analyzed.

viktormalik

Looks good!

We have CI failure, though 😢

It was decided that this is basically a pattern and that in the future, more patterns may build on SMT solving which would make the option confusing. The smt-timeout option is kept instead of making it pattern specific -- for now it seems fine if all patterns based on SMT solving share the same timeout.

viktormalik · 2024-11-25T08:46:28Z

The CI failure is not caused by this PR, seems to be some Ubuntu repo issue.

viktormalik · 2024-11-26T08:32:15Z

And we're done here 🥳

Many thanks!

FrNecas added the enhancement label Mar 18, 2024

FrNecas requested a review from viktormalik March 18, 2024 18:48

FrNecas force-pushed the fnecas-smt branch 2 times, most recently from 0316ae0 to a02f38b Compare March 19, 2024 13:20

FrNecas force-pushed the fnecas-smt branch 3 times, most recently from b9c8b3e to 40def05 Compare March 26, 2024 16:11

FrNecas force-pushed the fnecas-smt branch 4 times, most recently from 6468230 to faf9037 Compare April 16, 2024 11:39

FrNecas force-pushed the fnecas-smt branch 2 times, most recently from 2a17f91 to 9171e14 Compare April 22, 2024 14:19

FrNecas force-pushed the fnecas-smt branch from ec3cae1 to 54b563b Compare April 29, 2024 13:33

PLukas2018 reviewed Jun 10, 2024

View reviewed changes

tests/unit_tests/simpll/DifferentialFunctionComparatorTest.cpp Outdated Show resolved Hide resolved

diffkemp/simpll/SMTBlockComparator.cpp Outdated Show resolved Hide resolved

diffkemp/simpll/SMTBlockComparator.cpp Outdated Show resolved Hide resolved

FrNecas mentioned this pull request Jun 28, 2024

simplifycfg pass sometimes represents && with select instead of and instruction #308

Open

viktormalik reviewed Jul 3, 2024

View reviewed changes

viktormalik reviewed Jul 9, 2024

View reviewed changes

FrNecas force-pushed the fnecas-smt branch from 54b563b to b1e73b1 Compare October 27, 2024 18:55

FrNecas requested review from PLukas2018 and viktormalik October 27, 2024 18:55

FrNecas force-pushed the fnecas-smt branch 2 times, most recently from 17d2227 to c068e05 Compare October 27, 2024 19:01

FrNecas force-pushed the fnecas-smt branch from 558ab2d to bc02b96 Compare November 3, 2024 12:59

viktormalik reviewed Nov 14, 2024

View reviewed changes

diffkemp/simpll/DifferentialFunctionComparator.h Outdated Show resolved Hide resolved

diffkemp/cli.py Outdated Show resolved Hide resolved

FrNecas added 9 commits November 17, 2024 14:46

Optimize timeouts

18cdbaf

Based on experiments, SMT solving in the observed cases only takes a couple of milliseconds, or seconds at worst. We also try to reduce unnecessary SMT solving by checking whether we support all the instructions in the blocks to be analyzed.

Add Z3 to dependencies in the installation guide

5163bde

Add tests for SmtBlockComparator

ada53d9

FrNecas force-pushed the fnecas-smt branch 2 times, most recently from 978a904 to b68f599 Compare November 17, 2024 14:03

FrNecas requested a review from viktormalik November 17, 2024 14:10

viktormalik approved these changes Nov 19, 2024

View reviewed changes

FrNecas force-pushed the fnecas-smt branch from 4a29d40 to a1876cb Compare November 25, 2024 06:35

viktormalik merged commit 298da70 into diffkemp:master Nov 26, 2024

Conversation

FrNecas commented Mar 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FrNecas commented Mar 19, 2024

Uh oh!

lenticularis39 commented Mar 19, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

viktormalik commented Jul 3, 2024

Uh oh!

viktormalik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

viktormalik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

FrNecas commented Oct 27, 2024

Uh oh!

FrNecas commented Nov 3, 2024

Uh oh!

viktormalik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

viktormalik left a comment

Choose a reason for hiding this comment

Uh oh!

viktormalik commented Nov 25, 2024

Uh oh!

viktormalik commented Nov 26, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

FrNecas commented Mar 18, 2024 •

edited

Loading