William SchultzMy personal website.
https://will62794.github.io/
Mon, 16 Mar 2026 17:44:17 +0000Mon, 16 Mar 2026 17:44:17 +0000Jekyll v3.10.0Canonicalized Distributed Protocol Specs<style>
pre {
font-size: 0.75em;
line-height: 1.5;
margin: 1.2em 0;
padding: 1.1em 1.4em;
border-radius: 10px;
border: none;
background: linear-gradient(98deg, #f9fafc 0%, #eef1f5 100%);
box-shadow: 0 3px 18px 0 rgba(80,100,138,0.07);
color: #222;
font-family: 'Fira Mono', 'Consolas', 'SFMono-Regular', Menlo, Monaco, monospace;
overflow-x: auto;
transition: background 0.25s;
}
pre:hover {
background: linear-gradient(98deg, #f1f5fb 0%, #e5eaf3 100%);
}
</style>
<p>Formal descriptions of message passing distributed protocols are complex and heterogeneous. In theory, writing a formal spec of a distributed protocol is a good way to formalize and communicate its precise behavior. In practice, though, many of these specs become quite <a href="https://github.com/ongardie/raft.tla/blob/master/raft.tla">large</a> and <a href="https://github.com/Vanlightly/vsr-tlaplus/blob/main/vsr-revisited/paper/VSR.tla">challenging to digest</a> clearly. They use different messaging formats and patterns for how information is communicated between nodes, making protocol comprehension and modification tedious and <a href="https://jira.mongodb.org/browse/SERVER-34728">error-prone</a>. There are <a href="https://groups.google.com/g/raft-dev/c/cBNLTZT2q8o">long discussions</a> around the various message types used and comparions between Raft and Viewstamped Replication.</p>
<!-- , an EPaxos [spec](https://github.com/efficient/epaxos/blob/791b115669fca472d3136f6a2eda46c00b3f8251/tla%2B/EgalitarianPaxos.tla#L61-L90) has 9 different message types, and in general [these specs](https://github.com/ongardie/raft.tla/blob/master/raft.tla) just become pretty large and challenging to digest succinctly. -->
<!-- These specs become complex and difficult to understand when specified at sufficient level of detail to fully capture fine-grained, asynchronous message passing details [1,2,3]. -->
<p>I’ve found the way these protocols are described also often leads to confusion around the separation between (1) the messaging-specific details and communication patterns of a protocol and (2) the essential behavior required for ensuring correctness.
It would be nice to have a better <em>canonical</em> format for describing/modeling distributed protocols that makes their similarities & differences clearer, and potentially also facilitates mechanical derivation of protocol optimizations, modifications etc. without obscuring things with too many implementation specific choices.</p>
<p>Raft, for example, chooses two specific message types, <em>RequestVote</em> and <em>AppendEntries</em>, to implement its protocol behavior. It also contains a host of other specific state variables for tracking state, etc.
What does a version of Raft look like if we try to abstract away concrete message types and communication patterns i.e. specify it in what we can call a so-called “canonicalized” message passing form? We can take a very simple approach and see how far it takes us.</p>
<p>Conceptually, we will express protocols in a model where all actions on a given node follow a simple, common template:</p>
<ol>
<li><strong>Read</strong> its local state and optionally a message from the network.</li>
<li><strong>Update</strong> its local state based on this read.</li>
<li><strong>Broadcast</strong> its entire updated state into the network as a new message.</li>
</ol>
<p>We don’t impose any message type details on communication between nodes, so we can think of the behavior of every action as reading some message from the network and updating its state appropriately in response. More simply, since all messages are simply a full recording of a node’s local state at sending time, we can view every action as based on reading the remote (past) state of some other node and acting in response.</p>
<p>As an example, we can apply this to a version of the originally published <a href="https://github.com/ongardie/raft.tla/blob/master/raft.tla">Raft TLA+ spec</a>, which contains roughly 9 distinct, core protocol actions. If we write a version of this spec in a “canonical” form, we end up with the following, election related actions <code class="language-plaintext highlighter-rouge">GrantVote</code>, <code class="language-plaintext highlighter-rouge">RecordGrantedVote</code>, and <code class="language-plaintext highlighter-rouge">BecomeLeader</code> actions:</p>
<pre>
<span style="color: green">\* Server i grants its vote to a candidate server.</span>
<b>GrantVote</b>(i, m) ==
/\ m.currentTerm >= currentTerm[i]
/\ state[i] = Follower
/\ LET j == m.from
logOk == \/ LastTerm(m.log) > LastTerm(log[i])
\/ /\ LastTerm(m.log) = LastTerm(log[i])
/\ Len(m.log) >= Len(log[i])
grant == /\ m.currentTerm >= currentTerm[i]
/\ logOk
/\ votedFor[i] \in {Nil, j} IN
/\ votedFor' = [votedFor EXCEPT ![i] = IF grant THEN j ELSE votedFor[i]]
/\ currentTerm' = [currentTerm EXCEPT ![i] = m.currentTerm]
/\ UNCHANGED <<state, candidateVars, leaderVars, logVars>>
/\ BroadcastUniversalMsg(i)
<span style="color: green">\* Server i records a vote that was granted for it in its current term.</span>
<b>RecordGrantedVote</b>(i, m) ==
/\ m.currentTerm = currentTerm[i]
/\ state[i] = Candidate
/\ votesGranted' =
[votesGranted EXCEPT ![i] =
<span style="color: green">\* The sender must have voted for us in this term.</span>
votesGranted[i] \cup IF (i = m.votedFor) THEN {m.from} ELSE {}]
/\ UNCHANGED <<serverVars, votedFor, leaderVars, logVars, msgs>>
<span style="color: green">\* Candidate i becomes a leader.</span>
<b>BecomeLeader</b>(i) ==
/\ state[i] = Candidate
/\ votesGranted[i] \in Quorum
/\ state' = [state EXCEPT ![i] = Leader]
/\ nextIndex' = [nextIndex EXCEPT ![i] = [j \in Server |-> Len(log[i]) + 1]]
/\ matchIndex' = [matchIndex EXCEPT ![i] = [j \in Server |-> 0]]
/\ UNCHANGED <<currentTerm, votedFor, candidateVars, logVars, msgs>>
/\ BroadcastUniversalMsg(i)
</pre>
<p>where each action is parameterized on a message <code class="language-plaintext highlighter-rouge">m</code> whose fields exactly match the state variables on a local node, and the <a href="https://github.com/will62794/dist-protocol-canonicalization/blob/b80954af376903f503002b3608d1fefcf119573e/code/RaftAsyncUniversal/RaftAsyncUniversal.tla#L111-L122"><code class="language-plaintext highlighter-rouge">BroadcastUniversalMsg</code></a> operator simply pushes a node’s full, updated state into the network as a new message, stored in a global <code class="language-plaintext highlighter-rouge">msgs</code> state variable.</p>
<pre>
<b>BroadcastUniversalMsg</b>(s) ==
msgs' = msgs \cup {[
from |-> s,
currentTerm |-> currentTerm'[s],
state |-> state'[s],
votedFor |-> votedFor'[s],
log |-> log'[s],
commitIndex |-> commitIndex'[s]
]}
</pre>
<p>We can do this similarly for the core log replication related actions:</p>
<pre>
<span style="color: green">\* Server i appends a new log entry from some other server.</span>
<b>AppendEntry</b>(i, m) ==
/\ m.currentTerm = currentTerm[i]
/\ state[i] \in { Follower } \* is this precondition necessary?
\* Can always append an entry if we are a prefix of the other log, and will only
\* append if other log actually has more entries than us.
/\ IsPrefix(log[i], m.log)
/\ Len(m.log) > Len(log[i])
\* Only update logs in this action. Commit learning is done separately.
/\ log' = [log EXCEPT ![i] = Append(log[i], m.log[Len(log[i]) + 1])]
/\ UNCHANGED <<candidateVars, commitIndex, leaderVars, votedFor, currentTerm, state>>
/\ BroadcastUniversalMsg(i)
<span style="color: green">\* Server i learns that another server has applied an entry up to some point in its log.</span>
<b>LeaderLearnsOfAppliedEntry</b>(i, m) ==
/\ state[i] = Leader
\* Entry is applied in current term.
/\ m.currentTerm = currentTerm[i]
\* Only need to update if newer.
/\ Len(m.log) > matchIndex[i][m.from]
\* Follower must have a matching log entry.
/\ Len(m.log) \in DOMAIN log[i]
/\ m.log[Len(m.log)] = log[i][Len(m.log)]
\* Update matchIndex to highest index of their log.
/\ matchIndex' = [matchIndex EXCEPT ![i][m.from] = Len(m.log)]
/\ UNCHANGED <<serverVars, candidateVars, logVars, nextIndex, msgs>>
<span style="color: green">\* Leader advances its commit index with quorum Q.</span>
<b>AdvanceCommitIndex</b>(i, Q, newCommitIndex) ==
/\ state[i] = Leader
/\ newCommitIndex > commitIndex[i]
/\ LET \* The maximum indexes for which a quorum agrees
agreeIndexes == {index \in 1..Len(log[i]) : Agree(i, index) \in Quorum}
\* New value for commitIndex'[i]
newCommitIndex ==
IF /\ agreeIndexes /= {}
/\ log[i][Max(agreeIndexes)] = currentTerm[i]
THEN Max(agreeIndexes)
ELSE commitIndex[i]
IN
/\ commitIndex[i] < newCommitIndex \* only enabled if it actually advances
/\ commitIndex' = [commitIndex EXCEPT ![i] = newCommitIndex]
/\ UNCHANGED <<serverVars, candidateVars, leaderVars, log>>
/\ BroadcastUniversalMsg(i)
</pre>
<p>This type of specification approach gets rid of message type and communication pattern specific details from the protocol. All we do is define actions that are able to read some past state of another node and make updates based on it. In this model, we can view a protocol as specified simply in terms of (1) its state variables and (2) its actions, each of which are simply a read of some (current or past) node state.</p>
<h3 id="history-queries">History Queries</h3>
<p>We can push this specification approach further, simplifying some actions to express their reads entirely in terms of <em>history queries</em>, rather than incrementally updating and reading an auxiliary variable. For example, for the <code class="language-plaintext highlighter-rouge">BecomeLeader</code> action, it is really just waiting until the <code class="language-plaintext highlighter-rouge">votesGranted</code> variable has accumulated the right internal state so that it can safely transition to a leader state. If we ignore this variable entirely, we can express the action precondition with one big precondition query like this:</p>
<pre>
<span style="color: green">\* Candidate i becomes a leader.</span>
<b>BecomeLeader</b>(i, Q) ==
/\ state[i] = Candidate
<span style="background-color: #ccffcc">/\ \A j \in Q : \E m \in msgs : m.currentTerm = currentTerm[i] /\ m.from = j /\ m.votedFor = i</span>
/\ state' = [state EXCEPT ![i] = Leader]
/\ nextIndex' = [nextIndex EXCEPT ![i] = [j \in Server |-> Len(log[i]) + 1]]
/\ matchIndex' = [matchIndex EXCEPT ![i] = [j \in Server |-> 0]]
/\ UNCHANGED <<currentTerm, votedFor, candidateVars, logVars, msgs>>
/\ BroadcastUniversalMsg(i)
</pre>
<p>which checks for the appropriate quorum of voters given the set of messages (states) in the network.</p>
<p>We can do something similar for the log replication related actions, the <code class="language-plaintext highlighter-rouge">LeaderLearnsOfAppliedEntry</code> is another similar action that records log application progress from other nodes.</p>
<pre>
<span style="color: green">\* Leader advances its commit index.</span>
<b>AdvanceCommitIndex</b>(i, Q, newCommitIndex) ==
/\ state[i] = Leader
/\ newCommitIndex > commitIndex[i]
<span style="background-color: #ccffcc">/\ \A j \in Q : \E m \in msgs :
/\ m.from = j
/\ Len(m.log) >= newCommitIndex
/\ log[i][newCommitIndex] = m.log[newCommitIndex]
/\ m.currentTerm = currentTerm[i]</span>
/\ commitIndex' = [commitIndex EXCEPT ![i] = newCommitIndex]
/\ UNCHANGED <<serverVars, candidateVars, leaderVars, log>>
/\ BroadcastUniversalMsg(i)
</pre>
<!-- So, for example, `RecordGrantedVote` is simply checking for some `votedFor` value, and recording this state into a local variable `votesGranted`. Similarly, `BecomeLeader` is simply reading the (current) `votesGranted` state and setting some `state` variable. -->
<!-- This canonical description model also reduces the possible design space of protocols. For example, given only `state` and `currentTerm` variables, what are our possible options for implementing a protocol that ensures Election Safety? Everyone can just become leader at term when they decide to, but to ensure safety, they must check that no one else is currently leader in the term they want to go to. -->
<p>Applying this history query specification approach, we end up with a <a href="https://github.com/will62794/dist-protocol-canonicalization/blob/16b93ab4d26d7abdcd5e4fbb6306db5c1cd6d898/code/RaftAsyncUniversal/RaftAsyncUniversal.tla#L280-L295">simplified set of actions</a> for the protocol:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">BecomeCandidate</code></li>
<li><code class="language-plaintext highlighter-rouge">GrantVote</code></li>
<li><code class="language-plaintext highlighter-rouge">BecomeLeader</code></li>
<li><code class="language-plaintext highlighter-rouge">ClientRequest</code></li>
<li><code class="language-plaintext highlighter-rouge">AppendEntry</code></li>
<li><code class="language-plaintext highlighter-rouge">TruncateEntry</code></li>
<li><code class="language-plaintext highlighter-rouge">AdvanceCommitIndex</code></li>
<li><code class="language-plaintext highlighter-rouge">LearnCommit</code></li>
<li><code class="language-plaintext highlighter-rouge">UpdateTerm</code></li>
</ul>
<p>where the previously required <code class="language-plaintext highlighter-rouge">RecordGrantedVote</code> and <code class="language-plaintext highlighter-rouge">LeaderLearnsOfAppliedEntry</code> actions have been subsumed into the <code class="language-plaintext highlighter-rouge">BecomeLeader</code> and <code class="language-plaintext highlighter-rouge">AdvanceCommitIndex</code> actions respectively, as well as their associated state variables <code class="language-plaintext highlighter-rouge">votesGranted</code> and <code class="language-plaintext highlighter-rouge">matchIndex</code>.</p>
<p>Simplifying the action structure by utilizing history queries can also have a non-trivial impact on model checking performance, as we are able to cut out a number of intermediate steps from the protocol. For example, in one experiment, even for a relatively small model (3 servers, <code class="language-plaintext highlighter-rouge">MaxTerm = 2</code>, <code class="language-plaintext highlighter-rouge">MaxLogLen=1</code>), running the original spec with <code class="language-plaintext highlighter-rouge">RecordGrantedVote</code> and
<code class="language-plaintext highlighter-rouge">LeaderLearnsOfAppliedEntry</code> actions enabled generates 2,060,946 distinct states. With these actions disabled and using the history query based spec, only 27,062 distinct states were generated, a potential 75x reduction.</p>
<h3 id="query-incrementalization">Query Incrementalization</h3>
<p>Specifying a protocol in terms of history queries is conceptually satisfying and a nice way to abstract away more of the lower level protocol details. It moves the protocol further away from a practical implementation, though, since it’s not realistic for a node to have the ability to continuously read and query over the entire history of all states of other nodes. We can bridge this over to practical implementations, though, by viewing this as an <a href="https://materialize.com/blog/ivm-database-replica/">incremental view maintenance</a> problem.</p>
<p>That is, in a real system, we essentially want to maintain the correct output of these precondition queries based on the current state of the network. We can view this as an online maintenance problem i.e. instead of computing the query output over a giant batch of historical messages, we update the output of the query incrementally as each new message arrives.
This is a formal way to map between the abstract, query-oriented protocol specification and a more practical, operational algorithmic implementation. It also, in theory, is sufficiently general i.e. as long as know that the queries we write down can be computed incrementally, any protocol we specify in this manner could in theory always be automatically “incrementalized” into a practical, operational version.</p>
<p>A lot of previous work has explored the <a href="https://ecommons.cornell.edu/server/api/core/bitstreams/ef203133-30b8-45e8-a504-53b3b5443632/content">foundations</a> of evaluating these types of (first order logic) queries incrementally, particularly in the <a href="https://corescholar.libraries.wright.edu/knoesis/352/">context of Datalog</a>. I’m not as clear, though, what work has been done on automatically “incrementalizing” these types of queries into practical, operational versions for realistic protocols like Raft. <a href="https://speakerdeck.com/jhellerstein/hydroflow-a-compiler-target-for-fast-correct-distributed-programs">Hydroflow</a> might be the closest project tackling similar ideas.</p>
<!-- From an efficiency and optimization perspective, we can also deal with the reasonable objection that passing around every node's full state for any real protocol is infeasible e.g. you can't be passing around an entire Raft log in every message, even though it's easy to do in an abstract spec. So, we can also define transformation functions that operate on the full state of a node for sake of a practical efficiency. For example, if we send a message that contains a node's full state $$s$$, we can send $$m = f(s)$$ into the network, and assume the receiver can easily compute $$s = f^{-1}(m)$$ to get the full state back, so that we could still express our protocol logic in terms of the full state. -->
<!-- For example, in Raft as classically defined, an AppendEntries message may only send one new log entry (or a chunk of them) from a primary to a follower. This is based on a local computation, though, based on its knowledge of its own log and the log application progress (`matchIndex`) of the follower node. So, we can think of this as applying some transformation function $$f$$ on these local state variables to produce a message format that is efficient to send across the network. -->
<h3 id="related-work">Related Work</h3>
<p>This approach is similar to past work on the <a href="https://link.springer.com/article/10.1007/s00446-009-0084-6">Heard-Of Model</a>, and also a specification approach taken in some <a href="https://dl.acm.org/doi/10.14778/3137765.3137778">PaxosStore specifications</a> from WeChat that they refer to as <em>semi-symmetric</em> message passing. The notion of specifying protocols as queries over histories also has been around for a while. This includes the foundational work done on <a href="https://www2.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-173.pdf">Dedalus</a> and <a href="https://bloom-lang.net/">Bloom</a> by Peter Alvaro and also on <a href="https://dl.acm.org/doi/10.1145/2994595">DistAlgo</a>. My understanding is that this also overlapped somewhat with the “relational transducer” model for declarative networking used in <a href="https://netdb.cis.upenn.edu/fvn/ndlogsemantics.pdf">NDLog</a> and <a href="https://arxiv.org/pdf/1012.2858">similar techniques</a>. The general idea of a history-oriented approach to specification has appeared in a kind of folk way in some of Lamport’s <a href="https://github.com/tlaplus/Examples/blob/9ac1cdc8d54ce619105ffed96a7c9b52041733ae/specifications/Paxos/Paxos.tla#L108-L141">original specs</a> of Paxos. Similar concepts also appear in posts on a <a href="https://quint-lang.org/posts/soup#long-story-short">message soup</a> approach to modeling. I believe the <a href="https://hydro.run/papers/hydroflow-thesis.pdf">Hydroflow work</a> is also more recently taking these ideas further by concretely exploring ways to incrementally compute (e.g. compile) network or dataflow queries.</p>
Sat, 07 Mar 2026 00:00:00 +0000
https://will62794.github.io/distributed-systems/2026/03/07/canonical-dist-protocols.html
https://will62794.github.io/distributed-systems/2026/03/07/canonical-dist-protocols.htmldistributed-systemsVerified Transpilation with Claude<p>We can check correctness of a TLA+ specification using the <a href="https://github.com/tlaplus/tlaplus">TLC</a> model checker, which will exhaustively explore a spec’s reachable states to check that a specified property (i.e. an invariant) holds.
TLC was originally developed <a href="https://link.springer.com/chapter/10.1007/3-540-48153-2_6">over 20 years ago</a> and has had a lot of development effort put into it. It is a mature and performant tool, but it is written in Java and it is essentially a dynamic interpreter for the TLA+ language. So, it is still likely unable to reach a theoretical upper limit of performance for checking finite, explicit-state system models, meaning there are still <a href="https://conf.tlapl.us/2018/kuppe.pdf">performance gains</a> to be had by moving to a lower-level representation for model checking. This is basically the approach taken by other state-of-the-art model checkers within their domain like <a href="https://spinroot.com/spin/whatispin.html">SPIN</a> i.e. they generate C code for model checking that can be compiled and run natively, rather than dynamically interpreting the model code.</p>
<h2 id="transpiling-with-claude">Transpiling with Claude</h2>
<p>In general, doing this kind of lower-level translation task for TLA+ would be relatively nontrivial e.g. compiling TLA+ constructs down into some lower level representation (e.g. in C/C++ data structures) for compilation and native execution. Building any kind of general approach here requires a somewhat detailed understanding of the language and existing interpreter implementations, and how to effectively translate this into a lower level representation while preserving semantics accurately.</p>
<p>Instead of building a whole compilation engine, we can try asking Claude to do these as one-off translations for us. This is a kind of standard transpilation/compilation task, but in a “bespoke” way, since we’re not aiming to build any kind of generic compiler, and can also take advantage of any details specific to the given problem instance (more and more software problems seem to be falling under this type of “bespoke” category with LLMs). Some <a href="https://arxiv.org/abs/2406.03003">other folks</a> have tried doing this recently for a variety of standard programming languages.</p>
<p>Since we already have TLC as an existing, reference interpreter, we can also ask Claude to generate an automated validation harness for us i.e. one that checks (at least for finite domains), that the output of the optimized C++ version of the model exactly matches that from the original TLA+ model. This gives us a convenient kind of (approximately) verified compilation step for going from high level TLA+ spec to a lower level model.</p>
<p>We can easily try this out for a given TLA+ spec by condensing this whole workflow into a prompt to Claude Code (i.e. wrap it into a <a href="https://code.claude.com/docs/en/skills">skill</a>). The prompt itself was developed over a few rounds of trial and error and refinement, to make sure Claude knew how to generate scripts with the right arguments, compare outputs properly, etc. The overall prompt is as follows:</p>
<style>
pre {
white-space: pre-wrap; /* Since CSS 2.1 */
white-space: -moz-pre-wrap; /* Mozilla, since 1999 */
white-space: -pre-wrap; /* Opera 4-6 */
white-space: -o-pre-wrap; /* Opera 7 */
word-wrap: break-word; /* Internet Explorer 5.5+ */
font-size: 14px;
border: 1px solid #ccc;
}
.language-markdown{
font-size: 12px;
}
</style>
<figure class="highlight"><pre><code class="language-markdown" data-lang="markdown"><span class="gu">## Generate Optimized C++ version of TLA+ Spec</span>
Take the chosen TLA+ spec (ask the user for which one) and generate a C++ program that generates its full reachable state space as the model checker would do but in a way as optimized as possible for a C++ implementation. Do this single threaded, and check with the user for how to instantiate the constant finite parameters in the compiled C++ version. Ensure that the C++ version dumps out all states in a standard JSON format, and can output this to a JSON file. Assume a general JSON dump format that contains a state array like <span class="sb">`{ "states": [ {fp: <uint64_t>, val: <JSON>, initial: <boolean>}, ... ]}`</span>, where 'val' is the actual JSON representation of that state, and 'fp' is some hash/fingerprint for that state. Also add an option to run this state space exploration with JSON dumping disabled.
Finally generate a Makefile with a simple, barebones default target for building it.
<span class="gu">## Validate Conformance between TLC and C++ version </span>
Now validate to make sure that the set of states generated and dumped into JSON by the C++ version match the set of states generated and dumped by TLC in JSON. Generate a Python script that runs TLC to generate the same state space and dump it to JSON using the tla2tools-checkall.jar binary which supports a <span class="sb">`-dump json states.json`</span> argument, and then have the script validate that the states match between the TLC output and the C++ generated state space.
Generate a simple validation report in Markdown after completing this.
<span class="gu">## Benchmark Throughput Difference</span>
Measure the throughput (states/second) difference in the state space generation states between TLC and the C++ version. Check with the user for the finite model config parameters to use for this run, and update the generated C++ version of the spec to account for this if needed. You can do this benchmark by measuring the total runtime of TLC for an exhaustive run, and measuring its time duration and from this compute distinct states per second, and doing this similarly for the C++ version. When doing this, disable JSON dumping for both TLC and C++ to avoid the associated overhead. In order to measure the throughput of TLC, make sure to use the time duration reported by the final output of TLC. You don't need to do multiple runs of each, a single run is fine.
Generate a simple markdown report file on the results once the benchmark is complete.</code></pre></figure>
<p>From within a <a href="https://github.com/will62794/model-compiler">repo</a>, we can <a href="https://github.com/will62794/model-compiler/blob/main/.claude/commands/compile_tla.md">store this</a> as a Markdown file under <code class="language-plaintext highlighter-rouge">~/.claude/commands</code> and then open up Claude Code and run the <code class="language-plaintext highlighter-rouge">compile_tla</code> command, which will then prompt us to get started with a given TLA+ spec.</p>
<h2 id="benchmarks">Benchmarks</h2>
<p>We can start with a test on the <a href="https://github.com/will62794/model-compiler/blob/main/TwoPhase/TwoPhase.tla"><code class="language-plaintext highlighter-rouge">TwoPhase.tla</code></a> specification, a standard TLA+ example and benchmark modeling two-phase commit. If we start up Claude Code and run our compilation command on this spec, Claude chugs away, with a few interaction points from the user (e.g. to confirm finite model parameters, etc.) and we can see it generate the following validation report, for a finite model with 4 transaction resource managers:</p>
<figure class="highlight"><pre><code class="language-markdown" data-lang="markdown"><span class="gh"># TwoPhase Spec Validation Report</span>
<span class="gu">## Configuration</span>
<span class="p">-</span> <span class="gs">**Spec**</span>: TwoPhase.tla
<span class="p">-</span> <span class="gs">**Resource Managers**</span>: 4 (rm1, rm2, rm3, rm4)
<span class="p">-</span> <span class="gs">**Depth Limit**</span>: Unlimited (full state space)
<span class="gu">## Results Summary</span>
| Metric | TLC | C++ |
|--------|-----|-----|
| Total States | 1568 | 1568 |
| Initial States | 1 | 1 |
| Runtime | 1.90s | 0.48s |
<span class="gu">## State Comparison</span>
<span class="p">
-</span> <span class="gs">**Common states**</span>: 1568
<span class="p">-</span> <span class="gs">**Only in TLC**</span>: 0
<span class="p">-</span> <span class="gs">**Only in C++**</span>: 0
<span class="gu">## Validation Status: PASSED</span></code></pre></figure>
<p>As a sanity check, we can go into this spec’s directory and take a look. Claude generated a <a href="https://github.com/will62794/model-compiler/blob/60c3c076f34d0a2984143205096b952d657c66eb/TwoPhase/saved_outputs/TwoPhase.cpp">456 line C++ file</a>, <code class="language-plaintext highlighter-rouge">TwoPhase.cpp</code>, that when compiled and run produces:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>./twophase
TwoPhase State Space Generator <span class="o">(</span>C++<span class="o">)</span>
Configuration: NUM_RM <span class="o">=</span> 4
Depth limit: unlimited
JSON output: disabled
Exploration complete.
States found: 1568
Transitions: 5377
Duration: 0.000155417 seconds
Throughput: 10088986 states/second
</code></pre></div></div>
<p>If we run TLC with the same model parameters, we get the following:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Model checking completed. No error has been found.
Estimates of the probability that TLC did not check all reachable states
because two distinct states had the same fingerprint:
calculated (optimistic): val = 3.2E-13
5378 states generated, 1568 distinct states found, 0 states left on queue.
The depth of the complete state graph search is 14.
The average outdegree of the complete state graph is 1 (minimum is 0, the maximum 9 and the 95th percentile is 4).
Finished in 00s at (2026-01-20 21:42:36)
</code></pre></div></div>
<p>which feels a strong extra sanity check that the C++ model is doing the right thing. Even generating the exactly correct number of distinct, reachable states would be hard to cheat, and the generated Python <a href="https://github.com/will62794/model-compiler/blob/81be9fb8c91e87cb354c4982e0ea915c0c59ef4f/TwoPhase/saved_outputs/validate.py">validation script</a> should also ensure that the generated JSON state spaces match exactly between both TLC and the C++ version.</p>
<p>As a few extra sanity “spot checks”, we can also run a few manual queries on the JSON outputs from TLC and the C++ version. As an example, one of the generated JSON states from TLC looks like the following:</p>
<div class="language-json highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="p">{</span><span class="w">
</span><span class="nl">"fp"</span><span class="p">:</span><span class="w"> </span><span class="mi">12161962213042174405</span><span class="p">,</span><span class="w">
</span><span class="nl">"val"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"rmState"</span><span class="p">:</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="nl">"rm1"</span><span class="p">:</span><span class="w"> </span><span class="s2">"working"</span><span class="p">,</span><span class="w">
</span><span class="nl">"rm2"</span><span class="p">:</span><span class="w"> </span><span class="s2">"working"</span><span class="p">,</span><span class="w">
</span><span class="nl">"rm3"</span><span class="p">:</span><span class="w"> </span><span class="s2">"working"</span><span class="p">,</span><span class="w">
</span><span class="nl">"rm4"</span><span class="p">:</span><span class="w"> </span><span class="s2">"working"</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"tmState"</span><span class="p">:</span><span class="w"> </span><span class="s2">"init"</span><span class="p">,</span><span class="w">
</span><span class="nl">"tmPrepared"</span><span class="p">:</span><span class="w"> </span><span class="p">[],</span><span class="w">
</span><span class="nl">"msgs"</span><span class="p">:</span><span class="w"> </span><span class="p">[]</span><span class="w">
</span><span class="p">},</span><span class="w">
</span><span class="nl">"initial"</span><span class="p">:</span><span class="w"> </span><span class="kc">true</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>so even counting the occurrences of a few field values serves as a reasonable validation hash on the outputs e.g.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ for val in "aborted" "working" "prepared"; do grep -o $val tlc_states.json | wc -l && grep -o $val cpp_states.json | wc -l; done
4144
4144
1120
1120
2272
2272
</code></pre></div></div>
<p>In general, it feels helpful and important to have these kinds of lightweight, human verifiable “spot checks”. Even with a nice, verifiable task like we do here, asking the model to “verify all its outputs” still doesn’t engender a strong enough degree of trust that it did the right thing, at least for these experimental tasks where you are less confident in its outputs.</p>
<p>After running the benchmarking step, Claude also generated this report:</p>
<figure class="highlight"><pre><code class="language-markdown" data-lang="markdown"><span class="gh"># TwoPhase State Space Generation Benchmark</span>
<span class="gu">## Configuration</span>
<span class="p">
-</span> <span class="gs">**Spec**</span>: TwoPhase.tla
<span class="p">-</span> <span class="gs">**Resource Managers**</span>: 8 (rm1, rm2, rm3, rm4, rm5, rm6, rm7, rm8)
<span class="p">-</span> <span class="gs">**JSON Output**</span>: Disabled (pure exploration benchmark)
<span class="p">-</span> <span class="gs">**Workers**</span>: 1 (single-threaded)
<span class="p">-</span> <span class="gs">**Benchmark Iterations**</span>: 3
<span class="gu">## Results</span>
| Metric | TLC | C++ |
|--------|-----|-----|
| States | 1,745,408 | 1,745,408 |
| Avg Duration | 49.9251s | 0.851219s |
| Min Duration | 43.0652s | 0.808712s |
| Avg Throughput | 34,961 states/s | 2,050,480 states/s |
| Max Throughput | 40,529 states/s | 2,158,257 states/s |
<span class="gu">## Speedup</span>
<span class="p">
-</span> <span class="gs">**Average throughput speedup**</span>: C++ is <span class="gs">**58.7x faster**</span> than TLC
<span class="p">-</span> <span class="gs">**Peak throughput speedup**</span>: C++ is <span class="gs">**53.3x faster**</span> than TLC</code></pre></figure>
<p>showing us the C++ version achieved over a 50x throughput speedup over TLC for a larger parameter configuration (8 resource managers).</p>
<p>This is pretty cool, and impressive that Claude is able to generate what seems to be a semantically accurate translation of the high level spec in essentially one-shot. It also seems reasonable that validating these kinds of translation steps for smaller finite parameters would be sufficient to assume generalization to larger parameter configurations e.g. if it is desirable to run larger model checking runs but would be infeasible to do full validation at those larger parameters.</p>
<h3 id="abstractdynamicraft">AbstractDynamicRaft</h3>
<p>We can run the above command for another spec, an <a href="https://github.com/will62794/model-compiler/blob/main/AbstractDynamicRaft/AbstractDynamicRaft.tla">abstracted variant of Raft</a> that also includes basic dynamic reconfiguration functionality. Running our prior command again, Claude generates a <a href="https://github.com/will62794/model-compiler/blob/81be9fb8c91e87cb354c4982e0ea915c0c59ef4f/AbstractDynamicRaft/saved_outputs/AbstractDynamicRaft.cpp">740 line C++ file</a> and generates the following validation report:</p>
<figure class="highlight"><pre><code class="language-markdown" data-lang="markdown"><span class="gh"># AbstractDynamicRaft Validation Report</span>
<span class="gu">## Summary</span>
| Metric | Value |
|--------|-------|
| TLC States | 470098 |
| C++ States | 470098 |
| Common States | 470098 |
| Only in TLC | 0 |
| Only in C++ | 0 |
| TLC Initial States | 7 |
| C++ Initial States | 7 |
<span class="gu">## Result</span>
<span class="gs">**✓ PASSED**</span>: The state spaces match exactly.
The C++ implementation correctly generates the same state space as TLC.
<span class="gu">## Configuration</span>
<span class="p">
-</span> <span class="gs">**Spec**</span>: AbstractDynamicRaft.tla
<span class="p">-</span> <span class="gs">**Server**</span>: {n1, n2, n3}
<span class="p">-</span> <span class="gs">**MaxTerm**</span>: 2
<span class="p">-</span> <span class="gs">**MaxLogLen**</span>: 2
<span class="p">-</span> <span class="gs">**MaxConfigVersion**</span>: 2
<span class="p">-</span> <span class="gs">**InitTerm**</span>: 0</code></pre></figure>
<p>and followed by the benchmark report:</p>
<figure class="highlight"><pre><code class="language-markdown" data-lang="markdown"><span class="gh"># AbstractDynamicRaft Benchmark Report</span>
<span class="gu">## Summary</span>
Comparison of state space exploration throughput between TLC (Java) and optimized C++ implementation.
| Metric | TLC | C++ |
|--------|-----|-----|
| Distinct States | 470,098 | 470,098 |
| Duration | 60.00s | 1.696s |
| Throughput | 7,835 states/sec | 277,180 states/sec |
<span class="gu">## Performance Analysis</span>
<span class="p">
-</span> <span class="gs">**C++ Speedup**</span>: <span class="gs">**35.4x**</span> faster than TLC
<span class="p">-</span> <span class="gs">**TLC Throughput**</span>: 7,835 states/second
<span class="p">-</span> <span class="gs">**C++ Throughput**</span>: 277,180 states/second
<span class="gu">## Configuration</span>
<span class="p">
-</span> <span class="gs">**Spec**</span>: AbstractDynamicRaft.tla
<span class="p">-</span> <span class="gs">**Server**</span>: {n1, n2, n3}
<span class="p">-</span> <span class="gs">**MaxTerm**</span>: 2
<span class="p">-</span> <span class="gs">**MaxLogLen**</span>: 2
<span class="p">-</span> <span class="gs">**MaxConfigVersion**</span>: 2
<span class="p">-</span> <span class="gs">**InitTerm**</span>: 0
<span class="gu">## Notes</span>
<span class="p">
-</span> JSON state dumping was disabled for both TLC and C++ during benchmarking
<span class="p">-</span> TLC was run with <span class="sb">`-Xmx8g`</span> heap and <span class="sb">`-XX:+UseParallelGC`</span>
<span class="p">-</span> C++ was compiled with <span class="sb">`-O3 -march=native -flto`</span> optimizations
<span class="p">-</span> Single-threaded execution for both</code></pre></figure>
<p>showing a roughly 35x speedup over TLC when running with the C++ version.</p>
<h3 id="bakery">Bakery</h3>
<p>Finally, we can try out one more benchmark, a <a href="https://github.com/will62794/model-compiler/blob/main/Bakery/Bakery.tla">specification</a> of Lamport’s Bakery algorithm for mutual exclusion, with the following validation results:</p>
<figure class="highlight"><pre><code class="language-markdown" data-lang="markdown"><span class="gh"># TLA+ / C++ State Space Validation Report</span>
<span class="gu">## Specification</span>
<span class="p">-</span> <span class="gs">**TLA+ Spec**</span>: Bakery.tla (Lamport's Bakery Algorithm)
<span class="p">-</span> <span class="gs">**Configuration**</span>: N=2, Nat={0,1,2,3}
<span class="gu">## Results</span>
| Metric | TLC | C++ |
|--------|-----|-----|
| Distinct States | 7161 | 7161 |
<span class="gu">## Validation Status</span>
<span class="gs">**PASSED**</span>: State spaces are identical.</code></pre></figure>
<p>and followed by the benchmarking report:</p>
<figure class="highlight"><pre><code class="language-markdown" data-lang="markdown"><span class="gh"># TLA+ / C++ State Space Exploration Benchmark Report</span>
<span class="gu">## Specification</span>
<span class="p">-</span> <span class="gs">**TLA+ Spec**</span>: Bakery.tla (Lamport's Bakery Algorithm)
<span class="p">-</span> <span class="gs">**Configuration**</span>: N=3, Nat={0,1,2,3}
<span class="p">-</span> <span class="gs">**Distinct States**</span>: 6,016,610
<span class="gu">## Benchmark Configuration</span>
<span class="p">-</span> JSON state dumping: <span class="gs">**disabled**</span> for both tools (measuring pure exploration throughput)
<span class="p">-</span> TLC: Single worker thread with parallel GC
<span class="p">-</span> C++: Single-threaded BFS exploration with O3 optimization
<span class="gu">## Results</span>
| Metric | TLC | C++ |
|--------|-----|-----|
| Total Time | 102 sec | 5.45 sec |
| Distinct States | 6,016,610 | 6,016,610 |
| Throughput | ~58,986 states/sec | ~1,103,780 states/sec |
<span class="gu">## Performance Comparison</span>
| Metric | Value |
|--------|-------|
| <span class="gs">**Speedup**</span> | <span class="gs">**18.7x**</span> |
| C++ / TLC Throughput Ratio | 18.71 |
<span class="gu">## Analysis</span>
The C++ implementation achieves approximately <span class="gs">**18.7x higher throughput**</span> than TLC for the Bakery algorithm state space exploration.</code></pre></figure>
<p>Again, along with validation success, we get almost a 19x throughput speedup.</p>
<h3 id="final-thoughts">Final Thoughts</h3>
<p>This is another impressive, general capability of coding agents, and also an example that re-frames the types of programming tasks we might traditionally care about solving. In a “classical” view of programming, the only natural way to solve this task would be to build a general purpose compiler, but LLMs let us consider just making these one-off tasks solvable in a <a href="https://www.geoffreylitt.com/2023/03/25/llm-end-user-programming.html"><em>bespoke</em> way</a>, that simplifies away the task of building a compiler altogether. So, both the intelligence and generality of the LLMs can reduce the hardness of the types of problems that need to be solved, for problems that have this “bespoke” quality to them.</p>
<p>It is worth pointing out a variety of caveats that still limit this approach as a practical, real-world solution. First, all of the above was limited to single threaded execution, and TLC is able to safely run many model checking workers in parallel, which requires extra care around concurrency control and efficient data structure design e.g. a shared, concurrent BFS queue is required to be managed between workers, as well as the state hash (fingerprint) set, which has been a source of nontrivial <a href="https://lemmster.de/talks/MSc_MarkusAKuppe_1497363471.pdf">performance engineering work</a> in past. Additionally, one of TLC’s unique features is also its ability to spill states to disk when they are too large to fit in memory. The above approach would be fundamentally memory limited, but with modern machines this is becoming less of a concern. Nevertheless, this is still quite a promising solution for simply the inner loop of any model checking or verification task which ultimately still requires fast generation and evaluation of the transition relation of a spec in order to generate reachable states.</p>
<p>A recurring pattern in this type of experiment is the balance of trust between you and the agent. Having a verifiable feedback loop helps a lot, but even still, it still felt necessary to review the outputs at different stages and manually verify things were actually being done correctly and not cheated in odd ways. For example, checking that the generated state spaces actually contain real, nonempty sets of states, etc.</p>
<p>I think there also seemed like a slightly related, missing feature here, which is kind of similar to the <a href="https://arxiv.org/abs/2501.07278">continual learning</a> idea. That is, being confident that the LLM is developing a deeper understanding of the problem at hand to a point where you trust it further to take on things more autonomously. It felt hard to predict when and where it would or wouldn’t make the same mistakes twice, or go off on small tangents that felt unexpected. I think a lot of these issues can sort of be addressed with careful trial and error and prompt refinement, but there may still be a better kind of “teaching” workflow here to get an agent up to speed on a new problem and have it record the knowledge it learned more durably. At one point it seemed like it could be promising to start off with a “teaching” session, to have Claude learn about the workflow and develop its <em>own</em>, repeatable prompt for the task itself, but I didn’t go very far with this.</p>
<p>As with many LLM-oriented tasks, the determinism of the outputs of these type of workflows was also hazy to understand well, and there often seemed to be a better ideal breakdown of a workflow into steps which are truly “non-deterministic” or LLM-driven and those which can be cached as relatively deterministic scripts (e.g. the validation scripts). When starting out, though, it is easiest to write everything up a single agent prompt and re-run the workflow from scratch to test it out and experiment. Working with Claude in this way also makes it really nice to think about these experimentation workflows “end to end” without focusing on chaining together various Python, bash scripts, compilation steps, etc. In a sense, a true kind of <a href="https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/">end-to-end research/experimentation workflow</a>. Especially when going further and generating whole written reports or visuals from an experiment, that is something that typically is super manual and requires a lot of analysis and stitching things together.</p>
<p><em>All the tests here were run with Claude Code v2.1.14 on Opus 4.5, on 2024 Apple M3 Macbook Pro, and the code and Claude prompts found <a href="https://github.com/will62794/model-compiler">here</a>.</em></p>
Tue, 20 Jan 2026 00:00:00 +0000
https://will62794.github.io/verification/llms/compilation/2026/01/20/bespoke-compilation.html
https://will62794.github.io/verification/llms/compilation/2026/01/20/bespoke-compilation.htmlverificationllmscompilationGit for Transactions<p>The idealized model of a transactional data storage system is one of a sequential, serializable system, where clients can submit transactions and the system ensures the outcomes are as if those transactions were executed against a single copy of that data. In practice, performance limitations of this model have historically pushed systems to explore a wide set of alternative, weakly consistent models.</p>
<!-- An alternate approach is to re-examine the application level requirements we really want out of weakly consistent systems. In general, strongly consistent systems try to offer a sequential (e.g. linearizable/serializable) data store as the fundamental abstraction to a client. Alternatively, weakly consistent systems do not provide such a clean abstraction. -->
<p>In the weakly consistent world, one approach is to define some reasonable isolation or consistency level that is <a href="http://www.bailis.org/papers/ramp-sigmod2014.pdf">applicable to a wide enough range of applications</a>, or allow the application to <a href="https://learn.microsoft.com/en-us/azure/cosmos-db/consistency-levels">tune the consistency level</a> to their specific needs. Another perspective is to lean even more strongly into the detailed mechanics of the weakly consistent model, abandoning the strictly sequential view of storage, and expose this flexibility to users. This is the type of approach explored in <a href="https://www.cs.cornell.edu/~youerpu/papers/2016-sigmod-tardis.pdf">TARDiS</a> (SIGMOD 2016), a concurrency control approach for transactional storage systems that essentially throws out the sequential data store model, and instead adopts weak consistency by making explicit a notion of <em>branch and merge</em> concurrency model.</p>
<div style="text-align: center;">
<img src="/assets/git-for-txns/basic-merging-2.png" alt="TARDiS Model" width="530" />
</div>
<p>Essentially, TARDIS adopts a view of transactional isolation and consistency in the style of <a href="https://en.wikipedia.org/wiki/Git">Git</a> i.e. at the start of a transaction a client forks history onto their own local branch, and are able to perform reads and writes in isolation on this branch. When they have completed their operations, they can go ahead and “merge” back their changes into a main branch of history. TARDiS leaves this merging task to the application, rather than to the underlying storage layer.</p>
<h2 id="branch-and-merge-transactions">Branch and Merge Transactions</h2>
<p>The proposed TARDiS system consists of a transactional key-value store that tracks conflicting execution branches with 3 mechanisms:</p>
<ol>
<li>branch on conflict</li>
<li>inter-branch isolation</li>
<li>application-specific merge</li>
</ol>
<p>In a standard transactional data store, we can imagine that the entire system consists of a single, linear history of states. As transactions commit, the effects of their write operations are applied to the latest state in this linear history, and new transactions may read from some state in this history (e.g. from either the latest state or some historical snapshot).</p>
<p>TARDiS breaks from this model and instead includes an explicit notion of branching into their data store abstraction. That is, when clients execute transactions, they may can do so in a <em>single mode</em>, which means they are executing their transactions against a chosen branch, or in a <em>merge mode</em>, which allows them to explicitly decide how to merge together conflicting changes across branches. As in Git, branches are conceptually isolated from concurrent transactions, so can be viewed conceptually as their own linear/sequential thread of history by an application.</p>
<h3 id="begin-and-commit">Begin and Commit</h3>
<p>In this model, there are a few natural modifications to the lifecycle of a transaction. First, when a transaction begins, it is not obvious where the transaction will “start from”, since there is no longer a global, sequential state history. So, a transaction first needs some strategy for selecting a branch to being execution against i.e. which state in the history DAG it will begin from, which is called its <em>read state</em>. Similarly, upon commit, a transaction can choose a <em>commit state</em>, which is the state where it will append its new changes to in the history DAG.</p>
<div style="text-align: center;">
<img src="/assets/git-for-txns/tardis-branching.png" alt="TARDiS Model" width="600" />
</div>
<p>There is also a notion of <em>begin</em> and <em>end constraints</em>, which are additional conditions on start and commit that place extra validity conditions on a transaction, allowing a user more control over the degree of local branching. Essentially, <em>begin</em> constraints place conditions on what read states are valid for a transaction to choose from, and and <em>end</em> constraints place conditions on whether a transaction is valid to successfully commit.</p>
<div style="text-align: center;">
<img src="/assets/git-for-txns/begin-commit-constraints.png" alt="TARDiS Model" width="470" />
</div>
<p>These constraints can be used and composed to guarantee the properties of various standard database isolation levels e.g. snapshot isolation or serializability.</p>
<p>For example, to achieve <em>serializability</em>, one can combine the constraints of</p>
<ul>
<li>Begin Constraint: <em>Ancestor</em></li>
<li>End Constraint: <em>Serializability</em>, <em>No Branching</em></li>
</ul>
<p>These constraints require that a transaction starts from a read state that is the child of its latest committed transaction, and enforces that upon commit, the state does not fork the history. It also will implicitly require tracking of read and write sets of transactions on a branch, since for serializability, we may need to validate that no concurrent transactions intersected with our write/read sets.</p>
<p>There are also constraints for ensuring <em>snapshot isolation</em> e.g. if you do something similar to the serializability constraints but validate write-write conflicts between transactions. The paper does not go into depth on the formal definitions of these constraints, but my impression is that they are sufficient to provide guarantees analogous to these standard isolation levels.</p>
<h3 id="merging">Merging</h3>
<p>To make the concept of branch merging explicit, TARDiS includes a concept of <em>merge transactions</em>. Conceptually, these can be viewed as similar to standard transactions in <em>single mode</em>, except that they may operate on multiple <em>read states</em> (i.e. multiple branches). These merge transactions are also a bit special in that they are given access to additional structure about the global state DAG, most notably</p>
<ul>
<li><em>Fork Points</em>: the fork point in the history between the set of states being merged</li>
<li><em>Conflict Writes</em>: the set of conflicting writes that occurred on the set of branches being merged.</li>
</ul>
<p>Access to this information allows merge transactions to explicitly resolve conflicts between branches in an application-specific manner. For example, they take the example of a simple counter value that has diverged among conflicting branches. Given the values on each branch and the fork point, a merge transaction can compute a new, resolved value by summing the difference between the value on each branch plus the value at the fork point.</p>
<div style="text-align: center;">
<img src="/assets/git-for-txns/counter-code.png" alt="TARDiS Model" width="460" />
</div>
<p><br /></p>
<h2 id="concluding-thoughts">Concluding Thoughts</h2>
<p>It is interesting to note how the ideas in this paper echo the work that came just a bit later, in Crooks work on <a href="https://www.cs.cornell.edu/lorenzo/papers/Crooks17Seeing.pdf">state-based isolation formalism</a>, which appeared in PODC 2017. It seems that related ideas were present in this work, and the similar ideas were being developed concurrently. For example, the notions of <em>read states</em> and <em>end constraints</em> appear quite analogous to the “read state” and “commit test” concepts in the client-centric formulation. In general, both papers seem to share a common conceptual core of viewing transactional isolation models as centered around <em>state-centric histories</em> i.e. the database moves through a sequence of states over time, and new transactions may conceptually read from one of these states, and upon commit may create a new state, appending to this history.</p>
<p>TARDiS is an interesting attempt at an alternative to managing weakly consistent data interfaces in a more principled manner. On the flip side, my intuition is that managing and merging these branches in a complex application would become burdensome and unintuitive for most application developers. For software and systems builders, and those familiar with Git, DAGs, etc. this may be more palatable, but even in Git I find that it is rare I have ever dealt with merging of more than a 1-2 branches at a time. Even then, dealing with merge conflicts in general can still be somewhat tedious. Perhaps this type of system, though, would be effective as a slightly more internal layer, that other tools/apps could build on top of, rather than having users directly interface with it themselves. Regardless, I think the ideas in the paper are productive and useful as an alternative model for conceptualizing transactions in general and especially weak consistency or isolation models.</p>
<p>They also note that similar ideas have been explored in past, including <a href="https://dl.acm.org/doi/10.5555/1267680.1267707">Olive</a> and <a href="https://www.cs.utexas.edu/~lorenzo/corsi/cs380d/papers/p172-terry.pdf">Bayou</a>, and there is sort of a <a href="https://www.dolthub.com/blog/2024-07-08-are-git-branches-mvcc/">folk understanding</a> of the <a href="https://buttondown.com/jaffray/archive/git-workflow-is-snapshot-isolated/">underlying relationships</a> between multiversion concurrency control, Git, snapshot transactions, etc. This also bear similarities to other earlier work on <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/ecr-esop2012.pdf">eventually consistent transactions</a>, and to some more recent practical systems primitives like the <a href="https://github.com/facebook/rocksdb/wiki/merge-operator">Merge operator</a> in RocksDB.</p>
Tue, 14 Oct 2025 00:00:00 +0000
https://will62794.github.io/databases/transactions/isolation/2025/10/14/git-for-transactions.html
https://will62794.github.io/databases/transactions/isolation/2025/10/14/git-for-transactions.htmldatabasestransactionsisolationOn Writing, Specification, and Outputs<p>In Andy Grove’s <a href="https://www.goodreads.com/book/show/324750.High_Output_Management">High Output Management</a>, on his experiences from management at Intel, he makes a comment about the value of writing “reports” in a business or organizational setting:</p>
<!-- pg. 48 -->
<blockquote>
<p>But reports also have another totally different function. As they are formulated and written, the author is forced to be more precise than he might be verbally. Hence their value stems from the discipline and the thinking the writer is forced to impose upon himself as he identifies and deals with trouble spots in his presentation. Reports are more a <strong>medium</strong> of <strong>self-discipline</strong> than a way to communicate information. Writing the report is important; reading it often is not.</p>
</blockquote>
<p>This comment felt nicely portable to a core type of value derived from the use of formal methods in an engineering and design context, somewhat analogous to Leslie Lamport’s quip on writing and mathematics:</p>
<blockquote>
<p>Writing is nature’s way of letting you know how sloppy your thinking is…Mathematics is nature’s way of letting you know how sloppy your writing is.</p>
</blockquote>
<p>and one that has been <a href="https://www.youtube.com/watch?v=pnfrWPFWbAA">re-emphasized</a> by those across industry e.g. <a href="https://cacm.acm.org/practice/systems-correctness-practices-at-amazon-web-services/">by folks</a> recently at AWS:</p>
<blockquote>
<p>First, the act of deeply thinking about and formally writing down distributed protocols forces a structured way of thinking that leads to deeper insights about the structure of protocols and the problem to be solved.</p>
</blockquote>
<p>In the setting of Grove’s book, as its title suggests, a core theme is about measuring <em>outputs</em> of your work, not merely <em>activity</em>. For programmers or other type of lower-level individual contributors, outputs may be significantly easier to measure quantitatively e.g. lines of code written, features shipped, bugs fixed, etc. For managers (or, more generally, a broader class of modern knowledge workers or researchers), these outputs may be less tangible and harder to measure concretely.</p>
<p>A main idea of his framework is that, more abstractly, output can essentially be measured as the output of a team you manage and/or the output of the other people or teams in an organization that you influence. A related aspect of this framework is one of Grove’s alternate definitions of “manager”, which he calls a <em>know-how</em> manager:</p>
<!-- page 40 -->
<blockquote>
<p>If the manager is a knowledge specialist, a <em>know-how</em> manager, his potential for influencing neighboring organizations is enormous. The internal consultant who supplies needed insight to a group struggling with a problem will affect the work and the output of the entire group…Thus, the definition of a “manager” should be broadened: individual contributors, who gather and disseminate know-how and information should also be seen as middle managers, because they exert great power within the organization.</p>
</blockquote>
<p>This <em>know-how manager</em> concept is perhaps somewhat natural to organizations where individual engineers may hold significant influence even if they have not formally assumed a “management” role. It also reinforces the broader space of possible outputs one might produce. That is, if disseminating know-how or transferring knowledge to others is one way to increase output of an organization, we may consider the writing process in a similar view. Deepening and expanding one’s own understanding of a problem by writing (or specifying) is an activity that later acts to increase output of the organization when others may need to come to understand or extend that problem or system. If the activity of writing or specifying is a way to deepen the understanding or knowledge of a problem or domain, this should ultimately be valuable since it implies additional leverage in the future when this person serves as a core contributor of insights or knowledge on this domain that impacts many others in the organization.</p>
<p>Further applying this “output-oriented” framework to Grove’s first statement above would imply that a detailed written report (or a formal specification, analogously) may not necessarily be a direct <em>output</em> (if no-one reads it, how could it be?). Rather, the valuable output associated with writing a report may be decoupled from the written report itself. Instead, it is associated with the way it impacts the eventual output of a team, or organization. That is, the process of writing itself and the <em>understanding, clarity, and knowledge</em> gained in the process is more closely tied to output here, though less tangible and harder to measure concretely.</p>
<p>The ideas presented by Grove here also felt somewhat related and applicable to a <a href="https://youtu.be/vtIzMaLkCaM?feature=shared&t=1288">classic talk</a> on academic writing from Larry McEnerney, where he notes:</p>
<blockquote>
<p>In the real world you’re going to stop paying your readers to care about what’s inside of your head…You think writing is communicating ideas to your readers. It is not…It’s not conveying your ideas to your reader…It’s changing their ideas.</p>
</blockquote>
<p>So, in this setting (academic writing), we can consider a fundamental output as <em>changing of the reader’s ideas</em>. This framing abstracts the process away from the concrete written document itself. That is, the output of an academic researcher is not papers, strictly, but to what degree they can change the behavior or ideas of others who consume their work. I think this also serves as a decent proxy for the concept of <em>impact</em>, especially in modern academic research, where a published tool, benchmark, dataset, blog post, talk, lecture, or tweet may have equal if not more impact (e.g. effect on others ideas/behaviors) than a traditional academic paper.</p>
<p>I found Grove’s writing on this topic a helpful lens to think about the value of writing and particularly the use of formal methods and formal specification in an engineering context. Especially given that I have often been discouraged when discussions in this domain often search for justifications of value based on how well a formal specification can map to a running system implementation, or help test and validate that system implementation more effectively. Those are certainly valuable auxiliary goals, but I find it helpful to separate them from the other, core value in developing these kinds of specifications, which may often be quite abstract or difficult to measure, similar to writing a report that no-one ever reads.</p>
<!-- I also find the following a helpful thought exercise: if I wrote a detailed formal specification of a problem to the point of convincing myself of a detailed, rigorous problem statement and solution, and then threw away the specification entirely, would this still have been a valuable process for me and/or the broader organization? Similarly, I find it is often not always a useful framing to measure the success of a specification process in terms of effective long-term maintenance of that specification and its conformance with an underlying system implementation. That is, we wouldn't consider it a negative outcome of a design process if our written design document becomes out of sync with the system implementation and is not maintained in accordance over time. It is obvious and natural to expect this of design documents, so it should not be unexpected for specifications. -->
Fri, 19 Sep 2025 00:00:00 +0000
https://will62794.github.io/formal-methods/specification/writing/2025/09/19/on-writing-and-specification.html
https://will62794.github.io/formal-methods/specification/writing/2025/09/19/on-writing-and-specification.htmlformal-methodsspecificationwritingLogless Raft<p>The standard use of Raft is for implementing a fault tolerant, replicated state machine by means of a replicated <em>log</em>, maintained at each server within a replication group. Depending on the nature of the state we want to replicate, we can employ a simpler variant of Raft that achieves the same essential correctness properties. We can call this <em>logless Raft</em> and it can be useful when we are only replicating a single, small piece of state (e.g. configuration, metadata, etc.) between servers.</p>
<h3 id="simplifying-log-management">Simplifying Log Management</h3>
<p>There is a lot of machinery included in the standard descriptions of Raft related to the intricacies of replicating log entries between servers, recording the applied indices of the log on each server, etc. (e.g. <code class="language-plaintext highlighter-rouge">matchIndex</code>,<code class="language-plaintext highlighter-rouge">nextIndex</code>,<code class="language-plaintext highlighter-rouge">commitIndex</code>). There are also strategies for specifically dealing with clean up and garbage collection of stale, divergent logs, <a href="https://github.com/ongardie/raft.tla/blob/6ecbdbcf1bcde2910367cdfd67f31b0bae447ddd/raft.tla#L375-L382">handled</a> as part of the <code class="language-plaintext highlighter-rouge">AppendEntries</code> request/response flow.</p>
<div style="text-align: center">
<img src="/assets/logless-raft/raft-algo1.png" alt="Logless Raft Diagram" width="260" />
<img src="/assets/logless-raft/raft-algo2.png" alt="Logless Raft Diagram" width="260" />
</div>
<p>Most of this log and index management machinery is bookkeeping around what log entries a node (e.g. a leader) should send to other nodes (<code class="language-plaintext highlighter-rouge">nextIndex</code>), what entries other nodes have received so far (<code class="language-plaintext highlighter-rouge">matchIndex</code>), and which entries have been marked as committed (<code class="language-plaintext highlighter-rouge">commitIndex</code>). These details may be required from an implementation perspective, but from a protocol correctness perspective they are somewhat extraneous.</p>
<p>We can reduce this complexity with a variant of Raft that gets rid of the lower level implementation details around log index management and propagation of this information. Instead, Raft servers can send their <em>entire logs</em> to each other in each message. Receiving nodes can, based on their local log state and the log they received, determine which entries (if any) they can go ahead and append to their own log. Individual nodes no longer track any of the <code class="language-plaintext highlighter-rouge">nextIndex</code>/<code class="language-plaintext highlighter-rouge">matchIndex</code> bookkeeping variables, and the information flow between leaders/followers can also become more symmetric e.g. both can propagate their entire logs to each other as a way of communicating new updates or feedback about which log entries have been appended.</p>
<h3 id="log-merging-vs-log-replication">Log Merging vs. Log Replication</h3>
<p>In this model, both log append and log truncation operations, normally incremental processes that may occur via repeated rounds of <code class="language-plaintext highlighter-rouge">AppendEntries</code> messages from a leader, are subsumed into a single <em>log merge</em> operation. That is, when a node \(i\) receives a log from node \(j\), it determines whether it can install this incoming log based on certain conditions.</p>
<p>At a high level, these conditions can be expressed as a check whether a node \(i\)’s own log, \(log[i]\) is a prefix of \(log[j]\). If so, it is safe for the node to extend its log to the received log, by updating \(log[i]\) to the value of \(log[j]\). If \(log[i]\) is not a prefix of \(log[j]\), then it must check for a “staleness” or “divergence” condition, by comparing the last term of both logs. If \(i\)’s log has an older last term than \(log[j]\), then it is safe to replace \(log[i]\) with \(log[j]\). Otherwise, it is not safe to modify its own log.</p>
<p>In both cases, this “prefix” check can be implemented in Raft by simply comparing the last term of each log, similar to how logs are compared in standard vote requests in Raft. That is, if the terms of the last entry in each logs are the same, then the prefix check can be done by comparing log lengths, and otherwise, the check is done by comparing the terms of the last entry in each log, with newer terms taking precedence.</p>
<p>A simplified version of this Raft variant is defined in <a href="https://github.com/will62794/raft-logless/blob/main/AbstractRaft.tla">this TLA+ specification</a> (along with an <a href="https://will62794.github.io/spectacle/#!/home?specpath=https%3A%2F%2Fraw.githubusercontent.com%2Fwill62794%2Fraft-logless%2Frefs%2Fheads%2Fmain%2FAbstractRaft.tla&constants%5BServer%5D=%7Bs1%2Cs2%2Cs3%7D&constants%5BSecondary%5D=Secondary&constants%5BPrimary%5D=Primary&constants%5BNil%5D=Nil&constants%5BInitTerm%5D=0&constants%5BMaxTerm%5D=3&constants%5BMaxLogLen%5D=3&trace=318c702a">explorable version</a>). In that specification the <code class="language-plaintext highlighter-rouge">MergeEntries</code> action represents the key “log merge” operation, and encodes the log prefix checking rules for both append and/or garbage collection.</p>
<h3 id="a-closer-look-at-raft-log-structure">A Closer Look at Raft Log Structure</h3>
<p>We can gain some additional intuition on the above merging view with another, closer look at the way that logs are structured across nodes in classic Raft. Specifically, we can view the set of all node logs as forming a global <em>log tree</em> structure, where each node’s local log is a “view” on this global tree e.g. a local log can be seen as a path in this tree. Over time, new branches may be created or pruned from this tree (e.g. via log truncation), and nodes may sync their local logs to move back in sync with (newer) branches.</p>
<p>We can illustrate this more concretely if we look at a sample protocol behavior through this lens. The diagram below shows a behavior from the above TLA+ specification of the abstract variant of Raft with a configuration of 4 servers (<code class="language-plaintext highlighter-rouge">{n1,n2,n3,n4}</code>). The log tree structure shown is defined where nodes correspond to log entries (i.e. <code class="language-plaintext highlighter-rouge">(index,term)</code> pairs) and edges correspond to adjacent log entries in some given log across any node. The log tree is also annotated with each node’s current “position” in the tree i.e. the log entry that corresponds to their current last log entry (nodes with an empty log are simply omitted in those annotations), and entries marked as committed are highlighted in green. A special “root” node in gray denotes an empty log, the initial state for all nodes.</p>
<table style="margin: 0 auto;min-width: 780px;">
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 0</b>: Initial State</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_0.png" alt="Logless Raft Diagram" height="50.160000000000004" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 1</b>: BecomeLeader(n1, ['n1', 'n2', 'n3'])</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_1.png" alt="Logless Raft Diagram" height="50.160000000000004" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 2</b>: ClientRequest(n1)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_2.png" alt="Logless Raft Diagram" height="50.160000000000004" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 3</b>: ClientRequest(n1)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_3.png" alt="Logless Raft Diagram" height="50.160000000000004" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 4</b>: ClientRequest(n1)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_4.png" alt="Logless Raft Diagram" height="50.160000000000004" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 5</b>: MergeEntries(n2, n1)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_5.png" alt="Logless Raft Diagram" height="50.160000000000004" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 6</b>: MergeEntries(n3, n1)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_6.png" alt="Logless Raft Diagram" height="50.160000000000004" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 7</b>: CommitEntry(n1, ['n1', 'n2', 'n3'])</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_7.png" alt="Logless Raft Diagram" height="50.160000000000004" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 8</b>: ClientRequest(n1)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_8.png" alt="Logless Raft Diagram" height="50.160000000000004" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 9</b>: BecomeLeader(n2, ['n2', 'n3', 'n4'])</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_9.png" alt="Logless Raft Diagram" height="50.160000000000004" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 10</b>: ClientRequest(n2)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_10.png" alt="Logless Raft Diagram" height="109.78" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 11</b>: MergeEntries(n3, n2)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_11.png" alt="Logless Raft Diagram" height="109.78" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 12</b>: MergeEntries(n4, n2)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_12.png" alt="Logless Raft Diagram" height="109.78" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 13</b>: CommitEntry(n2, ['n2', 'n3', 'n4'])</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_13.png" alt="Logless Raft Diagram" height="109.78" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 14</b>: ClientRequest(n2)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_14.png" alt="Logless Raft Diagram" height="109.78" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 15</b>: BecomeLeader(n3, ['n1', 'n3', 'n4'])</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_15.png" alt="Logless Raft Diagram" height="109.78" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 16</b>: ClientRequest(n3)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_16.png" alt="Logless Raft Diagram" height="139.04" />
</td>
</tr>
<tr>
<td style="text-align: left; font-size: 12px;padding: 15px"><b>State 17</b>: MergeEntries(n1, n2)</td>
<td style="text-align: left;padding: 18px">
<img src="/assets/logless-raft/imgs/log_tree_17.png" alt="Logless Raft Diagram" height="109.78" />
</td>
</tr>
</table>
<!-- <div style="text-align: center">
<img src="/assets/logless-raft/log_tree_filmstrip.png" alt="Logless Raft Diagram" width="380">
</div> -->
<p>When a new leader gets elected, a “fork” may be created in this tree, if the new leader did not contain all previously created (but uncommitted) log entries. For example, this first occurs in State 10, when node <code class="language-plaintext highlighter-rouge">n2</code> has become leader and written a new entry but without the log entry <code class="language-plaintext highlighter-rouge">(4,1)</code> created by <code class="language-plaintext highlighter-rouge">n1</code>. Similarly, another fork is created when a branch via <code class="language-plaintext highlighter-rouge">n3</code> is created in State 16.</p>
<p>Note also that local log “pointers” move along paths in this tree as new logs are replicated or “merged” around. For example, in State 10 to State 11 transition, <code class="language-plaintext highlighter-rouge">n3</code> replicates the log from <code class="language-plaintext highlighter-rouge">n2</code>, and so moves its pointer in the tree ahead to entry <code class="language-plaintext highlighter-rouge">(4,2)</code>. Note also that due to the key “log matching” property that is maintained in Raft, <code class="language-plaintext highlighter-rouge">(index, term)</code> pairs should identify unique prefixes/paths within this tree.</p>
<p>Pruning of branches in this tree also occurs when a node with an old/stale node merges its log with a newer log. In standard Raft, this pruning will also occur, but typically occur in stages e.g. first as a node truncates its log, and then replicates new entries to come into alignment with an up to date branch. For example, in State 17, <code class="language-plaintext highlighter-rouge">n1</code> has merged itself onto the newer branch of <code class="language-plaintext highlighter-rouge">n2</code>, pruning its older, stale branch ending in entry <code class="language-plaintext highlighter-rouge">(4,1)</code>.</p>
<p>This perspective on Raft logs helps to provide intuition on the “merging” strategy we outlined above. Local logs can be seen as views or paths in this global tree structure, and replication of logs between nodes can be viewed as a way of bringing divergent branches back in sync and replicating a branch to a sufficient number of nodes to ensure safe commit.
Note that this <a href="https://decentralizedthoughts.github.io/2021-07-17-simplifying-raft-with-chaining/">blog post</a> on chaining in Raft puts forth similar perspectives, partially through the lens of blockchain protocols.</p>
<h3 id="going-logless">Going Logless</h3>
<p>In this abstract, “merging” based variant of the Raft, lower level log management operations have been abstracted away. That is, the entire log is a monolithic piece of state that is replicated around between nodes in one shot, and we only care about some notion of logical “ordering” between two different logs, which is determined by the “last term” ordering condition described above.</p>
<p>In this monolithic log model, we only care about comparison between the end of each log. So, it is relatively straightforward to see that we can view such a protocol as simply storing a “rolled up” log at each node i.e. storing the full piece of state that corresponds to application of all entires in a log, tagged by the “(last index, last term)” of that log. When we propagate around logs, we don’t actually need to store the whole log, but only the state corresponding to application of that log’s entries. And we can easily compare two pieces of this state by simply comparing the tagged index/term values.</p>
<!-- With the key property that we prefer nodes to accept only logically "newer" logs. This can be modeled by simply viewing the log as an arbitrary piece of state that is tagged with a version/index number, which is increased every time a new version is created. -->
<p>From this perspective, we can now imagine a variant of Raft that stores some arbitrary piece of state, which gets updated “in-place” via client operations at a leader node. This state is propagated to followers via messages that contain the entire state, and they decide whether to install the newer state or not based on this simple “merging” logic which does this logical comparison in version between their own local state and the state they received. When we write a new entry down on a leader, we can simply update that state in-place and increment the “index” (perhaps more appropriately, can be called an object “version”) for the local state.</p>
<h3 id="related-work">Related Work</h3>
<p>There are many other versions of logless or “register” style consensus algorithms. Recent proposals like <a href="https://arxiv.org/abs/1802.07000">CASPaxos</a> and <a href="https://arxiv.org/pdf/2001.03362">RMWPaxos</a> try to do something similar for Paxos-based systems, and there is also a history of literature on “<a href="https://groups.csail.mit.edu/tds/papers/Lynch/FTCS97.pdf">atomic registers</a>”, implementing this type of primitive in a distributed fashion. This <a href="https://distributedthoughts.com/2017/03/27/log-less-consensus/">post</a> from the author of <a href="https://arxiv.org/pdf/1702.04242">Bizur</a> also discusses similar ideas.</p>
<p>I haven’t seen this logless variation specifically appear in the context of a Raft-based protocol, though it is essentially similar to the ideas employed in the design of a <a href="https://drops.dagstuhl.de/entities/document/10.4230/LIPIcs.OPODIS.2021.26">new reconfiguration protocol</a> within MongoDB’s Raft-based consensus system. It is also somewhat informative to derive this logless variant through a series of relatively straightforward modifications to standard, “log-based” versions of Raft.</p>
Mon, 25 Aug 2025 00:00:00 +0000
https://will62794.github.io/distributed-systems/consensus/2025/08/25/logless-raft.html
https://will62794.github.io/distributed-systems/consensus/2025/08/25/logless-raft.htmldistributed-systemsconsensusSimple Serializable Snapshot Isolation<p>In <a href="https://arxiv.org/abs/2405.18393"><em>A Critique of Snapshot Isolation</em></a>, published in EuroSys 2012, they present <em>write-snapshot isolation</em>, a simple but clever approach to making snapshot isolation serializable. This work was published a few years after Michael Cahill’s original work on <a href="https://courses.cs.washington.edu/courses/cse444/08au/544M/READING-LIST/fekete-sigmod2008.pdf"><em>Serializable Snapshot Isolation</em></a> (TODS 2009), and around a similar time as the work of Dan Ports on <a href="https://dl.acm.org/doi/10.14778/2367502.2367523">implementing serializable snapshot isolation</a> (VLDB 2012), which applied Cahill’s ideas in PostgreSQL.</p>
<p>At the highest level, the idea of this paper is that instead of detecting and aborting “write-write” conflicts, as is done in <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/tr-95-51.pdf">classic snapshot isolation</a>, it is sufficient to guarantee serializability by instead detecting and preventing “read-write” conflicts. That is, a conflict where one transaction writes to a key that is read by another concurrent transaction. They also show that, at least for some workloads, there is no significant fundamental concurrency/performance impact of this approach vs. snapshot isolation.</p>
<h2 id="snapshot-isolation">Snapshot Isolation</h2>
<p>Classic snapshot isolation ensures that each transaction observes a consistent snapshot of the database, and prevents conflicting writes by concurrent transactions. There are standard lock-based and lock-free implementations of SI, which both basically rely on assignment of a “read” and “commit” timestamp to each transaction. That is, a centralized <em>timestamp oracle</em> is used to assign timestamps for ordering transactions. For a transaction \(T_i\) with read timestamp \(T_s(T_i)\), it will read the latest version of data with commit timestamp \(\delta < T_s(T_i)\). Two transactions conflict if they (1) write to the same row \(r\) and (2) they have temporal overlap: \(T_s(T_i) < T_c(T_j)\) and \(T_s(T_j) < T_c(T_i)\) (i.e. their read and commit timestamp spans overlap).</p>
<div style="text-align: center">
<img src="/assets/diagrams/critique-of-si/classic-si.png" alt="Write-snapshot isolation lock-free algorithm" width="400px" />
</div>
<p>Google’s <a href="https://www.usenix.org/legacy/event/osdi10/tech/full_papers/Peng.pdf">Percolator system</a> implemented a standard lock-based implementation of SI, which adds <em>lock</em> and <em>write</em> columns, where the <em>write</em> column maintains the commit timestamp. Basically, it runs a 2PC algorithm, and updates the <em>lock</em> column on all modified rows during a first phase of 2PC. If a transaction tries to write into a locked item it may either wait, abort, or force the transaction holding that lock to abort. In second phase of 2PC, the data is then updated with the commit timestamp and the locks removed. Slow or failed transactions that are holding locks, though, may prevent others from making progress.</p>
<p>A basic lock-free implementation of snapshot isolation can be done using a centralized oracle, that is responsible for receiving commit requests from all transactions and checking for conflicts.</p>
<div style="text-align: center">
<img src="/assets/diagrams/critique-of-si/lock-free-si.png" alt="Write-snapshot isolation lock-free algorithm" width="480px" />
</div>
<p>This algorithm checks, for each row modified by a transaction, \(R\), whether there is temporal overlap with any other transaction on that row i.e. has any other transaction concurrently written to it. If so, the transaction must be aborted. Otherwise, it is assigned a new commit timestamp and allowed to commit, marking each of its modified rows with the newly chosen commit timestamp.</p>
<h2 id="serializability">Serializability</h2>
<p>The paper first examines the question of what role write-write conflicts play in snapshot isolation and serializability. The most standard example of non-serializable snapshot isolation histories are those containing <em><a href="https://jepsen.io/consistency/phenomena/a5b">write skew</a></em> anomalies, where transactions don’t write to conflicting keys, but may update keys in a way that violates some global constraint.</p>
<p>They note, though, that aborting transactions on write-write conflicts is also overly restrictive in some ways i.e. transactions will be aborted in some cases even if no serialization anomaly would manifest. They consider a modified variant of the <em><a href="https://jepsen.io/consistency/phenomena/p4">lost update</a></em> anomaly, like</p>
\[r_1(x) \, \, w_2(x) \, \, w_1(x) \, \, c_1 \, \, c_2\]
<p>Standard write-write conflict checks will abort one of these transactions unnecessarily, since a lost update anomaly won’t actually manifest here.
As they summarize:</p>
<blockquote>
<p>In other words, write-write conflict avoidance of snapshot isolation, besides allowing some histories that are not serializable, unnecessarily lowers the concurrency of transactions by preventing some valid, serializable histories.</p>
</blockquote>
<h2 id="making-snapshot-isolation-serializable">Making Snapshot Isolation Serializable</h2>
<p>Instead of detecting write-write conflicts of concurrent transactions, as done under classic snapshot isolation, they introduce <em>write-snapshot isolation</em> (WSI), which instead detects and aborts <em>read-write</em> conflicts. They state the conflict conditions more formally as:</p>
<ol>
<li>RW-spatial overlap: \(T_j\) writes into row \(r\) and \(T_i\) reads from row \(r\);</li>
<li>RW-temporal overlap: \(T_s(T_i) < T_c(T_j) < T_c(T_i)\).</li>
</ol>
<p>Essentially, if a transaction \(T_j\) is concurrent with \(T_i\) and writes a key \(k\) that \(T_i\) reads from, this is manifested as a conflict and \(T_i\) must be prevented from committing. Most importantly, write-snapshot isolation is sufficient to strengthen snapshot isolation to be fully serializable.</p>
<div style="text-align: center">
<img src="/assets/diagrams/critique-of-si/write-si-diagram.png" alt="Write-snapshot isolation lock-free algorithm" width="380px" />
</div>
<!-- ### Read-Only Transactions -->
<p>They also point out that the simple condition of checking for read-write conflicts is not quite precise enough, and would, by default, lead to unnecessary aborts of read-only transactions. Read-only transactions needn’t abort even if they fall into the conflict detection condition for write-snapshot isolation, since they don’t affect the values read by other transactions, and/or concurrent transactions as well.</p>
<p>They prove that write-snapshot isolation is serializable, by basically showing that you can use commit timestamps of transactions for a serial ordering, and that read-write conflict detection is sufficient to ensure that all transaction reads would be equivalent to those read in a serial history, since they are not allowed to proceed if they conflict with a concurrent write into their read set. And, similarly, the output of writes from each transaction is maintained and respects the commit timestamp ordering.</p>
<p>They present a lock-free implementation of write-snapshot isolation, which augments the classic SI approach by recording both the read sets \(R_w\) and write sets \(R_r\) of each transaction that is used upon transaction commit at an “oracle”.</p>
<div style="text-align: center">
<img src="/assets/diagrams/critique-of-si/write-si-lock-free-algo" alt="Write-snapshot isolation lock-free algorithm" width="450px" />
</div>
<p>This is a nice idea since it is mostly the same as write-write conflict detection of SI, just a bit generalized to handle reads as well as writes. Marc Brooker makes some similar observations in a related <a href="https://brooker.co.za/blog/2024/12/17/occ-and-isolation.html">blog post</a>.</p>
<h2 id="performance">Performance</h2>
<p>Their approach raises the question of how different classic SI is from write-snapshot isolation in terms of histories that are allowed or proscribed. Intuitively, it doesn’t seem that there would be something inherently more restrictive about the prevention of read-write conflicts vs. write-write conflicts.</p>
<p>They compare the concurrency level offered by a centralized, lock-free implementation of write-snapshot isolation with that of <a href="https://dl.acm.org/doi/10.1109/DSNW.2011.5958809">standard snapshot isolation implementation</a>. They implemented both snapshot isolation and write-snapshot isolation in HBase (an open-source clone of <a href="https://static.googleusercontent.com/media/research.google.com/en//archive/bigtable-osdi06.pdf">BigTable</a>) to test this. Overall, they test a YCSB workload variant with both normally distributed and zipfian (modeling case where some items are extremely popular) row selection, and find that essentially there is minimal performance difference between the two, at least for these (somewhat artificial) workloads.</p>
<!-- ## Comparison with Other Approahces -->
<div style="text-align: center">
<img src="/assets/diagrams/critique-of-si/eval1.png" alt="Write-snapshot isolation lock-free algorithm" width="810px" />
</div>
<p>They find similar results for abort rate comparison between SI and WSI, with the latter being slightly higher, but the overall difference is negligible.</p>
<div style="text-align: center">
<img src="/assets/diagrams/critique-of-si/eval2.png" alt="Write-snapshot isolation lock-free algorithm" width="810px" />
</div>
<h2 id="concluding-thoughts">Concluding Thoughts</h2>
<p>Overall, this paper provides a nice perspective on snapshot isolation in general, and a re-consideration of its underlying assumptions. One takeaway is a reinforcement of the somewhat arbitrary delineations between isolation levels. For example, snapshot isolation is intuitive in some ways i.e. every transaction reads from a consistent snapshot, but the notion of write-write conflicts is somewhat arbitrary. This paper kind of sheds light on that by showing that, in some sense, read-write conflicts are actually the more “natural” type of conflict you would care about, at least in the sense that they give you a more fundamental guarantee i.e. serializability. I’m not sure that the specific anomalies allowed under SI (i.e. write skew) are fundamentally intuitive in any way.</p>
<!-- Furthermore, it also helps provide some insight on the question of basically why a set of given transactions as executed by a databases will or will not satisfy serializability. Read-write dependencies seem to capture this more fundamentally. In other words, if a set of transaction execute concurrently, serializability issues arise if there is some way that the execution of those transactions will not be equivalent to executing them sequentially. And, this will only occur when there are some types of dependencies between these transactions. Namely, when one transaction reads from a key that another transaction writes to. Read-write dependencies are the somewhat natural dependency category you care about when considering serializability anomalies (at least at levels with snapshot read guarantees). -->
<p>Note that <a href="http://pmg.csail.mit.edu/papers/adya-phd.pdf">Adya style formalisms</a> and considerations of these type of anomalies center around the concept of <em>anti-dependencies</em> and their appearance in cycles (e.g. the <a href="https://jepsen.io/consistency/phenomena/g2">G2 anomaly class</a>). Adya defines an anti-dependency for a transaction that writes a newer version of a value read by another transaction. Cahill’s <a href="https://dl.acm.org/doi/10.1145/1620585.1620587">work on serializable snapshot isolation</a> builds on <a href="https://dsf.berkeley.edu/cs286/papers/ssi-tods2005.pdf">earlier results from Fekete</a> (TODS 2005), which showed a result that any non-serializable SI history must contain a cycle with 2 consecutive anti-dependency edges, and furthermore, that each of these edges involves two transactions that are active concurrently. For example, in the classic write skew anomaly, such a cycle exists with just two transactions, each with a mutual anti-dependency on the other, satisfying Fekete’s condition. Cahill’s technique basically tracks incoming and outgoing \(rw\) dependencies. This bears similarities to the global approach of checking read set / write set conflicts between transactions, as done in WSI, but using per-transaction metadata.</p>
<!-- They do note, however, that -->
<!-- Adya defines anti-dependency for a transaction that writes a newer version of a value read by another transaction. -->
<!--
I think it's useful to think about a starting point of something like [RAMP transactions](http://www.bailis.org/papers/ramp-sigmod2014.pdf), which essentially provide snapshot-based reads of classic snapshot isolation, but with no write-write conflicts. Classic SI augements this with write-write conflicts, whereas WSI -->
<p>Some of the ideas from this paper have been more directly implemented in <a href="https://hypermode.com/blog/badger-txn">transactional systems like Badger</a>, and are similar to how serializable transactions are <a href="https://dl.acm.org/doi/pdf/10.1145/3318464.3386134">implemented in CockroachDB</a>.</p>
Tue, 13 May 2025 00:00:00 +0000
https://will62794.github.io/databases/transactions/isolation/2025/05/13/simple-serializable-snapshot-isolation.html
https://will62794.github.io/databases/transactions/isolation/2025/05/13/simple-serializable-snapshot-isolation.htmldatabasestransactionsisolationTransactions as Transformers<p>Database transactions are traditionally modeled as a sequence of read/write operations on a set of keys, where each read operation returns some value and each write sets a key to some value. This is reflected in most of the formalisms that define various transactional isolation semantics (<a href="https://pmg.csail.mit.edu/papers/icde00.pdf">Adya</a>, <a href="https://www.cs.cornell.edu/lorenzo/papers/Crooks17Seeing.pdf">Crooks</a>, etc.).</p>
<p>For many isolation levels used in practice in modern database systems, (e.g. snapshot isolation or above), we can alternatively consider viewing transactions as <em>state transformers</em>. That is, instead of a lower-level sequence of read/write operations, a transaction can be viewed as a function that takes in a current state, and returns a set of modifications to a subset of database keys, based on values in the current state that it read. This view is not fully general in its applicability to all isolation levels (e.g. read committed), but we can explore this perspective and how it simplifies various aspects of reasoning about existing isolation levels and their anomalies.</p>
<p><!-- this may not be the best model, and leads to some unnecessary confusion and complexity. --></p>
<h3 id="state-transformer-model">State Transformer Model</h3>
<p>Most standard formalisms represent a transaction as a sequence of read/write operations over a subset of some fixed set of database keys and values e.g</p>
\[T:
\begin{cases}
&r(x,v_0) \\
&r(y, v_1) \\
&w(x, v_2) \\
&r(z, v_0)
\end{cases}\]
<p>For transactions operating at isolation levels that read from a consistent database snapshot, though (e.g. <a href="https://drops.dagstuhl.de/storage/00lipics/lipics-vol042-concur2015/LIPIcs.CONCUR.2015.58/LIPIcs.CONCUR.2015.58.pdf">Read Atomic</a> and stronger), we can think about transactions as more cleanly partitioned between a “read phase” and “update phase”. That is, we can consider the “input” of a transaction as the subset of keys it reads from its snapshot, and its “output” as writes to some subset of keys, each of which, at most, can depend on some subset of keys that were read from that transaction’s snapshot. In other words, at levels like snapshot isolation, although a transaction seems to pass through many stages, doing reads and writes internally, we can compact these stages down into a single read and update phase.</p>
<p>We can formalize this idea into the view of transactions as <em>state transformers</em>. For example, for a database with a key set \(\mathcal{K}=\{x,y,z\}\), we can consider an example of a transaction modeled in this way:</p>
\[T:
\begin{cases}
&\mathcal{R}=\{x,y\} \\
&x' = f_x(y,z) \\
&y' = f_y()
\end{cases}\]
<p>In this representation, \(\mathcal{R}=\{x,y\}\) is the set of keys read by the transaction upfront, and each \(f_k\) is a <em>key transformer</em> function i.e. a pure function describing the updates that get applied to each key \(k\) that is being updated by that transaction. Each such function can optionally depend on the values read from the current snapshot state for that transaction. We can refer to the set of keys passed as arguments to a key transformer as the <em>update dependencies</em> of a key.</p>
<p>Note that we separate transactions into “read” and “update” portions, where the read-only phase is considered to happen upfront and may or may not be over a different subset of keys than exist in the update dependencies of the key transformers. We might choose a different model where transactions only consist of the key transformer functions, but the above allows us to also include read-only transactions in our model.</p>
<!-- More formally, we simply define a transaction as a set of key transformer functions $$F_\mathcal{W_T}$$, where $$\mathcal{W_T} \subseteq \mathcal{K}$$ is the set of keys that the transaction updates, and each transformer function $$f_k$$ for $$k \in \mathcal{W_T}$$ is a function over some subset of key dependencies $$\mathcal{D_k} \subseteq \mathcal{K}$$. -->
<h3 id="isolation-anomalies-and-lost-update">Isolation Anomalies and Lost Update</h3>
<p>Viewing transaction operations as being composed of these key transformer functions i.e. functions that read some values and produce some writes, helps clarify some awkward aspects of existing transaction isolation models and their treatment of anomalies. Particularly due to the fact that most traditional transaction formalisms don’t make these kind of update/transformer operations an explicit first-class member of their formal model.</p>
<p>For example, we can consider some standard treatments of the <em>lost update</em> anomaly. In the <a href="https://software.imdea.org/~andrea.cerone/works/Framework.pdf">Cerone 2015</a> framework, they represent transactions as sequences of read/write operations over a set of keys.</p>
<div style="text-align: center">
<img src="/assets/diagrams/txn-transformers/cerone-defs-new.png" alt="Transaction Isolation Models" width="710" style="border: 1px solid gray;padding: 5px;margin-bottom: 5px;" />
</div>
<p>When they define the <em>lost update</em> anomaly in their model, though, it sort of requires skirting the issue a bit by resorting to a notion of “application code” that could have produced this sequence of writes:</p>
<div style="text-align: center">
<img src="/assets/diagrams/txn-transformers/cerone-lost-update-explanation.png" alt="Transaction Isolation Models" width="700" style="border: 1px solid gray;padding: 5px;margin-bottom: 25px;" />
</div>
<div style="text-align: center">
<img src="/assets/diagrams/txn-transformers/cerone-lost-update.png" alt="Transaction Isolation Models" width="410" style="border: 1px solid gray;padding: 2px;" />
</div>
<p>This is common across many <a href="https://www.cs.umb.edu/~poneil/ROAnom.pdf">other descriptions of the <em>lost update</em> anomaly</a>. One unsatisfying aspect of these descriptions of lost update is that the underlying formalism doesn’t seem to do an accurate job at capturing the underlying reason for the anomaly occurring. In addition, lost update is presented often as the canonical anomaly that “write-write” conflicts in snapshot isolation (SI) exist to prevent. But, if we removed the reads from the \(T_1\) and \(T_2\) in the above example, this SI approach to preventing lost updates would need to still abort one of the transactions, but it’s not really clear why that is necessary, since the notion of “lost update” goes away when both transactions are doing only writes, albeit to the same key.</p>
<p><!-- along with the standard explanation that snapshot isolation specifically prevents lost update anomalies by preventing write-write conflicts between concurrent transactions. These explanations can be somewhat unsatisfying, though, and the formal representation of these issues don't do a great job of illustrating the general, underlying issue at play. --></p>
<!-- In the base formalism, though, it's not quite clear exactly why exactlywe consider the above example an anomaly, and why abort of write-write conflicts necessarily remedies the issue. For example, given two write-only transactions that write to the same key, basic snapshot isolation approaches will abort one of them, but this clearly won't cause an anomaly, and it seems confusing as to why such writes would need to be aborted in the first place. -->
<p>One view is that anomalies like <em>lost update</em> (which are the specific anomaly which write-write conflicts in snapshot isolation are supposed to prevent), are fundamentally unnatural to express without resorting to some model that can take into account the true “update” semantics (e.g. read-write dependencies) between transactions. In other words, the underlying problem of “lost update” arises due specifically to a case where a write is done that is dependent on a value that was read in that transaction. Most formalisms don’t make this “write that depends on a read” semantics explicit, though, and so resort to a kind of vague notion of “application code” that might have done such updates.</p>
<!-- In other words, if two transactions conflict by writing to the same key, what's the problem? One of them will commit after the other, and the database state will then reflect this as it should, and from an external observer's perspective (i.e. another transaction), this is no different than if the two transactions had executed in some serial order. These type of anomalies only "make sense" by resorting to some vague higher level "application code" notion. What we really care about is whether the value of that write was computed based specifically on the values that it read. Most existing formalisms don't make this explicit, and just kind of gloss over it with the mention of "application code". -->
<p>In the state transformer model, we might say that a more precise definition of <em>lost update</em> is the case where two transactions \(T_1\) and \(T_2\) update the same key \(k\) via key transformers \(f^{T_1}_x\) and \(f^{T_2}_x\) <em>and</em> \(k\) is a dependency of one of these transformer functions e.g.</p>
\[\begin{aligned}
f^{T_1}_x(x) = x + 1 \\
f^{T_2}_x(x) = x + 3
\end{aligned}\]
<p>That is, a lost update is a problem specifically due to the read-write dependency that exists between the two transactions. This creates a potential serializability anomaly since, if you execute two transactions with transformers as in the above example, the order of these transactions matters for the final outcome, since they incur a semantic (read-write) dependency on each other. That is, if they both execute on the same data snapshot and are allowed to commit, the result will be semantically incorrect i.e. you really have “lost” one of the updates, since the outcome will be either \(x=1\) or \(x=3\), but not \(x=4\) as it should be (assuming \(x=0\) in the shared snapshot).</p>
<p>Similarly, such an anomaly can also arise with a different dependency structure e.g.</p>
\[\begin{aligned}
f^{T_1}_x&() = 6 \\
f^{T_2}_x&(x) = x + 3
\end{aligned}\]
<p>In this case, the order of execution <em>can</em> matter, but if these transactions are concurrent and \(T_1\)’s write were allowed to “win”, then we end up in the state \(x=6\), which is equivalent to a scenario where the transactions executed serially with \(T_2\) going first. If \(T_2\)’s write “wins”, though, then we end up in a state where \(x=3\) which is not equivalent to either serial execution of these transactions, which produces either \(x=9\) or \(x=6\).</p>
<p>Finally, we can also have a case where both transactions perform “blind” writes to the same key, incurring no dependency on each other, e.g.</p>
\[\begin{aligned}
f^{T_1}_x() &= 6 \\
f^{T_2}_x() &= 3
\end{aligned}\]
<p>In this case, no true lost update anomaly can manifest, since the resulting state after commit of both transactions will always be equivalent to their execution in some sequential order. Essentially, existing transaction formalisms can be seen as behaving as this case i.e. where all key transformers have no key dependencies. That is, they always write “constant” values i.e. those that are not dependent on any values read by the transaction.
<!-- This is the case because a semantic notion of "dependence" is not explicitly representable in most of these formalisms. --></p>
<!-- In such a world, we might argue that "lost update" isn't a "true" anomaly at all, since if two transactions conflict by writing to the same key, what's the problem? One of them will commit after the other, and the database state will then reflect this as it should, and from an external observer's perspective (i.e. another transaction), this is no different than if the two transactions had executed in some serial order. -->
<h3 id="write-skew-and-a-generalized-view-of-anomalies">Write Skew and a Generalized View of Anomalies</h3>
<p>This transformer model also gives us a way to see that <em>lost update</em> can really be seen as a special case of a more general class of anomalies. In particular, two transactions writing to the same key is not a fundamental condition for this class of anomalies, it just occurs in the <em>lost update</em> special case.</p>
<p>For example, we can also consider <em>write skew</em> within this framework, the canonical anomaly permitted under snapshot isolation. The classic write skew example manifests when two transactions don’t write to intersecting key sets, but they both update keys in a way that may break some external “semantic” constraint. As illustrated again in Cerone by example:</p>
<div style="text-align: center">
<img src="/assets/diagrams/txn-transformers/cerone-write-skew.png" alt="Transaction Isolation Models" width="710" style="border: 1px solid gray;padding: 2px;" />
</div>
<p>We can represent this example in the state transformer model as something like:</p>
\[T_1: \quad
\begin{aligned}
f_x(x,y) &= \text{if } (x + y) > 100 \text{ then } (x - 100) \text{ else } x \\
\end{aligned}\]
\[T_2: \quad
\begin{aligned}
f_y(x,y) &= \text{if } (x + y) > 100 \text{ then } (y - 100) \text{ else } y
\end{aligned}\]
<p>In this case, even though transactions \(T_1\) and \(T_2\) write to disjoint keys, the key transformers in each transaction depend on both keys, \(x\) and \(y\), based on their conditional update logic.
Again, the core problem here arises due to the read-write dependencies between these transactions i.e. the writes of one transaction affect the update dependency key set of the writes (i.e. key transformers) of the other. Thus, their order of execution matters, and so the resulting state will not be equivalent to some serial execution.</p>
<p>When viewed in this perspective, it is clearer to understand <em>lost update</em> and <em>write skew</em> as special cases of a more general class of anomalies that can arise when there are data dependencies between the <em>write set</em> of a transaction and the <em>update dependency set</em> of another transaction. This provides a more general view of this type of anomaly e.g. these arise when there exists some read-write dependencies between a set of transactions.</p>
<p>For example, we can also consider subtle variations on the standard write-skew example:</p>
\[\begin{aligned}
T_1: \quad &f_x(y) = y - 50 \\ \\
T_2: \quad &f_y(x,y) = y + x
\end{aligned}\]
<p>This isn’t quite the same as the classical write skew constraint violation example, but it can still cause a serialization anomaly. For example, from a starting state where \((x=100, y=200)\), executing both transactions concurrently (i.e. against the same snapshot) ends us in a state where</p>
\[(x=150, y=300)\]
<p>whereas serial execution gives us either</p>
\[(x=150, y=350)_{T_1 \rightarrow T_2}\]
\[(x=250, y=300)_{T_2 \rightarrow T_1}\]
<!-- From one perspective, we could say this is simply the case that occurs when the "read sets" and "write sets" of transactions intersect. This is not fully accurate, though. -->
<p>The state transformer model also sheds light on various quirks of default transaction isolation definitions and implementations that appear somewhat confusing or ad hoc when you examine them in more detail.
For example, in classic snapshot isolation, write-conflicts force two transactions to conflict and one to abort if they both write to the same key. In theory, this behavior exists to prevent lost update anomalies, as mentioned above. In reality, though, this is a very coarse-grained and conservative way of trying to prevent these anomalies. Really, what we care about is aborting a write that <em>depended</em> on a value read that was written by another transaction. Aborting write conflicts is just one way to prevent the particular “lost update” special case (i.e. when two transactions directly write to the same key), doing nothing to handle write skew and the more general class of update-based anomalies. Moreover, it is done in an overly conservative way e.g. if two transactions do no reads but write to the same key, there is no strict need to abort either of them, though one of them must be aborted under most snapshot isolation implementations.</p>
<p>Similarly, there are related workarounds in other models where a basic notion of “read set”/”write set” conflicts is not precise enough. For example, in <a href="https://arxiv.org/abs/2405.18393"><em>A Critique of Snapshot Isolation</em></a>, they make the simple but clever observation that detecting <em>read-write</em> conflicts (rather than <em>write-write</em>) is sufficient to make snapshot isolation serializable. But, even here, they have to add a particular special case for read-only transactions e.g.</p>
<blockquote>
<p>Plainly, since a read-only transaction does not perform any writes, it does not affect the values read by other transactions, and therefore does not affect the concurrent transactions as well. Because the reads in both snapshot isolation and write-snapshot isolation are performed on a fixed snapshot of the database that is determined by the transaction start timestamp, the return value of a read operation is always the same, independent of the real time that the read is executed. Hence, a read-only transaction is not affected by concurrent transactions and intuitively does not have to be aborted….In other words, the read-only transactions are not checked for conflicts and hence never abort.</p>
</blockquote>
<p>When viewed in the state transformer model, we can see why read-only transactions are not needed to be checked for conflicts, since none of these reads are used in an update dependency key set. Anomalies of this class only arise when you take into account reads that are used as a part of a true “update”, which most models just don’t explicitly represent.</p>
<p>Another related aspect is that the transformer based view can theoretically assist in finer-grained conflict analysis between transactions. For example, in the <em>write-snapshot</em> isolation approach, any transactions that do writes may be prone to abort due to a read-write conflict (i.e. they read a key that was written by a concurrent transaction). But, if transactions have very large read sets, this can make them very prone to aborts, since their read conflict surface area is large. But, we should really only need to be concerned with those keys that are read <em>and</em> used in some update dependency set in a key transformer of that transaction e.g. it is possible we may read 1000s of keys in a transaction, but only a small number of these keys are used in an update dependency.</p>
<!--
$$
T_1: \quad
\begin{aligned}
f_x(x,y) &= \text{if } (x + y) > 100 \text{ then } (x - 100) \text{ else } x \\
\end{aligned}
$$
$$
T_2: \quad
\begin{aligned}
f_y(x,y) &= \text{if } (x + y) > 100 \text{ then } (y - 100) \text{ else } y
\end{aligned}
$$ -->
<p>The state transformer model helps to see many of these special anomaly cases in a unified way e.g. two write-only transactions that conflict in the transformer model can be understood as both having transformer functions with empty key dependency sets, so no conflict manifests by the conflict rules of this model. Similarly, for read-only transactions i.e. if you do reads but none of these reads are actually used as dependencies of a key transformer function, no conflict needs to be manifested.</p>
<p>Overall, given a set of transactions that may be executing concurrently, we can say that, in our state transformer model, some serialization anomalies (and therefore conflicts) may arise if there exists dependencies between the write set of a transaction and the key transformer dependency set of another transaction. This serves as a way to generalize both lost update and write skew anomalies into a broader, unified class of anomalies. Furthermore, it becomes clear that special cases like “write conflicts” or “lost updates” are not fundamentally related in any way to “transactions writing to the same key”, but rather a case of the general problem of these update dependency relationships.</p>
<p><!-- is no need to special case read-only transactions, since they already satisfy the condition we defined above which is about the dependency set of each key transformer. This provides a more unifying view to understand when serialization anomalies will arise. Furthermore, we can make a finer-grained distinction that allows transactions to proceed if the key transformer updates are "constant". --></p>
<!--
### Fekete's Read-Only Anomaly
What about Fekete's [read-only snapshot transaction anomaly](https://www.cs.umb.edu/~poneil/ROAnom.pdf)? The state transformer view also provides a simplified view on this. Fekete's original example is given as follows:
$$
T_1:
\begin{cases}
&r(y,0) \\
&w(x,-11)
\end{cases}
$$
$$
T_2:
\begin{cases}
&r(x,0) \\
&r(y,0) \\
&w(y,20)
\end{cases}
$$
$$
T_3:
\begin{cases}
&r(x,0) \\
&r(y,20) \\
\end{cases}
$$
He claims that this is a "read-only" transaction anomaly since if you remove $$T_3$$ then the execution of $$T_1$$ and $$T_2$$ in isolation is serializable. But I think this argument is a bit misleading, since if you look at a prior example from the same paper of basic *write skew*, it is shown as follows, for 2 transactions:
$$
T_1:
\begin{cases}
&r(x,70) \\
&r(y,80) \\
&w(x,-30)
\end{cases}
$$
$$
T_2:
\begin{cases}
&r(x,70) \\
&r(y,80) \\
&w(y,-20)
\end{cases}
$$
It seems that almost the same argument would apply here. That is, why can't we say that $$T_1$$ and $$T_2$$ serializable (if we remove $$T_1$$'s read of $$x$$) for the same reason as in the ROA scenario, even though this is claimed to exhibit a "write skew" anomaly? Again, the problem here is that with "blind writes" there isn't a precise way to define these type of anomalies without resorting to explicit, operation level "update" semantics. In the state transformer model, Fekete's write skew example would more accurately be represented as:
$$
T_1:
\begin{cases}
&\mathcal{R}=\{x,y\} \\
&x' = \text{if } x + y > 100 \text{ then } (x - 100) \text{ else } x
\end{cases}
$$
$$
T_2:
\begin{cases}
&\mathcal{R}=\{x,y\} \\
&y' = \text{if } x + y > 100 \text{ then } (y - 100) \text{ else } y \\
\end{cases}
$$
So, one way to view this is that the read-only anomaly is really just a way that write skew becomes "visible" in the default existing formal model (???)
This demystifies the "read-only" anomaly somewhat, showing that it isn't really fundamentally different from write skew case, but just an awkward artifact of the way that many default formalisms express transactions and anomalies. -->
<h3 id="merging-and-deterministic-scheduling">Merging and Deterministic Scheduling</h3>
<p>This state transformer view of transactions also opens up a few interesting questions about whether we can be smarter when thinking about conflicts. That is, if transactions are formally expressed in this state transformer structure, we could consider cases where, instead of aborting transactions that encounter certain type of conflicts (write-write), we may consider semantically merging the effects of their key transformers into a unified operation that reflects the correct, sequential execution of both transformers. This is in essence similar to <a href="https://inria.hal.science/inria-00555588v1/document">CRDT-based ideas</a>, but applied in the context of a more classic transaction processing paradigm. Similarly, we might consider related ideas on <a href="https://www.cs.cornell.edu/~matthelb/papers/morty-eurosys23.pdf">“re-execution”</a> e.g. we could imagine that, if a conflict is detected from a concurrent transaction writing into your update dependency key set, it may be fine to simply re-compute the result of your update based on the written value, dynamically “correcting” the serialization anomaly.</p>
<p>Similarly, if transactions can be represented in this fashion for most practical systems/isolation levels, this also seems to raise the question of whether we can also apply ideas from deterministic transaction scheduling (i.e. <a href="https://cs.yale.edu/homes/thomson/publications/calvin-sigmod12.pdf">Calvin</a> style), since in theory the read/write sets of a transaction are known upfront. In practice, many transactions may still determine their full read/write sets dynamically (or via predicates), as they execute, but it may still be useful to model transactions within this formalism, even if it doesn’t fully align with the execution model in practice.</p>
<!-- Also, even in Adya type models, which explicitly model read and write dependencies between transactions, they still don't make explicit this finer-grained notion of update dependencies, since they just determine a "read dependency" as one that may lead to an update based anomaly if the transaction uses that read value in an update expression. For read only transactions, though, we don't need to be worried about these anomalies, since the values read are not being used in update computations. (???) -->
<p>Note that the existing world of stored procedure transactions, and one-shot transaction models like those described in <a href="https://research.google/pubs/spanner-googles-globally-distributed-database-2/">Spanner</a>, share similarities with this state transformer view of transactions. It is also similar to some formal reasoning arguments that talk about moving all reads to the beginning of a transaction, and all writes to the commit point, to simplify reasoning.
It may be helpful, though, to consider this type of model a more fundamental part of the isolation formalism itself.</p>
Sun, 04 May 2025 00:00:00 +0000
https://will62794.github.io/databases/transactions/isolation/2025/05/04/transformers-not-transactions.html
https://will62794.github.io/databases/transactions/isolation/2025/05/04/transformers-not-transactions.htmldatabasestransactionsisolationModern Views of Transaction Isolation<p>There have been many attempts to formalize the <a href="https://jepsen.io/consistency">zoo of various transaction isolation and consistency concepts</a> over the years. It is <a href="https://dl.acm.org/doi/pdf/10.1145/3035918.3056096">not always clear</a>, though, to what extent these attempts have clarified things, especially when each approach has introduced new variations of complexity and formal notation. The rise of distributed storage and database systems and the need to reason about isolation in these contexts has likely <a href="https://www.youtube.com/watch?v=ecZp6cWhDjg">worsened the situation</a>.</p>
<!-- If we want to say anything precise about transaction isolation, we do need some formal model in which to reason about it. -->
<p>There are a host of proposed formalisms that all approach the problem from different angles, with different frameworks, notations, etc. (<a href="http://pmg.csail.mit.edu/papers/adya-phd.pdf">Adya</a>, <a href="https://drops.dagstuhl.de/storage/00lipics/lipics-vol042-concur2015/LIPIcs.CONCUR.2015.58/LIPIcs.CONCUR.2015.58.pdf">Cerone</a>, <a href="https://dl.acm.org/doi/10.1145/3087801.3087802">Crooks</a>). They are all quite dense and differ in nontrivial ways, so it is helpful to try to understand some of the common underlying concepts between them. In particular there are two “modern” (post <a href="http://pmg.csail.mit.edu/papers/adya-phd.pdf">Adya 1999</a>) formalisms of isolation, <a href="https://drops.dagstuhl.de/storage/00lipics/lipics-vol042-concur2015/LIPIcs.CONCUR.2015.58/LIPIcs.CONCUR.2015.58.pdf">Cerone 2015</a> and <a href="https://dl.acm.org/doi/10.1145/3087801.3087802">Crooks 2017</a>, which take a similar “read-centric” view of isolation. Their surface details and formalizations appear quite different, but they share many similarities in their core ideas.</p>
<!-- They are also both notably distant from the foundational work of [Adya 1999](https://pmg.csail.mit.edu/papers/adya-phd.pdf). -->
<!-- which are both notably distant from the foundational work of [Adya 1999](https://pmg.csail.mit.edu/papers/adya-phd.pdf) and bear similarities in their core ideas. -->
<h2 id="modern-isolation-formalisms">Modern Isolation Formalisms</h2>
<p>A unifying concept of essentially any transaction isolation formalism is that an isolation definition can be viewed as a <em>condition over a set of committed transactions</em>. That is, given some set of transactions that were committed by a database system, these transactions either satisfy a given isolation level or not, based on the sequence of read and write operations present in each of these transactions.</p>
<div style="text-align: center">
<img src="/assets/diagrams/txn-isolation/txnvis1-CommittedTxns.drawio.png" alt="Transaction Isolation Models" width="550" />
</div>
<p>Note that a core aspect of any formal isolation definition is about putting conditions on <em>how reads observe database state</em>. If we have a set of transactions that only perform writes, we might have some intuitive correctness notion for how a database should execute these transactions, but we can’t make such a definition formal unless there exist some read operations that may observe the effect of other transaction’s writes. So, we can say that, to a first degree, a transaction isolation definition should be about <em>conditions on the set of values that a transaction can read</em>.
<!-- We can keep this "read-centric" view in mind in context of some of the modern formalisms for transaction isolation. -->
<!-- There are two notable, modern models of transaction isolation that try to capture some of this intuition formally: -->
The modern models of <a href="https://drops.dagstuhl.de/storage/00lipics/lipics-vol042-concur2015/LIPIcs.CONCUR.2015.58/LIPIcs.CONCUR.2015.58.pdf">2015 model of Cerone et al.</a> and the subsequent <a href="https://dl.acm.org/doi/10.1145/3087801.3087802">Crooks 2017 client-centric model</a> both approach isolation in a somewhat similar, “read-centric” view.</p>
<p>Under this “read-centric” view, when we think about how to define an isolation level, we should first be concerned with how we define what values a transaction can read. If our isolation level makes no restrictions on this, then a transaction can read any value (in practice how you might define a level like <em>read uncommitted</em>). More sensibly, we would expect a transaction to read states that are <em>reasonable</em>, in some sense. More concretely, we should expect transactions to actually read values written by other transactions. This could be a starting definition for isolation (similar to <em>read committed</em>), and one step up in strength from allowing transactions to read any possible value.</p>
<p>There are some other reasonable constraints, though. Basically, we likely expect that the possible states we read from came about through some “reasonable” execution of the transactions we gave to the database. One “reasonable” type of execution would be to execute these transactions in some sequential order. This is, for example, what we would expect out of a database system if we gave it a series of transactions one-by-one, with no concurrent overlapping between transactions (e.g. the classic notion of serializability). The Cerone and Crooks model both allow for a more precise formalization of these ideas.</p>
<h3 id="cerone-2015">Cerone 2015</h3>
<p>The Cerone paper, <a href="https://drops.dagstuhl.de/storage/00lipics/lipics-vol042-concur2015/LIPIcs.CONCUR.2015.58/LIPIcs.CONCUR.2015.58.pdf"><em>A Framework for Transactional Consistency Models with Atomic Visibility</em></a>, starts with a core simplifying assumption of <em>atomic visibility</em>, which is that either all or none of the operations of a transaction can become visible to other transactions. This means that their model cannot represent isolation levels like <em>Read Committed</em>, which is weaker than <em>Read Atomic</em>, the weakest level their model can express.</p>
<p>Their model encodes the intuitive idea of “read-centric” isolation by first defining a <em>visibility</em> relation between transactions i.e. a way of defining which transactions are visible to other transactions. That is, if a transaction reads a key, what other transaction writes should it observe. It defines this in terms of <em>abstract executions</em>, where an abstract execution consists of a set of committed transactions (called a <em>history</em> \(\mathcal{H}\)) along with two relations over this set:</p>
<ul>
<li><strong>Visibility</strong> (\(\mathsf{VIS} \subseteq \mathcal{H} \times \mathcal{H}\)): acyclic relation where \(T \overset{\mathsf{VIS}}{\rightarrow} S\) means that \(S\) is aware of \(T\).</li>
<li><strong>Arbitration</strong> (\(\mathsf{AR} \subseteq \mathcal{H} \times \mathcal{H}\)): total order such that \(\mathsf{AR} \supseteq \mathsf{VIS}\) where \(T \overset{\mathsf{AR}}{\rightarrow} S\) means that the writes of \(S\) supersede those written by \(T\) (essentially only orders write by concurrent transactions).</li>
</ul>
<p>Basically, \(\mathsf{VIS}\) is a partial ordering of transactions in a history, and \(\mathsf{AR}\) is a total order on transactions that is a superset of \(\mathsf{VIS}\) (i.e. any edge in \(\mathsf{VIS}\) is also by default an edge in \(\mathsf{AR}\)). Note that \(\mathsf{AR}\) is a total order, so every two transactions are comparable by this ordering even if, in some cases (as discussed below), this ordering is not relevant, and could be omitted.</p>
<figure style="text-align: center" id="fig1">
<div style="text-align: center;padding:30px;">
<img src="/assets/diagrams/txn-isolation/txnvis1-SampleHistory.drawio.png" alt="Transaction Isolation Models" width="450" />
</div>
<figcaption>Figure 1: A history \(\mathcal{H}\) of committed transactions with a possible visibility and arbitration relation. Note that \(\mathsf{AR}\) is a total order, so we can visualize this by ordering all transactions in some linear (left-to-right) order.</figcaption>
</figure>
<!-- e.g. External Consistency ($$EXT$$) axiom is basically saying, there exists a partial ordering of transactions such that you observe the latest effects of all transactions visible to you as defined by this partial order. -->
<!-- The framework is defined in terms of *abstract executions*, and a consistency model as a set of *consistency axioms* constraining executions. -->
<p>A consistency model (e.g. isolation level) is then defined as a set of <em>consistency axioms</em> constraining executions, where a consistency model allows histories for which there exists an abstract execution satisfying the axioms. In other words, given a set of transactions that executed against the database, they satisfy a consistency/isolation level if there exists an abstract execution that obeys the axioms of that consistency/isolation level, meaning that there exists a \(\mathsf{VIS}\) and \(\mathsf{AR}\) relation over this set that satisfies the axioms.</p>
<p>The weakest isolation level defined in the Cerone model, <em>Read Atomic</em>, imposes only two conditions: <em>internal</em> and <em>external</em> consistency, which are defined intuitively as:</p>
<ul>
<li>\(I\small{NT}\) (internal consistency): reads from an object returns the same value as the last write to or read from this object in the transaction.</li>
<li>\(E\small{XT}\) (external consistency): the value returned by an external read in \(T\) is determined by the transactions \(\mathsf{VIS}\)-preceding \(T\) that write to \(x\). If there are no such transactions, then \(T\) reads the initial value 0. Otherwise it reads the final value written by the last such transaction in \(\mathsf{AR}\).</li>
</ul>
<p>Internal consistency is a bit tedious from a formal perspective and is the less interesting condition, stating essentially that you read your own writes within a transaction. External consistency is the more important condition and depends on the visibility relation, stating that a transaction will read the value written by the latest transaction preceding it in the visibility relation, with conflicts decided by the arbitration relation. Note that if the read and write sets of two transactions are disjoint, then visibility relation is kind of irrelevant for them, so it doesn’t matter whether such an edge is or isn’t included in \(\mathsf{VIS}\).
<!-- Other than that, though, there are no real restrictions. --></p>
<!-- For example, transactions could observe the effect of two different transactions in different orders, or even violate causality (e.g. by reading from the future?). This very weak transaction level is helpful to understand the model. -->
<p>So, at this weakest defined isolation level, <em>Read Atomic</em>, we can think about a whole batch of committed transactions, and the only restrictions that we are placing on their reads is that they observe the effects of some other transaction(s) in this set, determined by the transaction’s incoming visibility (\(\mathsf{VIS}\)) edges (e.g. as illustrated in <a href="#fig1">Figure 1</a>). If multiple transactions among the incoming visibility edges wrote to conflicting key sets, then the \(\mathsf{AR}\) exists to arbitrate between them, determining which write is observed. Note also that \(\mathsf{AR}\) is a total order, so we can think about (and visualize) it as a global, linear ordering of all transactions, as illustrated by the left-to-right ordering in <a href="#fig1">Figure 1</a>. In some cases this total ordering is not relevant to the semantics of transactions, but we can imagine that it always exists in the background. Also, note that \(\mathsf{AR}\) is a superset of \(\mathsf{VIS}\), so this means you can’t have a visibility edge that goes “backwards” in this arbitration total order.</p>
<p>The underlying model requires that the visibility relation is acyclic, but without any other restrictions there are still some unintuitive semantics allowed at this weakest definition, with <em>causality violations</em> as the notable example. Basically, the visibility relation is not, by default, required to be transitive at <em>Read Atomic</em>, so you can end up with transactions observing the effects of some other transaction that observed the effect of an “earlier” transaction, but you don’t observe the effects of the “earlier” transaction e.g. as shown by example below with 3 transactions (i.e. \(T_3\) observes the effect of \(T_2\) via \(y\), and \(T_2\) observes the effect of \(T_1\) via \(x\), but \(T_3\) does not observe the effect of \(T_1\)’s write to \(x\)).</p>
<div style="text-align: center">
<img src="/assets/diagrams/txn-isolation/txnvis1-CausalViolation.drawio.png" alt="Transaction Isolation Models" width="420" />
</div>
<p>Moving up the isolation hierarchy from <em>Read Atomic</em> in Cerone’s model, we start strengthening requirements on what the reads in transactions can observe. In their framework, this starts by first adding a transitivity condition on visibility (\(T{\small{RANS}}V{\small{IS}}\)), to get <em>Causal Consistency</em>. This is then extended to <em>Parallel Snapshot Isolation (PSI)</em> and <em>Prefix Consistency</em>, two levels that are not strictly comparable to each other in the hierarchy (see <a href="#fig2">Figure 2</a>).</p>
<figure style="text-align: center" id="fig2">
<div style="text-align: center">
<img src="/assets/fig1-framework-atomic-viz.png" alt="Transaction Isolation Models" width="790" />
</div>
<figcaption>Figure 2: Consistency models in the Cerone framework.</figcaption>
</figure>
<p>Note that the \(N{\small{O}}C{\small{ONFLICT}}\) condition enforced at PSI is the first condition that is not related to the <em>values observed by reads</em>. Rather, it places conditions on valid cases of conflicting writes between transactions.</p>
<p>Similarly, there is a notable transition from PSI to Prefix Consistency in this hierarchy, which is related to a switch from a <em>partial</em> to <em>total</em> ordering requirements on the visibility relation.
Basically, the \(P\small{REFIX}\) condition requires that if \(T\) observes \(S\), then it also observes all \(\mathsf{AR}\) predecessors of \(S\). In the example below, which illustrates the <em>long fork</em> anomaly of PSI, transactions \(T_3\) and \(T_4\) can be understood as observing the effects of \(T_1\) and \(T_2\) in “different orders” i.e. for \(T_3\) it appears as if \(T_1 \rightarrow T_2\), but for \(T_4\) it observed \(T_2 \rightarrow T_1\).</p>
<figure style="text-align: center">
<img src="/assets/diagrams/txn-isolation/txnvis1-LongFork.drawio.png" alt="Transaction Isolation Models" width="580" style="display: block; margin-left: auto; margin-right: auto;" />
<figcaption style="text-align: center;width:730px;margin:auto;margin-top:10px;">Case of long fork anomaly allowed under Parallel Snapshot Isolation. Omission of the dotted visibility edge \(\mathsf{VIS}_{\small{PREFIX}}\) enables this anomaly, but its existence is forced under the \(P\small{REFIX}\) condition (e.g. at full snapshot isolation).</figcaption>
</figure>
<p>Under the \(P\small{REFIX}\) condition, the arbitration ordering of \(\mathsf{AR}\) between \(T_1\) and \(T_2\) comes into play, effectively enforcing a fixed order on how \(T_1\) and \(T_2\) are observed by other transactions. That is, in the above example, if \(T_3\) observes \(T_1\), then by \(P\small{REFIX}\) it must observe its \(AR\) predecessor \(T_2\). Similarly, \(T_4\) is then only required to observe \(T_2\), conforming to the \(T_2 \rightarrow T_1\) ordering enforced by \(AR\). Recall that since \(AR\) is a total order, this condition is basically saying that if you are ever going to observe some transaction out of some transaction set, then you are also forced to observe all transactions in this set in a fixed order, decided by the arbitration ordering. So, this effectively forces visibility to be totally ordered for concurrent transactions.</p>
<p>If we move all the way to serializability, the conditions are strengthened to \(T{\small{OTAL}}V{\small{IS}}\), requiring simply that \(VIS\) is a total order (along with \(I\small{NT}\) and \(E\small{XT}\) conditions).
<!-- > Is there a way to alternatively specify serializability in terms of SI + prevention of write skew? Instead of saying it is just "total ordering of visibility relation"? (Will's Question) Can we specify it in terms of NoReadWriteConflict? instead of NoConflict (which is specific to write conflicts)? -->
If we look at <a href="https://arxiv.org/abs/2405.18393">A Critique of Snapshot Isolation</a>, though, this offers another approach to formalizing serializability. That is, we instead alter snapshot isolation to prevent <em>read-write</em> conflicts instead of <em>write-write</em> conflicts i.e. if a transaction’s read set is written to by a concurrent transaction, then we must abort it. This is an alternative way to formalize serializability that mirrors more closely the \(N{\small{O}}C{\small{ONFLICT}}\) strengthening added for snapshot isolation levels.</p>
<!--
$$
INT \wedge EXT \wedge {P\small{REFIX}} \wedge N{\small{O}}C{\small{ONFLICT}}
$$
as for snapshot isolation, we can specify it as
$$
\begin{align}
INT \wedge EXT \wedge {P\small{REFIX}} \wedge N{\small{O}}R{\small{W}}C{\small{ONFLICT}}
\end{align}
$$ -->
<!-- Does this also mean the SI actually rules out some serializable executions? -->
<p><!-- $$AR$$ --></p>
<!-- One transaction T3 reads a value written by another transaction T2 that must have observed T1, but T3 does not observe T1's effect. (i.e. if you observe something, and another transaction observes that, it should also reflect effect of all things you observed.) -->
<p>Note that the <em>Read Atomic</em> isolation model (the weakest expressed in the Cerone formalism) can be viewed as an interesting “boundary” in isolation strength since, for something weaker like <em>Read Committed</em>, you only need to ensure that reads within a transaction of a key \(k\) read the value written by <em>some</em> other transaction to key \(k\). At such a weak level, there is no restriction on reading from a “consistent” state across keys. So, the weakest interpretation of read committed might be simply that any read can read any value that was written to that key at some point by any transaction in the history. This may not even impose any notion of ordering on transactions, since you really only care about your consistency guarantees at the level of a single key.</p>
<p>The <em>Read Atomic</em> model was first discussed in <a href="http://www.bailis.org/papers/ramp-sigmod2014.pdf">Bailis’ 2014 paper</a> on RAMP transactions. Note that <em>Read Atomic</em> can be viewed as similar to Snapshot Isolation but with an allowance for concurrent updates (i.e. it does not prevent write-write conflicts). This was also preceded by their earlier proposal of <a href="https://www.vldb.org/pvldb/vol7/p181-bailis.pdf"><em>Monotonic Atomic View</em></a> (MAV) isolation which is strictly weaker than <em>Read Atomic</em>. Essentially, MAV ensures that you always observe all effects of a transaction, but doesn’t require that reads necessarily read from the same, fixed database snapshot (i.e. it is stronger than <em>Read Committed</em> but weaker than <em>Read Atomic</em>).</p>
<h3 id="crooks-2017">Crooks 2017</h3>
<p>While the Cerone 2015 formalization starts with the visibility and arbitration ordering concepts, the Crooks formalism, presented in <a href="https://dl.acm.org/doi/10.1145/3087801.3087802"><em>Seeing is Believing: A Client-Centric Specification of Database Isolation</em></a>, takes a different starting point, though there are underlying similarities. Crooks similarly defines isolation over a set of committed transactions, but formalizes its definitions in terms of <em>executions</em>, which are simply a totally ordered sequence of these transactions.</p>
<figure style="text-align: center">
<div style="text-align: center;padding:30px;">
<img src="/assets/diagrams/txn-isolation/txnvis1-ReadStates.drawio.png" alt="Transaction Isolation Models" width="550" />
</div>
<figcaption>An <i>execution</i> of transactions with associated read states \(s_i\) in Crooks model.</figcaption>
</figure>
<p>The basic idea of Crooks’ formalism is centered on a <em>state-based</em> or <em>client-centric</em> view of isolation. That is, the values observed by any transactions will be determined based on <em>read states</em>, which are the states that the database passed through as it executed the transactions according to the execution ordering you defined. In a sense, this is similar to the notion of serializability as classically defined i.e. the values observed by each transaction being consistent with <em>some</em> sequential execution ordering that could have occurred.</p>
<!-- Crooks model is able to allow for transactions observing concurrent transaction writes in different orders since it doesn't require strict ordering between transactions to disjoint keys? -->
<p>This is ultimately quite similar to the Cerone view, since the visibility relation (\(\mathsf{VIS}\)) serves a similar purpose i.e. by basically picking out which transactions writes are visible to you. Cerone doesn’t formulate this in terms of “read states” as Crooks does, but essentially the same idea is present i.e. the “read state” in the Cerone model is created by the application of your \(\mathsf{VIS}\)-preceding transactions.</p>
<p>Crooks does also have a technical difference from the Cerone model, in that it allows expression of weaker models like <em>Read Committed</em>, since it does not make the assumption of <em>atomic visibility</em> that Cerone does. It does this by allowing each read operation of a transaction to potentially read from a <em>different</em> read state, allowing for expression of the fractured reads anomaly the Cerone cannot represent.</p>
<figure style="text-align: center" id="fig3">
<div style="text-align: center;padding:5px;">
<img src="/assets/crooks-commit-tests.png" alt="Transaction Isolation Models" width="630" />
</div>
<figcaption>Figure 3: Execution of transactions with associated read states in Crooks model.</figcaption>
</figure>
<p>Crooks is also naturally able to represent <em>Read Atomic</em>, though. The formal definition (as shown in <a href="#fig3">Figure 3</a>) is somewhat dense (note that \(sf_o\) represents the first read state for an operation \(o\)), but intuitively it is saying that if an operation \(o\) observes the writes of a transaction \(T_i\), all subsequent operations reading a key in \(T_i\)’s write set must read from a state that include \(T_i\)’s effects.</p>
<p>While Crooks and Cerone models are kind of different on the surface and in their formal details, they can be viewed as quite similar in their core ideas, which are about first establishing what possible values a transaction can read. We can roughly map Cerone’s model to Crooks’ model as well. We can consider the \(\mathsf{AR}\) total order of Cerone as analogous to the “execution order” used in the Crooks model, which is also a total order of transactions. The \(\mathsf{VIS}\) relation of Cerone is then akin to the selection of read states in Crooks’ model. That is, each transaction in the chosen total order picks out some transactions that are visible to it, and reads values accordingly. In Crooks’ model, based on the read state you pick, the transactions visible to you (as in Cerone) would be determined by the transactions preceding that read state.
<!-- And the total ordering of Crooks can then be used as the $$\mathsf{AR}$$ arbitration ordering in Cerone's model. --></p>
<!--
### Restrictions on Reads vs. Writes
A core aspect of the above models is how we define and formalize the values that transactions are able to *read*. But the values that a transaction can read don't capture the full picture of isolation definitions. Other than the values a transaction reads, what other restrictions are there to be made? Well, we may want to prevent a transaction from doing certain writes if they may break some "semantic" guarantees (which need to be defined a bit more carefully).
**Up to certain isolation level strength, transactions will commit regardless of what they write, as long as their reads satisfy the appropriate validity conditions.** At snapshot isolation and stronger, though, we start to require transactions to abort in some cases based on what they *write*.
This manifests notably in the write conflict condition for snapshot isolation, and also in the alternative, read-write conflict condition that we can use to make a serializable snapshot.
-->
<!-- -------------------------------------- -->
<!-- Why does snapshot isolation really need to enforce write-write conflict checking? If it didn't, how would this be observable to other transaction reads?
```
t1: r(x,init) w(y,1) w(z, 1)
t2: r(x,init) w(z,2) w(y, 2)
```
If you don't prevent a certain anomaly, is the effect of this anomaly actually observable to the reads of any other transactions?
Do we actually care about how writes are abitrated between concurrent/conflicting transactions?
what if both transactions were allowed to commit? How do we arbitrate between them?
It seems that certain anomalies, like "Lost Updates", are really only representable/observable if you include in your transactions model the ability of a write inside a transaction to use some value previously read within the transaction i.e. having a first-class semantic notion of an "update".
What if we view all transactions as kind of "transactional updates" or "state transformers"? Where we describe them as writes to a set of output keys that are a function of the state of input keys at the start of the transaction? e.g. over keys `x` and `y`:
```
x' = y + 1
y' = 2
```
where `x` is modified based on current value of `y` and `y` is just set to a new constant value. This simplified/condenses the whole read/write model of transactions. Obviously, in practice, we may not know the whole set of transaction keys upfront, but in a model, we could consider any transaction as reading some set of keys, and making writes possibly dependent on the values read from those keys.
For commutative operations, could you avoid aborting on write conflicts? e.g. pure inserts (v.s. increments, etc.) Note that commutativity/idempotency (CRDT style) is one way to avoid concurrency conflicts entirely.
*Some anomalies don't really make sense unless you consider the semantics of operations explicitly??* i.e. dealing with how writes conflict??
--------------------
Other than restricting the values that can be observed by reads, what other restrictions does an isolation level need to impose? As mentioned, why do we even need to make any other restrictions?
<!-- e.g. *Read Atomic* is only making restrictions on -->
<!-- Restrictions on *what you can observe* vs. restrictions on *whether a transaction can proceed* (or, perhaps, *what you can write*)?
Depends on the semantics of "writes"?
- *Lost Updates* - only meaningful if you consider the semantics of "update" operations?
- *Write Skew*
Write skew arises in a case when there is a read write dependency between two transactions
Say two transactions both read x and y, but then T1 writes to x and T2 writes to y. Both are reading from a consistent snapshot of state, but a potentially "stale" snapshot w.r.t the other ongoing transaction.
```
T1: r(y,0) w(x' = y+2)
T2: r(x,0) w(y' = x+2)
```
If executed serially, you would expect a result like `x=2, y=4`, but if executed concurrently you can get a result like `x=2, y=2`.
How different is this, actually, from a *lost update*?
And in the read-only transaction anomaly of SI:
```
Say initially x0=0 and y0=0.
T1: R(y0,0) W(y' = y + 20, 20) C1
T2: R(x0,0) R(y0, 0) W(x' = x - 10, -11) C2
T3: R(x0,0) R(y1,20) C3
```
I think this is ultimately similar?
If two updates are non-concurrent, then you should expect them to always behave "correctly". It is really concurrent updates whose effect you have to decide on and reason about. -->
Mon, 17 Mar 2025 00:00:00 +0000
https://will62794.github.io/formal-methods/specification/2025/03/17/transaction-isolation-models.html
https://will62794.github.io/formal-methods/specification/2025/03/17/transaction-isolation-models.htmlformal-methodsspecificationInteractive Formal Specifications<p>Formal specifications <a href="https://github.com/elastic/elasticsearch-formal-models">have</a> <a href="https://www.datadoghq.com/blog/engineering/formal-modeling-and-simulation/">become</a> <a href="https://www.amazon.science/publications/how-amazon-web-services-uses-formal-methods">a core part</a> of rigorous distributed systems design and verification, but existing tools have still been lacking in providing good interfaces for interacting with, exploring, visualizing and sharing these specifications and models in a portable and effective manner.</p>
<div style="text-align: center;">
<img src="/assets/interactive-formal-specs/ReconfigBroken2-Screenshot 2025-11-21 at 10.28.23 AM.png" alt="TLA+ Web Explorer Visualization" style="width: 68%; height: auto; border:solid 1px; border-radius: 8px;" width="95%" />
<div style="font-size: 0.85em; color: #555; margin-top: 6px;">
<span>Visualization of dynamic reconfiguration behavior in Raft.</span>
</div>
</div>
<p><a href="https://github.com/will62794/tla-web">Spectacle</a> aims to address this shortcoming by providing a browser-based tool for exploring and visualizing formal specifications written in <a href="https://lamport.azurewebsites.net/tla/tla.html">the TLA+ specification language</a>. It takes inspiration from past attempts at building similar tools, like Diego Ongaro’s <a href="https://www.usenix.org/system/files/login/articles/login_fall16_06_ongaro.pdf">Runway</a>, but it builds on top of TLA+, taking advantage of an existing, well-defined formal specification language, rather than trying to build a new language alongside the tool.</p>
<h3 id="the-javascript-interpreter">The JavaScript Interpreter</h3>
<p>At the core of the tool is a native Javascript interpreter for TLA+. <a href="https://github.com/tlaplus/tlaplus">TLC</a> is the primary, existing interpreter and model checker for TLA+ specifications, and it is mature, well-maintained, and has been optimized for performance over many years. It is, however, a somewhat complex and intricate codebase, written in Java, and so it was not a great candidate for integration into a browser-based tool that would allow for dynamic interaction with specifications.</p>
<p>One could build a type of language server into TLC that allows for remote interaction, but this seemed to provide a less than ideal dynamic interaction experience, and would require an external server to be maintained whenever the tool is being used. The now defunct <a href="https://github.com/Z3Prover/z3/discussions/5473">Rise4Fun</a> site from Microsoft Research illustrates the pitfalls of relying on a remote service for running these types of tools.
<!-- , e.g. when support for running that infrastructure goes away. --></p>
<!-- In light of this, the origin of this tool centered around building a native interpreter for TLA+ in Javascript, so that we could have a fast, dynamic interpreter that could be directly embedded in the browser, and could run anywhere, locally. -->
<p>The development of a Javascript interpreter for TLA+ was enabled by earlier work on <a href="https://ahelwer.ca/post/2023-01-11-tree-sitter-tlaplus/">building a tree-sitter parser for TLA+</a>, which can be compiled to <a href="https://webassembly.org/">WebAssembly</a> and run in the browser. Parsing TLA+ itself is a non-trivial task, so the development of this browser-based parser was a big step forward in terms of enabling the development of the interpreter. The interpreter itself is written entirely in vanilla Javascript, and <a href="https://github.com/will62794/spectacle/blob/master/js/eval.js">currently consists of</a> around 5000 lines of code. The goal is for the interpreter semantics to conform as closely as possible with TLC semantics, which we try to achieve via a conformance testing approach that compares the results of the Javascript interpreter and TLC <a href="https://will62794.github.io/spectacle/test.html">on a large corpus of TLA+ specifications</a>.</p>
<p>A benefit of this interpreter implementation is its ability to dynamically evaluate TLA+ specifications and expressions in the browser. For example, the demo below shows the dynamic evaluation of initial states for a single variable declaration (e.g <code class="language-plaintext highlighter-rouge">VARIABLE x</code>):</p>
<div style="
display: flex;
flex-direction: row;
width: 98%;
margin: 0 auto 24px auto;
gap: 0;
min-height: 180px;
border-radius: 12px;
box-shadow: 0 1px 7px rgba(0,0,0,0.04);
background: #f5f6fa;">
<!-- Left: Input -->
<div style="
flex: 1 1 0;
border-right: 1px solid #e0e0e0;
padding: 26px 20px 26px 20px;
display: flex;
flex-direction: column;
justify-content: flex-start;">
<div style="font-size: 0.96em; font-weight: 500; margin-bottom: 8px; color: #444;">Initial state expression</div>
<textarea id="tla-repl-input" placeholder="Enter state expression (e.g. x \in {1,2,3})" style="
font-size: 14px;
padding: 12px;
width: 100%;
resize: vertical;
min-height: 90px;
max-height: 250px;
font-family: 'Fira Mono', 'Consolas', 'Menlo', monospace;
border-radius: 8px;
border: 1px solid #cacaca;
background: #fff;
box-sizing: border-box;"></textarea>
</div>
<!-- Right: Output -->
<div style="
flex: 1 1 0;
padding: 26px 20px 26px 20px;
display: flex;
flex-direction: column;
justify-content: flex-start;">
<div style="font-size: 0.96em; font-weight: 500; margin-bottom: 8px; color: #444;">Initial states generated</div>
<div id="tla-init-states" style="
font-size: 14px;
font-family: 'Fira Mono', 'Consolas', 'Menlo', monospace;
border: solid 1px #ccc;
padding: 15px 14px;
border-radius: 8px;
min-height: 70px;
background: #fff;"></div>
</div>
</div>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/lodash.min.js"></script>
<script src="/assets/tla-web-embed/js/hash-sum/hash-sum.js"></script>
<script src="/assets/tla-web-embed/js/eval.js"></script>
<script>LANGUAGE_BASE_URL = "js";</script>
<script src="/assets/tla-web-embed/js/tree-sitter.js"></script>
<script>
//
// Main script that sets up and runs the interpreter.
//
let tree;
let parser;
let languageName = "tlaplus";
let enableEvalTracing = false;
/**
* Main UI initialization logic.
*/
async function init() {
const codeInput = document.getElementById('code-input');
await TreeSitter.init();
parser = new TreeSitter();
let tree = null;
var ASSIGN_PRIMED = false;
// Load the tree-sitter TLA+ parser.
let language;
const url = `/assets/tla-web-embed/${LANGUAGE_BASE_URL}/tree-sitter-${languageName}.wasm`;
try {
language = await TreeSitter.Language.load(url);
} catch (e) {
console.error(e);
return;
}
tree = null;
parser.setLanguage(language);
// Define a very simple spec inline.
// This can also be fetched from a remote URL.
let inputExpr = document.getElementById('tla-repl-input');
inputExpr.addEventListener('input', function() {
// Re-parse and evaluate spec with new input value
let specText = `
---- MODULE test ----
VARIABLE x
Init == ${inputExpr.value}
Next == x' = x
====`;
document.getElementById('tla-init-states').innerHTML = "";
// Parse the spec
let spec = new TLASpec(specText, "");
spec.parse().then(function () {
// Initialize the interpreter after parsing the spec
let interp = new TlaInterpreter();
// Generate initial states
let initStates = interp.computeInitStates(spec.spec_obj, {}, false);
console.log("Init states:", initStates);
document.getElementById('tla-init-states').innerHTML = "";
initStates.forEach(state => {
document.getElementById('tla-init-states').innerHTML += state.toString().substring(2) + "<br>\n";
});
// = initStates;
// [0].getVarVal("expr");
});
});
// let specText = `
// ---- MODULE test ----
// VARIABLE expr
// Init == expr = ${inputExpr.value}
// Next == expr' = expr
// ====`;
// // Parse the spec.
// let spec = new TLASpec(specText, "");
// spec.parse().then(function () {
// // Initialize the interpreter after parsing the spec.
// let interp = new TlaInterpreter();
// // Generate initial states.
// let initStates = interp.computeInitStates(spec.spec_obj, {}, false);
// console.log("Init states:", initStates);
// document.getElementById('tla-init-states').innerHTML = initStates[0];
// // Generate next states from the set of initial states.
// // let nextStates = interp.computeNextStates(spec.spec_obj, {}, initStates);
// // console.log("Next states:", nextStates);
// // document.getElementById('tla-next-states').innerHTML = nextStates[0].state;
// });
}
// Initialize things.
init();
</script>
<h3 id="interactive-trace-exploration">Interactive Trace Exploration</h3>
<p>A core feature of the tool is the ability to load a TLA+ specification and <em>interactively</em> explore its behaviors. It provides the capability for a user to, from any current state, select an enabled action to transition to a next state, and also allows for back-tracking in the current trace. It also allows for the definition of <em>trace expressions</em>, which allow arbitrary TLA+ expressions to be evaluated at each state of the current trace.</p>
<p>For example, below shows a partial trace of the <a href="https://github.com/will62794/tla-web/blob/07c093c27a0886c70cbbf1ab1c1b7188caf4ca3d/specs/TwoPhase.tla">two-phase commit protocol specification</a> in the tool:</p>
<div style="text-align: center;">
<img src="/assets/interactive-formal-specs/2pc-partial-Screenshot 2025-11-21 at 10.16.00 AM.png" alt="TLA+ Web Explorer Visualization" style="width:90%; height:auto; border:solid 1px; border-radius: 8px;" width="95%" />
</div>
<p>The tool also provides with the ability to easily share traces via static links, which can be reloaded in a new browser window while retaining the generated trace and its existing parameters/settings. This provides a universal, portable way to share system traces, something that was quite awkward with existing tools. For example, here is a link showing two-phase commit <a href="https://will62794.github.io/spectacle/#!/home?specpath=.%2Fspecs%2FTwoPhase.tla&initPred=Init&nextPred=Next&constants%5BRM%5D=%7Brm1%2Crm2%2Crm3%7D&trace=736d33ec%2Cb018b670_0890eb82%2C51a8a4d6_1b98ff2e%2C51194ae5_567308aa%2C5eb7f37c_0890eb82%2C2350e907_1b98ff2e%2C5e3c1474_567308aa%2C05ef0b8c%2C4300e35b_0890eb82%2Cf5692076_1b98ff2e%2C0029e85c_567308aa">driving all the way to commit</a>, and another link showing it <a href="https://will62794.github.io/spectacle/#!/home?specpath=.%2Fspecs%2FTwoPhase.tla&initPred=Init&nextPred=Next&constants%5BRM%5D=%7Brm1%2Crm2%2Crm3%7D&trace=736d33ec%2C5dbafbf8_0890eb82%2C2ba67f94_1b98ff2e%2C25f7a42e_567308aa%2Cb9c48c8a">driving through to abort</a>. It is also easy to link to system traces/counterexamples that illustrate interesting behaviors and/or edge cases of different protocols e.g. here is a <a href="https://will62794.github.io/spectacle/#!/home?specpath=.%2Fspecs%2FAbstractRaft.tla&constants%5BServer%5D=%7Bs1%2Cs2%2C%20s3%7D&constants%5BSecondary%5D=%22Secondary%22&constants%5BPrimary%5D=%22Primary%22&constants%5BNil%5D=%22Nil%22&constants%5BInitTerm%5D=0&initPred=Init&nextPred=Next&trace=318c702a%2C0785f33f_f64845d8%2Cbbf1576c_f013900a%2C79ad3285_f013900a%2C708acdc2_35419b82%2C3ef71dbf_26e3c58d%2C3c79f059_35419b82%2C9f67014c_f64845d8">link</a> to a case of a Raft leader being elected, writing a log entry and then committing it across all nodes.</p>
<p>In addition to the trace exploration and expression features, the tool also provides a basic REPL interface, which allows arbitrary expressions to be evaluated in the context of the currently loaded specification. This feature mostly subsumes <a href="https://github.com/will62794/tlaplus_repl">previous attempts</a> at providing a REPL-like interface for TLA+ specifications.</p>
<h3 id="visualization">Visualization</h3>
<p>The above features are effective for exploring and understanding a specification, but in some cases it can be helpful to have a more polished and visual way to understand a system and its states/behaviors. Currently, the tool provides a very simple, SVG-based DSL for defining visualizations directly in a TLA+ specification itself, rather than requiring a separate interface/language for defining visualizations.</p>
<p>For example, here is a <a href="https://will62794.github.io/spectacle/#!/home?specpath=.%2Fspecs%2FCabbageGoatWolf.tla&initPred=Init&nextPred=Next&trace=f3cb45ca%2C4357915f_7da698e2%2C126ae834_bf3b326e%2C76c2f092_652fccef%2C7229f089_f598e730%2C29e91cea_2ac3323e%2C50fe2821_bf3b326e%2C1d26e01c_9abe74ba%2C5f98d202_f598e730%2C3a9fa186_34b35f78%2Ca49994fc_bf3b326e%2Ceec0674a_652fccef%2C2afe63ed_f598e730%2C2883b61a_7da698e2%2C73ea1058_bf3b326e">simple visualization</a> of the final state of the <a href="https://en.wikipedia.org/wiki/Wolf,_goat_and_cabbage_problem">Wolf, Goat, and Cabbage</a> puzzle solution:</p>
<div style="text-align: center;">
<img src="/assets/interactive-formal-specs/CabbageGoatSolution-Screenshot 2025-11-21 at 10.20.08 AM.png" alt="TLA+ Web Explorer Visualization" style="width: 95%; height: auto; border:solid 1px; border-radius: 8px;" width="95%" />
</div>
<p>and here is <a href="https://will62794.github.io/spectacle/#!/home?specpath=.%2Fspecs%2FAbstractRaft.tla&constants%5BServer%5D=%7Bs1%2Cs2%2C%20s3%7D&constants%5BSecondary%5D=%22Secondary%22&constants%5BPrimary%5D=%22Primary%22&constants%5BNil%5D=%22Nil%22&constants%5BInitTerm%5D=0&initPred=Init&nextPred=Next&trace=318c702a%2C0785f33f_f64845d8%2Cbbf1576c_f013900a%2C2bf180d4_5a6a532a%2C4a68b9f3_1fec2cce%2C0aa370b2_4e287604%2C6ac87ace_4e287604">a visualization</a> of an abstract Raft specification with an elected leader and some log entries replicated across nodes:</p>
<div style="text-align: center;">
<img src="/assets/interactive-formal-specs/RaftLogs-Screenshot 2025-11-21 at 10.21.29 AM.png" alt="TLA+ Web Explorer Visualization" style="width: 95%; height: auto; border:solid 1px; border-radius: 8px;" width="95%" />
</div>
<p>The visualization DSL can currently be defined directly in the TLA+ specification itself, as seen <a href="https://github.com/will62794/tla-web/blob/07c093c27a0886c70cbbf1ab1c1b7188caf4ca3d/specs/CabbageGoatWolf.tla#L74-L107">here</a>, and provides a set of basic SVG primitives that can be <a href="https://github.com/will62794/tla-web/blob/07c093c27a0886c70cbbf1ab1c1b7188caf4ca3d/specs/CabbageGoatWolf.tla#L109-L156">arranged and positioned in hierarchical groups</a>, following standard SVG conventions. In future, these visualization primitives could be expanded with a variety of richer strucutres (e.g. graphs, lists, charts, <a href="https://andrewcmyers.github.io/constrain/">constraint-based approaches</a> etc.), but for now even this simple set of primitives allows for a variety of helpful system visualizations.</p>
<h3 id="conclusion">Conclusion</h3>
<p>Overall, the vision is for Spectacle to be complementary to existing TLA+ tooling. For example, it is expected that TLC will remain the primary tool for model checking of non-trivial TLA+ specifications, since it is still the most performant tool for doing so. The goal is for Spectacle to be a tool for prototyping, exploring, and understanding specs, and sharing the results of these explorations in a convenient and portable manner, aspects that few existing tools in the ecosystem excel at.</p>
Thu, 12 Dec 2024 00:00:00 +0000
https://will62794.github.io/formal-methods/specification/2024/12/12/interactive-formal-specs.html
https://will62794.github.io/formal-methods/specification/2024/12/12/interactive-formal-specs.htmlformal-methodsspecificationDecomposing Protocols with Interaction Graphs<!-- When verifying complex protocols, it is often useful to break them up into smaller components, verify each component separately, and then compose the results to verify the overall protocol. Ideally we would like to be able to break down a protocol into as small components as possible, verify each component separately, and then compose the results to verify the overall protocol. There are a few approaches to doing this for protocols, which we can do by analyzing the interactions between components, and abstracting sub-components based on these interactions. -->
<!-- ## Protocol Decomposition via Interaction Graphs -->
<p>Concurrent and distributed protocols can be formally viewed as a set of logical <em>actions</em>, each of which symbolically describe allowed state transitions of the system. We can analyze the structure of a protocol’s actions to understand the interaction between them, and to reason about a protocol’s underlying compositional structure.
<!-- e.g. for improving verification efficiency if possible. --></p>
<p>One approach to decomposing a protocol into subcomponents is to break up its actions into disjoint subsets, and view each disjoint subset of actions as a separate logical component. This is a useful starting point for decomposition of protocols since actions represent the atomic units of concurrent behavior within a protocol specification. We can also use this basic type of decomposition to define various formal notions of <em>interaction</em> between individual actions or subcomponents of a protocol.</p>
<!-- , which illustrates the logical interaction structure of a protocol and can also be used for accelerating verification for some protocols with the adequate interaction structure. -->
<p>As a simple example, consider the following protocol specification:</p>
\[\small
\begin{align*}
&\text{VARIABLES } a,b,c \\[0.4em]
&{Init } \triangleq \\
& \quad \land \, a = 0 \\
& \quad \land \, b = 0 \\
& \quad \land \, c = 0 \\[0.4em]
& IncrementA
\triangleq \\
& \quad \land \, b = 0 \\
& \quad \land \, a' = a + b \\
&\quad \land {\text{UNCHANGED }} \langle b,c \rangle \\[0.4em]
& IncrementB
\triangleq \\
& \quad \land \, b' = b + c \\
& \quad \land \, \text{UNCHANGED } \langle a,c \rangle \\[0.4em]
& IncrementC \triangleq \\
&\quad \land \, c < cycle \\
&\quad \land \, c' = (c + 1) \% cycle \\
&\quad \land \, \text{UNCHANGED } \langle a,b \rangle
\\[0.4em]
&Next \triangleq \\
&\quad \lor IncrementA \\
&\quad \lor IncrementB \\
&\quad \lor IncrementC \\
\\
&Inv \triangleq a \in \{0,1\} \quad \text{(* top-level invariant. *)} \\
& L1 \triangleq b \in \{0,1\} \\
& L2 \triangleq c \in \{0,1\}
\end{align*}\]
<p>In this case, we can consider decomposing the protocol into 2 logical sub-components,</p>
\[\begin{align*}
M_1 &= \{IncrementA\} & \qquad Vars(M_1)=\{a,b\} \\
M_2 &= \{IncrementB, IncrementC\} & \qquad Vars(M_2)=\{b,c\}
\end{align*}\]
<p>with the state variables associated with each component.</p>
<p>In this case, it is clear that the logical <em>interaction</em> between \(M_1\) and \(M_2\) can be defined in terms of their single shared variable, \(b\). Furthermore, this interaction is “uni-directional” in terms of the data flow between components i.e. only \(M_1\) reads from \(b\) and only \(M_2\) writes to \(b\). In this simple case of interaction it is also clear that, for example, verification of \(M_1\) behavior’s should only depend on the behavior of the interaction variable \(b\). The full behavior of \(M_2\) is irrelevant to the behavior of \(M_1\), enabling a natural type of compositional verification.</p>
<!-- That is, we can consider all behaviors of $$M_2$$, projected to the interaction variable $$b$$, and then verify $$M_1$$ against this behavior. -->
<p>More generally, if we consider every action of a protocol as its own, fine-grained component, with associated read/write variables, we can check pairwise interactions between all actions of an original protocol to produce an <em>interaction graph</em>, as shown below. This then serves as a starting point for understanding the interaction between protocol actions and the potential boundaries for protocol decomposition.</p>
<p align="center">
<img src="https://github.com/will62794/ipa/blob/main/specs/M_uni/M_uni_interaction_graph.png?raw=true" alt="Two Phase Commit Protocol Interaction Graph" width="430" />
</p>
<p>As another example, we can consider <a href="https://github.com/will62794/ipa/blob/main/specs/consensus_epr/consensus_epr.tla">this simplified consensus protocol</a> for selecting a value among a set of nodes via a simple leader election protocol. There are 5 actions of this protocol, related to nodes sending out votes for a leader, a leader processing those votes, getting electing as a leader, and a leader deciding on a value. We can examine this protocol’s interaction graph as follows:</p>
<figure id="consensus-interaction-graph">
<p align="center">
<img src="https://github.com/will62794/ipa/blob/main/specs/consensus_epr/consensus_epr_interaction_graph.png?raw=true" alt="Simple Consensus Protocol Interaction Graph" width="730" />
</p>
<figcaption>Figure 1. Interaction graph for simple consensus protocol.</figcaption>
</figure>
<p>Here, we see its interaction graph admits a simple, acyclic structure, with uni-directional dataflow between nearly all actions.
<!-- We can utilize this for accelerating verification as we discuss below. --></p>
<p>We can see another example of an interaction graph, for the two-phase commit protocol, based on its formal specification <a href="https://github.com/will62794/scimitar/blob/main/benchmarks/TwoPhase.tla">here</a>:</p>
<figure id="2pc-interaction-graph">
<p align="center">
<img src="https://github.com/will62794/ipa/blob/8cbb8e01a7640a13504a4f2577088a906ada2077/specs/TwoPhase/TwoPhase_interaction_graph.png?raw=true" alt="Two Phase Commit Protocol Interaction Graph" width="750" />
</p>
<figcaption>Figure 2. Two phase commit protocol interaction graph.</figcaption>
</figure>
<p>This interaction graph, annotated with the interaction variables along its edges, makes explicit the logical dataflow between actions of the protocol, and also suggests natural action groupings for decomposition. Specifically, into the resource manager (\(RM\)) sub-component and the transaction manager (\(TM\)) sub-component i.e.</p>
\[\small
\begin{align*}
&RM = \{RMRcvAbortMsg, RMRcvCommitMsg, RMPrepare, RMChooseToAbort\} \\
&TM = \{TMRcvPrepare, TMAbort, TMCommit\}
\end{align*}\]
<figure>
<p align="center">
<img src="https://github.com/will62794/ipa/blob/main/specs/TwoPhase/TwoPhase_interaction_graph_partitioned.png?raw=true" alt="Two Phase Commit Protocol Interaction Graph" width="750" />
</p>
<figcaption>Figure 3. Two phase commit protocol interaction graph from <a href="#2pc-interaction-graph">Figure 2</a> with partitioned components shown.</figcaption>
</figure>
<p>For example, we can note that the only outgoing dataflow from the \(RM\) set of actions is via the \(msgsPrepared\) variable, which is read via \(TMRcvPrepare\). The only incoming dataflow to the resource manager sub-component is via the \(msgsAbort\) and \(msgsCommit\) variables, which are written to by the transaction manager.</p>
<p>This matches our intuitive notions of the protocol whereby the resource manager and transaction manager behave as logically separate processes, and only interact via the relevant message channels (\(msgsAbort\), \(msgsCommit\), and \(msgsPrepared\)).</p>
<h2 id="compositional-verification">Compositional Verification</h2>
<p>The decomposition concepts above provide a way to view a protocol in terms of how its fine-grained atomic sub-components interact. We can, in some cases, utilize this structure for a kind of compositional verification when a protocol’s interaction graph is amenable.</p>
<h3 id="simple-consensus-protocol">Simple Consensus Protocol</h3>
<p>For example, we can consider the interaction graph of the simple consensus protocol from above. Its mostly acyclic interaction graph (<a href="#consensus-interaction-graph">Figure 1</a>) makes it directly amenable to a simple form of efficient, compositional verification. If we want to verify the core safety property of this protocol, \(NoConflictingValues\), which states that no two nodes decide on distinct values, we can check this with the TLC model checker in a few seconds using a model with 3 nodes (\(Node=\{n1,n2,n3\}\)), generating a reachable state space with 110,464 states.</p>
<p>From the protocol’s interaction graph, however, it is easy to see that the actions \(\{SendRequestVote, SendVote\}\), operate independently from the rest of the protocol, interacting only via writes to the \(vote\_msg\) variable. So, one approach to verifying this protocol is to start by verifying the \(\{SendRequestVote, SendVote\}\) actions independently of the rest of the protocol, and then verify the rest of the protocol against this behavior. More specifically, the overall protocol only depends on the observable behavior of this \(\{SendRequestVote, SendVote\}\) sub-component with respect to the \(vote\_msg\) variable.</p>
<p>For example, if we model check the protocol with the pruned transition relation of</p>
\[\begin{align*}
&Next_A \triangleq \\
& \quad \vee SendRequestVote\\
& \quad \vee SendVote \\
\end{align*}\]
<p>we generate 16,128 distinct reachable states, a ~7x reduction from the full state space. Now, since the only “interaction variable” between this \(Next_A\) sub-protocol and the rest of the protocol is the \(vote\_msg\) variable, we could project the state space of \(Next_A\) to the \(vote\_msg\) variable and verify the rest of the protocol against this projected state space.</p>
<p>With an explicit state model checker, we could directly compute this projection by generating and projecting the full state graph, and using this projected state graph as the “environment” under which to verify the rest of the protocol. Alternatively, we can come up with an <em>abstraction</em> of the \(Next_A\) protocol that reflects the external behavior of the interaction variable \(vote\_msg\) adequately.</p>
<p>For example, consider the following abstract model over the single \(vote\_msg\) variable that logically merges the \(SendRequestVote\) and \(SendVote\) actions into one atomic action:</p>
\[\begin{align*}
&SendRequestVote\_SendVote(src, dst) \triangleq \\
&\quad \wedge \, \nexists m \in vote\_msg : m[1] = src \\
&\quad \wedge \, vote\_msg' = vote\_msg \cup \{\langle src,dst \rangle\}
% &\quad \wedge \, \text{UNCHANGED } \langle vote\_request\_msg, voted, votes, leader, decided \rangle\\
\end{align*}\]
<p>This atomic action adds a new message into \(vote\_msg\) only if no existing node has already put such a message into \(vote\_msg\) (i.e. since nodes can’t vote twice in the original protocol).</p>
<p>We can formally check that this is a valid abstraction of the \(Next_A\) sub-protocol by showing a refinement between them e.g. showing that every behavior of \(Next_A\) is a valid behavior of this abstract spec:</p>
\[(Init \wedge \square [Next_A]_{vars}) \Rightarrow (Init \wedge \square [SendRequestVote\_SendVote]_{vote\_msg})\]
<p>Verifying this refinement is one way of ensuring that the abstract spec preserves the “externally observable” transitions of this sub-component (e.g. with respect to the \(vote\_msg\) variable).</p>
<p>Due to the acyclic nature of this protocol’s interaction graph, we could continue applying this compositional rule to further accelerate verification, but even with this initial reduction, we can see significant improvement. That is, now that we have an abstraction of the \(\{SendRequestVote, SendVote\}\) sub-protocol that preserves its interactions with the rest of the protocol, we can try verifying the rest of the protocol against this abstraction e.g.</p>
\[\begin{align*}
& Next_B \triangleq \\
&\quad \vee \wedge \exists i,j \in Node : SendRequestVote\_SendVote(i,j) \\
&\quad \quad \wedge \text{UNCHANGED } \langle vote\_request\_msg,voted,votes,leader,decided \rangle \\
&\quad \vee \exists i,j \in Node : RecvVote(i,j) \\
&\quad \vee \exists i \in Node, Q \in Quorum : BecomeLeader(i,Q) \\
&\quad \vee \exists i \in Node, v \in Value : Decide(i,v)
\end{align*}\]
<p>Model checking the above protocol (\(Next_B\)) with TLC, produces 514 distinct reachable states, a > 200x reduction from the original state space.</p>
<p>So, in this case, with only a simple dataflow/interaction analysis, we were able to reduce the largest model checking problem by a factor of ~10x e.g. in this case model checking of the \(Next_A\) sub-protocol was the most expensive verification sub-problem.
<!-- e.g. since we would need to verify that $$Next_A$$ is a valid abstraction of the $$\{SendRequestVote, SendVote\}$$ sub-protocol. --></p>
<!--
### Two Phase Commit Protocol
Any actions taken by `RMChooseToAbort` can only *disable* the `RMPrepare` action, right? So, from an external view, it should be safe to absorb `RMChooseToAbort` into `RMPrepare`, since it could only restrict the behaviors of `RMPrepare`, and there is not true "data flow" from `RMChooseToAbort` to `RMPrepare`, since the update expressions of`RMPrepare` don't actually depend on `RMChooseToAbort` writes?
Is there an "interaction preserving abstraction" that exists for the transaction manager sub-component in the two-phase commit protocol? Well, if we break down the protocol into transaction manager and resource manager sub-components, then we know the only interaction points between these two sub-components are via the $$\{msgsCommit, msgsAbort\}$$ (written to by RM, read by TM) and the $$msgsPrepared$$ (read by TM, written to by RM) variables.
From the perspective of the transaction manager, all it knows about is the view of the `msgsPrepared` variable, and it simply waits until it is filled up with enough resource managers. So, we can consider this abstraction of the `RM`:
$$
\begin{align*}
&RMAtomic(rm) \triangleq \\
&\quad \land msgsCommit = \{\} \\
&\quad \land msgsAbort = \{\} \\
&\quad \land msgsPrepared' = msgsPrepared \cup \{[type \mapsto Prepared, rm \mapsto rm]\} \\
&\quad \land \text{UNCHANGED } \langle tmState, tmPrepared, rmState, msgsCommit, msgsAbort \rangle
\end{align*}
$$
We can model check the original two-phase commit protocol with 4 resource managers, $$RM=\{rm1,rm2,rm3,rm4\}$$, for the main $$Consistency$$ safety property,
$$
\small
Consistency \triangleq \forall rm_1, rm_2 \in RM : \neg (rmState[rm_1] = aborted \wedge rmState[rm_2] = committed)
$$
and find that it has 1568 reachable states. If we instead model check the protocol against the $$RMAtomic$$ abstraction
$$
\begin{align*}
Next_{TwoPhase_A} &\triangleq \\
&\lor RMAtomic(rm) \triangleq \\
&\lor \exists rm \in RM : TMRcvPrepared(rm) \\
&\lor \exists rm \in RM : TMAbort(rm) \\
&\lor \exists rm \in RM : TMCommit(rm)
\end{align*}
$$
we find 163 reachable states, a ~10x reduction.
In this example, though, the original interaction between these two logical sub-components ($$RM$$ and $$TM$$) was not as simple as the acyclic dataflow of the simple consensus protocol, so just doing the above verification step is not sufficient to establish the top level safety property. That is, we actually need to show formally that this `RMAtomic` abstraction is truly "interaction preserving". That is, we need to prove that it would behave the same as the original component with respect to the interaction variables, $$\{msgsCommit, msgsAbort, msgsPrepared\}$$. In general, one way to show this would be to show that the RMAtomic component is, formally, an abstraction of the original $$RM$$ component, i.e. that $$RM$$ is a refinement of $$RMAtomic$$, roughly, that
$$
Next_{RM} \Rightarrow RMAtomic
$$
In general, though proving this refinement may be hard, and require development of auxiliary invariants to constrain the interaction variables suitably (?) to prove this step refinement condition.
-->
<!-- In this case, it is fairly easy to intuitively see why this is true. For example, we can first consider actions that write to `msgsPrepared` in the original component and the abstracted one. Only the `RMPrepare` action of the original sub-component do this, -->
<!--
## Generalized Interaction Semantics
Note that the above notions of interaction between protocol actions are based on static (i.e. syntactic) checks and so they are, in fact, conservative. That is, they may syntactically determine that two actions interact, even when they, in a semantic sense, do not. For this, we need a more general notion of "interaction".
As a concrete example, consider that even if an action $$A$$ writes to a variable that another action $$B$$ reads, this does not necessarily mean that the two actions interact. If both share variable $$x$$, and $$A$$ and $$B$$ are defined as follows:
$$
\begin{aligned}
&A \triangleq \\
&\quad \wedge \, x = 1 \\
&\quad \wedge \, x' = 2 \\[0.5em]
&B \triangleq \\
&\quad \wedge \, x = 0 \\
&\quad \wedge \, x' = 3
\end{aligned}
$$
then these two actions don't truly "interact". In a sense, the actions of $$A$$ will always be "invisible" to $$B$$, since they have no effect on whether $$B$$ is enabled/disabled or on the outcome after a $$B$$ action is taken.
Intuitively, we can say that one action $$A$$ "interacts" with another action $$B$$ if action $$A$$ can "affect" $$B$$. More concretely, $$A$$ could either:
1. Enable or disable $$B$$.
2. Affect the resulting state after a $$B$$ action is taken.
This gives rise to a more precise notion of interaction compared to our syntactic, read/write definition from above. Note that this notion of interaction we define (conversely, "independence") bears similarity to the independence notions used in classical [partial order reduction](https://www.cs.cmu.edu/~emc/15817-f08/lectures/partialorder.pdf) techniques.
Related ideas also appear in [early papers](https://www-old.cs.utah.edu/docs/techreports/2003/pdf/UUCS-03-028.pdf) on symbolic partial order reduction, which use a SAT solver to check these independence conditions.
We can formally encode the two interaction properties above for generic actions $$A_1, A_2$$, as follows, defined as temporal logic formulas stating whether $$A_1$$ "interacts with" / "affects" $$A_2$$:
<figure id="semantic-interaction">
$$
\begin{aligned}
Enabledness \triangleq& \, \square[A_1 \Rightarrow ({A_2}^{Pre} \Leftrightarrow {A_{2}^{Pre}}')]_{vars} \\
Commutativity \triangleq& \, \square[A_1 \Rightarrow (A_{2}^{Post} \Leftrightarrow {A_{2}^{Post}}')]_{vars}
\end{aligned}
$$
<figcaption>Figure 5. Semantic interaction conditions between one action and another.</figcaption>
</figure>
where $${A_i}^{Pre}$$ represents the formula of $$A_i$$'s precondition, and $$A_i^{Post}$$ represent the list of $$A_i$$'s update expressions (i.e. its postcondition expressions).
Essentially, the $$Enabledness$$ condition states that if $$A_2$$ is enabled/disabled in a current state, then after an $$A_1$$ transition, $$A_2$$ is still enabled/disabled. Similarly, $$Commutativity$$ states that if an $$A_1$$ step is taken, the update expressions of $$A_2$$ are unchanged. Note that we can in theory check these conditions symbolically or, for small enough protocols, using an explicit state tool like TLC, given we define the set of type-correct states (similar to how TLC can be [used to check inductive invariants](https://lamport.azurewebsites.net/tla/inductive-invariant.pdf)).
-->
<!--
DISABLE this semantic graph details temporarily.
For example, in the case of the simple consensus protocol from <a href="#consensus-interaction-graph">above</a>, its semantic interaction graph based on these new property definitions turns out to be the same as the one based on read/write interactions, since the read/write relationships already capture the semantic interaction accurately.
For two-phase commit, however, its semantic interaction graph differs slightly from the original one <a href="#2pc-interaction-graph">above</a>, as follows:
<figure id="2pc-semantic-interaction-graph">
<p align="center">
<img src="https://github.com/will62794/ipa/blob/8cbb8e01a7640a13504a4f2577088a906ada2077/specs/TwoPhase/TwoPhase_semantic_interaction_graph.png?raw=true" alt="Two Phase Commit Protocol Interaction Graph" width="750">
</p>
<figcaption>Figure 4. Interaction graph for the two-phase commit protocol, based on the semantic independence.</figcaption>
</figure>
We can see, for example, that the $$RMRcvAbortMsg$$ and $$RMRcvCommitMsg$$ actions are determined as interacting in the original, syntactic interaction graph, but in the refined, semantic interaction graph, they do not interact. This makes sense if we look at these underlying actions:
$$
\small
\begin{aligned}
&RMRcvCommitMsg(rm) \triangleq \\
&\quad \land \, \langle \text{Commit} \rangle \in msgsCommit \\
&\quad \land \, rmState' = [rmState \text{ EXCEPT }![rm] = \text{committed}] \\
&\quad \land \, \text{UNCHANGED } \langle tmState, tmPrepared, msgsPrepared, msgsCommit, msgsAbort \rangle \\[1em]
&RMRcvAbortMsg(rm) \triangleq \\
&\quad \land \, \langle \text{Abort} \rangle \in msgsAbort \\
&\quad \land \, rmState' = [rmState \text{ EXCEPT }![rm] = \text{aborted}] \\
&\quad \land \, \text{UNCHANGED } \langle tmState, tmPrepared, msgsPrepared, msgsCommit, msgsAbort \rangle
\end{aligned}
$$
From a naive syntactic analysis, we observe that both actions read from the $$rmState$$ variable (e.g. in their postcondition), and both write to that variable as well, so we determine that they interact. Semantically, though, the updates of both actions don't depend on the value of $$rmState$$, so writes to that variable shouldn't "affect" either actions. Thus, these two actions can be considered as semantically independent. This leads to the slightly refined version of the interaction graph shown in the [figure above](#2pc-semantic-interaction-graph), where we still include arrows representing read/write dependencies between actions, but *only* if those actions semantically interact by the conditions above.
-->
<!-- From the interaction graph [above](#2pc-semantic-interaction-graph), we can apply some simple rewrites to derive an interaction prerserving abstraction. If we take the $$RMChooseToAbort$$, we can try to rewrite this somehow to preserve its interactions with the rest of the components. It interacts with $$RMRcvAbortMsg$$ and $$RMRcvCommitMsg$$ only via $$rmState$$, and similarly for $$RMPrepare$$, which is in fact the only action that can observe its transitions. So, we what if we merge it with $$RMPrepare$$? If we do this, then we need to preserve this merged node's interaction with $$RMPrepare$$. -->
<!-- We know that $$RMChooseToAbort$$ transitions a resource manager to state `"aborted"` if that resource manager's state is currently `"working"`, so we need to preserve these externally visible transitions. The only way that $$RMChooseToAbort$$ can affect $$RMPrepare$$ -->
<!-- $$
\small
RMChooseToAbort(rm) \triangleq \neg \langle \text{Commit} \rangle \in msgsCommit \Rightarrow RMChooseToAbort(rm)
$$ -->
<!-- The above definitions provide a more precise notion of interaction between two actions, for which the syntactic checks we defined above are an overapproximation.
In practice, there may be a tradeoff between the read/write, syntactic interaction analysis and the semantic interaction analysis. The former can in theory be done statically, based only on syntactic analysis of actions, whereas the semantic notions of interaction may require some symbolic analysis e.g. checking the independence conditions properly may in general require a SAT/SMT query. In general, though, this may be worth it if the semantic interactions can help us reduce verification times significantly. Especially since these independence conditions can be generated automatically, without any kind of special synthesis or learning procedure needed (e.g. in the case of inductive/loop invariant synthesis). -->
<!-- ## Conditional Interaction
TODO. Explore conditional interaction for Paxos based ballots.
-->
<!-- ## Questions
- **TODO:** how exactly do we check that one abstraction is "interaction preserving" w.r.t some interaction variable, like in the consensus_epr example? just a refinement check?
- Note that for some interactions that are "read only", this may be even a more fine-grained distinction in the sense that the read variable may only appear in the precondition of an action, and so may only *restrict* the behavior of the component that reads from this variable.
- Can we have some simple graph-based rewriting rules that are permitted as valid "merging" of multiple actions as long as we preserve externally visible behavior?
- Can we have another finer-grained interaction notion that reflects the "can enable" vs "can disable" distinction?
- Can you also do "conditional" interaction? i.e. interaction might occur between two Raft actions in general, but may not occur between those actions executed across different term boundaries? -->
<h2 id="conclusions">Conclusions</h2>
<p>The ideas and techniques discussed above are similar to various types of compositional verification techniques that have been applied in various contexts. Similar ideas are utilized in the “interaction preserving abstraction” techniques in <a href="https://arxiv.org/abs/2202.11385">this paper</a>, and also in the work on <em><a href="https://iandardik.github.io/assets/papers/recomp_fmcad24.pdf">recomposition</a></em>, which builds similar techniques within the TLC model checker. The concept of using dataflow to analyze distributed and concurrent protocols has also appeared in various works in the past (e.g. <a href="https://www.cs.cornell.edu/~krzys/krzys_debs2009.pdf">distributed data flow</a>), and also more <a href="https://dl.acm.org/doi/10.1145/3639257">recent work</a> on using a Datalog like variant to automatically optimize distributed protocols using pre-defined rewrite rules.</p>
<p>Note that the code used to model the protocols above and generate their associated interaction graphs can be found <a href="https://github.com/will62794/ipa">here</a>.</p>
Mon, 02 Dec 2024 00:00:00 +0000
https://will62794.github.io/distributed-systems/verification/2024/12/02/interaction-graphs.html
https://will62794.github.io/distributed-systems/verification/2024/12/02/interaction-graphs.htmldistributed-systemsverification