Sanjay Seenivasan

On WALs (Write Ahead Logs) and fsync()

2024-01-06T00:00:00+00:00

This post will discuss CWal: A Write Ahead log (WAL) implementation I wrote in C++ (tested on Ubuntu 22.04). Here is a direct link: https://github.com/sjay05/cwal.

In a database system, write ahead logs store operations in a sequential order within a log file which are flushed to the disk before any expensive database-wide commits are performed.

This preserves the durability of the data, as it is recoverable in the case of a power failure/system crash.

Log Entries:

Log entries consist of a byte_len representing the size of data in bytes, and a 4 byte CRC Checksum value (CRC_CHECKSUM).

struct LogEntry {
  uint64_t byte_length;
  std::string data;
};

----|-------------------|--------------------|-----------------------|-----
... | uint64_t byte_len | const string* data | uint32_t CRC_CHECKSUM | ... 
    | (8 bytes)         | (byte_len bytes)   | (4 bytes)             |
----|-------------------|--------------------|-----------------------|-----

fsync() system call:

In Linux systems, the traditional C-style write() or C++ std::ostream<CharT,Traits>::write() does not guarantee immediate write to disk due to Page Caches (or Disk Cache).

Page caches are implemented in order to optimize freqeuent system call operations that target the disk, and is held in the RAM. Thus, the disk updates are implemented with deferred evaluation, where they wait a few seconds for further read/write calls before the data is flushed to disk.

This is a volatile method of storage and will not work for Write Ahead Logs. However, using the fsync system call, we can force the OS to flush the page cache to the disk.

fsync(2) - Linux Manual Page

NAME
   fsync - sychronize a file's in-core state with that on disk

SYNOPSIS
   #include <unistd.h>

   int fsync(int fd);
   int fdatasync(int fd);

The fd (file descriptor) may be obtained by opening the file with open(). fsync() will return once all buffers have been flushed to permanent storage, which achieves durable storage for the Write Ahead Log.

This leads to several questions now. What is the exact expense of the fsync() syscall? Does every log append require a fsync or can they batched up into blocks and committed when ready?

Benchmarks

CWal uses two types of benchmarks. The first is benchmark_write(const int LOG_LENGTH, const int LOG_SIZE, bool RFLUSH, bool SYNC, const int SYNC_PERIOD), which appends a total of LOG_LENGTH logs with data of LOG_SIZE bytes each.

RFLUSH indicates if a routine flush will be performed after each append.
SYNC period indicates if a routine fsync, every SYNC_PERIOD operations will occur.

In particular, benchmark_write runs with $\mathcal{O}(\text{LOG_LENGTH} \cdot \text{LOG_SIZE})$ time. We set LOG_LENGTH * LOG_SIZE ~= 1e6.

Reg. Benchmark: 1000 entries | data_length = 1000 | Flush? No | Sync? No
==> 2931 ms | 2.93 ms/log

Reg. Benchmark: 1000 entries | data_length = 1000 | Flush? Yes | Sync? No
==> 3556 ms | 3.56 ms/log

Reg. Benchmark: 1000 entries | data_length = 1000 | Flush? No | Sync? Yes | SYNC_PERIOD = 1
==> 107601 ms | 107.60 ms/log

Reg. Benchmark: 1000 entries | data_length = 1000 | Flush? No | Sync? Yes | SYNC_PERIOD = 10
==> 63919 ms | 63.92 ms/log

It is clear that fsync() is very expensive, and causes a $35\%$ increase in time. However it is unreasonable that a database would wait for 1000 log entries to be flushed before it’s state is modified. So, we can try to batch the logs into segments of set size.

At the end of each batch, CWal would start overwriting over previous log, as this is more time efficient that truncating the file. The database would also then commit it’s changes to the disk, so the previous logs would not be required anymore.

The function batched_sync_benchmarks(const int LOG_LENGTH, const int LOG_SIZE, cont int BATCH_SIZE) has one new argument, BATCH_SIZE. We stick with the same specifications of LOG_LENGTH and LOG_SIZE. Two BATCH_SIZE’s we can experiment with are $\sqrt{\text{LOG_LENGTH}}$ and $\sqrt[3]{\text{LOG_LENGTH}}$.

Batched Sync Benchmark: 1000 entries | data_length = 1000 | BATCH_SIZE = 31
==> 98265 ms | 98.27 ms/log

Batched Sync Benchmark: 1000 entries | data_length = 1000 | BATCH_SIZE = 10
==> 101750 ms | 101.75 ms/log

We can see slight improvements with the times of the 3rd Regular Benchmark.

Hence, with continued tweaking a user can dictate how often the write ahead log performs fsync() operations, and batch the logs appropriately.

Extensions for CWal

Asynchronous write ahead logging
Copies of log files for redundancy
Experiment with memory mapped files for WalReader IO performance

Prefix Digits - An Outline

2022-10-24T00:00:00+00:00

Problem Link: https://dmoj.ca/problem/pdigit

This post will dicuss the solution for the problem linked above, that I created for a mini-contest in DMOJ.

Statement

You are given two integers ~n~ and ~k~, and can perform operations to ~n~.

Each operation allows you to prepend a digit ~d~ ~(0 \le 0 \le 9)~ to ~n~, and it is your task to determine if there exists a sequence of operations such that ~n~ will end up being divisible by ~k~.

Note: ~n~ and ~k~ can be fairly large with bounds ~(1 \le n, k \le 10^9)~, and you are required to answer ~t~ test cases.

Subtask 1

~1 \le k, \le 9~

Since divisibility rules exist from ~1~ to ~9~, we can use logic to solve for each case.

Note: This subtask doesn’t exist in the linked problem.

Subtask 2

~1 \le t \le 10^5~

~1 \le n, k, \le 10^9~

Step 1

Since ~t~ can be ~10^5~, we are looking for a ~\mathcal{O}(T \cdot \log N)~, or ~\mathcal{O}(T)~ with some form of log factor, unless this problem can be solved in constant time.

Next, notice that the integer ~n~, after say ~m~ operations, can also be represented with an equation. Suppose ~d_1, d_2, d_3, \dots, d_m~ are the digits prepended to ~n~ in order from ~1~ to ~m~.

All the operations can be represented as the addition of one integer with digits ~d_1, d_2, d_3, \dots, d_m~ to the front of ~n~.

So, let ~y~ be the integer with digits ~d_1, d_2, d_3, \dots, d_m~.

Example: Prepend The Integer 23 to 45:

Our resultant value will have a length of ~2 + 2 = 4~, and we can picture this operation to be ~2300 + 45 = 2345~.

Hence notice, that we create ~0~s in the position where the ~45~ will go into. The number ~2300~ is created by multiplying ~23 \cdot 10^2~.

Generalization: Add Integer ~y~ to ~n~:

Define the length of an integer to be ~\text{len}(x)~. For example, ~\text{len}(342) = 3~.

The new integer ~n~, with ~y~ prepended is:

\[10^{\text{len}(n)} \cdot y + n\]

Step 2

How do we represent that a number ~n~ is divisible by ~k~ with an equation? We can write this as ~n \equiv 0 \pmod k~, where ~n~ is congruent to ~0 \pmod k~.

Since we figured out how to represent the final value of ~n~, our congruence is:

\[10^{\text{len}(n)} \cdot y + n \equiv 0 \pmod k\]

Our linear congruence is similar to form of ~ax \equiv b \pmod m~, where ~a = 10^{\text{len}(n)} \cdot y~, ~b = -n~ and ~m = k~.

Since the problem asks us YES or NO, does a sequence of operations exist, this is similar to asking if the congruence has any solution.

A congruence of the form ~ax \equiv b \pmod m~, has a solution when ~\text{gcd}(a, m)~ is a divisor of ~b~.

Therefore we output YES when ~\gcd(10^{\text{len}(n)}, k)~ is a divisor of ~-n \bmod k~, and NO otherwise.

AAC1 P5 - Odd Alpacas

2021-07-01T00:00:00+00:00

Problem Link: dmoj.ca/problem/aac1p5

This post will discuss the solution for the problem linked above. I created this problem along with Sam Liu for Animal Contest 1 on DMOJ.

Statistics:

Served as P5 of a 6-problem set.
~9~ correct submissions during contest.
~29.33\%~ AC rate (including subtasks).

Statement

You are given an tree of ~N~ nodes and ~N - 1~ weighted edges connecting ~u_i~ and ~v_i~ with weight ~w_i~ for ~1 \le i \le N - 1~.

Let the “length” of a path ~(x, y)~ to be the sum of weights on the edges from node ~x~ to node ~y~.

Let ~x~ to be the number of even length paths, and ~y~ to be the number of odd length paths.

By changing the weight of one edge, minimize ~|x - y|~.

Note: You are allowed to modify ~0~ edges.

Subtask 1

The constraint ~1 \le N \le 200~ was set on purpose to allow brute force solutions to pass for ~10\%~ of points.

First, notice how the modification of an edge modifies a path length.

Suppose a path was defined of the following weight parities:

\[\text{len} = \text{odd} + \text{even} + \text{odd} + \text{even}\]

The parity of ~\text{len}~ would only change if one of the 4 parities also changed. This is either changing an ~\text{odd}~ to an ~\text{even}~ or the other way around.

Hence, for this subtask we can simulate changing the parity for each edge. Once that is done, how can we find ~x~ and ~y~?

For each node ~v~ ~(1 \le v \le N)~, run a dfs on an assumption that ~v~ is an endpoint on a path. Create a distance array maintaining parity and ~|x - y|~ can be found easily.

Time Complexity: ~\mathcal{O}(N^3)~

Code Snippets

void Dfs(int v, int pr) { 
  for (const auto p : g[v]) { 
    int to = p.first;
    int w = p.second;
    if (to == pr) {
      continue;
    }
    if (min(to, v) == mod_x && max(to, v) == mod_y) {
      w ^= 1;
    }
    dist[to] = dist[v] + w;
    Dfs(to, v);
  }
};

if mod_x and mod_y are the nodes we are modifying, we can do a check and do w ^= 1 to switch the parity.

int x = get<0>(e);
int y = get<1>(e);
if (x > y) {
  swap(x, y);
}     
mod_x = x;
mod_y = y;
{
  long long odd = 0;
  long long even = 0; 
  for (int i = 0; i < n; i++) {
    dist.assign(n, 0);
    Dfs(i, -1);
    for (int j = 0; j < n; j++) {
      if (i == j) { 
        continue;
      }
      odd += (dist[j] % 2 == 1);
      even += (dist[j] % 2 == 0);
    }
  }
  odd /= 2;
  even /= 2;  
  ans = min(ans, abs(odd - even));
}

For each edge ~(x, y)~, we can set these as mod_x and mod_y and run a dfs for each node from ~1~ to ~N~.

Note: This implementation uses 0-based indexing. Hence the nodes are labeled from ~0~ to ~N - 1~.

Subtask 2

Constraints in this subtask (~1 \le N \le 2 \times 10^3~) were set to allow for a more optimized brute force to pass.

If ~N = 2 \times 10^3~, a ~\mathcal{O}(N^2)~ algorithm with about ~4 \times 10^6~ operations will pass.

We can draw inspiration from a very common ~\text{LCA}~ property used to find distance between two nodes in a tree:

If ~\text{dist}[x]~ is the distance from the root (~1~):

\[\text{dist}(x, y) = \text{dist}[x] + \text{dist}[y] - 2 \times \text{dist}[\text{lca}(x, y)]\]

Notice that any number (odd or even), when multipled by ~2~ will always result in a ~even~ result. Since ~ 2 \times \text{dist}[\text{lca}(x, y)]~ will always be even, the parity of ~\text{dist}(x, y)~ will be determined by ~\text{dist}[x]~ and ~\text{dist}[y]~.

So:

If ~\text{dist}[x] + \text{dist}[y]~ is odd, ~\text{dist}(x, y)~ will be odd.
If ~\text{dist}[x] + \text{dist}[y]~ is even, ~\text{dist}(x, y)~ will be even.

Let ~\alpha = \text{dist}[x] + \text{dist}[y]~.

Now, we want to be able to count ~x~ and ~y~ in ~\mathcal{O}(N)~ time, since we are trying each ~N - 1~ edges.

To count all paths with ~\alpha \equiv 1 \pmod 2~, we can multiply the number of nodes ~v~ with odd distance by the number of nodes with even distance. Let this result be ~\text{odd}~.

For the other case ~\alpha \equiv 0 \pmod 2~, we can subtract the number of odd paths from the total number of paths: (~\frac{n \cdot (n - 1)}{2} - \text{odd}~).

Time Complexity: ~\mathcal{O}(N^2)~

Code Snippets

long long odd = 0;
long long even = 0;
for (int i = 0; i < n; i++) {
  odd += (dist[i] % 2 == 1);
  even += (dist[i] % 2 == 0);
}     
long long o_cnt = odd * even;
long long e_cnt = n * (n - 1) / 2 - o_cnt;  
ans = min(ans, abs(o_cnt - e_cnt));

For each edge ~i~ (~1 \le i \le N - 1~), we run a DFS and calculate ~|x - y|~ like so.

Subtask 3

For the final subtask with ~(1 \le N \le 2 \times 10^5)~, a ~\mathcal{O}(N)~ algorithm must be derived.

The intended solution still goes on the assumption that all edges must be tried, but has an constant time way of finding ~x~ and ~y~.

Define an “odd” node to be an arbitrary node ~v~ such that ~\text{dist}[v] \equiv 1 \pmod 2~.

Define an “even” node to be an arbitrary node ~v~ such that ~\text{dist}[v] \equiv 0 \pmod 2~.

Suppose an edge ~(x, y)~ parity is changed. What paths are affected by this edge?

We can make the claim that:

Any path ~(u, v)~ intersecting edge ~(x, y)~ will have either ~u~ or ~v~ in the subtree of edge ~(x, y)~.

Take the tree above, if the edge ~(3, 5)~ (highlighted in green) was modified, notice that the grey, red, and blue path all have a end-vertex in the subtree of ~(3, 5)~.

We can notice that all even nodes will swap to odd nodes and vise-versa in this subtree, because all paths must have an end-point in the subtree.

Hence, if we keep a counter for the number of odd and even nodes in each subtree, when the time comes to modify an edge ~(x, y)~ we can swap the odd and even nodes appropriately and calculate ~x~ and ~y~ with the formula described in Subtask 2.

Time Complexity: ~\mathcal{O}(N)~

Code Snippets

long long ov = 0, ev = 0;
for (int i = 0; i < n; i++) {
  ov += (d[i] % 2 == 1);
  ev += (d[i] % 2 == 0);
}
long long o_cnt = ov * ev;
long long e_cnt = n * (n - 1) / 2 - o_cnt;
long long ans = abs(o_cnt - e_cnt); 
for (const auto& e : es) {
  int x, y, z;
  tie(x, y, z) = e;
  if (dep[x] < dep[y]) {
    swap(x, y);
  }
  long long o_aux = ov - odd[x] + even[x];
  long long e_aux = ev - even[x] + odd[x];
  long long new_o_cnt = o_aux * e_aux;
  long long new_e_cnt = n * (n - 1) / 2 - new_o_cnt;
  ans = min(ans, abs(new_o_cnt - new_e_cnt));
}
cout << ans << '\n';