A high-performance C program that reads hundreds of millions of integers from multiple files, sorts them in parallel using POSIX threads and Mergesort, and writes the result to a single sorted output file β all with a pipeline optimized at every stage.
500,000,000 integers sorted in under 20 seconds with 12 sorter threads.
This program solves a classic systems programming challenge at scale:
- Read β distribute input files across reader threads, each parsing integers from disk as fast as possible
- Sort β merge all data into a single array, split it evenly across sorter threads, and sort in parallel using Mergesort
- Write β merge all sorted subarrays into a single output file using a min-heap for O(N log T) k-way merge
This is v2 of the project β a full refactor from the first implementation, featuring better code organization, reduced complexity, and major performance optimizations throughout the entire pipeline.
βββββββββββββββββββββββββββββββββββββββββββββββ
β Input Files (N files) β
ββββββββββββββββββ¬βββββββββββββββββββββββββββββ
β distributed equally
ββββββββββΌβββββββββ
β Reader Threads β (T threads, each reads a file subset)
β fread + manual β β buffered I/O, 4MB block reads
β integer parser β
ββββββββββ¬βββββββββ
β merge into single array
ββββββββββΌβββββββββ
β Sorter Threads β (T threads, each sorts N/T integers)
β Mergesort β β in-place parallel sort
ββββββββββ¬βββββββββ
β sorted subarrays
ββββββββββΌβββββββββ
β Writer (1 thd) β β min-heap k-way merge
β fast_itoa + β β buffered fwrite, no fprintf
β fwrite buffer β
ββββββββββ¬βββββββββ
β
ββββββββββΌβββββββββ
β Output File β
βββββββββββββββββββ
All tests run on 500,000,000 integers spread across 5 files (100M integers each).
| Case | Threads (-t) |
Reader Time | Sorter Time | Writer Time | Total Time |
|---|---|---|---|---|---|
| 1 | 1 | 3.816s | 56.262s | 9.840s | 70.21s |
| 2 | 2 | 2.916s | 28.247s | 10.023s | 41.44s |
| 3 | 4 | 2.516s | 14.471s | 10.627s | 27.87s |
| 4 | 6 | 1.575s | 10.168s | 10.418s | 22.43s |
| 5 | 12 | 1.066s | 6.505s | 11.489s | 19.35s |
Case 1 (1 thread): ββββββββββββββββββββββββββββββββββββββββ 70.21s (1.0Γ)
Case 2 (2 threads): ββββββββββββββββββββββββ 41.44s (1.7Γ)
Case 3 (4 threads): ββββββββββββββββ 27.87s (2.5Γ)
Case 4 (6 threads): βββββββββββββ 22.43s (3.1Γ)
Case 5 (12 threads): βββββββββββ 19.35s (3.6Γ)
- Sorter scales nearly linearly with threads β doubling threads roughly halves sort time (56s β 28s β 14s β 10s β 6.5s)
- Reader also benefits from parallelism but is I/O-bound, so gains taper off after 4β5 threads
- Writer time stays nearly constant (~10β11s) across all cases β it is the irreducible sequential bottleneck, writing 500M integers to disk
- Sorter threads can exceed reader threads β since there are only 5 input files, Cases 4 and 5 (
-t 6,-t 12) cap reader threads at 5 but still spin up the full T sorter threads, maximizing CPU utilization during the sort phase
The reader was rewritten from a fscanf-based approach to a high-throughput buffered streaming parser.
// β Old β one function call per integer
while (fscanf(fp, "%d", &num) == 1) { ... }
// β
New β bulk block reads into a 4MB buffer
while ((bytes = fread(buffer, 1, BUF_SIZE, file)) > 0) { ... }Block reads reduce system call overhead by orders of magnitude and maximize disk throughput.
// β
Simple, branch-predictable, CPU-efficient
num = num * 10 + (c - '0');Eliminates all the overhead inside fscanf: format parsing, type dispatch, and internal locking.
A lightweight state machine handles number boundaries correctly across buffer splits:
if (c >= '0' && c <= '9') {
num = num * 10 + (c - '0');
reading = 1;
} else if (reading) {
array[count++] = num;
num = 0; reading = 0;
}| Area | Old | New |
|---|---|---|
| Initial array capacity | 10,000 | 10,000,000 β fewer realloc calls |
| EOF handling | Implicit | Explicit flush of last number |
| State reset between files | None | num = 0; reading = 0; |
| Error handling | Basic | Frees memory and closes file on failure |
The writer was transformed from a CPU-bound, syscall-heavy loop into a cache-friendly streaming pipeline.
// β Old β O(N Γ T) comparisons
for (int i = 0; i < threads_count; i++) {
if (current < min_value) { ... }
}
// β
New β O(N log T) with a min-heap
HeapNode min = heap_pop(&heap);
heap_push(&heap, next_value);For 500M integers and 6 threads, this eliminates billions of redundant comparisons.
// β Old β format parsing + internal locking on every integer
fprintf(fp, "%d\n", value);
// β
New β inline integer-to-ASCII, zero overhead
pos += fast_itoa(min.value, buffer + pos);fast_itoa is a fully inlined custom converter with no format parsing, no type handling, no stdio overhead.
// Accumulate in user-space buffer, flush only when full
if (pos + 16 >= BUFFER_SIZE) {
fwrite(buffer, 1, pos, pfile);
pos = 0;
}
// Plus stdio-level buffer for double buffering
setvbuf(pfile, NULL, _IOFBF, BUFFER_SIZE);Reduces write syscalls from millions to a handful.
| Aspect | Old | New |
|---|---|---|
| Merge algorithm | O(N Γ T) linear scan | O(N log T) min-heap |
| Integer serialization | fprintf |
fast_itoa + fwrite |
| Syscall count | Millions | Handful |
| Cache behavior | Poor (pointer chasing) | Sequential, prefetch-friendly |
| Scalability | Degrades with T | Scales with log(T) |
This project requires a Linux system with the POSIX threads library (pthreads).
Clone the repository
git clone https://github.com/hitalloazevedo/mergesort
cd mergesort/v2Compile
makeRun
./build/main -t <T> <F1> <F2> ... <FN> -o <output_file>| Argument | Description |
|---|---|
-t <T> |
Number of threads (used for both reader and sorter phases) |
<Fn> |
Input file paths (must be inside /data) |
-o <OF> |
Output file path |
Example β run predefined test cases
./case1.sh # -t 1 (1 thread)
./case2.sh # -t 2 (2 threads)
./case3.sh # -t 4 (4 threads)
./case4.sh # -t 6 (6 threads)
./case5.sh # -t 12 (12 threads)Input files must be located inside
/data. Extensions.dator.txtare recommended.
| Property | Value |
|---|---|
| Input files | 5 files |
| Integers per file | 100,000,000 |
| Total integers | 500,000,000 |
| Integer range | 0 β 5,000,000 (random order) |
| Sort algorithm | Mergesort |
| Output | Single ascending-sorted file |
Hitallo Alves Teles Azevedo
This project is licensed under the MIT License.