🧵 Parallel Integer Sorter — Multithreaded Read → Sort → Write Pipeline

A high-performance C program that reads hundreds of millions of integers from multiple files, sorts them in parallel using POSIX threads and Mergesort, and writes the result to a single sorted output file — all with a pipeline optimized at every stage.

500,000,000 integers sorted in under 20 seconds with 12 sorter threads.

📋 Table of Contents

🔍 Overview

This program solves a classic systems programming challenge at scale:

Read — distribute input files across reader threads, each parsing integers from disk as fast as possible
Sort — merge all data into a single array, split it evenly across sorter threads, and sort in parallel using Mergesort
Write — merge all sorted subarrays into a single output file using a min-heap for O(N log T) k-way merge

This is v2 of the project — a full refactor from the first implementation, featuring better code organization, reduced complexity, and major performance optimizations throughout the entire pipeline.

🏗️ Architecture

┌─────────────────────────────────────────────┐
│              Input Files (N files)           │
└────────────────┬────────────────────────────┘
                 │  distributed equally
        ┌────────▼────────┐
        │  Reader Threads  │  (T threads, each reads a file subset)
        │  fread + manual  │  ← buffered I/O, 4MB block reads
        │  integer parser  │
        └────────┬────────┘
                 │  merge into single array
        ┌────────▼────────┐
        │  Sorter Threads  │  (T threads, each sorts N/T integers)
        │    Mergesort     │  ← in-place parallel sort
        └────────┬────────┘
                 │  sorted subarrays
        ┌────────▼────────┐
        │  Writer (1 thd)  │  ← min-heap k-way merge
        │  fast_itoa +     │  ← buffered fwrite, no fprintf
        │  fwrite buffer   │
        └────────┬────────┘
                 │
        ┌────────▼────────┐
        │   Output File    │
        └─────────────────┘

📊 Performance Results

All tests run on 500,000,000 integers spread across 5 files (100M integers each).

Case	Threads (`-t`)	Reader Time	Sorter Time	Writer Time	Total Time
1	1	3.816s	56.262s	9.840s	70.21s
2	2	2.916s	28.247s	10.023s	41.44s
3	4	2.516s	14.471s	10.627s	27.87s
4	6	1.575s	10.168s	10.418s	22.43s
5	12	1.066s	6.505s	11.489s	19.35s

Speedup vs. Single Thread

Case 1 (1 thread):   ████████████████████████████████████████  70.21s  (1.0×)
Case 2 (2 threads):  ████████████████████████  41.44s  (1.7×)
Case 3 (4 threads):  ████████████████  27.87s  (2.5×)
Case 4 (6 threads):  █████████████  22.43s  (3.1×)
Case 5 (12 threads): ███████████  19.35s  (3.6×)

Key Observations

Sorter scales nearly linearly with threads — doubling threads roughly halves sort time (56s → 28s → 14s → 10s → 6.5s)
Reader also benefits from parallelism but is I/O-bound, so gains taper off after 4–5 threads
Writer time stays nearly constant (~10–11s) across all cases — it is the irreducible sequential bottleneck, writing 500M integers to disk
Sorter threads can exceed reader threads — since there are only 5 input files, Cases 4 and 5 (-t 6, -t 12) cap reader threads at 5 but still spin up the full T sorter threads, maximizing CPU utilization during the sort phase

⚡ Optimizations

📖 Reader Optimizations

The reader was rewritten from a fscanf-based approach to a high-throughput buffered streaming parser.

I/O Strategy: `fscanf` → `fread`

// ❌ Old — one function call per integer
while (fscanf(fp, "%d", &num) == 1) { ... }

// ✅ New — bulk block reads into a 4MB buffer
while ((bytes = fread(buffer, 1, BUF_SIZE, file)) > 0) { ... }

Block reads reduce system call overhead by orders of magnitude and maximize disk throughput.

Manual Integer Parsing

// ✅ Simple, branch-predictable, CPU-efficient
num = num * 10 + (c - '0');

Eliminates all the overhead inside fscanf: format parsing, type dispatch, and internal locking.

Stateful Streaming Parser

A lightweight state machine handles number boundaries correctly across buffer splits:

if (c >= '0' && c <= '9') {
    num = num * 10 + (c - '0');
    reading = 1;
} else if (reading) {
    array[count++] = num;
    num = 0; reading = 0;
}

Other Improvements

Area	Old	New
Initial array capacity	10,000	10,000,000 — fewer `realloc` calls
EOF handling	Implicit	Explicit flush of last number
State reset between files	None	`num = 0; reading = 0;`
Error handling	Basic	Frees memory and closes file on failure

✍️ Writer Optimizations

The writer was transformed from a CPU-bound, syscall-heavy loop into a cache-friendly streaming pipeline.

Algorithmic Improvement: Linear Scan → Min-Heap

// ❌ Old — O(N × T) comparisons
for (int i = 0; i < threads_count; i++) {
    if (current < min_value) { ... }
}

// ✅ New — O(N log T) with a min-heap
HeapNode min = heap_pop(&heap);
heap_push(&heap, next_value);

For 500M integers and 6 threads, this eliminates billions of redundant comparisons.

Eliminating `fprintf`

// ❌ Old — format parsing + internal locking on every integer
fprintf(fp, "%d\n", value);

// ✅ New — inline integer-to-ASCII, zero overhead
pos += fast_itoa(min.value, buffer + pos);

fast_itoa is a fully inlined custom converter with no format parsing, no type handling, no stdio overhead.

Buffered Output

// Accumulate in user-space buffer, flush only when full
if (pos + 16 >= BUFFER_SIZE) {
    fwrite(buffer, 1, pos, pfile);
    pos = 0;
}
// Plus stdio-level buffer for double buffering
setvbuf(pfile, NULL, _IOFBF, BUFFER_SIZE);

Reduces write syscalls from millions to a handful.

Summary Table

Aspect	Old	New
Merge algorithm	O(N × T) linear scan	O(N log T) min-heap
Integer serialization	`fprintf`	`fast_itoa` + `fwrite`
Syscall count	Millions	Handful
Cache behavior	Poor (pointer chasing)	Sequential, prefetch-friendly
Scalability	Degrades with T	Scales with log(T)

🚀 How to Run

This project requires a Linux system with the POSIX threads library (pthreads).

Clone the repository

git clone https://github.com/hitalloazevedo/mergesort
cd mergesort/v2

Compile

make

Run

./build/main -t <T> <F1> <F2> ... <FN> -o <output_file>

Argument	Description
`-t <T>`	Number of threads (used for both reader and sorter phases)
`<Fn>`	Input file paths (must be inside `/data`)
`-o <OF>`	Output file path

Example — run predefined test cases

./case1.sh   # -t 1  (1 thread)
./case2.sh   # -t 2  (2 threads)
./case3.sh   # -t 4  (4 threads)
./case4.sh   # -t 6  (6 threads)
./case5.sh   # -t 12 (12 threads)

Input files must be located inside /data. Extensions .dat or .txt are recommended.

🧪 Test Configuration

Property	Value
Input files	5 files
Integers per file	100,000,000
Total integers	500,000,000
Integer range	0 – 5,000,000 (random order)
Sort algorithm	Mergesort
Output	Single ascending-sorted file

👤 Author

Hitallo Alves Teles Azevedo

📝 License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
v1		v1
v2		v2
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧵 Parallel Integer Sorter — Multithreaded Read → Sort → Write Pipeline

📋 Table of Contents

🔍 Overview

🏗️ Architecture

📊 Performance Results

Speedup vs. Single Thread

Key Observations

⚡ Optimizations

📖 Reader Optimizations

I/O Strategy: `fscanf` → `fread`

Manual Integer Parsing

Stateful Streaming Parser

Other Improvements

✍️ Writer Optimizations

Algorithmic Improvement: Linear Scan → Min-Heap

Eliminating `fprintf`

Buffered Output

Summary Table

🚀 How to Run

🧪 Test Configuration

👤 Author

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧵 Parallel Integer Sorter — Multithreaded Read → Sort → Write Pipeline

📋 Table of Contents

🔍 Overview

🏗️ Architecture

📊 Performance Results

Speedup vs. Single Thread

Key Observations

⚡ Optimizations

📖 Reader Optimizations

I/O Strategy: fscanf → fread

Manual Integer Parsing

Stateful Streaming Parser

Other Improvements

✍️ Writer Optimizations

Algorithmic Improvement: Linear Scan → Min-Heap

Eliminating fprintf

Buffered Output

Summary Table

🚀 How to Run

🧪 Test Configuration

👤 Author

📝 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

I/O Strategy: `fscanf` → `fread`

Eliminating `fprintf`

Packages