Skip to content

hitalloazevedo/parallel-sorting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🧵 Parallel Integer Sorter — Multithreaded Read → Sort → Write Pipeline

C POSIX Threads Platform License

A high-performance C program that reads hundreds of millions of integers from multiple files, sorts them in parallel using POSIX threads and Mergesort, and writes the result to a single sorted output file — all with a pipeline optimized at every stage.

500,000,000 integers sorted in under 20 seconds with 12 sorter threads.


📋 Table of Contents


🔍 Overview

This program solves a classic systems programming challenge at scale:

  1. Read — distribute input files across reader threads, each parsing integers from disk as fast as possible
  2. Sort — merge all data into a single array, split it evenly across sorter threads, and sort in parallel using Mergesort
  3. Write — merge all sorted subarrays into a single output file using a min-heap for O(N log T) k-way merge

This is v2 of the project — a full refactor from the first implementation, featuring better code organization, reduced complexity, and major performance optimizations throughout the entire pipeline.


🏗️ Architecture

┌─────────────────────────────────────────────┐
│              Input Files (N files)           │
└────────────────┬────────────────────────────┘
                 │  distributed equally
        ┌────────▼────────┐
        │  Reader Threads  │  (T threads, each reads a file subset)
        │  fread + manual  │  ← buffered I/O, 4MB block reads
        │  integer parser  │
        └────────┬────────┘
                 │  merge into single array
        ┌────────▼────────┐
        │  Sorter Threads  │  (T threads, each sorts N/T integers)
        │    Mergesort     │  ← in-place parallel sort
        └────────┬────────┘
                 │  sorted subarrays
        ┌────────▼────────┐
        │  Writer (1 thd)  │  ← min-heap k-way merge
        │  fast_itoa +     │  ← buffered fwrite, no fprintf
        │  fwrite buffer   │
        └────────┬────────┘
                 │
        ┌────────▼────────┐
        │   Output File    │
        └─────────────────┘

📊 Performance Results

All tests run on 500,000,000 integers spread across 5 files (100M integers each).

Case Threads (-t) Reader Time Sorter Time Writer Time Total Time
1 1 3.816s 56.262s 9.840s 70.21s
2 2 2.916s 28.247s 10.023s 41.44s
3 4 2.516s 14.471s 10.627s 27.87s
4 6 1.575s 10.168s 10.418s 22.43s
5 12 1.066s 6.505s 11.489s 19.35s

Speedup vs. Single Thread

Case 1 (1 thread):   ████████████████████████████████████████  70.21s  (1.0×)
Case 2 (2 threads):  ████████████████████████  41.44s  (1.7×)
Case 3 (4 threads):  ████████████████  27.87s  (2.5×)
Case 4 (6 threads):  █████████████  22.43s  (3.1×)
Case 5 (12 threads): ███████████  19.35s  (3.6×)

Key Observations

  • Sorter scales nearly linearly with threads — doubling threads roughly halves sort time (56s → 28s → 14s → 10s → 6.5s)
  • Reader also benefits from parallelism but is I/O-bound, so gains taper off after 4–5 threads
  • Writer time stays nearly constant (~10–11s) across all cases — it is the irreducible sequential bottleneck, writing 500M integers to disk
  • Sorter threads can exceed reader threads — since there are only 5 input files, Cases 4 and 5 (-t 6, -t 12) cap reader threads at 5 but still spin up the full T sorter threads, maximizing CPU utilization during the sort phase

⚡ Optimizations

📖 Reader Optimizations

The reader was rewritten from a fscanf-based approach to a high-throughput buffered streaming parser.

I/O Strategy: fscanffread

// ❌ Old — one function call per integer
while (fscanf(fp, "%d", &num) == 1) { ... }

// ✅ New — bulk block reads into a 4MB buffer
while ((bytes = fread(buffer, 1, BUF_SIZE, file)) > 0) { ... }

Block reads reduce system call overhead by orders of magnitude and maximize disk throughput.

Manual Integer Parsing

// ✅ Simple, branch-predictable, CPU-efficient
num = num * 10 + (c - '0');

Eliminates all the overhead inside fscanf: format parsing, type dispatch, and internal locking.

Stateful Streaming Parser

A lightweight state machine handles number boundaries correctly across buffer splits:

if (c >= '0' && c <= '9') {
    num = num * 10 + (c - '0');
    reading = 1;
} else if (reading) {
    array[count++] = num;
    num = 0; reading = 0;
}

Other Improvements

Area Old New
Initial array capacity 10,000 10,000,000 — fewer realloc calls
EOF handling Implicit Explicit flush of last number
State reset between files None num = 0; reading = 0;
Error handling Basic Frees memory and closes file on failure

✍️ Writer Optimizations

The writer was transformed from a CPU-bound, syscall-heavy loop into a cache-friendly streaming pipeline.

Algorithmic Improvement: Linear Scan → Min-Heap

// ❌ Old — O(N × T) comparisons
for (int i = 0; i < threads_count; i++) {
    if (current < min_value) { ... }
}

// ✅ New — O(N log T) with a min-heap
HeapNode min = heap_pop(&heap);
heap_push(&heap, next_value);

For 500M integers and 6 threads, this eliminates billions of redundant comparisons.

Eliminating fprintf

// ❌ Old — format parsing + internal locking on every integer
fprintf(fp, "%d\n", value);

// ✅ New — inline integer-to-ASCII, zero overhead
pos += fast_itoa(min.value, buffer + pos);

fast_itoa is a fully inlined custom converter with no format parsing, no type handling, no stdio overhead.

Buffered Output

// Accumulate in user-space buffer, flush only when full
if (pos + 16 >= BUFFER_SIZE) {
    fwrite(buffer, 1, pos, pfile);
    pos = 0;
}
// Plus stdio-level buffer for double buffering
setvbuf(pfile, NULL, _IOFBF, BUFFER_SIZE);

Reduces write syscalls from millions to a handful.

Summary Table

Aspect Old New
Merge algorithm O(N × T) linear scan O(N log T) min-heap
Integer serialization fprintf fast_itoa + fwrite
Syscall count Millions Handful
Cache behavior Poor (pointer chasing) Sequential, prefetch-friendly
Scalability Degrades with T Scales with log(T)

🚀 How to Run

This project requires a Linux system with the POSIX threads library (pthreads).

Clone the repository

git clone https://github.com/hitalloazevedo/mergesort
cd mergesort/v2

Compile

make

Run

./build/main -t <T> <F1> <F2> ... <FN> -o <output_file>
Argument Description
-t <T> Number of threads (used for both reader and sorter phases)
<Fn> Input file paths (must be inside /data)
-o <OF> Output file path

Example — run predefined test cases

./case1.sh   # -t 1  (1 thread)
./case2.sh   # -t 2  (2 threads)
./case3.sh   # -t 4  (4 threads)
./case4.sh   # -t 6  (6 threads)
./case5.sh   # -t 12 (12 threads)

Input files must be located inside /data. Extensions .dat or .txt are recommended.


🧪 Test Configuration

Property Value
Input files 5 files
Integers per file 100,000,000
Total integers 500,000,000
Integer range 0 – 5,000,000 (random order)
Sort algorithm Mergesort
Output Single ascending-sorted file

👤 Author

Hitallo Alves Teles Azevedo


📝 License

This project is licensed under the MIT License.

About

Multithreaded C program using POSIX threads to read, sort, and merge hundreds of millions of integers across multiple files - featuring buffered fread parsing, parallel Mergesort, and O(N log T) min-heap output.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors