Skip to content

hitalloazevedo/parallel-sorting

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

51 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧡 Parallel Integer Sorter β€” Multithreaded Read β†’ Sort β†’ Write Pipeline

C POSIX Threads Platform License

A high-performance C program that reads hundreds of millions of integers from multiple files, sorts them in parallel using POSIX threads and Mergesort, and writes the result to a single sorted output file β€” all with a pipeline optimized at every stage.

500,000,000 integers sorted in under 20 seconds with 12 sorter threads.


πŸ“‹ Table of Contents


πŸ” Overview

This program solves a classic systems programming challenge at scale:

  1. Read β€” distribute input files across reader threads, each parsing integers from disk as fast as possible
  2. Sort β€” merge all data into a single array, split it evenly across sorter threads, and sort in parallel using Mergesort
  3. Write β€” merge all sorted subarrays into a single output file using a min-heap for O(N log T) k-way merge

This is v2 of the project β€” a full refactor from the first implementation, featuring better code organization, reduced complexity, and major performance optimizations throughout the entire pipeline.


πŸ—οΈ Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Input Files (N files)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚  distributed equally
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  Reader Threads  β”‚  (T threads, each reads a file subset)
        β”‚  fread + manual  β”‚  ← buffered I/O, 4MB block reads
        β”‚  integer parser  β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚  merge into single array
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  Sorter Threads  β”‚  (T threads, each sorts N/T integers)
        β”‚    Mergesort     β”‚  ← in-place parallel sort
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚  sorted subarrays
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚  Writer (1 thd)  β”‚  ← min-heap k-way merge
        β”‚  fast_itoa +     β”‚  ← buffered fwrite, no fprintf
        β”‚  fwrite buffer   β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                 β”‚
        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
        β”‚   Output File    β”‚
        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š Performance Results

All tests run on 500,000,000 integers spread across 5 files (100M integers each).

Case Threads (-t) Reader Time Sorter Time Writer Time Total Time
1 1 3.816s 56.262s 9.840s 70.21s
2 2 2.916s 28.247s 10.023s 41.44s
3 4 2.516s 14.471s 10.627s 27.87s
4 6 1.575s 10.168s 10.418s 22.43s
5 12 1.066s 6.505s 11.489s 19.35s

Speedup vs. Single Thread

Case 1 (1 thread):   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  70.21s  (1.0Γ—)
Case 2 (2 threads):  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  41.44s  (1.7Γ—)
Case 3 (4 threads):  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  27.87s  (2.5Γ—)
Case 4 (6 threads):  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  22.43s  (3.1Γ—)
Case 5 (12 threads): β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ  19.35s  (3.6Γ—)

Key Observations

  • Sorter scales nearly linearly with threads β€” doubling threads roughly halves sort time (56s β†’ 28s β†’ 14s β†’ 10s β†’ 6.5s)
  • Reader also benefits from parallelism but is I/O-bound, so gains taper off after 4–5 threads
  • Writer time stays nearly constant (~10–11s) across all cases β€” it is the irreducible sequential bottleneck, writing 500M integers to disk
  • Sorter threads can exceed reader threads β€” since there are only 5 input files, Cases 4 and 5 (-t 6, -t 12) cap reader threads at 5 but still spin up the full T sorter threads, maximizing CPU utilization during the sort phase

⚑ Optimizations

πŸ“– Reader Optimizations

The reader was rewritten from a fscanf-based approach to a high-throughput buffered streaming parser.

I/O Strategy: fscanf β†’ fread

// ❌ Old β€” one function call per integer
while (fscanf(fp, "%d", &num) == 1) { ... }

// βœ… New β€” bulk block reads into a 4MB buffer
while ((bytes = fread(buffer, 1, BUF_SIZE, file)) > 0) { ... }

Block reads reduce system call overhead by orders of magnitude and maximize disk throughput.

Manual Integer Parsing

// βœ… Simple, branch-predictable, CPU-efficient
num = num * 10 + (c - '0');

Eliminates all the overhead inside fscanf: format parsing, type dispatch, and internal locking.

Stateful Streaming Parser

A lightweight state machine handles number boundaries correctly across buffer splits:

if (c >= '0' && c <= '9') {
    num = num * 10 + (c - '0');
    reading = 1;
} else if (reading) {
    array[count++] = num;
    num = 0; reading = 0;
}

Other Improvements

Area Old New
Initial array capacity 10,000 10,000,000 β€” fewer realloc calls
EOF handling Implicit Explicit flush of last number
State reset between files None num = 0; reading = 0;
Error handling Basic Frees memory and closes file on failure

✍️ Writer Optimizations

The writer was transformed from a CPU-bound, syscall-heavy loop into a cache-friendly streaming pipeline.

Algorithmic Improvement: Linear Scan β†’ Min-Heap

// ❌ Old β€” O(N Γ— T) comparisons
for (int i = 0; i < threads_count; i++) {
    if (current < min_value) { ... }
}

// βœ… New β€” O(N log T) with a min-heap
HeapNode min = heap_pop(&heap);
heap_push(&heap, next_value);

For 500M integers and 6 threads, this eliminates billions of redundant comparisons.

Eliminating fprintf

// ❌ Old β€” format parsing + internal locking on every integer
fprintf(fp, "%d\n", value);

// βœ… New β€” inline integer-to-ASCII, zero overhead
pos += fast_itoa(min.value, buffer + pos);

fast_itoa is a fully inlined custom converter with no format parsing, no type handling, no stdio overhead.

Buffered Output

// Accumulate in user-space buffer, flush only when full
if (pos + 16 >= BUFFER_SIZE) {
    fwrite(buffer, 1, pos, pfile);
    pos = 0;
}
// Plus stdio-level buffer for double buffering
setvbuf(pfile, NULL, _IOFBF, BUFFER_SIZE);

Reduces write syscalls from millions to a handful.

Summary Table

Aspect Old New
Merge algorithm O(N Γ— T) linear scan O(N log T) min-heap
Integer serialization fprintf fast_itoa + fwrite
Syscall count Millions Handful
Cache behavior Poor (pointer chasing) Sequential, prefetch-friendly
Scalability Degrades with T Scales with log(T)

πŸš€ How to Run

This project requires a Linux system with the POSIX threads library (pthreads).

Clone the repository

git clone https://github.com/hitalloazevedo/mergesort
cd mergesort/v2

Compile

make

Run

./build/main -t <T> <F1> <F2> ... <FN> -o <output_file>
Argument Description
-t <T> Number of threads (used for both reader and sorter phases)
<Fn> Input file paths (must be inside /data)
-o <OF> Output file path

Example β€” run predefined test cases

./case1.sh   # -t 1  (1 thread)
./case2.sh   # -t 2  (2 threads)
./case3.sh   # -t 4  (4 threads)
./case4.sh   # -t 6  (6 threads)
./case5.sh   # -t 12 (12 threads)

Input files must be located inside /data. Extensions .dat or .txt are recommended.


πŸ§ͺ Test Configuration

Property Value
Input files 5 files
Integers per file 100,000,000
Total integers 500,000,000
Integer range 0 – 5,000,000 (random order)
Sort algorithm Mergesort
Output Single ascending-sorted file

πŸ‘€ Author

Hitallo Alves Teles Azevedo


πŸ“ License

This project is licensed under the MIT License.

About

Multithreaded C program using POSIX threads to read, sort, and merge hundreds of millions of integers across multiple files - featuring buffered fread parsing, parallel Mergesort, and O(N log T) min-heap output.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors