Heterogeneous Server Cluster Simulation & Load Balancing

Overview

This project focuses on the performance analysis of heterogeneous server clusters and explores the effectiveness of different load balancing strategies. In modern cloud infrastructure, servers often have varying processing capabilities. Standard routing algorithms that ignore this heterogeneity can lead to severe bottlenecks.

Using discrete event simulation (via the Python Ciw library), this project models a network of servers and compares a standard "Shortest Queue" algorithm against a custom "Heterogeneity-Aware" routing policy. The goal is to find the optimal strategy that minimizes average customer wait time while maximizing resource utilization.

System Architecture

The simulated queuing network consists of 12 dynamically configurable nodes defined in config.yml:

Node 1 (Dispatcher): The entry point of the system. It performs no processing (Service time = 0) and is strictly responsible for routing incoming requests to the worker nodes. Arrivals follow an Exponential Distribution ($\lambda=1.0$ jobs/sec).
Nodes 2-12 (Heterogeneous Workers): 11 worker nodes divided into three speed tiers, with service times following an Exponential Distribution.
- Slow Workers (Nodes 2-6): Average service time of 12 sec.
- Medium Workers (Nodes 7-9): Average service time of 8 sec.
- Fast Workers (Nodes 10-12): Average service time of 4 sec.

Load Balancing Strategies Evaluated

1. Algorithm 1: Shortest Queue

A baseline policy that routes tasks to the worker with the absolute shortest queue (including the currently served job). It completely ignores the processing speed of the nodes. In case of a tie, the node with the lowest ID is selected.

2. Algorithm 2: Heterogeneity-Aware (Optimal)

A smarter approach that attempts to exploit the faster servers. It sorts workers by their current load. If the least-loaded worker is "Fast", it is selected immediately. If the least-loaded worker is "Slow" or "Medium", the algorithm searches for a "Fast" node that satisfies a tolerance parameter $d$:

$$Load_{fast}\le{Load_{minNode}+d}$$

If found, the fast node takes the job, prioritizing speed over a slightly shorter queue. Otherwise, it falls back to the least-loaded node.

Simulation Methodology

To ensure statistical validity, the simulation is fully automated via run_experiments.py:

Duration: 24-hour simulation time with an additional 1-hour warm-up period to reach a steady state.
Monte Carlo: 20 independent runs per experiment.
Metrics: 95% Confidence Intervals ($\alpha=0.05$) are calculated for average waiting time and resource utilization.

Key Results

1. Tuning the Tolerance Parameter ($d$)

Experiments showed that the optimal value for the Heterogeneity-Aware algorithm is $d=0$. Forcing jobs into already busy fast servers ($d > 0$) causes severe bottlenecks and reduces the overall system utilization.

2. Shortest Queue vs. Heterogeneity-Aware Algorithm

Comparing the baseline against our tuned algorithm ($d=0$) yielded massive improvements:

Metric	Algorithm 1 (Shortest Queue)	Algorithm 2 (Heterogeneity-Aware, $d=0$)
Average Wait Time	0.78 ± 0.008 sec	0.257 ± 0.0005 sec
Average Utilization	75.07% ± 0.08%	62.82% ± 0.14%

Conclusion: Algorithm 2 reduces wait times by ~66%. While Algorithm 1 shows higher resource utilization (75%), this is a negative indicator here; tasks get trapped in "Slow" nodes (12 sec/job) simply because their queues were temporarily short. Algorithm 2 clears the workload much faster by heavily utilizing the 4-second nodes, leading to a highly desirable lower overall utilization state.

How to Run the Project

Install dependencies:
```
pip install ciw pyyaml matplotlib numpy
```
Run the automated experiment script:
```
python run_experiments.py
```
The script will automatically generate the statistical summary (simulation_results.txt) and all relevant plots in the root directory.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
plots		plots
LICENSE		LICENSE
README.md		README.md
config.yml		config.yml
run_experiments.py		run_experiments.py
simulation.py		simulation.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Heterogeneous Server Cluster Simulation & Load Balancing

Overview

System Architecture

Load Balancing Strategies Evaluated

1. Algorithm 1: Shortest Queue

2. Algorithm 2: Heterogeneity-Aware (Optimal)

Simulation Methodology

Key Results

1. Tuning the Tolerance Parameter ($d$)

2. Shortest Queue vs. Heterogeneity-Aware Algorithm

How to Run the Project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Heterogeneous Server Cluster Simulation & Load Balancing

Overview

System Architecture

Load Balancing Strategies Evaluated

1. Algorithm 1: Shortest Queue

2. Algorithm 2: Heterogeneity-Aware (Optimal)

Simulation Methodology

Key Results

1. Tuning the Tolerance Parameter ($d$)

2. Shortest Queue vs. Heterogeneity-Aware Algorithm

How to Run the Project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages