This project focuses on the performance analysis of heterogeneous server clusters and explores the effectiveness of different load balancing strategies. In modern cloud infrastructure, servers often have varying processing capabilities. Standard routing algorithms that ignore this heterogeneity can lead to severe bottlenecks.
Using discrete event simulation (via the Python Ciw library), this project models a network of servers and compares a standard "Shortest Queue" algorithm against a custom "Heterogeneity-Aware" routing policy. The goal is to find the optimal strategy that minimizes average customer wait time while maximizing resource utilization.
The simulated queuing network consists of 12 dynamically configurable nodes defined in config.yml:
-
Node 1 (Dispatcher): The entry point of the system. It performs no processing (Service time = 0) and is strictly responsible for routing incoming requests to the worker nodes. Arrivals follow an Exponential Distribution (
$\lambda=1.0$ jobs/sec). -
Nodes 2-12 (Heterogeneous Workers): 11 worker nodes divided into three speed tiers, with service times following an Exponential Distribution.
- Slow Workers (Nodes 2-6): Average service time of 12 sec.
- Medium Workers (Nodes 7-9): Average service time of 8 sec.
- Fast Workers (Nodes 10-12): Average service time of 4 sec.
A baseline policy that routes tasks to the worker with the absolute shortest queue (including the currently served job). It completely ignores the processing speed of the nodes. In case of a tie, the node with the lowest ID is selected.
A smarter approach that attempts to exploit the faster servers. It sorts workers by their current load. If the least-loaded worker is "Fast", it is selected immediately. If the least-loaded worker is "Slow" or "Medium", the algorithm searches for a "Fast" node that satisfies a tolerance parameter
If found, the fast node takes the job, prioritizing speed over a slightly shorter queue. Otherwise, it falls back to the least-loaded node.
To ensure statistical validity, the simulation is fully automated via run_experiments.py:
- Duration: 24-hour simulation time with an additional 1-hour warm-up period to reach a steady state.
- Monte Carlo: 20 independent runs per experiment.
-
Metrics: 95% Confidence Intervals (
$\alpha=0.05$ ) are calculated for average waiting time and resource utilization.
Experiments showed that the optimal value for the Heterogeneity-Aware algorithm is
Comparing the baseline against our tuned algorithm (
| Metric | Algorithm 1 (Shortest Queue) | Algorithm 2 (Heterogeneity-Aware, |
|---|---|---|
| Average Wait Time | 0.78 ± 0.008 sec | 0.257 ± 0.0005 sec |
| Average Utilization | 75.07% ± 0.08% | 62.82% ± 0.14% |
Conclusion: Algorithm 2 reduces wait times by ~66%. While Algorithm 1 shows higher resource utilization (75%), this is a negative indicator here; tasks get trapped in "Slow" nodes (12 sec/job) simply because their queues were temporarily short. Algorithm 2 clears the workload much faster by heavily utilizing the 4-second nodes, leading to a highly desirable lower overall utilization state.
- Install dependencies:
pip install ciw pyyaml matplotlib numpy
- Run the automated experiment script:
python run_experiments.py
- The script will automatically generate the statistical summary (
simulation_results.txt) and all relevant plots in the root directory.


