- OS: Linux Mint
- GPU: NVIDIA RTX 3090 (24GB)
- RAM: 16GB
Modeling and simulating various queuing models for Large Language Model (LLM) inference systems such as-
M/M/1, M/M/k, G/G/1, G/M/1, and M/G/1:

This project utilizes standard queueing models described by the notation A/B/C, where:
- M (Markovian): Poisson arrivals or Exponential service times (random/memoryless).
- G (General): Arbitrary probability distribution (could be anything).
- 1 or k: The number of servers available.
Supported Models:
-
M/M/1
- Arrivals: Poisson (Random).
- Service: Exponential (Random).
- Servers: 1.
- Description: The classic "Hello World" of queueing theory. Simple random arrivals and service times with a single processor.
-
M/M/k
- Arrivals: Poisson (Random).
- Service: Exponential (Random).
-
Servers:
$k$ (Multiple). -
Description: A multi-server version of M/M/1. Think of a bank with a single line feeding into
$k$ open teller windows.
-
M/G/1
- Arrivals: Poisson (Random).
- Service: General (Any distribution).
- Servers: 1.
- Description: Arrivals are random, but the service time follows a specific, non-random distribution (e.g., fixed time or heavy-tailed). Often analyzed using the Pollaczek–Khinchine formula.
-
G/M/1
- Arrivals: General (Any distribution).
- Service: Exponential (Random).
- Servers: 1.
- Description: The inverse of M/G/1. The service rate is random, but the incoming traffic follows a complex or specific pattern (e.g., bursty traffic).
-
G/G/1
- Arrivals: General.
- Service: General.
- Servers: 1.
- Description: The most complex single-server model. Both arrival and service times can follow any distribution. No simple formulas exist; typically solved via approximation or simulation.