What Is Queuing Theory?

Queuing theory is the mathematical study of waiting lines — how arrivals, service, and the number of servers combine to produce waiting and congestion. It applies anywhere demand competes for limited capacity: call centers, checkout lines, hospital beds, network packets, and machines on a production floor. Its purpose is practical: predict waiting times and queue lengths, and size systems so they meet a service-level target without wasteful over-provisioning.

Kendall Notation

Queues are described with A/B/c notation. The first letter is the arrival process, the second the service-time distribution, and the number is how many parallel servers share the queue. "M" stands for Markovian — Poisson arrivals or exponential service times (memoryless). Common cases:

  • M/M/1 — Poisson arrivals, exponential service, one server.
  • M/M/c — Poisson arrivals, exponential service, c servers, one shared queue.
  • G/G/1 — General (arbitrary) arrival and service distributions, one server.

The Core Parameters

  • λ (lambda) — mean arrival rate (e.g., customers per hour).
  • μ (mu) — mean service rate per server (customers per hour one server can handle).
  • ρ (rho) — utilization (traffic intensity). For one server, ρ = λ/μ; for c servers, ρ = λ/(cμ). Stability requires ρ < 1.

Little's Law

The most general and useful result in queuing is Little's Law:

L = λ × W

where L is the average number of items in the system, λ is the arrival rate, and W is the average time an item spends in the system. It also applies to the queue alone: Lq = λ × Wq. Little's Law is astonishingly general — it holds for any stable system regardless of distributions or queue discipline — and it links throughput, inventory, and flow time, which is why it shows up in manufacturing as well as service. You can experiment with it on the Little's Law calculator.

The M/M/1 Queue

For a single-server M/M/1 system with ρ = λ/μ < 1, the standard results are:

MetricFormula
Utilizationρ = λ/μ
Avg number in system, Lρ / (1 − ρ)
Avg number in queue, Lqρ² / (1 − ρ)
Avg time in system, W1 / (μ − λ)
Avg time in queue, Wqρ / (μ − λ)

Example. If λ = 8/hour and μ = 10/hour, then ρ = 0.8, L = 0.8/0.2 = 4 customers in the system, and W = 1/(10−8) = 0.5 hour = 30 minutes. Notice how a modest 80% utilization already produces a 30-minute average time in system.

The M/M/c Queue

With c parallel servers feeding one queue, utilization is ρ = λ/(cμ). The formulas are more involved (they use the Erlang-C probability that an arrival must wait), but the key insight is pooling: one shared queue feeding many servers dramatically outperforms many separate single-server queues at the same total capacity, because a free server is never idle while someone waits in a different line. This is why banks and airports funnel everyone into a single serpentine line. You can model multi-server systems on the queuing calculator.

The Utilization Trap

Look again at the M/M/1 formulas: every waiting metric contains a 1/(1−ρ) term. As ρ climbs toward 1, that term explodes. Waiting time does not rise linearly with utilization — it rises hyperbolically near full load.

Utilization ρL = ρ/(1−ρ)
0.501.0
0.804.0
0.909.0
0.9519.0
0.9999.0

This is the utilization trap: pushing a system toward 100% utilization in the name of efficiency causes waiting to skyrocket. Systems that must respond quickly deliberately run with slack capacity. There is no way around the math — only adding servers or reducing variability tames the queue.

Kingman's Equation: Variability Matters

M/M models assume exponential variability. Real systems vary more or less than that, and variability is a second lever on waiting. Kingman's equation (the VUT equation) approximates the average queue wait for a general single-server (G/G/1) system as a product of three factors:

Wq ≈ ( ρ / (1 − ρ) ) × ( (Ca² + Cs²) / 2 ) × ts

  • Utilization factor ρ/(1−ρ) — the same explosive term as before.
  • Variability factor (Ca² + Cs²)/2 — the average of the squared coefficients of variation of inter-arrival times (Ca) and service times (Cs).
  • Time factor ts — the mean service time.

The lesson is profound: waiting is driven by utilization and variability together. Even at fixed utilization, cutting variability (more uniform arrivals via appointments, more consistent service via standard work) directly reduces waiting. This is a quantitative argument for the lean obsession with reducing variation.

Using Queuing Theory in Practice

  • Sizing servers — find the smallest c that keeps Wq below a service target.
  • Pooling decisions — combine separate lines into one shared queue to cut waits.
  • Setting target utilization — accept slack capacity to avoid the utilization trap.
  • Attacking variability — appointments, leveling, and standard work shrink waits without adding capacity.