What does M/M/1 mean?

M/M/1 is Kendall notation for a queue with Markovian (Poisson) arrivals, Markovian (exponential) service times, and 1 server. The first M is the arrival process, the second M is the service distribution, and the number is the count of servers. M/M/c is the same but with c parallel servers sharing one queue.

What is Little's Law?

Little's Law states L = λW, where L is the average number of items in the system, λ is the average arrival rate, and W is the average time an item spends in the system. It is remarkably general — it holds for almost any stable queuing system regardless of arrival or service distribution — and links inventory, throughput, and flow time.

What is the utilization trap?

The utilization trap is the counterintuitive fact that waiting time grows nonlinearly and explodes as utilization (ρ) approaches 100%. Running a system at very high utilization to "be efficient" causes wait times to skyrocket because the 1/(1−ρ) term in queuing formulas blows up. Some slack is necessary to keep waits reasonable.

What does Kingman's equation tell you?

Kingman's equation approximates the average wait in a general single-server queue (G/G/1) as the product of three factors: a utilization term ρ/(1−ρ), a variability term combining the squared coefficients of variation of arrivals and service (Ca²+Cs²)/2, and the mean service time. It shows that wait time is driven by utilization AND variability — reducing variability cuts waiting even at fixed utilization.

Engineering·12 min read·February 9, 2026

🚶 Queuing Theory Guide: M/M/1, M/M/c, Little's Law, and Kingman's Equation

Q: What is queuing theory?

Queuing theory is the mathematical study of waiting lines. It models how customers (or jobs, calls, parts) arrive, wait, and are served, and predicts metrics like average wait time, queue length, and server utilization. It helps size systems — how many servers, tellers, or machines are needed to meet a service target.

A practical guide to queuing theory — the M/M/1 and M/M/c models, Little's Law, Kingman's equation for variability, the utilization trap, and how to size service systems.

What Is Queuing Theory?

Queuing theory is the mathematical study of waiting lines — how arrivals, service, and the number of servers combine to produce waiting and congestion. It applies anywhere demand competes for limited capacity: call centers, checkout lines, hospital beds, network packets, and machines on a production floor. Its purpose is practical: predict waiting times and queue lengths, and size systems so they meet a service-level target without wasteful over-provisioning.

Kendall Notation

Queues are described with A/B/c notation. The first letter is the arrival process, the second the service-time distribution, and the number is how many parallel servers share the queue. "M" stands for Markovian — Poisson arrivals or exponential service times (memoryless). Common cases:

M/M/1 — Poisson arrivals, exponential service, one server.
M/M/c — Poisson arrivals, exponential service, c servers, one shared queue.
G/G/1 — General (arbitrary) arrival and service distributions, one server.

The Core Parameters

λ (lambda) — mean arrival rate (e.g., customers per hour).
μ (mu) — mean service rate per server (customers per hour one server can handle).
ρ (rho) — utilization (traffic intensity). For one server, ρ = λ/μ; for c servers, ρ = λ/(cμ). Stability requires ρ < 1.

Little's Law

The most general and useful result in queuing is Little's Law:

L = λ × W

where L is the average number of items in the system, λ is the arrival rate, and W is the average time an item spends in the system. It also applies to the queue alone: L_q = λ × W_q. Little's Law is astonishingly general — it holds for any stable system regardless of distributions or queue discipline — and it links throughput, inventory, and flow time, which is why it shows up in manufacturing as well as service. You can experiment with it on the Little's Law calculator.

The M/M/1 Queue

For a single-server M/M/1 system with ρ = λ/μ < 1, the standard results are:

Metric	Formula
Utilization	ρ = λ/μ
Avg number in system, L	ρ / (1 − ρ)
Avg number in queue, L_q	ρ² / (1 − ρ)
Avg time in system, W	1 / (μ − λ)
Avg time in queue, W_q	ρ / (μ − λ)

Example. If λ = 8/hour and μ = 10/hour, then ρ = 0.8, L = 0.8/0.2 = 4 customers in the system, and W = 1/(10−8) = 0.5 hour = 30 minutes. Notice how a modest 80% utilization already produces a 30-minute average time in system.

The M/M/c Queue

With c parallel servers feeding one queue, utilization is ρ = λ/(cμ). The formulas are more involved (they use the Erlang-C probability that an arrival must wait), but the key insight is pooling: one shared queue feeding many servers dramatically outperforms many separate single-server queues at the same total capacity, because a free server is never idle while someone waits in a different line. This is why banks and airports funnel everyone into a single serpentine line. You can model multi-server systems on the queuing calculator.

The Utilization Trap

Look again at the M/M/1 formulas: every waiting metric contains a 1/(1−ρ) term. As ρ climbs toward 1, that term explodes. Waiting time does not rise linearly with utilization — it rises hyperbolically near full load.

Utilization ρ	L = ρ/(1−ρ)
0.50	1.0
0.80	4.0
0.90	9.0
0.95	19.0
0.99	99.0

This is the utilization trap: pushing a system toward 100% utilization in the name of efficiency causes waiting to skyrocket. Systems that must respond quickly deliberately run with slack capacity. There is no way around the math — only adding servers or reducing variability tames the queue.

Kingman's Equation: Variability Matters

M/M models assume exponential variability. Real systems vary more or less than that, and variability is a second lever on waiting. Kingman's equation (the VUT equation) approximates the average queue wait for a general single-server (G/G/1) system as a product of three factors:

W_q ≈ ( ρ / (1 − ρ) ) × ( (C_a² + C_s²) / 2 ) × t_s

Utilization factor ρ/(1−ρ) — the same explosive term as before.
Variability factor (C_a² + C_s²)/2 — the average of the squared coefficients of variation of inter-arrival times (C_a) and service times (C_s).
Time factor t_s — the mean service time.

The lesson is profound: waiting is driven by utilization and variability together. Even at fixed utilization, cutting variability (more uniform arrivals via appointments, more consistent service via standard work) directly reduces waiting. This is a quantitative argument for the lean obsession with reducing variation.

Using Queuing Theory in Practice

Sizing servers — find the smallest c that keeps W_q below a service target.
Pooling decisions — combine separate lines into one shared queue to cut waits.
Setting target utilization — accept slack capacity to avoid the utilization trap.
Attacking variability — appointments, leveling, and standard work shrink waits without adding capacity.

Topics covered

queuing theoryqueueing theoryM/M/1M/M/cLittle's LawKingman's equationutilizationwaiting timequeue lengtharrival rateservice ratecapacity planningoperations researchcongestionVUT equationserver utilization

🛠️ Related Free Tools

Put this knowledge to work on your iPhone

Browse our full catalog of professional iOS apps — from electrical code tools to AI builders.

Browse All 95+ Apps