What is the difference between standard deviation and variance?

Variance (s² or σ²) is the average of the squared deviations from the mean, so its units are squared (e.g., mm²). Standard deviation is the square root of variance, which brings the units back to the original scale (e.g., mm), making it much easier to interpret alongside the raw measurements. In practice, standard deviation is the number engineers report and compare against tolerances, while variance is mostly a computational stepping stone.

How do I know whether to use the binomial or normal distribution?

Use the binomial distribution when you are counting a fixed number of discrete pass/fail trials, such as the number of defective parts in a sample of 50. Use the normal distribution when you are modeling a continuous measurement, such as a dimension, weight, or voltage, that can take any value within a range. As a rule of thumb, if you can meaningfully compute a standard deviation and picture a bell curve for the data, the normal distribution is the right tool; if you are counting successes out of a fixed number of trials, reach for the binomial.

What does a p-value of 0.03 actually mean?

It means that if the null hypothesis were true, there would only be a 3% chance of observing a sample result as extreme as (or more extreme than) the one you actually got. Because 0.03 is below the common 0.05 threshold, you would typically reject the null hypothesis in favor of the alternative. It does not mean there is a 3% chance the null hypothesis is true, nor a 97% chance the alternative is true — the p-value is a statement about the data given the null hypothesis, not a statement about the hypotheses themselves.

Why does Probability and Statistics appear on every FE exam discipline?

NCEES treats probability and statistics as a foundational engineering competency because every discipline deals with variability, measurement uncertainty, and data-driven decisions, whether that is material strength scatter in structural engineering, defect rates in manufacturing, or signal noise in electrical systems. Rather than being discipline-specific, it is considered part of the common engineering toolkit, which is why every version of the FE exam devotes a section to it alongside topics like ethics and engineering economics.

Probability & Statistics for Engineers: The Basics Behind Quality, Reliability & Risk

A practical refresher on probability and statistics for engineers, covering descriptive stats, distributions, z-scores, hypothesis testing, and their use in quality control and reliability.

Why Engineers Need Statistics

Every real-world measurement carries some amount of scatter. A batch of resistors rated at 100 Ω will not all measure exactly 100 Ω; a production run of steel bolts will not all fail at precisely the same torque; a fleet of pumps will not all run for exactly the same number of hours before maintenance is due. Probability and statistics give engineers a disciplined way to describe that scatter, quantify risk, and make defensible decisions in the presence of uncertainty.

This matters far beyond the classroom. Statistics is how a quality engineer decides whether a production line has drifted out of tolerance, how a reliability engineer predicts when a component is likely to fail, and how a structural engineer sets a safety factor that accounts for variability in material strength. It is also, not coincidentally, one of the few topics tested on every single version of the NCEES FE exam — mechanical, electrical, civil, chemical, industrial, and every other discipline all include a Probability and Statistics section. This guide walks through the core ideas in the order you would actually use them: describing data, modeling randomness, computing probabilities, and drawing conclusions from samples.

Descriptive Statistics: Summarizing a Data Set

Before you can reason about randomness, you need a compact way to describe a batch of measurements. Three quantities do most of the work: the mean, the variance, and the standard deviation.

Mean, Median, and Mode

The mean (μ for a population, x̄ for a sample) is the arithmetic average:

x̄ = (Σx_i) / n

The median is the middle value when the data are sorted — it is less sensitive to outliers than the mean, which matters when a single bad sensor reading could otherwise skew your average. The mode is simply the most frequently occurring value. For symmetric, well-behaved data these three measures cluster close together; a large gap between mean and median is a signal that your data are skewed or contain outliers worth investigating.

Variance and Standard Deviation

The mean tells you where the data are centered, but not how spread out they are. That is the job of variance and standard deviation. For a sample of n measurements:

s² = Σ(x_i − x̄)² / (n − 1)

s = √(s²)

Squaring the deviations before averaging ensures positive and negative differences from the mean don't cancel out, and dividing by (n − 1) rather than n (called Bessel's correction) gives an unbiased estimate of the population variance when working from a sample. The standard deviation s is preferred for reporting because it carries the same units as the original measurement — if you measure shaft diameter in millimeters, s is in millimeters too, while variance would be in mm².

Worked Example: Computing Mean and Standard Deviation

A quality technician measures the wall thickness (in mm) of five injection-molded parts pulled from a production run: 2.05, 2.10, 1.98, 2.02, 2.15.

Step 1 — Mean:

x̄ = (2.05 + 2.10 + 1.98 + 2.02 + 2.15) / 5 = 10.30 / 5 = 2.06 mm

Step 2 — Deviations from the mean: −0.01, 0.04, −0.08, −0.04, 0.09

Step 3 — Squared deviations: 0.0001, 0.0016, 0.0064, 0.0016, 0.0081 → sum = 0.0178

Step 4 — Sample variance: s² = 0.0178 / (5 − 1) = 0.00445 mm²

Step 5 — Standard deviation: s = √0.00445 ≈ 0.0667 mm

So this sample has a mean wall thickness of 2.06 mm with a standard deviation of about 0.067 mm. If the design tolerance is 2.00 ± 0.10 mm, this process looks centered close to nominal with reasonable spread — but as you'll see later, a formal capability study would go further and check how that spread compares to the tolerance band itself.

Basic Probability Rules

Probability quantifies how likely an event is, on a scale from 0 (impossible) to 1 (certain). A few rules cover the majority of engineering applications.

Independent vs. Dependent Events

Two events are independent if the outcome of one has no effect on the probability of the other. For independent events A and B, the probability that both occur is simply the product:

P(A and B) = P(A) × P(B)

This shows up constantly in reliability work: if two redundant sensors each have a 2% chance of failing on a given day, and their failures are independent, the chance both fail on the same day is 0.02 × 0.02 = 0.0004, or 0.04% — a dramatic improvement over relying on a single sensor.

Events are dependent when the outcome of one changes the probability of the other — for example, drawing a defective part from a bin without replacement changes the odds for the next draw, since the bin composition has changed.

Conditional Probability

Conditional probability, written P(A | B), is the probability that A occurs given that B has already occurred:

P(A | B) = P(A and B) / P(B)

This is the backbone of diagnostic testing and inspection logic. Suppose 3% of parts from a supplier are defective (P(D) = 0.03), and a particular inspection test correctly flags 95% of defective parts (P(flag | D) = 0.95) but also produces a 2% false-positive rate on good parts (P(flag | D′) = 0.02). Conditional probability, combined with Bayes' theorem, lets you work backward to answer the more useful question: "Given that a part was flagged, what's the actual chance it's defective?" These calculations routinely surprise people — false positives from a large pool of good parts can outnumber true positives from a small pool of bad ones, which is exactly why inspection systems need to be tuned carefully rather than trusted blindly.

Probability Distributions Engineers Actually Use

A probability distribution describes how likely each possible outcome of a random variable is. Two distributions cover the vast majority of introductory engineering statistics: the binomial for counting discrete successes/failures, and the normal for continuous measurements.

The Binomial Distribution

The binomial distribution applies when you have a fixed number of independent trials (n), each with the same probability of "success" (p) — such as a part being defective. The probability of exactly k successes out of n trials is:

P(X = k) = C(n, k) × p^k × (1 − p)^{(n − k)}

where C(n, k) = n! / (k!(n − k)!) is the number of ways to choose k items from n. This is the natural model for acceptance sampling: if a lot has a 1% true defect rate and you inspect 20 units, the binomial distribution tells you the probability of finding exactly 0, 1, 2, or more defective units in that sample.

The Normal Distribution

The normal distribution (or Gaussian distribution) is the familiar bell curve, and it is the default model for continuous physical measurements — dimensions, weights, voltages, strengths — because of the central limit theorem, which says that sums of many small independent random effects tend toward a normal shape regardless of the underlying process. A normal distribution is fully described by just two parameters: its mean μ and standard deviation σ.

Roughly 68% of values fall within μ ± 1σ, about 95% fall within μ ± 2σ, and about 99.7% fall within μ ± 3σ — the well-known 68-95-99.7 rule.

Z-Scores

To find probabilities for a specific normal distribution, you convert a raw value x into a z-score, which measures how many standard deviations x is from the mean:

z = (x − μ) / σ

Once converted to a z-score, any normal distribution can be evaluated using a single standard normal table (mean 0, standard deviation 1).

z-score	Area to the left (cumulative probability)	Practical meaning
−3.0	0.0013	Far below the mean; extremely rare
−2.0	0.0228	About 2.3% of values fall this low or lower
−1.0	0.1587	About 15.9% fall this low or lower
0.0	0.5000	Exactly at the mean
+1.0	0.8413	About 84.1% fall at or below this value
+2.0	0.9772	About 97.7% fall at or below this value
+3.0	0.9987	Only about 0.13% exceed this value

Worked Example: Z-Score for a Manufacturing Tolerance

A machine shop produces shafts with diameters that are normally distributed with a mean μ = 25.00 mm and standard deviation σ = 0.08 mm. The design specification requires the diameter to be no greater than 25.16 mm. What fraction of shafts is expected to exceed this upper limit?

Step 1 — Compute the z-score:

z = (25.16 − 25.00) / 0.08 = 0.16 / 0.08 = 2.0

Step 2 — Look up the cumulative area: From the table above, z = 2.0 corresponds to a cumulative probability of 0.9772 — meaning 97.72% of shafts fall at or below 25.16 mm.

Step 3 — Find the tail probability: P(X > 25.16) = 1 − 0.9772 = 0.0228, or about 2.28%.

So roughly 2.3% of production — about 1 in 44 shafts — is expected to exceed the upper spec limit. If that scrap rate is too high, the engineer's options are to tighten the process (reduce σ through better tooling or fixturing) or, if the design allows, recenter or widen the tolerance band.

Expected Value

The expected value E(X) of a random variable is its long-run average outcome, weighted by probability:

E(X) = Σ [x_i × P(x_i)]

Expected value is the natural tool for cost and risk trade-offs. Suppose a component fails with probability 0.5% per year, and a failure costs $50,000 in downtime and repairs, while a preventive maintenance program that eliminates the failure mode costs $500/year. The expected annual cost of doing nothing is 0.005 × $50,000 = $250/year — less than the cost of the maintenance program, so on a pure expected-cost basis, the program isn't justified unless the consequence of failure includes non-monetary risk (safety, reputational damage) that changes the calculus. This is exactly the kind of reasoning behind risk matrices, insurance pricing, and maintenance-interval optimization.

Hypothesis Testing: Drawing Conclusions from Samples

Descriptive statistics summarize the data you have; hypothesis testing lets you draw a conclusion about a larger population or process based on a limited sample — while being honest about how confident you can actually be.

Null and Alternative Hypotheses

Every hypothesis test starts with two competing claims. The null hypothesis (H₀) represents the default assumption — typically "no change" or "no difference," such as "the new alloy has the same average tensile strength as the old one." The alternative hypothesis (H₁ or H_a) is what you suspect might actually be true — "the new alloy has a different (or higher, or lower) average tensile strength." The test procedure never technically "proves" H₀ true; it either rejects H₀ in favor of H₁, or fails to reject H₀ due to insufficient evidence.

The p-value, Conceptually

The p-value answers a specific question: if the null hypothesis were actually true, how likely would it be to see a sample result as extreme as (or more extreme than) the one you observed, just by chance? A small p-value (conventionally below 0.05) means your observed result would be unlikely under H₀, so you reject H₀ in favor of H₁. A large p-value means the data are quite consistent with H₀, so you fail to reject it. The 0.05 threshold is a convention, not a law of nature — safety-critical applications often demand much stricter thresholds, since the cost of a wrong conclusion is higher.

A p-value is not the probability that H₀ is true, and it is not the probability the result happened "by chance" in some vague sense — it is specifically the probability of the observed (or more extreme) data, computed under the assumption that H₀ holds. This distinction trips up even experienced practitioners, so it's worth internalizing.

Confidence Intervals

Rather than a single yes/no test, a confidence interval gives a range of plausible values for an unknown population parameter, along with a stated confidence level. A "95% confidence interval for the mean tensile strength is 410 to 430 MPa" means that if you repeated the sampling process many times and built an interval the same way each time, about 95% of those intervals would contain the true population mean. Wider intervals (say, 99% confidence) give more certainty of capturing the true value but at the cost of precision; narrower intervals are more precise but riskier. Confidence intervals are often more informative than a bare p-value because they show not just whether there's a detectable effect, but roughly how large it is and how much uncertainty remains.

Where This Shows Up in Real Engineering

Quality Control and Six Sigma

Statistical process control charts plot sample measurements over time with control limits typically set at ±3σ from the process mean; a point outside those limits signals the process may have shifted and needs investigation before it produces out-of-spec parts. Six Sigma methodology takes its name directly from this idea — a "six sigma" process is one where the tolerance limits sit a full six standard deviations from the mean, corresponding to roughly 3.4 defects per million opportunities once long-term process shifts are accounted for. Process capability indices such as C_p and C_pk formalize the comparison between how spread out a process is (σ) and how wide the tolerance band is, giving quality engineers a single number to track improvement against.

Reliability Engineering and Failure Rates

Reliability engineers model time-to-failure using distributions like the exponential (for constant failure rates) and Weibull (for failure rates that change with age, capturing infant-mortality and wear-out effects). Quantities like MTBF (mean time between failures) and MTTF (mean time to failure) are direct applications of expected value, and reliability calculations for redundant systems lean heavily on the independent-events multiplication rule discussed earlier — which is precisely why critical aerospace and medical systems use triple- or quadruple-redundant components.

Risk Analysis

Engineering risk is frequently expressed as the product of probability and consequence — an expected-value framework at heart. Flood-return-period estimates, structural safety factors calibrated against material strength variability, and failure-mode-and-effects analysis (FMEA) risk priority numbers are all, underneath the terminology, applications of the same probability and statistics toolkit covered in this guide.

The FE Exam

Whatever discipline-specific FE exam you sit for — mechanical, electrical, civil, industrial, chemical, environmental, or otherwise — the NCEES exam specifications include a dedicated Probability and Statistics section. Expect questions on measures of central tendency and dispersion, probability of combined and conditional events, expected value, and interpreting the normal distribution using z-scores, often pulled straight from the NCEES Reference Handbook's statistics tables. Because this topic is universal across every discipline, it is one of the highest-leverage areas to review regardless of which FE exam you're taking.

📊 Probability & Statistics for Engineers: The Basics Behind Quality, Reliability & Risk