← Industrial & Systems Engineering Studio
Least-squares fit, r & R²

Linear Regression & Correlation Calculator

Fit a least-squares regression line to your (x, y) data — get the slope, intercept, correlation coefficient r, and R² — visualise the scatter and fitted line, and predict y for any x. Everything recomputes live in your browser.

Data
Regression line
ŷ = 0.0533 + 1.9943·x
Statistics
Slope b₁1.9943
Intercept b₀0.0533
Correlation r0.9986
R² (coeff. of determination)0.9972
n (data points)6
3.5
ȳ7.033
Prediction
ŷ = 14.0133at x = 7
Scatter plot & fitted line
xy
b₁ = Σ(x−x̄)(y−ȳ) / Σ(x−x̄)²  ·  b₀ = ȳ − b₁·x̄  ·  r = Σ(x−x̄)(y−ȳ) / √(Σ(x−x̄)²·Σ(y−ȳ)²)  ·  R² = r²

About the Linear Regression & Correlation Calculator

Simple linear regression fits a straight line ŷ = b₀ + b₁x through a set of (x, y) data points using the least-squares criterion — minimizing the sum of squared vertical distances between the points and the line. This calculator returns the slope, intercept, Pearson correlation coefficient r, and the coefficient of determination R², draws the scatter plot with the fitted line, and predicts y for any x you choose. Everything is computed live in your browser.

The least-squares slope and intercept

The least-squares line is the one that minimizes the sum of squared residuals (vertical gaps between each point and the line). Its slope is b₁ = Σ(x − x̄)(y − ȳ) / Σ(x − x̄)², the ratio of the covariation of x and y to the variation in x. The intercept follows from the fact that the least-squares line always passes through the centroid (x̄, ȳ): b₀ = ȳ − b₁·x̄. The slope tells you the expected change in y per one-unit increase in x; the intercept is the fitted value of y when x = 0 (meaningful only if x = 0 is within or near the data range).

Correlation coefficient r

The Pearson correlation coefficient r = Σ(x − x̄)(y − ȳ) / √(Σ(x − x̄)²·Σ(y − ȳ)²) measures the strength and direction of the linear relationship, ranging from −1 to +1. A value near +1 means a strong positive linear relationship, near −1 a strong negative one, and near 0 little or no linear relationship. The sign of r always matches the sign of the slope. Importantly, r captures only LINEAR association — a strong curved relationship can have an r near zero, which is why you should always look at the scatter plot.

Coefficient of determination R²

R² equals r² and represents the proportion of the total variation in y that is explained by the regression on x. An R² of 0.95 means 95% of the variability in y is accounted for by the linear relationship with x, leaving 5% unexplained (random scatter or other factors). R² ranges from 0 to 1; higher is a better fit, but a high R² does not prove causation, nor does it guarantee the model is appropriate — outliers, nonlinearity, and a narrow x-range can all distort it. Always pair R² with a residual check.

Using regression to predict

Once fitted, the line predicts y for a new x as ŷ = b₀ + b₁·x. Prediction is most reliable for x values inside the range of your observed data (interpolation). Predicting far outside that range (extrapolation) is risky: the linear relationship may not hold beyond where it was observed, and small slope errors are magnified at distant x values. In industrial and quality work, regression is used for calibration curves, learning curves, demand-versus-price relationships, tool-wear trends, and as the engine behind least-squares trend forecasting.

Frequently asked questions

What is the difference between correlation r and R²?

Correlation r measures the strength and direction of a linear relationship and ranges from −1 to +1. R² is simply r squared, so it ranges from 0 to 1 and is interpreted as the fraction of the variation in y explained by x. r tells you direction (positive or negative); R² tells you explanatory power but loses the sign.

Does a high R² mean x causes y?

No. Regression and correlation describe association, not causation. A high R² means x is a good linear predictor of y in your data, but the real driver could be a third variable, or the relationship could be coincidental. Causation requires controlled experiments or strong domain reasoning, not just a good fit.

Why should I look at the scatter plot, not just r?

Because r and R² only measure LINEAR association. Famous examples (Anscombe’s quartet) show four very different datasets — including a perfect curve and one dominated by a single outlier — that share the same slope, intercept, and r. The scatter plot reveals nonlinearity, outliers, and clustering that the summary statistics hide.

Can I trust predictions outside my data range?

Be cautious. Predicting within the observed x-range (interpolation) is generally safe if the linear model fits. Predicting outside it (extrapolation) assumes the same straight-line relationship continues, which often fails — physical limits, saturation, or new regimes can change the behavior. Treat extrapolated values as rough estimates at best.

What does the slope b₁ represent physically?

The slope is the estimated change in y for each one-unit increase in x. For example, if y is cost and x is units produced, a slope of 4.2 means each additional unit adds about $4.20 of cost on average within the data range. Its sign shows whether y rises or falls as x increases.

Related tools & guides

Demand Forecasting CalculatorProcess Capability (Cp / Cpk) CalculatorDemand Forecasting GuideLittle's Law CalculatorIndustrial Engineering Exam Prep