Fit a least-squares regression line to your (x, y) data — get the slope, intercept, correlation coefficient r, and R² — visualise the scatter and fitted line, and predict y for any x. Everything recomputes live in your browser.
Simple linear regression fits a straight line ŷ = b₀ + b₁x through a set of (x, y) data points using the least-squares criterion — minimizing the sum of squared vertical distances between the points and the line. This calculator returns the slope, intercept, Pearson correlation coefficient r, and the coefficient of determination R², draws the scatter plot with the fitted line, and predicts y for any x you choose. Everything is computed live in your browser.
The least-squares line is the one that minimizes the sum of squared residuals (vertical gaps between each point and the line). Its slope is b₁ = Σ(x − x̄)(y − ȳ) / Σ(x − x̄)², the ratio of the covariation of x and y to the variation in x. The intercept follows from the fact that the least-squares line always passes through the centroid (x̄, ȳ): b₀ = ȳ − b₁·x̄. The slope tells you the expected change in y per one-unit increase in x; the intercept is the fitted value of y when x = 0 (meaningful only if x = 0 is within or near the data range).
The Pearson correlation coefficient r = Σ(x − x̄)(y − ȳ) / √(Σ(x − x̄)²·Σ(y − ȳ)²) measures the strength and direction of the linear relationship, ranging from −1 to +1. A value near +1 means a strong positive linear relationship, near −1 a strong negative one, and near 0 little or no linear relationship. The sign of r always matches the sign of the slope. Importantly, r captures only LINEAR association — a strong curved relationship can have an r near zero, which is why you should always look at the scatter plot.
R² equals r² and represents the proportion of the total variation in y that is explained by the regression on x. An R² of 0.95 means 95% of the variability in y is accounted for by the linear relationship with x, leaving 5% unexplained (random scatter or other factors). R² ranges from 0 to 1; higher is a better fit, but a high R² does not prove causation, nor does it guarantee the model is appropriate — outliers, nonlinearity, and a narrow x-range can all distort it. Always pair R² with a residual check.
Once fitted, the line predicts y for a new x as ŷ = b₀ + b₁·x. Prediction is most reliable for x values inside the range of your observed data (interpolation). Predicting far outside that range (extrapolation) is risky: the linear relationship may not hold beyond where it was observed, and small slope errors are magnified at distant x values. In industrial and quality work, regression is used for calibration curves, learning curves, demand-versus-price relationships, tool-wear trends, and as the engine behind least-squares trend forecasting.
Correlation r measures the strength and direction of a linear relationship and ranges from −1 to +1. R² is simply r squared, so it ranges from 0 to 1 and is interpreted as the fraction of the variation in y explained by x. r tells you direction (positive or negative); R² tells you explanatory power but loses the sign.
No. Regression and correlation describe association, not causation. A high R² means x is a good linear predictor of y in your data, but the real driver could be a third variable, or the relationship could be coincidental. Causation requires controlled experiments or strong domain reasoning, not just a good fit.
Because r and R² only measure LINEAR association. Famous examples (Anscombe’s quartet) show four very different datasets — including a perfect curve and one dominated by a single outlier — that share the same slope, intercept, and r. The scatter plot reveals nonlinearity, outliers, and clustering that the summary statistics hide.
Be cautious. Predicting within the observed x-range (interpolation) is generally safe if the linear model fits. Predicting outside it (extrapolation) assumes the same straight-line relationship continues, which often fails — physical limits, saturation, or new regimes can change the behavior. Treat extrapolated values as rough estimates at best.
The slope is the estimated change in y for each one-unit increase in x. For example, if y is cost and x is units produced, a slope of 4.2 means each additional unit adds about $4.20 of cost on average within the data range. Its sign shows whether y rises or falls as x increases.