Skip to content
Search
Generic filters
Exact matches only

Linear Regression in R

Last Updated on August 22, 2019

In this post you will discover 4 recipes for linear regression for the R platform.

You can copy and paste the recipes in this post to make a jump-start on your own problem or to learn and practice with linear regression in R.

Discover how to prepare data, fit machine learning models and evaluate their predictions in R with my new book, including 14 step-by-step tutorials, 3 projects, and full source code.

Let’s get started.

ordinary least squares regression

Ordinary Least Squares Regression
Some rights reserved

Each example in this post uses the longley dataset provided in the datasets package that comes with R. The longley dataset describes 7 economic variables observed from 1947 to 1962 used to predict the number of people employed yearly.

Ordinary Least Squares Regression

Ordinary Least Squares (OLS) regression is a linear model that seeks to find a set of coefficients for a line/hyper-plane that minimise the sum of the squared errors.

# load data
data(longley)
# fit model
fit <- lm(Employed~., longley)
# summarize the fit
summary(fit)
# make predictions
predictions <- predict(fit, longley)
# summarize accuracy
mse <- mean((longley$Employed – predictions)^2)
print(mse)

# load data

data(longley)

# fit model

fit <- lm(Employed~., longley)

# summarize the fit

summary(fit)

# make predictions

predictions <- predict(fit, longley)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

Learn more about the lm function and the stats package.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Stepwize Linear Regression

Stepwise Linear Regression is a method that makes use of linear regression to discover which subset of attributes in the dataset result in the best performing model. It is step-wise because each iteration of the method makes a change to the set of attributes and creates a model to evaluate the performance of the set.

# load data
data(longley)
# fit model
base <- lm(Employed~., longley)
# summarize the fit
summary(base)
# perform step-wise feature selection
fit <- step(base)
# summarize the selected model
summary(fit)
# make predictions
predictions <- predict(fit, longley)
# summarize accuracy
mse <- mean((longley$Employed – predictions)^2)
print(mse)

# load data

data(longley)

# fit model

base <- lm(Employed~., longley)

# summarize the fit

summary(base)

# perform step-wise feature selection

fit <- step(base)

# summarize the selected model

summary(fit)

# make predictions

predictions <- predict(fit, longley)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

Learn more about the step function and the stats package.

Principal Component Regression

Principal Component Regression (PCR) creates a linear regression model using the outputs of a Principal Component Analysis (PCA) to estimate the coefficients of the model. PCR is useful when the data has highly correlated predictors.

# load the package
library(pls)
# load data
data(longley)
# fit model
fit <- pcr(Employed~., data=longley, validation=”CV”)
# summarize the fit
summary(fit)
# make predictions
predictions <- predict(fit, longley, ncomp=6)
# summarize accuracy
mse <- mean((longley$Employed – predictions)^2)
print(mse)

# load the package

library(pls)

# load data

data(longley)

# fit model

fit <- pcr(Employed~., data=longley, validation=”CV”)

# summarize the fit

summary(fit)

# make predictions

predictions <- predict(fit, longley, ncomp=6)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

Learn more about the pcr function and the pls package.

Partial Least Squares Regression

Partial Least Squares (PLS) Regression creates a linear model of the data in a transformed projection of problem space. Like PCR, PLS is appropriate for data with highly-correlated predictors.

# load the package
library(pls)
# load data
data(longley)
# fit model
fit <- plsr(Employed~., data=longley, validation=”CV”)
# summarize the fit
summary(fit)
# make predictions
predictions <- predict(fit, longley, ncomp=6)
# summarize accuracy
mse <- mean((longley$Employed – predictions)^2)
print(mse)

# load the package

library(pls)

# load data

data(longley)

# fit model

fit <- plsr(Employed~., data=longley, validation=”CV”)

# summarize the fit

summary(fit)

# make predictions

predictions <- predict(fit, longley, ncomp=6)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

Learn more about the plsr function and the pls package.

Summary

In this post you discovered 4 recipes for creating linear regression models in R and making predictions using those models.

Chapter 6 of Applied Predictive Modeling by Kuhn and Johnson provides an excellent introduction to linear regression with R for beginners. Practical Regression and Anova using R (PDF) by Faraway provides a more in-depth treatment.

Discover Faster Machine Learning in R!

Master Machine Learning With R

Develop Your Own Models in Minutes

…with just a few lines of R code

Discover how in my new Ebook:
Machine Learning Mastery With R

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring Machine Learning To Your Own Projects

Skip the Academics. Just Results.

See What’s Inside

About Jason Brownlee

Jason Brownlee, PhD is a machine learning specialist who teaches developers how to get results with modern machine learning methods via hands-on tutorials.

error: Content is protected !!