Last Updated on August 22, 2019

In this post you will discover 4 recipes for linear regression for the R platform.

You can copy and paste the recipes in this post to make a jump-start on your own problem or to learn and practice with linear regression in R.

Discover how to prepare data, fit machine learning models and evaluate their predictions in R with my new book, including 14 step-by-step tutorials, 3 projects, and full source code.

Let’s get started.

Each example in this post uses the longley dataset provided in the datasets package that comes with R. The longley dataset describes 7 economic variables observed from 1947 to 1962 used to predict the number of people employed yearly.

What You Will Learn

## Ordinary Least Squares Regression

Ordinary Least Squares (OLS) regression is a linear model that seeks to find a set of coefficients for a line/hyper-plane that minimise the sum of the squared errors.

# load data

data(longley)

# fit model

fit <- lm(Employed~., longley)

# summarize the fit

summary(fit)

# make predictions

predictions <- predict(fit, longley)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

# load data

data(longley)

# fit model

fit <- lm(Employed~., longley)

# summarize the fit

summary(fit)

# make predictions

predictions <- predict(fit, longley)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

Learn more about the **lm** function and the stats package.

### Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

## Stepwize Linear Regression

Stepwise Linear Regression is a method that makes use of linear regression to discover which subset of attributes in the dataset result in the best performing model. It is step-wise because each iteration of the method makes a change to the set of attributes and creates a model to evaluate the performance of the set.

# load data

data(longley)

# fit model

base <- lm(Employed~., longley)

# summarize the fit

summary(base)

# perform step-wise feature selection

fit <- step(base)

# summarize the selected model

summary(fit)

# make predictions

predictions <- predict(fit, longley)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

# load data

data(longley)

# fit model

base <- lm(Employed~., longley)

# summarize the fit

summary(base)

# perform step-wise feature selection

fit <- step(base)

# summarize the selected model

summary(fit)

# make predictions

predictions <- predict(fit, longley)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

Learn more about the **step** function and the stats package.

## Principal Component Regression

Principal Component Regression (PCR) creates a linear regression model using the outputs of a Principal Component Analysis (PCA) to estimate the coefficients of the model. PCR is useful when the data has highly correlated predictors.

# load the package

library(pls)

# load data

data(longley)

# fit model

fit <- pcr(Employed~., data=longley, validation=”CV”)

# summarize the fit

summary(fit)

# make predictions

predictions <- predict(fit, longley, ncomp=6)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

# load the package

library(pls)

# load data

data(longley)

# fit model

fit <- pcr(Employed~., data=longley, validation=”CV”)

# summarize the fit

summary(fit)

# make predictions

predictions <- predict(fit, longley, ncomp=6)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

Learn more about the **pcr** function and the pls package.

## Partial Least Squares Regression

Partial Least Squares (PLS) Regression creates a linear model of the data in a transformed projection of problem space. Like PCR, PLS is appropriate for data with highly-correlated predictors.

# load the package

library(pls)

# load data

data(longley)

# fit model

fit <- plsr(Employed~., data=longley, validation=”CV”)

# summarize the fit

summary(fit)

# make predictions

predictions <- predict(fit, longley, ncomp=6)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

# load the package

library(pls)

# load data

data(longley)

# fit model

fit <- plsr(Employed~., data=longley, validation=”CV”)

# summarize the fit

summary(fit)

# make predictions

predictions <- predict(fit, longley, ncomp=6)

# summarize accuracy

mse <- mean((longley$Employed – predictions)^2)

print(mse)

Learn more about the **plsr** function and the pls package.

## Summary

In this post you discovered 4 recipes for creating linear regression models in R and making predictions using those models.

Chapter 6 of Applied Predictive Modeling by Kuhn and Johnson provides an excellent introduction to linear regression with R for beginners. Practical Regression and Anova using R (PDF) by Faraway provides a more in-depth treatment.

## Discover Faster Machine Learning in R!

#### Develop Your Own Models in Minutes

…with just a few lines of R code

Discover how in my new Ebook:

Machine Learning Mastery With R

Covers **self-study tutorials** and **end-to-end projects** like:

Loading data, visualization, build models, tuning, and much more…

#### Finally Bring Machine Learning To Your Own Projects

Skip the Academics. Just Results.

See What’s Inside

#### About Jason Brownlee

Jason Brownlee, PhD is a machine learning specialist who teaches developers how to get results with modern machine learning methods via hands-on tutorials.