Skip to content
Search
Generic filters
Exact matches only

How to Build an Ensemble Of Machine Learning Algorithms in R

Last Updated on August 22, 2019

Ensembles can give you a boost in accuracy on your dataset.

In this post you will discover how you can create three of the most powerful types of ensembles in R.

This case study will step you through Boosting, Bagging and Stacking and show you how you can continue to ratchet up the accuracy of the models on your own datasets.

Discover how to prepare data, fit machine learning models and evaluate their predictions in R with my new book, including 14 step-by-step tutorials, 3 projects, and full source code.

Let’s get started.

Build an Ensemble Of Machine Learning Algorithms in R

Build an Ensemble Of Machine Learning Algorithms in R
Photo by Barbara Hobbs, some rights reserved.

Increase The Accuracy Of Your Models

It can take time to find well performing machine learning algorithms for your dataset. This is because of the trial and error nature of applied machine learning.

Once you have a shortlist of accurate models, you can use algorithm tuning to get the most from each algorithm.

Another approach that you can use to increase accuracy on your dataset is to combine the predictions of multiple different models together.

This is called an ensemble prediction.

Combine Model Predictions Into Ensemble Predictions

The three most popular methods for combining the predictions from different models are:

  • Bagging. Building multiple models (typically of the same type) from different subsamples of the training dataset.
  • Boosting. Building multiple models (typically of the same type) each of which learns to fix the prediction errors of a prior model in the chain.
  • Stacking. Building multiple models (typically of differing types) and supervisor model that learns how to best combine the predictions of the primary models.

This post will not explain each of these methods. It assumes you are generally familiar with machine learning algorithms and ensemble methods and that you are looking for information on how to create ensembles with R.

Need more Help with R for Machine Learning?

Take my free 14-day email course and discover how to use R on your project (with sample code).

Click to sign-up and also get a free PDF Ebook version of the course.

Start Your FREE Mini-Course Now!

Ensemble Machine Learning in R

You can create ensembles of machine learning algorithms in R.

There are three main techniques that you can create an ensemble of machine learning algorithms in R: Boosting, Bagging and Stacking. In this section, we will look at each in turn.

Before we start building ensembles, let’s define our test set-up.

Test Dataset

All of the examples of ensemble predictions in this case study will use the ionosphere dataset.

This is a dataset available from the UCI Machine Learning Repository. This dataset describes high-frequency antenna returns from high energy particles in the atmosphere and whether the return shows structure or not. The problem is a binary classification that contains 351 instances and 35 numerical attributes.

Let’s load the libraries and the dataset.

# Load libraries
library(mlbench)
library(caret)
library(caretEnsemble)

# Load the dataset
data(Ionosphere)
dataset <- Ionosphere
dataset <- dataset[,-2]
dataset$V1 <- as.numeric(as.character(dataset$V1))

# Load libraries

library(mlbench)

library(caret)

library(caretEnsemble)

 

# Load the dataset

data(Ionosphere)

dataset <- Ionosphere

dataset <- dataset[,-2]

dataset$V1 <- as.numeric(as.character(dataset$V1))

Note that the first attribute was a factor (0,1) and has been transformed to be numeric for consistency with all of the other numeric attributes. Also note that the second attribute is a constant and has been removed.

Here is a sneak-peek at the first few rows of the ionosphere dataset.

> head(dataset)
V1 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
1 1 0.99539 -0.05889 0.85243 0.02306 0.83398 -0.37708 1.00000 0.03760 0.85243 -0.17755 0.59755 -0.44945 0.60536
2 1 1.00000 -0.18829 0.93035 -0.36156 -0.10868 -0.93597 1.00000 -0.04549 0.50874 -0.67743 0.34432 -0.69707 -0.51685
3 1 1.00000 -0.03365 1.00000 0.00485 1.00000 -0.12062 0.88965 0.01198 0.73082 0.05346 0.85443 0.00827 0.54591
4 1 1.00000 -0.45161 1.00000 1.00000 0.71216 -1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 -1.00000
5 1 1.00000 -0.02401 0.94140 0.06531 0.92106 -0.23255 0.77152 -0.16399 0.52798 -0.20275 0.56409 -0.00712 0.34395
6 1 0.02337 -0.00592 -0.09924 -0.11949 -0.00763 -0.11824 0.14706 0.06637 0.03786 -0.06302 0.00000 0.00000 -0.04572
V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28
1 -0.38223 0.84356 -0.38542 0.58212 -0.32192 0.56971 -0.29674 0.36946 -0.47357 0.56811 -0.51171 0.41078 -0.46168
2 -0.97515 0.05499 -0.62237 0.33109 -1.00000 -0.13151 -0.45300 -0.18056 -0.35734 -0.20332 -0.26569 -0.20468 -0.18401
3 0.00299 0.83775 -0.13644 0.75535 -0.08540 0.70887 -0.27502 0.43385 -0.12062 0.57528 -0.40220 0.58984 -0.22145
4 0.14516 0.54094 -0.39330 -1.00000 -0.54467 -0.69975 1.00000 0.00000 0.00000 1.00000 0.90695 0.51613 1.00000
5 -0.27457 0.52940 -0.21780 0.45107 -0.17813 0.05982 -0.35575 0.02309 -0.52879 0.03286 -0.65158 0.13290 -0.53206
6 -0.15540 -0.00343 -0.10196 -0.11575 -0.05414 0.01838 0.03669 0.01519 0.00888 0.03513 -0.01535 -0.03240 0.09223
V29 V30 V31 V32 V33 V34 Class
1 0.21266 -0.34090 0.42267 -0.54487 0.18641 -0.45300 good
2 -0.19040 -0.11593 -0.16626 -0.06288 -0.13738 -0.02447 bad
3 0.43100 -0.17365 0.60436 -0.24180 0.56045 -0.38238 good
4 1.00000 -0.20099 0.25682 1.00000 -0.32382 1.00000 bad
5 0.02431 -0.62197 -0.05707 -0.59573 -0.04608 -0.65697 good
6 -0.07859 0.00732 0.00000 0.00000 -0.00039 0.12011 bad

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

> head(dataset)

  V1      V3       V4       V5       V6       V7       V8      V9      V10     V11      V12     V13      V14      V15

1  1 0.99539 -0.05889  0.85243  0.02306  0.83398 -0.37708 1.00000  0.03760 0.85243 -0.17755 0.59755 -0.44945  0.60536

2  1 1.00000 -0.18829  0.93035 -0.36156 -0.10868 -0.93597 1.00000 -0.04549 0.50874 -0.67743 0.34432 -0.69707 -0.51685

3  1 1.00000 -0.03365  1.00000  0.00485  1.00000 -0.12062 0.88965  0.01198 0.73082  0.05346 0.85443  0.00827  0.54591

4  1 1.00000 -0.45161  1.00000  1.00000  0.71216 -1.00000 0.00000  0.00000 0.00000  0.00000 0.00000  0.00000 -1.00000

5  1 1.00000 -0.02401  0.94140  0.06531  0.92106 -0.23255 0.77152 -0.16399 0.52798 -0.20275 0.56409 -0.00712  0.34395

6  1 0.02337 -0.00592 -0.09924 -0.11949 -0.00763 -0.11824 0.14706  0.06637 0.03786 -0.06302 0.00000  0.00000 -0.04572

       V16      V17      V18      V19      V20      V21      V22      V23      V24      V25      V26      V27      V28

1 -0.38223  0.84356 -0.38542  0.58212 -0.32192  0.56971 -0.29674  0.36946 -0.47357  0.56811 -0.51171  0.41078 -0.46168

2 -0.97515  0.05499 -0.62237  0.33109 -1.00000 -0.13151 -0.45300 -0.18056 -0.35734 -0.20332 -0.26569 -0.20468 -0.18401

3  0.00299  0.83775 -0.13644  0.75535 -0.08540  0.70887 -0.27502  0.43385 -0.12062  0.57528 -0.40220  0.58984 -0.22145

4  0.14516  0.54094 -0.39330 -1.00000 -0.54467 -0.69975  1.00000  0.00000  0.00000  1.00000  0.90695  0.51613  1.00000

5 -0.27457  0.52940 -0.21780  0.45107 -0.17813  0.05982 -0.35575  0.02309 -0.52879  0.03286 -0.65158  0.13290 -0.53206

6 -0.15540 -0.00343 -0.10196 -0.11575 -0.05414  0.01838  0.03669  0.01519  0.00888  0.03513 -0.01535 -0.03240  0.09223

       V29      V30      V31      V32      V33      V34 Class

1  0.21266 -0.34090  0.42267 -0.54487  0.18641 -0.45300  good

2 -0.19040 -0.11593 -0.16626 -0.06288 -0.13738 -0.02447   bad

3  0.43100 -0.17365  0.60436 -0.24180  0.56045 -0.38238  good

4  1.00000 -0.20099  0.25682  1.00000 -0.32382  1.00000   bad

5  0.02431 -0.62197 -0.05707 -0.59573 -0.04608 -0.65697  good

6 -0.07859  0.00732  0.00000  0.00000 -0.00039  0.12011   bad

For more information, see the description of the Ionosphere dataset on the UCI Machine Learning Repository.

See this summary of published world-class results on the dataset.

1. Boosting Algorithms

We can look at two of the most popular boosting machine learning algorithms:

  • C5.0
  • Stochastic Gradient Boosting

Below is an example of the C5.0 and Stochastic Gradient Boosting (using the Gradient Boosting Modeling implementation) algorithms in R. Both algorithms include parameters that are not tuned in this example.

# Example of Boosting Algorithms
control <- trainControl(method=”repeatedcv”, number=10, repeats=3)
seed <- 7
metric <- “Accuracy”
# C5.0
set.seed(seed)
fit.c50 <- train(Class~., data=dataset, method=”C5.0″, metric=metric, trControl=control)
# Stochastic Gradient Boosting
set.seed(seed)
fit.gbm <- train(Class~., data=dataset, method=”gbm”, metric=metric, trControl=control, verbose=FALSE)
# summarize results
boosting_results <- resamples(list(c5.0=fit.c50, gbm=fit.gbm))
summary(boosting_results)
dotplot(boosting_results)

# Example of Boosting Algorithms

control <- trainControl(method=”repeatedcv”, number=10, repeats=3)

seed <- 7

metric <- “Accuracy”

# C5.0

set.seed(seed)

fit.c50 <- train(Class~., data=dataset, method=”C5.0″, metric=metric, trControl=control)

# Stochastic Gradient Boosting

set.seed(seed)

fit.gbm <- train(Class~., data=dataset, method=”gbm”, metric=metric, trControl=control, verbose=FALSE)

# summarize results

boosting_results <- resamples(list(c5.0=fit.c50, gbm=fit.gbm))

summary(boosting_results)

dotplot(boosting_results)

We can see that the C5.0 algorithm produces a more accurate model with an accuracy of 94.58%.

Models: c5.0, gbm
Number of resamples: 30

Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
c5.0 0.8824 0.9143 0.9437 0.9458 0.9714 1 0
gbm 0.8824 0.9143 0.9429 0.9402 0.9641 1 0

Models: c5.0, gbm

Number of resamples: 30

 

Accuracy

       Min. 1st Qu. Median   Mean 3rd Qu. Max. NA’s

c5.0 0.8824  0.9143 0.9437 0.9458  0.9714    1    0

gbm  0.8824  0.9143 0.9429 0.9402  0.9641    1    0

Boosting Machine Learning Algorithms in R

Boosting Machine Learning Algorithms in R

Learn more about caret boosting models tree: Boosting Models.

2. Bagging Algorithms

Let’s look at two of the most popular bagging machine learning algorithms:

  • Bagged CART
  • Random Forest

Below is an example of the Bagged CART and Random Forest algorithms in R. Both algorithms include parameters that are not tuned in this example.

# Example of Bagging algorithms
control <- trainControl(method=”repeatedcv”, number=10, repeats=3)
seed <- 7
metric <- “Accuracy”
# Bagged CART
set.seed(seed)
fit.treebag <- train(Class~., data=dataset, method=”treebag”, metric=metric, trControl=control)
# Random Forest
set.seed(seed)
fit.rf <- train(Class~., data=dataset, method=”rf”, metric=metric, trControl=control)
# summarize results
bagging_results <- resamples(list(treebag=fit.treebag, rf=fit.rf))
summary(bagging_results)
dotplot(bagging_results)

# Example of Bagging algorithms

control <- trainControl(method=”repeatedcv”, number=10, repeats=3)

seed <- 7

metric <- “Accuracy”

# Bagged CART

set.seed(seed)

fit.treebag <- train(Class~., data=dataset, method=”treebag”, metric=metric, trControl=control)

# Random Forest

set.seed(seed)

fit.rf <- train(Class~., data=dataset, method=”rf”, metric=metric, trControl=control)

# summarize results

bagging_results <- resamples(list(treebag=fit.treebag, rf=fit.rf))

summary(bagging_results)

dotplot(bagging_results)

We can see that random forest produces a more accurate model with an accuracy of 93.25%.

Models: treebag, rf
Number of resamples: 30

Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
treebag 0.8529 0.8946 0.9143 0.9183 0.9440 1 0
rf 0.8571 0.9143 0.9420 0.9325 0.9444 1 0

Models: treebag, rf

Number of resamples: 30

 

Accuracy

          Min. 1st Qu. Median   Mean 3rd Qu. Max. NA’s

treebag 0.8529  0.8946 0.9143 0.9183  0.9440    1    0

rf      0.8571  0.9143 0.9420 0.9325  0.9444    1    0

Bagging Machine Learning Algorithms in R

Bagging Machine Learning Algorithms in R

Learn more about caret bagging model here: Bagging Models.

3. Stacking Algorithms

You can combine the predictions of multiple caret models using the caretEnsemble package.

Given a list of caret models, the caretStack() function can be used to specify a higher-order model to learn how to best combine the predictions of sub-models together.

Let’s first look at creating 5 sub-models for the ionosphere dataset, specifically:

  • Linear Discriminate Analysis (LDA)
  • Classification and Regression Trees (CART)
  • Logistic Regression (via Generalized Linear Model or GLM)
  • k-Nearest Neighbors (kNN)
  • Support Vector Machine with a Radial Basis Kernel Function (SVM)

Below is an example that creates these 5 sub-models. Note the new helpful caretList() function provided by the caretEnsemble package for creating a list of standard caret models.

# Example of Stacking algorithms
# create submodels
control <- trainControl(method=”repeatedcv”, number=10, repeats=3, savePredictions=TRUE, classProbs=TRUE)
algorithmList <- c(‘lda’, ‘rpart’, ‘glm’, ‘knn’, ‘svmRadial’)
set.seed(seed)
models <- caretList(Class~., data=dataset, trControl=control, methodList=algorithmList)
results <- resamples(models)
summary(results)
dotplot(results)

# Example of Stacking algorithms

# create submodels

control <- trainControl(method=”repeatedcv”, number=10, repeats=3, savePredictions=TRUE, classProbs=TRUE)

algorithmList <- c(‘lda’, ‘rpart’, ‘glm’, ‘knn’, ‘svmRadial’)

set.seed(seed)

models <- caretList(Class~., data=dataset, trControl=control, methodList=algorithmList)

results <- resamples(models)

summary(results)

dotplot(results)

We can see that the SVM creates the most accurate model with an accuracy of 94.66%.

Models: lda, rpart, glm, knn, svmRadial
Number of resamples: 30

Accuracy
Min. 1st Qu. Median Mean 3rd Qu. Max. NA’s
lda 0.7714 0.8286 0.8611 0.8645 0.9060 0.9429 0
rpart 0.7714 0.8540 0.8873 0.8803 0.9143 0.9714 0
glm 0.7778 0.8286 0.8873 0.8803 0.9167 0.9722 0
knn 0.7647 0.8056 0.8431 0.8451 0.8857 0.9167 0
svmRadial 0.8824 0.9143 0.9429 0.9466 0.9722 1.0000 0

Models: lda, rpart, glm, knn, svmRadial

Number of resamples: 30

 

Accuracy

            Min. 1st Qu. Median   Mean 3rd Qu.   Max. NA’s

lda       0.7714  0.8286 0.8611 0.8645  0.9060 0.9429    0

rpart     0.7714  0.8540 0.8873 0.8803  0.9143 0.9714    0

glm       0.7778  0.8286 0.8873 0.8803  0.9167 0.9722    0

knn       0.7647  0.8056 0.8431 0.8451  0.8857 0.9167    0

svmRadial 0.8824  0.9143 0.9429 0.9466  0.9722 1.0000    0

Comparison of Sub-Models for Stacking Ensemble in R

Comparison of Sub-Models for Stacking Ensemble in R

When we combine the predictions of different models using stacking, it is desirable that the predictions made by the sub-models have low correlation. This would suggest that the models are skillful but in different ways, allowing a new classifier to figure out how to get the best from each model for an improved score.

If the predictions for the sub-models were highly corrected (>0.75) then they would be making the same or very similar predictions most of the time reducing the benefit of combining the predictions.

# correlation between results
modelCor(results)
splom(results)

# correlation between results

modelCor(results)

splom(results)

We can see that all pairs of predictions have generally low correlation. The two methods with the highest correlation between their predictions are Logistic Regression (GLM) and kNN at 0.517 correlation which is not considered high (>0.75).

lda rpart glm knn svmRadial
lda 1.0000000 0.2515454 0.2970731 0.5013524 0.1126050
rpart 0.2515454 1.0000000 0.1749923 0.2823324 0.3465532
glm 0.2970731 0.1749923 1.0000000 0.5172239 0.3788275
knn 0.5013524 0.2823324 0.5172239 1.0000000 0.3512242
svmRadial 0.1126050 0.3465532 0.3788275 0.3512242 1.0000000

                lda     rpart       glm       knn svmRadial

lda       1.0000000 0.2515454 0.2970731 0.5013524 0.1126050

rpart     0.2515454 1.0000000 0.1749923 0.2823324 0.3465532

glm       0.2970731 0.1749923 1.0000000 0.5172239 0.3788275

knn       0.5013524 0.2823324 0.5172239 1.0000000 0.3512242

svmRadial 0.1126050 0.3465532 0.3788275 0.3512242 1.0000000

Correlations Between Predictions Made By Sub-Models in Stacking Ensemble

Correlations Between Predictions Made By Sub-Models in Stacking Ensemble

Let’s combine the predictions of the classifiers using a simple linear model.

# stack using glm
stackControl <- trainControl(method=”repeatedcv”, number=10, repeats=3, savePredictions=TRUE, classProbs=TRUE)
set.seed(seed)
stack.glm <- caretStack(models, method=”glm”, metric=”Accuracy”, trControl=stackControl)
print(stack.glm)

# stack using glm

stackControl <- trainControl(method=”repeatedcv”, number=10, repeats=3, savePredictions=TRUE, classProbs=TRUE)

set.seed(seed)

stack.glm <- caretStack(models, method=”glm”, metric=”Accuracy”, trControl=stackControl)

print(stack.glm)

We can see that we have lifted the accuracy to 94.99% which is a small improvement over using SVM alone. This is also an improvement over using random forest alone on the dataset, as observed above.

A glm ensemble of 2 base models: lda, rpart, glm, knn, svmRadial

Ensemble results:
Generalized Linear Model

1053 samples
5 predictor
2 classes: ‘bad’, ‘good’

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 948, 947, 948, 947, 949, 948, …
Resampling results

Accuracy Kappa Accuracy SD Kappa SD
0.949996 0.891494 0.02121303 0.04600482

A glm ensemble of 2 base models: lda, rpart, glm, knn, svmRadial

 

Ensemble results:

Generalized Linear Model

 

1053 samples

   5 predictor

   2 classes: ‘bad’, ‘good’

 

No pre-processing

Resampling: Cross-Validated (10 fold, repeated 3 times)

Summary of sample sizes: 948, 947, 948, 947, 949, 948, …

Resampling results

 

  Accuracy  Kappa     Accuracy SD  Kappa SD  

  0.949996  0.891494  0.02121303   0.04600482

We can also use more sophisticated algorithms to combine predictions in an effort to tease out when best to use the different methods. In this case, we can use the random forest algorithm to combine the predictions.

# stack using random forest
set.seed(seed)
stack.rf <- caretStack(models, method=”rf”, metric=”Accuracy”, trControl=stackControl)
print(stack.rf)

# stack using random forest

set.seed(seed)

stack.rf <- caretStack(models, method=”rf”, metric=”Accuracy”, trControl=stackControl)

print(stack.rf)

We can see that this has lifted the accuracy to 96.26% an impressive improvement on SVM alone.

A rf ensemble of 2 base models: lda, rpart, glm, knn, svmRadial

Ensemble results:
Random Forest

1053 samples
5 predictor
2 classes: ‘bad’, ‘good’

No pre-processing
Resampling: Cross-Validated (10 fold, repeated 3 times)
Summary of sample sizes: 948, 947, 948, 947, 949, 948, …
Resampling results across tuning parameters:

mtry Accuracy Kappa Accuracy SD Kappa SD
2 0.9626439 0.9179410 0.01777927 0.03936882
3 0.9623205 0.9172689 0.01858314 0.04115226
5 0.9591459 0.9106736 0.01938769 0.04260672

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 2.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

A rf ensemble of 2 base models: lda, rpart, glm, knn, svmRadial

 

Ensemble results:

Random Forest

 

1053 samples

   5 predictor

   2 classes: ‘bad’, ‘good’

 

No pre-processing

Resampling: Cross-Validated (10 fold, repeated 3 times)

Summary of sample sizes: 948, 947, 948, 947, 949, 948, …

Resampling results across tuning parameters:

 

  mtry  Accuracy   Kappa      Accuracy SD  Kappa SD  

  2     0.9626439  0.9179410  0.01777927   0.03936882

  3     0.9623205  0.9172689  0.01858314   0.04115226

  5     0.9591459  0.9106736  0.01938769   0.04260672

 

Accuracy was used to select the optimal model using  the largest value.

The final value used for the model was mtry = 2.

You Can Build Ensembles in R

You do not need to be an R programmer. You can copy and paste the sample code from this blog post to get started. Study the functions used in the examples using the built-in help in R.

You do not need to be a machine learning expert. Creating ensembles can be very complex if you are doing it from scratch. The caret and the caretEnsemble package allow you start creating and experimenting with ensembles even if you don’t have a deep understanding of how they work. Read-up on each type of ensemble to get more out of them at a later time.

You do not need to collect your own data. The data used in this case study was from the mlbench package. You can use standard machine learning dataset like this to learn, use and experiment with machine learning algorithms.

You do not need to write your own ensemble code. Some of the most powerful algorithms for creating ensembles is provided by R, ready to run. Use the examples in this post to get started right now. You can always adapt it to your specific cases or try out new ideas with custom code at a later time.

Summary

In this post you discovered that you can use ensembles of machine learning algorithms to improve the accuracy of your models.

You discovered three types of ensembles of machine learning algorithms that you can build in R:

  • Boosting
  • Bagging
  • Stacking

You can use the code in this case study as a template on your current or next machine learning project in R.

Next Step

Did you work through the case study?

  1. Start your R interactive environment.
  2. Type or copy-paste all of the code in this case study.
  3. Take the time to understand each part of the case study using the help for R functions.

Do you have any questions about this case study or using ensembles in R? Leave a comment and ask and I will do my best to answer.

Discover Faster Machine Learning in R!

Master Machine Learning With R

Develop Your Own Models in Minutes

…with just a few lines of R code

Discover how in my new Ebook:
Machine Learning Mastery With R

Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…

Finally Bring Machine Learning To Your Own Projects

Skip the Academics. Just Results.

See What’s Inside

error: Content is protected !!