Last Updated on August 22, 2019
R is the most popular platform for applied machine learning. When you want to get serious with applied machine learning you will find your way into R.
It is very powerful because so many machine learning algorithms are provided. A problem is that the algorithms are all provided by third parties, which makes their usage very inconsistent. This slows you down, a lot, because you have to learn how to model data and how to make predicts with each algorithm in each package, again and again.
In this post, you will discover how you can overcome this difficulty with machine learning algorithms in R, with pre-prepared recipes that follow a consistent structure.
Discover how to prepare data, fit machine learning models and evaluate their predictions in R with my new book, including 14 step-by-step tutorials, 3 projects, and full source code.
Let’s get started.
What You Will Learn
Lots of Algorithms, Little Consistency
The R ecosystem is enormous. Open source third party packages provide this power, allowing academics and professionals to get the most powerful algorithms available into the hands of us practitioners.
A problem that I experienced when starting out with R was that the usage to each algorithm differs from package to package. This inconsistency also extends to the documentation, with some providing worked example for classification but ignoring regression and others not providing examples at all.
All this means that if you want to try a few different algorithms from different packages, you must spend time figuring out how to fit and make predictions with each method in turn. This takes a lot of time, especially with the spotty examples and vignettes.
I summarize these difficulties as follows:
- Inconsistent: Algorithm implementations vary in the way a model is fit to data and the way a model is used to generate predictions. This means that you have to study each package and each algorithm implementation just to put together a working example, let alone adapt it to your problem.
- Decentralized: Algorithm are implemented across different packages and it can be hard to locate which packages provide an implementation of the algorithm you need, let alone which package provides the most popular implementation. Additionally, the documentation for one package may be spread across multiple help files, website and vignettes. This means you have to do a lot of searching just to locate an algorithm, let alone compile a list of algorithms from which you can choose.
- Incomplete: Algorithm documentation is almost always partially complete. An example usage may or may not be provided, if it is, it may or may not be demonstrated on a canonical problem. This means you have no obvious way to quickly understand how to use an implementation.
- Complexity: Algorithms vary in their complexity of implementation and description. This can take it’s toll on you as you jump from package to package. You want to focus on how to get the most from the algorithm and its parameters, and not burn energy on parsing reams of PDFs just to get a hello world.
Need more Help with R for Machine Learning?
Take my free 14-day email course and discover how to use R on your project (with sample code).
Click to sign-up and also get a free PDF Ebook version of the course.
Start Your FREE Mini-Course Now!
Build an Algorithm Recipe Book
You could get a lot more done if you had an algorithm recipe book you could look up and find examples of machine learning algorithms in R that you could copy-and-paste and adapt for your specific problem.
For this the recipe book approach to work, it would have to confirm to some key principles:
- Standalone: Each code example must be standalone, complete and ready to execute.
- Just Code: Each recipe must focuses on the code with minimal exposition on machine learning theory (there are amazing books for that, don’t mix these concerns).
- Simplicity: Each recipe must be presented in the most common use case, which is probably what you are looking to do when you look it up. You want to consult the official documentation only to look up the parameters so that you can get the most from the algorithm.
- Portable: All recipes must be provided in a single reference that can be searched and printed, browsed and looked up (a recipe book).
- Consistent: All code examples are presented consistently and follow the same code structure and style conventions (load data, fit model, make prediction).
An algorithm recipe book would give you the ability to wield the R platform for machine learning and solve complex problems.
- You could apply algorithms and features directly.
- You could discover the code you need.
- You could understand what is going on with a glance.
- You could own the recipes and use and organize them the way you want.
- You could get the most out of the algorithms and features.
Algorithm Recipes in R
I have already blocked out examples of what these recipes could look like.
I have provided example machine learning recipes in R, grouped by algorithm type or similarity, as follows:
- Linear Regression: Ordinary Least Squares Regression, Stepwise Regression, Principal Component Regression and Partial Least Squares Regression.
- Penalized-Linear Regression: Ridge Regression, Least Absolute Shrinkage and Selection Operator (LASSO) and ElasticNet
- Non-Linear Regression: Multivariate Adaptive Regression Spines (MARS), Support Vector Machine (SVM), k-Nearest Neighbor (kNN) and Neural Network.
- Non-Linear Decision Tree Regression: Classification and Regression Trees (CART), Conditional Decision Trees, Modal Trees, Rule Systems, Bagging CART, Random Forest, Gradient Boosted Machines (GBM) and Cubist.
- Linear Classification: Logistic Regression, Linear Discriminant Analysis (LDA) and Partial Least Squares Discriminant Analysis.
- Non-Linear Classification: Mixture Discriminant Analysis (MDA), Quadratic Discriminant Analysis (QDA), Regularized Discriminant Analysis (RDA), Neural Network, Flexible Discriminant Analysis (FDA), Support Vector Machine (SVM), k-Nearest Neighbor (kNN) and Naive Bayes.
- Non-Linear Decision Tree Classification: Classification and Regression Trees (CART), C4.5, PART, Bagging CART, Random Forest, Gradient Boosted Machines (GBM) and Boosted C5.0.
I think these recipes really fit the bill of this mission.
Summary
In this post, you discovered the popularity and power of machine learning in R, but the cost of that power is the time required to harness it.
You discovered that one approach to addressing this limitation in R is to devise a recipe book of complete and standalone machine learning algorithms that you can look up and apply to your specific problems, as needed.
Finally, you saw examples of machine learning algorithm recipes in R for a wide range of algorithm type.
If you found this approach useful, I’d love to hear about it.
Discover Faster Machine Learning in R!
Develop Your Own Models in Minutes
…with just a few lines of R code
Discover how in my new Ebook:
Machine Learning Mastery With R
Covers self-study tutorials and end-to-end projects like:
Loading data, visualization, build models, tuning, and much more…
Finally Bring Machine Learning To Your Own Projects
Skip the Academics. Just Results.
See What’s Inside
About Jason Brownlee
Jason Brownlee, PhD is a machine learning specialist who teaches developers how to get results with modern machine learning methods via hands-on tutorials.