Last Updated on June 7, 2016
You want to learn machine learning to have more opportunities at work or to get a job. You may already be working as a data scientist or machine learning engineer and looking to improve your skills.
It is about as easy to pigeonhole machine learning skills as it is programming skills (you can’t).
There is a wide array of tasks that require some skill in data mining and machine learning in business from data analysis type work to full systems architecture and integration.
Nevertheless there are common tasks and common skills that you will want to develop, just like you could suggest for an aspiring software developer.
In this post we will look at 5 key areas were you might want to develop skills and the types of activities that you could take on to practice in those areas.
What You Will Learn
1. Machine Learning Foundations and Theory
You need to know the basics. This includes definitions and core principles.
You should know what supervised and unsupervised learning are and examples of each. You should know what over-learning and under-learning are.
You should also know the importance of estimating a predictive models performance on unseen data, common methods for doing so and common problems.
You also need to know some of the theoretical underpinnings, including familiarity with notations and algorithm descriptions using probability theory, linear algebra and information theory.
You may need to dip into some introductory books and maybe some introductory sections of some textbooks. Easy in and continue to tie what you learn to actual problems or datasets that interest you.
2. Machine Learning Algorithms
You need to know machine learning algorithms.
You can wave your arms and comment that this algorithm is good for that situation, but most of that is rubbish. Good results are found through lots of empirical testing of algorithms and algorithm parameters.
What you can and should learn is what algorithms are out there, what general classes, and how do they work.
Read, study and even construct your own algorithm descriptions from multiple applied and theoretical sources.
Implement algorithms from scratch to familiarize yourself with the myriad of micro-decisions any given algorithm implementation must make to be usable.
Experiment with algorithms. Study their behaviors and the effects that their parameters have on them, generalized across multiple standard problem instances.
3. Machine Learning Tools
You need to be able to get things done, and that requires tools.
You cannot realize everything from first principles every time you need it. That’s ridiculous.
You must learn what tools are out there, what you can use and when you should use them.
Learn a handful of machine learning tools and libraries, use them in anger on standard datasets and competitions. Learn their features. Learn what algorithms they provide and the quirks of those implementations.
I recommend at least Weka, scikit-learn and R. I have a ton of blog posts with recipes, just search.
4. Machine Learning Problems
The sibling to machine learning algorithms is problems. They are twins and cannot be separated.
You must study machine learning problems. This includes case studies, such as results from competitions and in papers.
It also critically includes how to address problems. The machine learning problem solving process. How to work a problem end-to-end from description to presenting results.
Further, what tools do you use in the process, at which step, and how can you take the results from one step to the next. What are the criterion by which success is met at each step.
If you come from an engineering background than the algorithms will come easy, the problem solving on the other hand requires study and hard work. You must become the scientist, formulate and objectively test hypotheses.
Programming does not require this skill (well, fault finding would be an exception).
5. Staying Up-To-Date
You must stay up to date.
Sure this means running some deep learning because that’s a hot fad right now. It very likely can deliver state-of-the-art results on that hard problem you’re slogging through.
It also means keeping abreast with the news, with the development to the tools (changelog, conferences, etc.), to the theory and algorithms (research papers, blogs, conference videos, etc.).
Tech changes quickly, and this is high-tech. It changes faster. Expect and cultivate this change.
This was a short post, but I think important.
We touched on 5 areas of machine learning you should be cultivating to meet your goals in the field.
Those five areas again are:
- Machine Learning Foundations and Theory: Build a solid foundation of definitions, terminology, principles and theory.
- Machine Learning Algorithms: Read and study algorithms, implement algorithms from scratch and experiment on them to build intuition for how they work and why.
- Machine Learning Tools: Learn and use machine learning tools and libraries in anger to make efficient use of your time.
- Machine Learning Problems: Study problem case studies, and continuously study the process of applied machine learning (or KDD or whatever you want to call it).
- Staying Up-To-Date: Keep yourself up to date with the algorithms, problems tools and even the hype. Expect everything to keep moving.
You cannot do it all at once, pick one area and spend some time with it, then change it up.
Which of these five areas are you currently cultivating?