Choose the correct ML algorithm for your problem

Mathanraj Sharma
3 min readMay 6, 2019

--

PC: studybritishenglish

Have you ever questioned yourself, “Which algorithm should I use to train my model?”. Read further to find out some tips which can come in handy to you in such situations.

First, Understand the problem you are going to solve

In common ML problems are categorized into three.

  1. Supervised Learning: When you are working with a Structured dataset (Labeled data) your problem falls under this category. In supervised learning, your problem may fall under two possible categories. If the target (expected output) is ordinal numerical value then you should use a regression algorithm. On the other hand, if the target is a class or categorical value then you should use a classification algorithm.
  2. Unsupervised Learning: If your dataset is Unstructured and you are trying to figuring out the labels then it is unsupervised learning. In this case, your target is a subset of your input Features. In this case, you will use a clustering algorithm.
  3. Reinforcement Learning: It is a type of learning where the software agents expected to take suitable actions in order to maximize the reward (the expected output or behavior).

Second, Understand the data you have

After the first step, you should have narrowed down your algorithm selection. Now you have an idea of under which three major categories your problem is falling. But there are many algorithms available under every three categories.

For example, if we take supervised learning as your category, there are algorithms like linear regression, logistic regression, Naive Bayes, decision trees, k nearest neighbor, similarity learning, Neural Networks and so on. So now the question is which on to choose?

To decide that you need to understand your data very well. Build hypothesis, visualize and analyze your data, find out the data types. Because some algorithms like linear regression or logistic regression only work on numerical data where algorithms like the random forest can work with categorical data too.

Third, determine the size of your dataset and Features

Determine how much data you have. Because for the smaller data set simple algorithms like linear regression will work decently and produce got generalization, where complex algorithms like Neural Networks can overfit the dataset.

When it comes to the number of features some complex algorithms consume more time for training and prediction as well. In such cases, SVM is a better algorithm to go on. And again Linear regression performs decently in handling a vast number of features.

Fourth, determine the accuracy you need

Accuracy will differ based on the application. For some problem, we might need an adequate level of accuracy and in sensitive cases, we need higher accuracy. So it depends on the end result you are working on.

But accuracy plays a major role in selecting algorithm. If adequate accuracy is enough you can go with a simple algorithm which saves a ton of computing power and time.

Fifth, Human Insight

Even after considering all the above you may end up with two or three possible algorithms (or more) to choose. Start trying out with the low complex algorithm and try the possibilities one by one until you find the algorithm which you need.

Don’t forget, always use your human insights. Determine the computing power, time and resources each possible algorithm may require to produce the accuracy you need.

To do this you need to understand the concepts behind different types of algorithms and the complexity behind them.

Hope you all understood what are the necessary things you need to concern while selecting the algorithm.

Applaud and Share if you like this. Please respond (comment) if I have missed anything.

--

--

Mathanraj Sharma
Mathanraj Sharma

Written by Mathanraj Sharma

Machine Learning Engineer at H2O.ai | Maker | Developer | Dev Blogger | Like my stories? Buy me a ☕ https://ko-fi.com/mathanraj_sharma

No responses yet