All you need to know about Support Vector Machines

Mathanraj Sharma
4 min readMay 15, 2019

--

PC:Packtpub

SVM is a simple classification algorithm every machine learning practitioner should have in their toolbox. Let’s first understand how it works then see its pros and cons.

As said earlier it is a classification algorithm used in supervised learning when we have categorical data. SVM takes the labeled data points (features) as input and returns the hyperplane which classifies that data points into categories(classes) as we expect.

Understanding hyperplanes is mandatory to understand SVM. To be simple hyperplane is a decision boundary helps to classify the data points. Data points fall under different side of the hyperplane considered as separate classes.

As the number of features increases the dimensions of hyperplane also will increase. When there is N features the resultant hyperplane will have n-1 dimension.

After SVM finds the hyperplane it will try to maximize the margin. Here margin in the sense, the distance to nearest data points (also known as support vectors) from the hyperplane. Orientation and position of the hyperplane are highly depended in this nearest data points.

Best Hyperplane(with maximized margin) for two features

You might have a question, “Why need to maximize the margin?”, this maximized margin gives some reinforcement which helps the model to classify future data points with more confidence.

How SVM works when there are Outliers in the dataset?

As we saw earlier, the position and orientation of the hyperplane are highly influenced by the closest data points. So if outliers exist in the data set, the algorithm will try to find the best hyperplane which reasonably separates the classes with relative to the number of nearest data points (or support vectors).

With Outlier

SVM on Non-linear Datapoints

So far we have seen the examples on linear data points. Look at the below example can you imagine how the hyperplane will look like? Do you think it is possible to classify these data points using SVM?

Guess the Hyperplane (Decision boundary) for this non-linear data points

Yes, it is possible with the help of the Kernel function. To be simple kernels are mathematical functions passed to the SVM as a parameter. Kernels take the input data points and convert them into the required form for the SVM, to find the hyperplane.

Kernel functions convert the non-linear data to linear data in higher order then finds the hyperplane. Again using the same kernel function it will map the decision boundary(hyperplane) on the non-linear data.

How Kernels helps to find decision boundary in non-linear data

Let’s see how it works on our example,

Linearity found by the kernel
Decision boundary on non-linear data points

Parameters of SVM

If you google “sklearn SVM” you will find the documentation for Scikit learn SVM model. There you can see the details of parameters passed into the SVM. Let me introduce three main parameters to be considered.

  1. Kernels: As we saw before kernel function is the parameter which needs to be passed according to our data points’ linearity. Refer to the sklearn documentation for different kernel functions available and, their uses.
  2. C (Regularization): Controls the trade-off between the smoothness of the decision boundary and correctness of the classification.i.e. when c is high it will classify all the data points correctly, also there is a chance to overfit.
  3. Gamma: Defines how much influence each training sample has on the decision boundary. i.e. when gamma is higher nearby points will have high influence and low gamma means far away points also be considered to get the decision boundary
When C is high it SVM will try to fit on all data points, see the less smoothness of the decision boundary
When Gamma is low SVM also consider far away points to decide decision boundary

play around and tune the hyperparameters and see the different results you get when trying out SVM classification.

Advantages first,

  1. Regularization parameter helps to avoid overfitting.
  2. The kernel tricks help to classify both linear and non-linear data.
  3. SVM uses convex optimization, which ensures the result is a global minimum.
  4. SVM also supports semi-supervised learning.

Let’s see the disadvantages

  1. In general, SVM is slower on training and prediction.

This is the only disadvantage I found, if you know any other please respond (comment) to this story.

Hope now you have a clear picture of SVM algorithm. Applaud and Share with your friends!!!

--

--

Mathanraj Sharma
Mathanraj Sharma

Written by Mathanraj Sharma

Machine Learning Engineer at H2O.ai | Maker | Developer | Dev Blogger | Like my stories? Buy me a ☕ https://ko-fi.com/mathanraj_sharma

Responses (1)