All you need to know about Support Vector Machines

4 min readMay 15, 2019

SVM is a simple classification algorithm every machine learning practitioner should have in their toolbox. Let’s first understand how it works then see its pros and cons.

As said earlier it is a classification algorithm used in supervised learning when we have categorical data. SVM takes the labeled data points (features) as input and returns the hyperplane which classifies that data points into categories(classes) as we expect.

Understanding hyperplanes is mandatory to understand SVM. To be simple hyperplane is a decision boundary helps to classify the data points. Data points fall under different side of the hyperplane considered as separate classes.

As the number of features increases the dimensions of hyperplane also will increase. When there is N features the resultant hyperplane will have n-1 dimension.

After SVM finds the hyperplane it will try to maximize the margin. Here margin in the sense, the distance to nearest data points (also known as support vectors) from the hyperplane. Orientation and position of the hyperplane are highly depended in this nearest data points.

Best Hyperplane(with maximized margin) for two features

You might have a question, “Why need to maximize the margin?”, this maximized margin gives some reinforcement which helps the model to classify future data points with more confidence.

How SVM works when there are Outliers in the dataset?

As we saw earlier, the position and orientation of the hyperplane are highly influenced by the closest data points. So if outliers exist in the data set, the algorithm will try to find the best hyperplane which reasonably separates the classes with relative to the number of nearest data points (or support vectors).

SVM on Non-linear Datapoints

So far we have seen the examples on linear data points. Look at the below example can you imagine how the hyperplane will look like? Do you think it is possible to classify these data points using SVM?

Guess the Hyperplane (Decision boundary) for this non-linear data points

Yes, it is possible with the help of the Kernel function. To be simple kernels are mathematical functions passed to the SVM as a parameter. Kernels take the input data points and convert them into the required form for the SVM, to find the hyperplane.

Kernel functions convert the non-linear data to linear data in higher order then finds the hyperplane. Again using the same kernel function it will map the decision boundary(hyperplane) on the non-linear data.

How Kernels helps to find decision boundary in non-linear data

Let’s see how it works on our example,

Decision boundary on non-linear data points

Parameters of SVM

If you google “sklearn SVM” you will find the documentation for Scikit learn SVM model. There you can see the details of parameters passed into the SVM. Let me introduce three main parameters to be considered.

Kernels: As we saw before kernel function is the parameter which needs to be passed according to our data points’ linearity. Refer to the sklearn documentation for different kernel functions available and, their uses.
C (Regularization): Controls the trade-off between the smoothness of the decision boundary and correctness of the classification.i.e. when c is high it will classify all the data points correctly, also there is a chance to overfit.
Gamma: Defines how much influence each training sample has on the decision boundary. i.e. when gamma is higher nearby points will have high influence and low gamma means far away points also be considered to get the decision boundary