Softmax Activation Function

Mathanraj Sharma
3 min readJul 4, 2019

--

Whenever we running into a classification problem in neural networks we always see that the word “Softmax” on the way. We all knew that it is an Activation function, but what is actually behind the screen? Let’s hear the story of Softmax…

Before that let us understand why we need such a function. If we need to get answer likes “Yes” /“No” or need to predict the probability of an occurrence of an event we have step function and sigmoid function. If you don’t know about them make a comment I will write about them too.

But what if we have more classes to classify. Think we have to build a model which tells us which animal we have captured in our camera. Assume the animals and the probabilities are,

  1. Dog: P(Dog) = 0.67
  2. Cat: P(Cat) = 0.24
  3. Lion: P(Lion) = 0.09

Note that probabilities have to add to 1, i.e sum(P(all classes)) = 1, and imagine we have a linear model which predict these three animals with a set of features like teeth, tail, color, fang, hair and etc. We can calculate linear functions based on the features and assume that we have these scores,

  1. Dog get the score of 2
  2. Cat get the score of 1
  3. Lion get the score of 0

Note: If our linear boundary is represented by y= b + w1x1 + w2x2 + ……wnxn + b. The score is the output of substituting the b, w, and x. (coordinates of data points)

Now the question is how we can turn these scores into probabilities?

Remember we need to Satisfy two conditions,

  1. The probabilities have to sum up to 1
  2. Since the Dog has the highest score than the Cat & Lion, it has to get a higher probability and likewise.

A simple way we all think of is Score(x) / Sum(All Scores), but there is a problem we might get. What if an animal has a negative score? a probability can’t be negative. And the second problem is what if the Sum(All Scores) = 0 since we can get negative scores in our linear models it is possible and we all know we can’t divide a number by 0.

So what is the solution? Is there a function which can turn negative values into positive?

Absolutely Yes!, we all know that the exponential of any value will result in positive values. So Instead of Score(x) / Sum(All Scores) let’s try, exp(x)/Sum(exp(All Scores)) now the outputs are,

  1. P(Dog) = exp(2)/(exp(2) + exp(1) + exp(0)) = 0.67
  2. P(Cat) = exp(1)/(exp(2) + exp(1) + exp(0)) = 0.24
  3. P(Dog) = exp(0)/(exp(2) + exp(1) + exp(0)) = 0.09

Hoorey!!!, we have found out the seceret behind the so called Softmax function. And notice since the exponential values always increase, we have highere probability for Dog than other two classes, also a higher probability for Cat than lion as we expected.

Softmax
import numpy as npL = [5,6,7]def softmax(L):
expL = np.exp(L)
sumExpL = sum(expL)
result = []
for i in expL:
result.append(i*1.0/sumExpL)
return result
print(softmax(L)

Hope you understood the concept behind the Softmax functions. You feedbacks are warmly welcomed. Don’t forget to Clap and follow me for more updates.

--

--

Mathanraj Sharma
Mathanraj Sharma

Written by Mathanraj Sharma

Machine Learning Engineer at H2O.ai | Maker | Developer | Dev Blogger | Like my stories? Buy me a ☕ https://ko-fi.com/mathanraj_sharma

Responses (2)