# Mathematics of Deep Learning

Aloha

In the previous post we looked at basic reasons behind the success of Deep Learning and got the understanding of commonly used activation functions. In this post we’ll be looking at mathematics behind Deep Learning. Most of the beginners in the field of Deep Learning are usually people who know (Deep Learning -Mathematics). This is probably the case because they are unable to understand the basic intuition behind the Mathematics part of Deep Learning. I’ll try to cover most of it in this post. So without wasting anymore time Lets Begin!

Derivative: The derivative of a function of a real variable measures the sensitivity to change of the function value (output value) with respect to a change in its argument (input value).

Position = x(t) and Speed = dx(t)/dt, where Speed gives us the rate of change of regional function x(t) w.r.t. time.

Partial Derivative: When our function has more than one input, we need to specify which variable we are using for derivation. Lets look at the most common example used to explain gradient descent(we’ll look into it again later), i.e., Mountain Descent example.

Let’s say that elevation(y) at any point is a function of North(n) and East(e). (Obviously South = -n and West = -e)

Therefore, y = f(n, e). Thus to find the direction of fastest change in elevation w.r.t. n and e will be ∂y/∂n and ∂y/∂e respectively. But again, fastest change in direction down (down as we are considering Mountain Descent) won’t necessarily be along any single direction; Right?

It’ll be in whichever direction the hill is most steeply descending down in 2-D plane (our n-e plane). Look at the image below for better understanding.

In 2-D plane of n and e, the direction of most abrupt change will be a 2-D vector whose components are partial derivatives w.r.t each variable.

We call this vector Gradient ( /del operator).

The magnitude of the gradient is the value of this steepest slope. Gradient is an operation that take a function of multiple variables and return a vector. How is Gradient different from derivative? The gradient is a vector-valued function, as opposed to a derivative, which is scalar-valued.

The components of this vector are all the partial derivatives of the function, i.e., (look below)

Thus if we want to go downhill, all we have to do is walk in the direction opposite to the gradient.

Let us take a function(f) with only one variable w. Also, let us consider a value of w, w1, and its corresponding output, f(w1). Thus to find a point(w2) in the direction of minimum value of f(w), we follow the path indicated by its derivative. (Similarly, in case of multiple variables, this path is indicated by Gradient).