Deep Learning activation functions examined below:

- ReLU
- Leaky ReLU
- sigmoid
- tanh

```
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
#Create array of possible z values
z = np.linspace(-5,5,num=1000)
def draw_activation_plot(a,quadrants=2,y_ticks=[0],two_quad_y_lim=[0,5], four_quad_y_lim=[-1,1]):
"""
Draws plot of activation function
Parameters
----------
a : Output of activation function over domain z.
quadrants: The number of quadrants in the plot (options: 2 or 4)
y_ticks: Ticks to show on the y-axis.
two_quad_y_lim: The limit of the y axis for 2 quadrant plots.
four_quad_y_lim: The limit of the y axis for 4 quadrant plots.
"""
#Create figure and axis
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
#Move left axis
ax.spines['left'].set_position('center')
#Remove top and right axes
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')
#Set x and y labels
plt.xlabel('z')
plt.ylabel('a')
#Set ticks
plt.xticks([])
plt.yticks(y_ticks)
#Set ylim
plt.ylim(two_quad_y_lim)
#4 Quadrant conditions
if quadrants==4:
#Move up bottom axis
ax.spines['bottom'].set_position('center')
#Move x and y labels for readability
ax.yaxis.set_label_coords(.48,.75)
ax.xaxis.set_label_coords(.75,.48)
##Set y_lim for 4 quadrant graphs
plt.ylim(four_quad_y_lim)
#Plot z vs. activation function
plt.plot(z,a);
```

## 1. ReLU

A great default choice for hidden layers. It is frequently used in industry and is almost always adequete to solve a problem.

Although this graph is not differentiable at z=0, it is not usually a problem in practice since an exact value of 0 is rare. The derivative at z=0 can usually be set to 0 or 1 without a problem.

```
relu = np.maximum(z,0)
draw_activation_plot(relu)
```

```
leaky_ReLU = np.maximum(0.01*z,z)
draw_activation_plot(leaky_ReLU)
```

## 3. sigmoid

Almost never used except in output layer when dealing with binary classification. It’s most useful feature is that it guarentees an output between 0 and 1.

However, when z is very small or very large, the derivative of the sigmoid function is very small which can slow down gradient descent.

```
sigmoid = 1/(1+np.exp(-z))
draw_activation_plot(sigmoid,y_ticks=[0,1], two_quad_y_lim=[0,1])
```

```
tanh = (np.exp(z)-np.exp(-z))/(np.exp(z)+np.exp(-z))
draw_activation_plot(tanh,y_ticks=[-1,0,1],quadrants=4)
```