I have recently started DeepLearning.AI’s Deep Learning Specialization on Coursera. Below are my lecture notes from the second week of the first course. The lectures examined vectorized Logistic regression as a neural network in preparation for more complex neural networks.

I. Logistic Regression

Logistic Regression Input and Output

Given x where x \in {\rm I\!R}^{n_{x}} ,

We want \hat{y} = P(y=1| x)

Parameters:
w \in {\rm I\!R}^{n_{x}}
b \in {\rm I\!R}


Output option 1 (linear regression): \hat{y} = w^T x + b

  • \hat{y} not always \in [0,1] which makes classification awkward

Output option 2 (logistic regression): \hat{y} = \sigma(w^T x + b)

  • Provides output in [0,1] for easy binary classification (usually \hat{y}>0.5 designated as class 1 and \hat{y}\leq0.5 designated as class 0).
  • Takes advantage of “sigmoid” equation \sigma(z) = \frac{1}{1+e^{-z}} visualized below.
In [1]:
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline

#Create z and sigma
z = np.linspace(-5,5)
sigma = 1/(1+np.exp(-z))

#Draw prediction cut-off line
plt.axhline(0.5, color='black',ls='--')

#Label axis
plt.xlabel('z')
plt.ylabel(r'$latex \hat{y}$')

#Plot graph
plt.tick_params(axis='x',bottom='off',labelbottom='off')
plt.plot(z,sigma,'-',lw=3);

Logistic Regression Loss and Cost Function

Loss Function:

For an individual instance, the loss function is:
L(\hat{y},y) = -(y\log\hat{y} + (1-y)\log(1-\hat{y}))

Intuition:

  • If y= 1: L(\hat{y},y) = -\log\hat{y}
    • Minimizing this will ensure that \log\hat{y} is large which will ensure that \hat{y} is large, i.e. \hat{y} close to 1.
  • If y= 0: L(\hat{y},y) = -(1-y)\log(1-\hat{y})
    • Minimizing this will ensure that \log(1-\hat{y}) is large which will ensure that \hat{y} is small, i.e. \hat{y} close to 0.

Cost Function:

Across training set, the cost function is:
J(w,b)=-\frac{1}{m}\sum[y^{(i)}\log\hat{y}^{(i)} + (1-y^{(i)})\log(1-\hat{y}^{(i)})]

Gradient Descent

J(w,b) is a convex function so gradient descent will not get stuck on a local minimum.


Gradient Descent Algorithm:

For cost function J(w,b) and learning rate \alpha,
repeat {
w:=w-\alpha\frac{J(w,b)}{\partial w}
b:=b-\alpha\frac{J(w,b)}{\partial b}
}


II. Implementing Vectorized Logistic Regression in Python

Note: Capital letters indicate a matrix rather than a single training instance.

Vectorization

Vectorization is the art of removing for loops. For loops are much slower than matrix multiplication.

To create the product of W transpose and X, use numpy’s .dot(X,Y) function and the .T method for transpose :

z = np.dot(W.T,X) + b

Parial Derivatives

In code, we represent the partial derivatives as follows:
dw=\frac{J(w,b)}{\partial w}
db=\frac{J(w,b)}{\partial b}

After taking partial derivatives from a computation graph, we find that \frac{J(w,b)}{\partial z} (the chage in cost with respect to z) is equal to:

\sum_{i=1}^{m}(\hat{y}^{(i)}-y^{(i)})

And so in code, we will represent this as dZ.

Single Vectorized Step of Logistic Regression

Calculate z:
z = np.dot(W.T,X)+b

Calculate A (convert z to [0,1] range with sigmoid function):
A= \sigma(z)

Calculate dZ (the change in cost with respect to z):
dZ = A-Y

Calculate w and b (weights and bias):
dw = \frac{1}{m}XdZ.T
db = \frac{1}{m}np.sum(dZ)


III. Processing Images for Classification and Array Notes

Image Processing

Images are stored as 3 n x m matrices of pixel intensities. To classify them we need to reshape them as a (n x m x 3, 1) array.

In [2]:
#Create dummy image
n=64
m=100

img = np.random.randn(n,m,3)

print('Shape of standard image: {}'.format(img.shape))
Shape of standard image: (64, 100, 3)
In [3]:
#Prepare for training or classification
reshaped_img = img.reshape((img.shape[0]*img.shape[1]*3,1))

print('Shape of reshaped image: {}'.format(reshaped_img.shape))
Shape of reshaped image: (19200, 1)

Broadcasting

Broadcasting refers to automatic conversion of array shapes to allow for various calculations.

Given array of (n,m), adding/substracting/dividing/multiplying by an array or real numbers with various dimensions will convert them as follows:

  • (1,n) -> (m,n) – Created by copying column m times
  • (m,1) -> (m,n) – Created by copying row n times
  • k -> (m,n) – Created by filling a (m,n) matrix with k.

Rank 1 Arrays and Assert

Avoid using rank 1 arrays:
These arrays have shape of (n,). Use reshape to give them dimension of (n,1) or (1,n) to avoid tricky bugs in code.

E.g. use:
X = np.zeros((5,1))

Instead of:
X = np.zeros(5)

Use assert to check array shape:
assert(X.shape==(5,1))

Categories: Deep Learning

Leave a Reply

Your email address will not be published. Required fields are marked *

Related Posts

Deep Learning

Activation Functions

Deep Learning activation functions examined below: ReLU Leaky ReLU sigmoid tanh Activation plotting pleminaries In [1]: import matplotlib.pyplot as plt import numpy as np %matplotlib inline #Create array of possible z values z = np.linspace(-5,5,num=1000) def Read more…

Deep Learning

DeepLearning.AI: Course 1 – Week 1 Takeaways

I will be taking deeplearning.ai’s 5-course Deep Learning Specialization on Coursera. Here are my notes from week 1 (Introduction to deep learning) of the first course (Neural Networks and Deep Learning). 1. Neural Nets are Read more…