These solutions are for reference only.

try to solve on your own

but if you get stuck in between than you can refer these solutions

there are different set of questions ,

-----------------------------------------------------------------------------------------

## Neural Networks: Learning

TOTAL POINTS 5

EXPLANATION:

This version is correct, as it takes the “outer product” of the two vectors $\inline&space;\delta^{(3)}$and $\inline&space;a^{(2)}$ which is a matrix such that the (i,j)-th entry is as $\inline&space;\delta_i^{(3)}&space;*&space;(a^{(2)})_j$ desired.

EXPLANATION:
Theta1 has 15 elements, so Theta2 begins at index 16 and ends at index 16 + 24 - 1 = 39.

EXPLANATION:

EXPLANATION:

Using gradient checking can help verify if one's implementation of backpropagation is bug-free. (TRUE)
If the gradient computed by backpropagation is the same as one computed numerically with gradient checking, this is very strong evidence that you have a correct implementation of backpropagation.

If our neural network overfits the training set, one reasonable step to take is to increase the regularization parameter λ. (TRUE)
Just as with logistic regression, a large value of λ will penalize large parameter values, thereby reducing the changes of overfitting the training set.

OTHER STATEMENTS WHICH CAN OCCUR IN THE ABOVE 4 TH QUESTION:

For computational efficiency, after we have performed gradient checking to verify that our backpropagation code is correct, we usually disable gradient checking before using backpropagation to train the network. (TRUE)

Computing the gradient of the cost function in a neural network has the same efficiency when we use backpropagation or when we numerically compute it using the method of gradient checking.(FALSE)

Gradient checking is useful if we are using one of the advanced optimization methods (such as in fminunc) as our optimization algorithm. However, it serves little purpose if we are using gradient descent.(FALSE)

EXPLANATION:

Suppose you are training a neural network using gradient descent.  Depending on your random initialization, your algorithm may converge to different local optima (i.e., if you run the algorithm twice with different random initializations, gradient descent may converge to two different solutions). (TRUE)
=>The cost function for a neural network is non-convex, so it may have multiple minima. Which minimum you find with gradient descent depends on the initialization.

If we are training a neural network using gradient descent, one reasonable "debugging" step to make sure it is working is to plot J(Θ) as a function of the number of iterations, and make sure it is decreasing (or at least non-increasing) after each iteration. (TRUE)
=>Since gradient descent uses the gradient to take a step toward parameters with lower cost (ie, lower J(Θ)), the value of J(Θ) should be equal or less at each iteration if the gradient computation is correct and the learning rate is set properly.

---------------------------------------------------------------------------------

reference : coursera

darkmode