Coursera: Machine Learning-Andrew NG(Week 3) Quiz - Logistic Regression

These solutions are for reference only.

try to solve on your own

but if you get stuck in between than you can refer these solutions

there are different set of questions ,

we have provided the variations in particular question at the end.

read questions carefully before marking


Logistic Regression



Our estimate for P(y=1|x;θ) is 0.7  =>because =>hθ(x) = 0.7

Our estimate for P(y=0|x;θ) is 0.3 =>because => P(y=0|x;θ) = 1 - P(y = 1| x; θ); the former is 1 - 0.7= 0.3


J(θ) will be a convex function, so gradient descent should converge to the global minimum.(true)

Adding polynomial features (e.g., instead using hθ(x) = g(θ0 + θ1x1 + θ2x2 + θ3x2 + θ4x1x2 + θ5x2 )) could increase how well we can fit the training data (true)
=>Adding new features can only improve the fit on the training set: since setting θ3 = θ4 = θ5 = 0 makes the hypothesis the same as the original one, gradient descent will use those features (by making the corresponding non-zero) only if doing so improves the training set fit

other statements that can occur in this question:

At the optimal value of θ (e.g., found by fminunc), we will have J(θ) ≥ 0. (true)

variation to 3rd  question is provided at the end.


The cost function J(θ) for logistic regression trained with examples is always greater than or equal to zero.(true)
=>The cost for any example x(i) is always ≥ 0 since it is the negative log of a quantity less than one. The cost function J(θ) is a summation over the cost for each eample, so the cost function itself must be greater than or equal to zero.

The sigmoid function is never greater than one(true)

other statements that can occur in this question:

The one-vs-all technique allows you to use logistic regression for problems in which each y(i)comes from a fixed, discrete set of values. (true)
=>If each y(i) is one of k different values, we can give a label to each y(i)belongs{1,2,....,k} and use one-vs-all as described in the lecture.


In this figure, we transition from negative to positive when x1 goes from left of 6 to right of 6 which is true for the given values of θ.




variations in 5 th question:

variations in 3 th question:


reference : coursera