Applied Machine Learning in Python Kevyn Collins Thompson week1 quiz  answers

These solutions are for reference only.

It is recommended that you should solve the assignments amd quizes by yourself honestly then only it makes sense to complete the course.
but if you are stuck in between refer these solutions

make sure you understand the solution
dont just copy paste it

----------------------------------------------------------------------------------------------

1。
Select the option that correctly completes the sentence:
Training a model using labeled data and using this model to predict
the labels for new data is known as ____________.

Supervised Learning
Density Estimation
Clustering
Unsupervised Learning

----------------------------------------------------------------------------------------------

2。
Select the option that correctly completes the sentence:
Modeling the features of an unlabeled dataset to find hidden
structure is known as ____________.

Supervised Learning
Regression
Unsupervised Learning
Classification

----------------------------------------------------------------------------------------------

3。
Select the option that correctly completes the sentence:
Training a model using categorically labelled data to predict labels
for new data is known as __________.

Regression
Clustering
Classification
Feature Extraction

----------------------------------------------------------------------------------------------

4。
Select the option that correctly completes the sentence:
Training a model using labelled data where the labels are
continuous quantities to predict labels for new data is known as
__________.

Feature Extraction
Regression
Classification
Clustering

----------------------------------------------------------------------------------------------

5。
Using the data for classes 0, 1, and 2 plotted below, what class
would a KNeighborsClassifier classify the new point as for k = 1 and
k = 3?

• k=1: Class 2
• k=3: Class 1

• k=1: Class 1
• k=3: Class 0

• k=1: Class 0
• k=3: Class 1

• k=1: Class 1
• k=3: Class 2

• k=1: Class 0
• k=3: Class 2

----------------------------------------------------------------------------------------------

6。
Which of the following is true for the nearest neighbor classifier
(Select all that apply):

A higher value of k leads to a more complex decision
boundary

Partitions observations into k clusters where each
observation belongs to the cluster with the nearest mean

Memorizes the entire training set

Given a data instance to classify, computes the
probability of each possible class using a statistical
model of the input features

----------------------------------------------------------------------------------------------

7。
Why is it important to examine your dataset as a first step in
applying machine learning? (Select all that apply):

See what type of cleaning or preprocessing still needs to
be done

You might notice missing data
Gain insight on what machine learning model might be
appropriate, if any

Get a sense for how difficult the problem might be

It is not important

----------------------------------------------------------------------------------------------

8。
The key purpose of splitting the dataset into training and test sets
is:

To estimate how well the learned model will generalize
to new data

To reduce the amount of labelled data needed for
evaluating classifier accuracy

To reduce the number of features we need to consider
as input to the learning algorithm

To speed up the training process

----------------------------------------------------------------------------------------------

9。
The purpose of setting the random_state parameter in
train_test_split is: (Select all that apply)

To avoid predictable splitting of the data

To make experiments easily reproducible by always
using the same partitioning of the data

To avoid bias in data splitting

To split the data into similar subsets so that bias is not
introduced into the final results

----------------------------------------------------------------------------------------------

10。
Given a dataset with 10,000 observations and 50 features plus one
label, what would be the dimensions of X_train, y_train, X_test, and
y_test? Assume a train/test split of 75%/25%.

• X_train: (2500, )
• y_train: (2500, 50)
• X_test: (7500, )
• y_test: (7500, 50)

• X_train: (10000, 28)
• y_train: (10000, )
• X_test: (10000, 12)
• y_test: (10000, )

• X_train: (2500, 50)
• y_train: (2500, )
• X_test: (7500, 50)
• y_test: (7500, )

• X_train: (7500, 50)
• y_train: (7500, )
• X_test: (2500, 50)
• y_test: (2500, )

• X_train: (10000, 50)
• y_train: (10000, )
• X_test: (10000, 50)
• y_test: (10000, )

darkmode