MultipleChoice

Multiple Choice Questions

Let’s suppose we model the Indian map boundary using the following equation: y = f(x) = ax + b, and learn the values of scalars a and b using 100,000 coordinates sampled uniformly from the Indian map boundary. Which of the following problems is our model likely to suffer from?

High bias
High variance
Low entropy
High entropy

Answer - High Bias

Consider an A versus B two-class classifier. Recall(A) = 100% implies which of the following:

Precision(A) < 100%
Recall(B) = 100%
Precision(B) < 100%
None of the above

Answer - Precision(A) < 100%

Normalization is one of the parts of data pre-processing. We do it in order to:

Features scaling
Improve performance of machine learning algorithm to be applied further
Remove elements from set with missing value.
Filling missing values in input data by average value

Advantages of Naive Bayes classifier are

It perform well in multi class prediction.
It perform well in case of dependent predictors.
It perform well in case of categorical input variables compared to numerical variable(s)
It perform well in case of categorical variable which was not observed in training data set.

Answer - Multi-Class / Prediction, Also, well on Categorical (1 & 3)

Decision trees have the following properties

Over-fitting does not occur when decision trees are used
They work well with continuous variables
Type of input data doesn't matter
It's not a parametric method

Answer - Non Parametric, 2-3-4

k-Means clustering method has the following properties

Each new element refers to a class based on the distance to this class
It's not a parametric method
Each element belongs just to one class.
It's a supervised machine learning method

Answer - 2-3

Support vector machine is used in the following cases

Binary classification for linear separable classes
Multiple classification for linear separable classes.
Binary classification for classes that are not linear separable.
Finding number of clusters that input points belong to

Answer - 3

Logistic regression is used in the following cases

There are set of vectors. Each of them belongs to one of the two classes. First class has label 'Yes' second 'No'. For new vector we should say for which class it belongs to.
For input vector predict target value in case nonlinear dependency between input and output variables.
There are set of vectors. Each of them belongs to one of the two classes. First class has label 'Yes' second 'No'. For new vector we should calculate probability that it belongs to class with label 'Yes'.
For detecting most significant feature(attribute) in the input vector of attributes.

Linear regression is used in the following cases

For input vector predict continuous target value
It is allow to detect degree of influence input vector attribute x_i on the target value y
When we make assumption that dependency between input and output data is linear
For linear separable classes detect if input vector belongs to first or to the second class

Answer = 1, 3

You have two attributes Attribute1 and Attribute2 in a dataset. Attribute2 is always multiple folds larger than Attribute1. What kind of data pre-processing would you apply to ensure that both the attributes are given equal importance by the learning algorithm? Choose all that apply

Standardization
Normalization
Min-Max Transformation
Log transformation

Answer 2-3-4

You are learning a logistic regression model with 1000 attributes. You would like to get a sparse solution for the coefficients. How could you achieve this?

By Applying PCA
Chi-square feature selection
L1 Regularization
L2 Regularization

When applied to different validation dataset composed of 100,000 records, your final model has the following confusion matrix when predicting the value of the churned field: True positive: 15,000 True negative: 79,000 False positive: 1,000 False negative: 5,000

What is the accuracy and the recall of this model?

Accuracy: 84% Recall: 75%
Accuracy: 94% Recall: 98.75%
Accuracy: 94% Recall: 75%
Accuracy: 84% Recall: 98.75%

What can you try to do to improve this new model?

Use regularization in your cost function
Remove one or more feature from the dataset
Acquire or engineer more features in the dataset
Collect more data

You are learning a logistic regression model with 1000 attributes. You would like to avoid giving too much weightage to very few attributes. How could you achieve this?

By Applying PCA
Chi-square feature selection
L1 Regularization
L2 Regularization

I have a 3-layer neural network with seven inputs, two hidden layers of 4 neurons each and one neuron in the output layer. When we say N-layer neural network, we do not count the input layer. The layers are fully connected and a bias is used for each node in the hidden layers. What is the total number neurons and total of learnable parameters in this network?

16, 57
8, 48
10, 51
9, 57

Answer - 9,57

Some recommended ways to initialize the weights of a neural network are

Random initialization
Small Random initialization
Zero initialization
Sparse Initialization
Batch Normalization
Random initialization with variance calibration

Answer - Random

Some cases where the problem of vanishing gradients could occur are

Relu activation
CNN
Zero initialization
Sigmoid activation
RNN

Answer - Sigmoid

A box contains 731 black balls and 2000 white balls. The following process is to be repeated as long as possible. Arbitrarily select two balls from the box. If they are of the same colour, throw them out and put a black ball into the box (enough extra black balls are available to do this). If they are of different colours, place th white ball back into the box and throw the black ball away.
Which of the following is correct?

The process can be applied indefinitly without any a prior bound
The process will stop with a single white ball in the box
The process will stop with a single black ball in the box
The proccess will stop with the box emplty
None of the above

Answer - The process will stop with a single black ball in the box

The hour hand and the minute hands of clock meet at noon and again at midnight. In between they meet N times, where N is:

6
11
12
13
None of the above

Answer - None of the above - N = 10

Given 10 tosses of a coin with probability of head = 0.4 and tail = (1- head probabolity). The probability of at least one head is

(0.4)^10
1 - (0.4)^10
1 - (0.6)^10
(0.6)^10
10(0.4)(0.6)^9

Answer - 1 - (0.6)^10

A cube whose faces are colored is split into 1000 small cubes of equal size. The cubes thus obtained are mixed thoroughly. The probability that a cube drawn at random will have exactly two colored facs is

0.096
0.12
0.104
0.24
None of the above

Answer - 1 - (8 * 12 / 1000 = 0.096)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MultipleChoice

Multiple Choice Questions

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally