联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2022-03-21 08:52

COMP 551 - Midterm

MCGILL UNIVERSITY

WINTER SEMESTER, 2022

Campus: Montreal - Downtown

COMPUTER SCIENCE

Applied Machine Learning Course br>NOTE: You need to submit your write-up by 07:00am March 22nd in Mycourses under the assignment

tab. Please note that your submitted answers must be typed and machine readable and may be

checked with text-matching software. You can assume the exam is open book and it is justified to

check relevant materials but you may not copy paste the exact text or image from the slides or other

sources. All answers needs to be in your own wording and reflect your own understanding.

By writing my name below, I confirm that all my exam work will be done entirely by myself, with no help

from others. I will not provide any information about the exam’s contents and or my solutions to other

people until after March 30th 2022.

McGill ID No: Student name:

SECTION A: ML Fundamentals

This part checks your understanding of the basic concepts and algorithms in Machine Learning. Please use text,

formula, images (created by yourself) or any combinations of them to answer the questions.

1. Explain each of the following models in a paragraph (use about 200 words or less). Discuss

when they work (e.g. what kind of data this method is useful for, model complexity, if they

can do classification, regression or both, etc.), how they work (explain the parameters and

discuss what they are learning and how), why they work (discuss their inductive bias) and

what needs to be considered specific to them to make them work (do you need to do data

normalization, regularization, etc.).

(a) Nearest Neighbours

(b) Decision Trees

(c) Naive Bayes

(d) Linear Regression

(e) Logistic Regression

(f) Softmax Regression

(g) Multilayer Perceptron (MLP)

(h) Convolutional Neural Networks (CNN)

CONTINUED

– 2 – COMP 551 - Midterm

2. Compare the following models with regards to each other. Use about 50 words or less per

comparison. Focus on their key differences and/or similarities in terms of data/task they

can be applied to, model complexity and efficiency, loss function, etc.

(a) Linear Regression v.s. Logistic Regression

(b) Logistic Regression v.s. Softmax Regression

(c) Logistic Regression v.s. Naive Bayes Classifier

(d) Logistic Regression v.s. Multilayer Perceptron (MLP)

(e) Multilayer Perceptron (MLP) v.s. Convolutional Neural Networks (CNN)

3. Explain each of the following concepts discussed in the course with few lines (use about or

less than 50 words).

(a) Over-fitting and Under-fitting

(b) Bias and Variance trade-off

(c) Regularization

(d) Generalization

(e) Hyper-parameter

4. Explain the gradient decent approach in a short paragraph (use less than 100 words) and

discuss what the Adam (Adaptive Moment Estimation) algorithm is doing to make it work

better.

5. Explain Maximum Likelihood Estimation (MLE) in the context of fitting a model to the

given data, and how it is different from Bayesian approach, discuss the MAP estimate and

how it relates the two, explain if it result in lower or higher variance. (use about or less than

200 words)

SECTION B: ML Practitioner’s Knowledge

This part checks your depth of understanding of the different concept with more specific questions.

6. When using gradient descent algorithm, are we guaranteed to find a local minimum? please

explain.

7. Can we use gradient descent to solve a linear regression problem? and if so, could it result

in multiple local optimum solutions?

CONTINUED

– 3 – COMP 551 - Midterm

8. How will the bias and variance of a trained model change with each of the following? e.g.

answer less bias but more variance; or less bias but variance stays the same, etc.

(a) increasing the number of data points the model learns from

(b) increasing k in a k-nearest neighbour model

(c) pruning a decision tree

(d) increasing the regularization parameter (λ) in Ridge regression

(e) adding dropout to an MLP

(f) reducing the batch size when training an MLP with stochastic gradient descent

9. Regularization is more important for a model that have higher or lower expressiveness power?

please discuss.

10. With back-propagation, can we learn the globally optimum solution for fully connected feed-

forward network with one hidden layer (2 layers MLP)? please explain.

SECTION C: ML Innerworkings Knowledge

This part checks your depth of understanding of the different algorithms with more specific questions.

11. What is the prediction for x = [1 0 1], when using a Gaussian Naive Bayes model that is

trained using maximum likelihood on the following data:

12. What is the prediction for input x = [1 0 1], when using a Softmax regression model with

the following weights:

w =

??1 2 31 0 0

0 2 1

??

13. What is the maximum likelihood estimate for the parameter w when using the following cost

function and training data? please write the derivations.

J(w) = 1

2

n(y

(n) ? wx(n))2

D = {(1, 1), (1, 3), (2, 1), (2, 5), (2, 6), (3, 1), (3, 8), (3, 4), (5, 10), (6, 10)}

CONTINUED

– 4 – COMP 551 - Midterm

14. In the class, we discussed using a linear regression model (y? = wTx + b) for binary clas-

sification, where you set the targets to {0, 1}, is not a good idea since the L2 loss

(L2(y, y?) =

1

2

(y ? y?)2) used by the linear regression model may penalize confident correct

predictions. Can we fix this by using a modified hinge loss defined as:

L(y, y?) =

{

max(0, y) y? = 0

1?min(1, y) y? = 1

15. If we change the logistic regression model to use σ?(z) = e

?z

1+e?z instead of the original sigmoid

function, σ(z) = 1

1+e?z , and when trained using the same binary cross entropy loss, what

would happen to the parameters and predictions of the model? how will they change? please

explain and justify your answer.

16. What happens when you increase the momentum in Adam algorithm (i.e. increasing β1)?

What about β2? Recall that in Adam we have:

M{t} ← β1M{t?1} + (1? β1)?J(w{t?1})

S{t} ← β2S{t?1} + (1? β2)?J(w{t?1})2

w{t}← w{t?1}? α√

S?{t}+?

M?{t}

17. Consider a two layered multi layered perceptron model given by u = σ(W ReLu(V x)) where

σ(x) = (1 + e?x)?1 and ReLu = max(0, x) when the loss function is set to L = |y? ? y|.

Consider learning the parameters of this model with stochastic gradient decent (SGD) with

learning rate of 0.5, and assume W {t} = [?1 1] and V {t} =

[

1 0

0 ?1

]

. Now consider a

training example with x = [2 ? 2] and y = 0, calculate V {t+1} and W {t+1}.

18. Consider the following input and and convolution filter and when using zero padding of 1

and stride of 2, what is output of y[1, 1] assuming indexing starts at zero?

SECTION D: ML Application Scenarios

This part checks your understanding of how to apply the techniques we discussed in the course. Please use text,

formula, images (created by yourself) or any combinations of them to answer the questions.

19. Consider you are applying a linear regression model to estimate corn yield. The data contains

many irrelevant feature or measurements as it is not known what actually impacts crop

productivity. Will you use Lasso or Ridge regression? please explain and justify your answer.

CONTINUED

– 5 – COMP 551 - Midterm

20. Consider you have a spam detection model (spam = 1, not spam = 0) to filter the messages

sent to you. You can control a parameter in the model to increase or decrease the recall.

If you set the parameter to have higher recall, you expect to see more or less spam in your

inbox? do you expect to see more or less actual emails being forwarded to your spam folder?

please explain.

21. Assume you are working with a company that wants to design a fake news detection classifier

for news articles. The company is providing you 1000 example articles they have manually

labeled as fake or not fake and is asking you to develop a model for them using deep learning

(e.g. MLP). Assume that they have a good feature extractor that converts a given text to a

vector which serves as the input for your model. Please discuss and justify your answers to

the following questions:

(a) how will you split this dataset to design/train/test your model?

(b) What would you do if after training the first model you try, your training loss is too

high? could asking more data to be labeled help?

(c) What would you do if your training loss is low but your loss on validation set varies too

much between different runs?

(d) How confident you would be on your trained model being able to detect fake news when

deployed, assuming you are able to achieve low validation/test loss? please discuss.

22. Assume you are working in a hospital to make a system that helps doctors with cancer

diagnosis. You have been given example data of 1000 patients with their different test

measurements and diagnosis outcome (no cancer, brain cancer, lung cancer, etc.). Please

discuss and justify your answers to the following questions:

(a) What classifier will you use for this task? explain, justify your choice.

(b) Considering some tests are more expensive to run (e.g. performing MRIs are more costly

for the hospital compared to blood tests), which means some features are more costly

to obtain for a new patient, how would you modify your model to be able to work with

less features and ask for more only when needed?

ANSWER SHEET FOLLOWS

ANSWER SHEET – 6 – COMP 551 - Midterm

By writing my name below, I confirm that all my exam work will be done entirely by myself,

with no help from others. I will not provide any information about the exam’s contents and

or my solutions to other people until after March 30th 2022.

McGill ID No: Student name:

This exam is marked out of 180 points and contributes to 30% of your final grade. The grade

breakdown is provided below. For multi-part questions, points are equally distributed.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp