联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2020-11-13 11:36

2020/11/7 Quiz: Practice Exam Quiz

Practice Exam Quiz

Started: Nov 7 at 17:25

Quiz Instructions

Academic Integrity Declaration

By commencing and/or submitting this assessment I agree that I have read and understood the

University’s policy on academic integrity. (https://academicintegrity.unimelb.edu.au/)

I also agree that:

1. Unless paragraph 2 applies, the work I submit will be original and solely my own work (cheating);

2. I will not seek or receive any assistance from any other person (collusion) except where the work

is for a designated collaborative task, in which case the individual contributions will be indicated;

and,

3. I will not use any sources without proper acknowledgment or referencing (plagiarism).

4. Where the work I submit is a computer program or code, I will ensure that:

1. any code I have copied is clearly noted by identifying the source of that code at the start of the

program or in a header file or, that comments inline identify the start and end of the copied

code; and

2. any modifications to code sourced from elsewhere will be commented upon to show the nature

of the modification.

This exam begins at 12.00 PM Australian Eastern Standard Time (AEST) on Tuesday 23/06/2020 in

Canvas (lms.unimelb.edu.au). The exam must be completed by 2.15 PM AEST on Tuesday

23/06/2020. This exam has 15 minutes of reading time, and 120 minutes of writing time.

Answer all questions

Question 1 1 pts

Assuming each record is allocated to exactly one block and that all blocks are equally

sized, a blocking method that produces more blocks will have a higher reduction ratio.

For any blocking function, blocking reduces the original complexity of O(n^2) for pairwise

comparison to a linear complexity

The Pair Completeness score is likely to decrease if the sizes of all blocks are large.

Select all that are correct statements in the context of data linkage.

2020/11/7 Quiz: Practice Exam Quiz

https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 2/12

Question 2 4 pts

p 0 words

Consider the following XML file:

<?xml version="1.0"?>

<subject code="COMP20008">

<URL> https://handbook.unimelb.edu.au/subjects/comp20008

</url>

<name> Elements of Data Processing </name>

</subject>

<semester>1</semester>

<year/>

(a) Modify the XML so that it is well formed.

(b) Explain why the data format is said to be semi-structured.

</>

2020/11/7 Quiz: Practice Exam Quiz

https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 3/12

Question 3 4 pts

p 0 words

Consider the following temperature data from various weather stations in Victoria:

16, 12, 15, 18, 13, 43, 10

The values are comma separated.

(a) Will the 43 value be classified as an outlier on the Tukey plot? Demonstrate

how you arrive at the conclusion.

(b) Suggest an imputation method for the data and justify your choice.

</>

Question 4 2 pts

Consider the following two plots:

2020/11/7 Quiz: Practice Exam Quiz

https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 4/12

Plot (1) is a VAT plot

Plot (2) is a scatter plot of the first 2 Principal Components of the data.

plot (1)

plot (2)

The data scientist states that the two plots are created from the same dataset. Do

you believe the statement? Justify your answer.

2020/11/7 Quiz: Practice Exam Quiz

https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 5/12

p 0 words </>

Question 5 3 pts

p 0 words

Consider a dataset with 10000 rows and 500 features. Give three reasons why we

might want to apply PCA while analysing the dataset.

</>

Question 6 8 pts

2020/11/7 Quiz: Practice Exam Quiz

https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 6/12

p 0 words

a) Explain with examples what supervised and unsupervised learning. is and what

the key differences are. 4 points

b) Assume you need to build a model from medical data that predicts if a patient

suffers from a particular illness or not. How would you decide whether to use

supervised or unsupervised learning? 4 points

</>

Question 7 4 pts

Assume you use k-nn clustering on a data set. Describe a method for choosing the

best value for k?

Edit View Insert Format Tools Table

12pt Paragraph

Edit View Insert Format Tools Table

12pt Paragraph

2020/11/7 Quiz: Practice Exam Quiz

https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 7/12

p 0 words </>

Question 8 4 pts

p 0 words

You work in a bank and are in charge of classifying customer data into two groups:

loans that are likely to be repaid and loans that are likely not to be repaid. You

have come up with two feature sets that give you the same accuracy using a

decision tree algorithm. However, one set gives relatively more false positives,

whilst the other gives relatively more false negatives. Explain how you would

choose the 'best' set.

</>

Edit View Insert Format Tools Table

12pt Paragraph

2020/11/7 Quiz: Practice Exam Quiz

https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 8/12

Question 9 6 pts

p 0 words

In the 1980s many regression type models forecast that the world would run out of

oil by 2010. Clearly we still have oil. Explain what went wrong and how you would

build a better forecasting model.

</>

Question 10 4 pts

Revise the following regular expression meta operators:

( ) [ ] { } . * + ? ^ $ | \

For each of the following, give a couple of examples of strings which the regular

expression would match. Describe (colloquially, in a manner that a non-technical

person would understand) the set of strings that the pattern is designed to match.

(a) /[a-zA-Z]+/

(b) /p[aeiou]{0,2}t/

Edit View Insert Format Tools Table

12pt Paragraph

Edit View Insert Format Tools Table

2020/11/7 Quiz: Practice Exam Quiz

https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 9/12

p 0 words </>

Question 11 3 pts

Given the following set of training instances (A, B, C, D):

Feature 1

Feature

2

Feature

3

Class

A sunny hot high N

B sunny mild medium N

C overcast mild high Y

D overcast mild medium Y

Show the use “information gain” to perform “filter-based feature selection”. Select

the best 2 features.

Edit View Insert Format Tools Table

12pt Paragraph

Edit View Insert Format Tools Table

12pt Paragraph

2020/11/7 Quiz: Practice Exam Quiz

https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 10/12

p 0 words </>

Question 12 3 pts

Given a user query to be applied on a dataset, differential privacy involves adding

noise to the true result for the query, and then returning the noisy result to the user.

Explain two factors which influence how much noise should be added to the query

result. For each factor, you should explain how it is related to the level of noise that

gets added.

Edit View Insert Format Tools Table

12pt Paragraph

2020/11/7 Quiz: Practice Exam Quiz

https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 11/12

p 0 words </>

Question 13 1 pts

We can use Pearson correlation as the similarity measure

User based methods are personalised but item based methods are not.

Item based and user based methods are mathematically similar, both have similar

performance

Item based methods have the cold-start problem, but user based methods do not have

the problem.

In the context of recommender systems, which of the following is correct about

collaborative filtering method:

Question 14 1 pts

A scatter plot of the x and y attributes

A parallel coordinate plot

A boxplot

A bar plot

A dataset has 2 numeric attributes x and y. From which plot(s) can one observe

the distribution of the y attribute?

Question 15 1 pts

In the context of privacy, which of the statements about k-anonymity and ldiversity

is correct:

2020/11/7 Quiz: Practice Exam Quiz

https://canvas.lms.unimelb.edu.au/courses/12027/quizzes/77429/take 12/12

No new data to save. Last checked at 14:27

A dataset satisfies l-diversity if there are at least l combinations of values of the quasiidentifiers

for every sensitive attribute value.

The value of l will be no greater than k

A dataset satisfies k-anonymity if every record in the data is indistinguishable from at

least k− 1 other records with respect to each individual attribute

Question 16 1 pts

.xls is structured data.

.csv is structured

an image file is semi-structured

pdf is unstructured data

Which of the following are true? select all correct answers:

Submit Quiz


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp