联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2023-04-20 09:22

COMP3425 and COMP8410 Data Mining S1 2023

Assignment 2: Description of

Data

Data and Metadata


The data supplied for the assignment arises from The Australian Data Archive’s ANU Poll

Dataverse [1]. As a student of the course, you are assumed to accept the Terms and Conditions

of Use reproduced below. Please read them carefully. The custodian of the data has requested

you delete your data at the end of the course.


In particular the data captures the results of a survey poll conducted in 2019 on the topic of

attitudes and behaviours towards Universities, amongst other things. You can find a complete

description of the purpose of the poll and coding of the data (metadata) and also a descriptive

summary of the poll results here:

https://dataverse.ada.edu.au/dataset.xhtml?persistentId=doi:10.26193/GOVGBB

The data is provided to you for the assignment in two forms. The first is the original dataset

as downloaded from the ADA called 2.ANUPoll2019RoleOfGovernment_CSV_01445.csv, in

comma-separated-values format. This data is described by the metadata in 1.

ADA.CODEBOOK.01445.xslx and the corresponding question text in 1.

ADA.QUESTIONNAIRE.01445.pdf


The second is a form derived from the original, pre-processed for the COMP3425 data mining

assignment, in comma-separated-values format called 3425_data.csv. Below you will find a

description of the pre-processing undertaken and this, in addition to the original metadata,

will be needed to assist your understanding of the data.


If you are a COMP3425 (undergraduate) student, you must work with the pre-processed

dataset 3425_data.csv.


If you are COMP8410 (postgraduate) student you may use either the original or the pre-

processed data, or both. The original will give you more opportunity to show off your technical

skills and creativity, while the pre-processed one is more constrained but may save time,

requiring you to spend less effort understanding the data, and helping to avoid some data

errors. The same rubric will be used for marking in both cases, but the original dataset provides

an extended learning experience and better opportunity for higher marks. Even if you use the

original data, you may find it useful to observe the pre-processing that has been undertaken to

produce 3425_data.csv to seed ideas or to solve problems you encounter.


Pre-processing applied with Excel to derive 3425_data.csv


Only a selection of the original attributes have been retained.

The Q15_safe_gambler column has been added, based on respondent’s answers to

questions Q15a-i, which have answers that range from almost always to never.

Q15_safe_gambler is a normalized number in the range [0,1] that shows the rarity of

the various problem gambling behaviours raised in Q15a-i. Refused and Don’t know

options are replaced by the midpoint value for each question, and the field is null

when the Q15 questions were not asked.

Q15_safe_gambler = IF(NOT(Q14=" "),((IF(OR(Q15a=-98, Q15a =-99),2.5,

Q15a)+(IF(OR(Q15b=-98, Q15b =-99),2.5, Q15b))+(IF(OR(Q15c =-98, Q15c

=-99),2.5, Q15c))+(IF(OR(Q15d =-98, Q15d =-99),2.5, Q15d))+(IF(OR(Q15e

=-98, Q15e4=-99),2.5, Q15e))+(IF(OR(Q15f =-98, Q15f =-99),2.5,

Q15f))+(IF(OR(Q15g =-98, Q15g=-99),2.5 Q15g))+(IF(OR(Q15h=-98, Q15h =-

99),2.5, Q15h))+(IF(OR(Q15i=-98, Q15i =-99),2.5, Q15i)))-9)/27,"")


The binary undecided voter column was added based on the given answer to Q4, and

is TRUE when the answer to Q4 is one of -98, -99, 95, 97 and FALSE otherwise. That

is, IF(OR(OR(OR(Q4=-99, Q4=-98),Q4=95), Q4=97),TRUE,FALSE).

For two categorical columns, nominal Q2 and nominal StateMap, double quotation

marks were added to all non-empty cells. For the rest of the categorical columns,

you can use the same approach to help Rattle recognise categorical data in a column

if necessary. For example, for nominal StateMap, the formula CONCATENATE("""",

StateMap, """") is used. For nominal Q2, the formula CONCATENATE("""", TEXT(Q2,

"0"), """") is used.


References


[1] Biddle, Nicholas; and Reddy, Karuna, 2019, “ANU Poll 2019: Role of the University”,

doi/10.26193/GOVGBB


Terms and Conditions of Use


This data has been distributed exclusively for students of COMP3425 and COMP8410 S1

2023 only. Data must be destroyed at the end of the course but may be re-obtained by

request to the Australian Data Archive.


Furthermore, from https://dataverse.ada.edu.au/dataset.xhtml?persistentId=doi:10.26193/GOVGBB,


I acknowledge that:


1. Use of the material is restricted to use for analytical purposes and that this means that I can only

use the material to produce information of an analytical nature.


Examples of such uses are: (a) the manipulation of data to produce means, correlations or other

descriptive summary measures; (b) the estimation of population characteristics from sample data;

(c) the use of data as input to mathematical models and for other types of analyses (e.g. factor

analysis); and (d) to provide graphical and pictorial representation of characteristics of the

population or sub-sets of the population.


2. The material is not to be used for any non-analytical purposes, or for commercial or financial gain,

without the express written permission of the Australian Data Archive.

Examples of non-analytical purposes are: (a) transmitting or allowing access to the data in part or

whole to any other person / Department / Organisation not a party to this undertaking; and (b)

attempting to match unit record data in whole or in part with any other information for the

purposes of attempting to identify individuals.


3. Outputs (such as statistics, tables and graphs) obtained from analysis of these data may be further

disseminated provided that I:

(a) acknowledge both the original depositors and the Australian Data Archive; (b) acknowledge

another archive where the data file is made available through the Australian Data Archive by

another archive; and (c) declare that those who carried out the original analysis and collection of the

data bear no responsibility for the further analysis or interpretation of it.


4. Use of the material is solely at my risk and I indemnify the Australian Data Archive and its host

institution, The Australian National University.


5. The Australian Data Archive and its host institution, The Australian National University, shall not

be held liable for any breach of this undertaking.


6. The Australian Data Archive and its host institution, The Australian National University, shall not

be held responsible for the accuracy and completeness of the material supplied.


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp