COURSEWORK RUBRIC
Module Code and Name: BS3519 ‘Exploratory Data Analysis’
Weighting: 100%
Deadline: 7 January 2026
Word Count: There is a page limit of 10 pages comprising screenshots from software and brief commentary upon them.
GENERAL GUIDANCE
Referencing:
This is required when you are discussing the source of your data or if you use some methodology other than those supplied in the module materials. You should quote any software code that you use and indicate whether it has been derived via any Artificial Intelligence tool such as ChatGPT (see below). There is no need to reference any software code provided in the module material.
Generic Marking Criteria
See Appendix 1.
Peer assessment (Only application to group work)
Buddy Check ☐
Other ☐
N/A
Submission
Students must submit their assignment in two parts – on two separate links 1) the write-up as a PDF and 2) the data as a CSV file (reformatted ready for analysis) both on Learning Central and by 11.00 a.m. UK Time on Wednesday 7th January 2026. A submission point will be created there for that purpose 4 weeks before the deadline. Work submitted late without an extension will be capped at the pass mark for the first 24hrs and set to zero after that time.
Should you experience any difficulties in submitting your work, please contact the UG Hub (CARBS [email protected]) immediately.
Extenuating Circumstances
If you experience extenuating circumstances which means you are unable to submit your assessment on time or where you have submitted your work but feel that your circumstances are related to a long-term health condition, protected characteristic and/or caring responsibility, you can declare your extenuating circumstances and request one of the following remedies: -
⦁ Deadline Extension to your submission deadline
⦁ A deferral to your next submission opportunity
⦁ An Exam Board remedy (only applies to students who have participated in an assessment but have circumstances related to a long-term health condition, protected characteristic and/or caring responsibilities. (evidence required)
You will NOT be able to declare your extenuating circumstances more than 2 weeks before a deadline.
Mark Return and Feedback
Students should expect their mark and the mark distribution of the whole cohort (as part of generic feedback) to be returned on 3 February 2026. However, until these marks are ratified by the Examination Board, any marks given will be provisional. Individual feedback will be provided to each candidate.
COURSE TASKS
Objective of the Assignment
The principal objective is to report on your Exploratory Data Analysis of data which you will have sourced as specified in the first few weeks of the module.
Note that, for assignments submitted in the Resit Period, individual data sets will be supplied since there will be insufficient time to carry out a comparable data choice and acquisition process.
Guidelines on choosing data
Three data sources will be specified – a company information database, a global economic data repository and another repository detailing incidents of crime in the UK Between them, these data sources cover a variety of topics and it will be a good idea to choose data relating to an area which you have some affinity. Your set of data does not to have a direct or obvious connection to business since almost any topic can have business relevance. For example, crime has major socioeconomic impact in the UK.
Typically, participants will have thousands of cases in their data set. To get the most out of the data analysis, you need to have at least hundreds of cases (rows) and at least four variables (columns) - which is a very small data set in today’s terms.
You will be guided through techniques for preparing data prior to analyzing them.
Using ChatGPT
This Artificial Intelligence tool, and others like it, offer a very useful way of exploring an area of interest. Experience with such tools show that they can occasionally provide completely inaccurate information, and that some of their outputs may be so generalized, or ‘cooked up’ as to be meaningless, hence it is important to check out the validity of the results by using rephrased queries and the same queries with similar AI tools. We will be using the R Statistical Programming Environment, and you will be provided with what amount to templates of code which you can modify as needed. Hence, you do not need to have any coding experience to take this module – though you will no doubt acquire some coding ability which is a very ‘saleable’ transferrable skill. If you use ChatGPT to generate any R code, the same caveat applies to this as is mentioned above.
Useful areas to cover in the analysis of your data set
Description of the data – variables, background information about the data (‘metadata’, as they are called).
Any ethical issues which you think may be important. These may be related to the topic under study, the metadata, the data themselves, any individuals involved whether subjects or respondents, the data collectors or you as the researcher.
Explanation of what is happening in the analysis methods (include any R code you use)
Your findings in the form. of hypotheses and features of the data that the EDA methods suggest to you.
N.B. Remember that EDA is about finding hypotheses, trends, outliers, patterns, etc. in data and NOT about confirming hypotheses with your data. Of course, you may well find features of the data which one might have expected, and this is fine. Report any trends, relationships, clusters and groupings, outliers and exceptions, problems with missing values, unusual features, etc. within the data.
Use screenshots from software, text output from R, etc. to illustrate your findings.
Don’t forget to report negative results and conclusions (e.g. things that you tried but didn’t bring out any features in the data), but, please, still describe briefly the methods you used in such cases.
What constitutes a good assignment?
People will have varying assignment topics linked to the variables being acquired and will have acquired different data. However, a good assignment needs to have
Background information on the topic or subject area from whence the data came and a description of the, e.g. their origin, particular circumstances, any peculiarities they might have, any difficulties in acquiring and formatting them, etc. [20%]
An outline of the methodologies (univariate, bivariate, multivariate, etc.) you used in analysing the data. (20%)
The hypotheses and data features (trends, outliers, associations between variables, clusters, etc.) which have emerged from your analysis. (As mentioned above, don’t forget to include negative results such as methods you tried which unexpectedly didn’t show anything of interest.) [20%]
Coherence (not just a list of things you did) and a clear presentation [20%]
Evidence of an innovative approach - so that the reader can see that you were willing to try new things [10%]
Ethical issues which you feel could apply to the assignment [10%]
The percentages in square brackets indicate the weight given to each factor in the assessment. For indicative levels of performance on each of these criteria, see Appendix 1
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。