Understanding Data and their Environment
Assessment 1: provenance report
Note: these assessment tasks are group tasks and will be marked based on asingle submission.
Upload via blackboard:
1. PROV-N text file (as a .txt or .provn file)
2. Report with diagrams (as pdf)
a. Remember to indicate your teamname (e.g. “Blue”) in the report and PROV!
Your task is to formalise the provenance of your week 1-3 group work as PROV-N, from the point the group designed the survey questions, through to including the generation of the metadata schema. For added complexity you should also include the indicator task (even if your indicator did not use your own survey). Do not include the generation of the initial provenance graph in group exercise or this week 4 work.
Note: For this task, you should focus on your group’s actual process as students, rather than the process that you were simulating.
Part 1: PROV-N text file
You will write a single PROV-N text file that include the statements you deem sufficient to describe the provenance of the data. As this is done after-the-fact, it does not have to be fully accurate, but it should aim to be representative.
Consider modelling the provenance in this order:
a. Responsibility view – which agents were attributed for which engagements? Were any external actors involved? (but: use Alice/Bob/Charlie-style placeholders instead of actual personal names within your group)
b. Data flow view – how did the information move from one entity to another? Which pre- existing entities were sourced? Did any entities evolve overtime?
c. Process view – which activities were performed that directly or indirectly led to the above entities? Include digital activities as well as essential non-digital activities.
d. Add attributes for types, attributes, roles, plans.
Name your own custom types and attributes, or explore schema.org (for instance http://schema.org/Dataset) and Dublin Core Terms ( http://purl.org/dc/terms/) Add the right prefix statements to provn! See References below.
Think about the granularity you want to detail provenance in (e.g. scope of entities and activities, which relations to include). Try to make sure the provenance trace is internally consistent. You can justify your modelling decisions in the report; but in PROV-N try to showcase your knowledge and explore detail levels beyond the week 4 lab exercises.
Avoid using personal information (it is not important to this assessment who did what in the group work, but it’s important to show that everyone contributed). You can use approximate date/time stamps, but make sure they are chronologically consistent.
Try to make sure the prov-n is syntactically valid, however it is better to submit a more complete (but syntactically invalid) PROV-N file than an incomplete (but valid) file.
Then try to programmatically generate a diagram from PROV-N, saved as SVG vector image or PNG bitmap image. You can use either the PROV ToolBox command line tool, the PROVStore web service, or both. Note that if you want to show the interactive diagrams, you may have to take a screenshot.
Tip: The order of statements in prov-n does not matter, so if you get an error message without line message, use copy-paste and delete 50% of the lines (except the document... endDocument statements) until you hit the bug. (This is Newton’s method or binary search applied to debugging! https://en.wikipedia.org/wiki/Newton's_method).
Note: The provconvert command line tool in PROV Toolbox gives more detailed error messages on validation but can be harder to install and use. The ProvStore service is hosted by KCL and may be shutdown by November 2024.
Assessment: One member of team to upload a provn text file in Blackboard (see Submission Instructions below)
NB: Include any diagrams in the report (see below)
Part 2: Textual report
Write a brief report (word limit: 500 excluding headings/captions), considering:
1. How did you decide which entities, activities and agents you needed?
How did you decide on their identifiers and types? You can justify here any activities/entities/agents you did not include.
2. How would your provenance look different if you had modelled it instead for:
a. lower granularity (simplified for wider audience)
b. higher granularity (detailed for researchers interviewing same subjects)
Indicate what you would add/remove/change, and what design decisions this would imply.
3. Did you find any part of writing provenance easy? What was most challenging? Reflect on any team disagreements on how to model the provenance.
Tip: To include a high resolution diagram in the report you may need to convert it to SVG rather than PNG.
Assessment: Upload report to Backboard, preferably as PDF (see below).
How to collaborate on the assessed work?
This assessed work is marked on a group-wide basis. You can use your Group room in Microsoft Teams to organise the work between you, try to split the tasks so all can contribute (e.g. split PROV-N tasks by view, split report writing by sections).
You can share a Word document using OneDrive with the team to allow collaborative editing, rather
then passing the baton. For Blackboard, remember to save the PROV-N file as a plaintext file, not Word document.
Set internal deadlines for team-wide reviews and discussions, don’t let one person dominate even if
they claim to understand the topic better. Make sure you have enough time for final editing so the
PROV-N is consistent (e.g. identifiers are the same across views), the report is coherent and below word limit, and the updated diagrams are included.
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。