联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2022-07-25 08:51

Module Code: CMT224

Module Title: Social Computing

Lecturer: Dr Liam Turner

Assessment Title: Social Computing Portfolio

Assessment Number: 1

Date Set: 18th July 2022

Submission Date and Time: 8th August 2022 at 9:30am

Return Date: 5th September 2022


This assignment is worth 100% of the total marks available for this module. If coursework is

submitted late (and where there are no extenuating circumstances):


1 If the assessment is submitted no later than 24 hours after the deadline,

the mark for the assessment will be capped at the minimum pass mark;

2 If the assessment is submitted more than 24 hours after the deadline, a

mark of 0 will be given for the assessment.


Your submission must include the official Coursework Submission Cover sheet, which can be

found here:


https://docs.cs.cf.ac.uk/downloads/coursework/Coversheet.pdf


Submission Instructions



Description Type Name

Cover sheet Compulsory One PDF (.pdf) file [student_number].pdf

Part 1 Notebook

(Using the template provided on Learning

Central)

Compulsory One IPython Notebook file (.ipynb) [student_number]-part-1.ipynb

Part 2 Notebook

(Using the template provided on Learning

Central)

Compulsory One IPython Notebook file (.ipynb) [student_number]-part-2.ipynb

Part 3 Notebook

(Using the template provided on Learning

Central)

Compulsory One IPython Notebook file (.ipynb) [student_number]-part-3.ipynb


Any code submitted will be run on a system equivalent to your University provided laptop

and must be submitted as stipulated in the instructions above.


Any deviation from the submission instructions above (including the number and types of

files submitted) will result in a mark of zero for the assessment or question part.


Staff reserve the right to invite students to a meeting to discuss coursework submissions



Assignment


You are tasked with analysing various datasets representing different types of social and

communication behaviour. These datasets are provided as files and can be found alongside

this coursework pro-forma on Learning Central. You should ONLY use the files provided as

they are intentionally modified subsets of public datasets1.


Alongside the dataset files, there are 3 (THREE) IPython notebooks, named part-1.ipynb,

part-2.ipynb, and part-3.ipynb, which you should solely use to complete the assignment and

submit these in line with the Submission Instructions section above. The cells in each

completed notebook will be run in the order that they appear. You do not need to resubmit

the dataset files.


You are required to address 16 total questions across the 3 parts. Each part is made up of 1

or 2 tasks containing multiple questions. These questions are also listed below for

convenience.


For EACH question in EACH notebook:


1. Complete the cell below each question marked with “#CODE:” with the Python code

needed to generate any new information you need for your answer. This information

should be outputted when the cell is ran and any floating-point values should be

presented to 2 decimal places unless they are less than 0.01.


2. Complete the cell below this marked with “ANSWER:” with your answer to the

question, referring to the information outputted above (as well as any previous cell if

needed). In doing so, briefly explain your approach and methods/measures used to

answer the question and justify any choices made. Each answer cell should (ideally)

be no more than 125 words.


Each question is worth 6 marks (making a total of 96/100 possible marks) and a further 4

marks (4/100) will awarded for the overall usability and readability of the notebooks

submitted. Marks will be awarded using the criteria described in the Criteria for assessment

section below.


You may use any Python packages locally installed or installable via pip on your University

provided laptop. “%pip install ” commands should be placed in the cell

below “Install Python packages (pip only)” provided at the top of each notebook. “import

” lines for all packages required for the notebook to be ran successfully

should be placed in the cell under “Import Python packages” provided at the top of each

notebook. You may add additional cells throughout the notebooks, but this should be

minimised.


1 Jure Leskovec, & Andrej Krevl. SNAP Datasets: Stanford Large Network Dataset Collection.

http://snap.stanford.edu/data

Questions (Duplicated from the notebook files)


Part 1: Social media behaviour data


Task 1 of 1


Examine the Graph Modelling Language (gml) files

"socialmedia_cmt224r_reply_network.gml" (reply network) and

"socialmedia_cmt224r_social_network.gml" (social network) which represent Twitter data

between a sample of users over several days at the time of the Higgs boson particle discovery.

Both networks are directed and share the same ids for nodes (anonymised Twitter users).

However, the shared user ids are contained within the "label" attribute in the .gml files, not

the node "id" attribute of each individual .gml file.


In the reply network, an edge from a node, , to some other node, , indicates that replied

to a Tweet made by during the time period. Replies are also Tweets. Edges are weighted

with the weight representing the number of times this happened over the time period.


In the social network, an edge from node to indicates that follows on the social media

platform.


Using these networks, answer the following questions:


Q1. How does the topological structure of the reply network differ from the social network

in terms of the fraction of mutual connections (i.e., users that follow each other) and

the number of connected groups of users?


Q2. Do the 20 users that follow the most other users also reply to the most amount of

users?


Q3. To what extent does the number of followers a user has in the social network correlate

with the number of users that have replied to them?


Q4. Do users typically ONLY reply to Tweets, are ONLY replied to, or BOTH?


Q5. Of the users that ONLY reply to Tweets, how many ONLY do so to those users they are

following?


Q6. How many users have ONLY mutual following connections AND ONLY mutual reply

connections with these SAME users?




Part 2: Email behaviour data


Task 1 of 2


Examine the file "emails_cmt224r.edgelist" which represents email behaviour at an

organisation. Each line contains two numbers, and , separated by a blank space. Consider

each number as an identifier for an individual in an organisation, with the space on each line

representing that the individual, , sent at least one email to another individual, , at some

point. Model the data using an appropriate network representation and answer the following

questions:


Q1. How many individuals have a higher or lower ratio of mutual connections than the

ratio of mutual connections found in the overall network?


Q2. Are occurrences of induced, connected subgraphs of 3 individuals (triads) with only

mutual connections more abundant in the network than those with a mixture of

asymmetric and mutual edges?


Q3. Using the largest, strongly connected component (where at least one path exists

between each individual and all others), could the connectivity be suggested to be

reflective of a small world phenomenon in comparison to a comparative random

network?


Task 2 of 2


Examine the JSON file "emails_cmt224_departments.json" (departments file). Keys in the

departments file represent individuals using the same ids as in the

"emails_cmt224r.edgelist" file in Part 2, Task 1 and the values represent a department id

that the individual can be attributed to. Using the contents of the departments file in

combination with the network in Part 2, Task 1, answer the following questions:


Q1. Using the connections that individuals have in the network, are they more likely to

mix with others in their department or those with a similar number of connections?


Q2. Are all departments with 12 or more members more tightly connected amongst

themselves in comparison to all individuals across the overall network irrespective of

their department? Where in this context, 'more tightly connected' is defined as

having less sparsity in the connections among members AND more clustered

connections. In addition to answering the overall question as yes or no, provide a list

of departments this is true for (if any) and not true for (if any).

Part 3: Peer-to-peer message behaviour data


Task 1 of 2


Examine the file "p2p_msg_cmt224r.csv" which represents messaging behaviour between

users on a messaging platform. Each row has four columns, representing a single event where

a person (person_a) messaged another person (person_b) on some date (date) at some time

of day (time). From this, answer the following questions:


Q1. Select a suitable network structure and build a network to represent social

connections based on the messaging behaviour that took place in the first 7 days. In

doing so, assume that one or more messages from one person to another represents

a MUTUAL underlying social connection (i.e., regardless of whether person_a

messaged person_b, vice versa, or both).


Q2. Build another suitable network to represent social connections based on ALL

message behaviour in the dataset. In doing so, assume that one or messages from

one person to another represents a MUTUAL underlying social connection (i.e.,

regardless of whether person_a messaged person_b, vice versa, or both). Can the

social phenomenon, ‘Triadic Closure’, be supported for the COMMON nodes that

exist in both the network created from data from the first 7 days (i.e., from Task 1,

Q1) and the network built from all message behaviour.


Q3. Using the largest connected component of the cumulative network constructed in

Task 1, Q2, what is the average and standard deviation of the MAXIMUM degree of

separation between an individual and all others?


Task 2 of 2


Using the largest connected component of the social network constructed from all data in

Task 1, Q2, assume the role of an outsider with complete visibility of the network that now

wishes to spread a hypothetical message such that everyone in the component would know

the information it contained as quickly as possible. Assume that messages will now spread in

sequential timesteps using the following mechanism. If an individual is told the message at

timestep , the individual will forward the message to all of their direct connections at

timestep +1. Individuals can therefore be told the message more than once. From this,

answer the following questions using network analysis measures as part of the approach:


Q1. Assume that you have to choose 1 individual to tell the message to at timestep 0.

What set of individuals could you choose this individual from and how many

timesteps would be needed for everyone in the component to receive the message?


Q2. Assume that you have to choose 5 individuals to tell the message to at timestep 0.

Provide an example set of 5 individuals that would result in the message being

received by everyone in the component in fewer timesteps than in Q1.


Learning Outcomes Assessed


1. Analyse fundamental traits of complex networks by synthesising theoretical concepts

and methodologies from graph theory.

2. Evaluate and implement computational approaches to model and visualise complex

social phenomena.

3. Design and create software to investigate or support human interaction behaviour.


Criteria for assessment


Credit will be awarded against the following criteria. There are 100 marks available for this

assignment. Each of the 16 questions are worth 6 marks, split between up to 3 marks for the

approach and implementation and up to 3 marks for the explanations and justifications of

the approach and implementation. This totals 96/100 possible marks. Marks will be

awarded using the following criteria:


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp