THE HONG KONG UNIVERSITY OF SCIENCE & TECHNOLOGY

Department of Computer Science and Engineering

MSBD5008: Introduction to Social Computing

Fall 2020 Assignment 1

IMPORTANT NOTES

Your grade will be based on the correctness and clarity.

Late submission: 25 marks will be deducted for every 24 hours after the deadline.

ZERO-Tolerance on Plagiarism: All involved parties will get zero mark.

NetworkX

In this question, you are required to use NetworkX to do basic data analysis on a Wikipedia vote network dataset. It

contains 7,115 nodes and 103,689 (directed) edges. The dataset can be downloaded from http://snap.stanford.

edu/data/wiki-Vote.html.

1. Use the function nx.read edgelist() to load the dataset Wiki-Vote.txt.

2. Output the following information related to degree:

average degree, average in-degree, average out-degree;

degree distribution (plot both the degree and frequency in log scale);

density (E/N2

), where E is the number of edges, and N is the number of nodes;

3. Find the largest strongly connected component (giant component), and output the number of nodes in it;

4. Output the following information about this giant component related to distance and clustering:

distribution of path length

average path length;

distribution of clustering coefficient;

average clustering coefficient.

5. Treat the network as undirected. Output the following information related to degree:

average degree;

degree distribution (plot both the degree and frequency in log scale);

density (E/N2

).

1

Deep Graph Library (DGL)

In this question, you are required to use DGL to build a graph neural network for node classification. The dataset

hw dataset.pkl can be downloaded from https://drive.google.com/file/d/1cZo93mIX37kI0wBKxulWE8CwSvjfUGZH/

view?usp=sharing

1. Load the dataset with the following command:

dataset = pkl.load(open("hw_dataset.pkl", "rb"))

This file contains a dictionary object with the following information of a directed graph:

nodes: a list containing the id’s of all the nodes in the graph;

labels: a list containing the label of each node;

num classes: the total number of node labels;

features: a matrix of size: number-of-nodes × feature-dimensionality;

source nodes: a list containing the source node-id of each (directed) edge;

target nodes: a list containing the target node-id of each (directed) edge;

train mask: a list (of values “True” or “False”) indicating whether each node is used in the training set or not;

val mask: This has the same format as train mask, and shows whether each node is used in the validation set

or not.

2. You have to use the graph neural network model dgl.nn.pytorch.conv.GINConv in DGL. It implements the following

neighborhood aggregation:

This model includes the graph neural network model discussed in class, but is more general. For details, read

https://docs.dgl.ai/api/python/nn.pytorch.html#dgl.nn.pytorch.conv.GINConv.

Your task is to find a model with high node classification accuracy. Your grade will be based on your model’s node

classification accuracy on a test set (which is hidden from you). We will use the following code to test your model.

Your code should include a test function (with your model and a mask as inputs) so that we do not need to retrain

your model.

load_checkpoint("best_model.pth", model)

# the test_mask here is hidden for you. you can replace the test_mask with the val_mask.

accuracy = test(model, test_mask)

print("Testing Acc {:.4}".format(accuracy))

Please also use the following functions

to save your final model:

def save_checkpoint(checkpoint_path, model):

# state_dict: a Python dictionary object that:

# - for a model, maps each layer to its parameter tensor;

state = {’state_dict’: model.state_dict()}

torch.save(state, checkpoint_path)

print(’model saved to %s’ % checkpoint_path)

save_checkpoint("best_model.pth", model)

2

to reload your model for evaluation:

def load_checkpoint(checkpoint_path, model):

state = torch.load(checkpoint_path)

model.load_state_dict(state[’state_dict’])

print(’model loaded from %s’ % checkpoint_path)

load_checkpoint("best_model.pth", model)

Submission Guidelines

Please submit two Python notebooks (A1.ipynb and A2.ipynb) and a report (report.pdf) for your results and conclusions.

The submitted folder should be Zip all the files into A1 awangab 12345678 (replace awangab with your ust

account and 12345678 your student id). Please submit the assignment by uploading the compressed file to Canvas.

Note that the assignment should be clearly legible, otherwise you may lose some points if the assignment is difficult

to read. Plagiarism will lead to zero point on this assignment.

3

版权所有：留学生编程辅导网 2018 All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。