代做Science 320 S2、代写Python设计、代做Canvas留学生、Python编程设计调试-代写Python编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> Python编程Python编程

代做Science 320 S2、代写Python设计、代做Canvas留学生、Python编程设计调试

日期：2019-09-27 10:05

Computer Science 320 S2 (2019)

Assignment 4

Due date Sep 28, 2019 23:59pm

Answer all of the following questions. There are 10 points, which contribute 5% of your total course

marks. Submit a properly typesetted pdf file (LATEX preferred) of your answers to Canvas before the

deadline. There is no automarker for the Python program. To aid the markers, scanned handwritten

solutions or late submissions are NOT accepted.

Email spam filtering

You are implementing a new email spam filtering for the University of Auckland. If the email comes from

one of these trusted addresses, it is not a spam. Otherwise, it will be a spam. Since the server memory

is very limited, you can not keep the list of trusted addresses on its memory (if an email address requires

25 bytes on average, it would take 25 GB to store A in the memory).

After searching on Google, you have found a very memory-efficient solution. That is, instead of

keeping A on memory, you will construct a bit array B of size n representing for the set A. You choose

n = 8000, 000, 000 bits so that you need only 1 GB of memory. The construction is as follows.

• B is initialized by 0s.

• You choose a hash function h : a 7→ [0, n). In other words, the input of h will be an string a (e.g.

email address) and the hash value will be an integer ranging between 0 and n.

• For each trusted email address a ∈ A, you hash a into one of n buckets, and set that bit to 1. In

order words, you simply set B[h(a)] = 1.

The filtering mechanism works as follows.

• When you receive a new email from the address a0, you compute the hash value h(a0).• If B[h(a0)] = 1, you consider that this email is not a spam and let it go through.• If B[h(a0)] = 0, you consider that this email is a spam and discard it.

Theoretical questions for measuring the performance of the filtering (5 pts):

1. Illustrate that if a new email from the address a ∈ A, it always gets through (1 pts).

2. Given any position 0 ≤ i < n, what is the probability that B[i] = 1 (2 pts).

3. Given a spam email from the address a

0 ∈/ A, what is the probability that it gets through (2 pts).

Practical implementation for measuring the performance of the filtering (5 pts):

Write a Python script to implement this email spam filtering technique given the scaled setting. Your

data structure B has size n = 8000, 000 bits. For simplicity, assume that your trusted email addresses

and spam email are presented as integers. In particular, trusted email addresses A = {1, 2, ..., 1000000}

and spam email addresses are any integer x > 1000, 000. You are free to choose your hash function to

hash an integer into the range [0, n).

1. Verify that any email from the address 1 ≤ a ≤ 1000, 000 it always gets through (1 pts).

2. Compute the probability that a spam email going through your filter given this setting. (1 pts).

3. Generate 1000 random integers x > 1000, 000 as spam email addresses and compute the number of

spam emails going through your filter. Verify this value with your theoretical value from the step

2 (3 pts).

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：COMP3211代做、代写Python编程语言、代做programming留学生、Python代写

【下一篇】：COMP3211代做、代写Python编程语言、代做programming留学生、Python代写

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Python编程Python编程

代做Science 320 S2、代写Python设计、代做Canvas留学生、Python编程设计调试

日期：2019-09-27 10:05

相关文章