联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2022-04-20 08:39

COMP 2150 – Winter 2022 – Assignment 4

Due April 25 at 11:59 PM

In this assignment, you will implement javascript classes to support Huffman data compression.

Data compression (as used in zip and rar, for instance) is the process of taking large files and

making them smaller by analyzing their contents. In this assignment the algorithm we will use is

called Huffman encoding, and it works by building a binary tree that encodes each character as

a sequence of 0s and 1s. The encoding is variable-length: some characters will be encoded as

a short sequence of 0s and 1s, while other (more rare) characters will be encoded with longer

sequences.

Part 1: Hash Table Dictionary

Create a javascript class called Dictionary that is implemented as a hash table. In the

Dictionary, each entry is a Key-Value pair, and there can't be two Key-Value pairs in a dictionary

with the same Key (that is, keys are unique in a dictionary). The Key will be an object that has a

hashVal method (see below for the Hashable class and hashVal information). A Value can be

anything (except undefined).

When initialized, the Dictionary should be given a size as a parameter1

. The Dictionary will

create an array of that size as a field. When adding or searching for a Key-Value pair in the

Dictionary, the position of the pair in the hash table will be the key’s hashVal modulo the length

of the array.

The hash table should use separate chaining. If there is a collision between two elements based

on their hashVal, the hash table should place both elements in a linked list of elements at the

array position. The order of elements in the linked list at a position is not important.

The Dictionary should implement the following methods:

1. put(k,v): takes a Hashable key k and a value v and adds it to the dictionary, if it does not

exist. If a key-value pair with k as the key already exists, the current value is replaced

with v. The method does not have a return value.

2. get(k): takes a Hashable key k and returns the value v associated with it, if it appears in

the dictionary. The method should return undefined if the key does not appear in the

dictionary.

3. contains(k): takes a Hashable key k and determines if it is a key in the dictionary.

Returns a boolean value.

4. isEmpty(): returns a boolean depending on whether the dictionary is empty or not.

For each of the first three methods, the method should ensure that the key parameter has type

Hashable (or has a hashVal() method, and an equals() method, as appropriate), and it should

throw an error if the condition does not hold.

1 When using Dictionaries in this assignment, you should set the size to a large value of your choice.

Hashable

You should also implement a hierarchy of Hashable classes that can be stored as Keys in the

hash table. The Hashable class should be abstract (using the techniques shown in class) with

an abstract hashVal() and an abstract equals(x) method. You should implement at least the

following two Hashable subclasses:

• IntHash, whose hashVal function is the value of the integer. Two IntHash objects are

equal if they contain the same integer value.

• StringHash, whose hashVal function is the following expression:

s[0]*pn-1 + s[1]*pn-2 + ...+ s[n-3]*p2 + s[n-2]*p + s[n-1]

where s[i] is the ASCII value of the i-th character of the string, n is the length of the

string, and p is a prime.2 You can assume that all characters in the strings are ASCII

characters. Two StringHash objects are equal if they contain the same strings.

Note: to get the ASCII value of a character from a string, you can use .charCodeAt():

x.charCodeAt(i) gives the ASCII value of the character at position i of the string x.

The subclasses can have additional methods if you need them, including getters. You may

implement further Hashable subclasses, but you are not required to do so.

2 In your code you should use a prime number of your choice that is less than 1000.

Part 2: Huffman Trees

To allow us to complete Part 3, you will create a class to represent a Huffman Tree, which is a

type of binary tree used in Huffman Encoding (in Part 3). A Huffman Tree has the following

properties:

1. Each tree is a binary tree. The children are not ordered in the same way as a binary

search tree – we only talk about “left” and “right” subtrees. The tree is also not structured

in a balanced way: left and right subtrees of a node, for instance, may have different

heights.

2. Every internal node has two children.

3. Leaf nodes in a Huffman tree are labelled with a single character. Internal nodes in the

tree are not labelled with any data.

4. Each Huffman Tree has a weight, which is a floating point value between zero and one.

Create a class that represents Huffman Trees. You will need the following operations:

1. Creating a new tree: you should be able to create a tree that has one node, based on

two parameters: a single character and a weight. This will create a tree with one node (a

leaf node) and the weight given.

2. Combine two trees to create a new tree: you should be able to create a tree from two

other trees (the left and right subtrees). The new tree should have a new root node,

whose children are the two subtrees given, and the weight of the new tree is the sum of

the weights of the two subtrees given.

3. You should implement a compareTo method for Huffman Trees. This method should

take a parameter (another Huffman Tree) and return +1,0 or -1 to represent whether the

parameter tree comes before (+1) or after (-1) the tree whose method is being called.

For any two trees, the tree with the lowest weight should come first. If the two trees

have the same weight, the tree that contains (in any leaf) the smallest character (in

order) should come first.3 For example, if two trees had the same weight and if one tree

contained the characters “C”,”O”,”M”,”P” and the other contained “E”,”N”,”G”,”L”, then the

first tree would come first (it contains “C”). Because of how Huffman Trees are built,

you do not have to worry about two trees containing the same smallest character (but if

they did, you could return 0).

4. You will additionally need a search or traversal method in your Huffman Trees. This

should be able to determine, for each character in the tree, the path (left/right moves)

from the root to the leaf labelled with that character. This is described more in step 4 of

the algorithm in Part 3.

3 To compare characters, use the less than operator < in javascript.

Part 3: Huffman Encoding

After creating your Dictionary and Huffman Tree classes, use them to implement Huffman

encoding. Huffman encoding is used to compress text files by replacing characters in the text

with sequences of bits (0-1).4

The algorithm works as follows:

1. The input to the algorithm is a text file.

2. Before encoding anything, the entire file should be read, and character frequencies are

calculated for each character that appears in the file. You then calculate, for each

character X in the file “what is the percentage of the file that is the character X?” (as a

number between 0 and 1). Store this information in a Dictionary from Part 1. This

calculation should be done for all characters, including spaces, newlines, etc.

3. Next, the Huffman Encoding is constructed. The Huffman Encoding is built by

constructing a set of Huffman trees, and then joining these trees until one final Huffman

tree remains.

a. Initially, we construct a tree for each character in the file. The weight used for

these initial trees are the percentages (0-1) calculated in step 2.

b. At each step, we choose the two Huffman Trees in the set that are smallest,

according to the order given by the compareTo algorithm in Part 2. We join

these two trees, using the joining algorithm in Part 2. (The left tree should be the

smaller tree using compareTo). Remove the two trees that were joined from the

set of Huffman Trees and add the new tree (thus, the set will have one fewer tree

at each step).

c. Repeat this until all of the Huffman Trees are combined into one tree, which we

call T.

To support this step, you may need to construct additional javascript classes.

4. Then, the Huffman code is calculated. For each different character X that appears in the

file, calculate the path from the root of T to the leaf that is labelled with the character X.

Build a string that represents this path to X: Each time the path moves from a node to

the left child, add a zero to the string and each time it moves to the right child, add a 1 to

the string.

5. Read the characters from the input file again. For each character in the input file, write

the binary encoding sequence from step 4 to the output file. Add spaces between each

symbol but no newline characters, except for one at the end.

See an example run of Huffman encoding at the end of the assignment.

Your Huffman Encoder class should have a constructor and one method called encode(). The

constructor should take the name of the file that should be encoded. The encode() method

should open a new output file (by adding – not replacing - the “.huff” extension to the original file

name), open the input file, and write the codes for the Huffman algorithm to the output file. (You

should write the values as strings, with a single space separating the codes. Do not output any

newline symbols, except for one at the end. Yes, the output file will be larger than the input file.)

You do not need to implement a decoder.

4 You will notice after finishing this assignment that this technique does not compress files – this is because we are

encoding 0s and 1s as characters, not as bits. This saves us the issue of having to deal with individual bits and

writing them to files.

Hand-in

Submit all source code for all classes. Submit all files on umlearn.

You will be provided a test text file for your encoder before the deadline. Submit your output on

thatfile with your submission.

You MUST submit all of your files in a zip file. Additionally, you MUST follow these rules:

• All of the files required for your project should be in a single directory.

• Include a README.TXT file that describes exactly how to run your code from the

command line. The markers will be using these instructions exactly and if your code

does not run directly, you will lose marks.

The easier it is for your assignment to mark, the more marks you are likely to get. Do yourself a

favour.

Example Huffman encoding

Consider the string “TOBEORNOTTOBETHATISTHEBANANA”. The Huffman algorithm first

calculates the following weights:

Character Weight

T 0.21428571428571427

O 0.14285714285714285

B 0.10714285714285714

E 0.10714285714285714

R 0.03571428571428571

N 0.10714285714285714

H 0.07142857142857142

A 0.14285714285714285

I 0.03571428571428571

S 0.03571428571428571

The tree joinings that occur are:

1. Join I and R into a tree (weight 0.07142857142857142)

2. Join S and H into a tree (weight 0.10714285714285714)

3. Join I,R and B into a tree (weight 0.17857142857142855)

4. Join E and S,H into a tree (weight 0.21428571428571427)

5. Join N and A into a tree (weight 0.25)

6. Join O and I,R,B into a tree (weight 0.3214285714285714)

7. Join E,S,H and T into a tree (weight 0.42857142857142855)

8. Join N,A and O,I,R,B (weight 0.5714285714285714)

9. Join N,A,O,I,R,B and E,S,H,T into a final tree.

The final tree is illustrated below.

Thus, the Huffman encoding for the file is:

T 01

E 000

S 0010

H 0011

N 100

A 101

O 110

I 11100

R 11101

B 1111

E N


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp