联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2022-03-30 11:28

CS 2210B - Data Structures and Algorithms

Assignment 4 (Programming)

Total Marks: 100

Due Date: March 31, 2022

1 Overview

In this assignment, you will do an investigation in the area of Natural Language Processing (NLP). At

the the same time, you will practice the AVL tree data structure. You will need to use AVL trees for

two steps of the assignment. In NLP, one often needs to count how many times each particular word

occurs in a text (or collection of texts). Let r be the number of times a particular word occurs in the

text. For example, consider the following text:

“I would like to eat an apple, a banana, and a peach. I already ate a banana.”

The word “a” occurs 3 times, the word “an” occurs 1 time, etc. For language modeling, one often

needs to know how many distinct words occur exactly r times in the text. Let N (r) be the number

of distinct words that occur exactly r times. For example, consider the text above again. The word

“I” occurs 2 times, the word “banana” occurs 2 times, the word “a” occurs 3 times. The remaining

10 words occur exactly once. Therefore in this case,

N (3) = 1, since one word (“a”) occurs 3 times in the text;

N (2) = 2, since two words (“I”,“banana”) occur 2 times in the text;

N (1) = 10, since ten words (“would”, “like”, “to”, “eat”, “an”, “apple”, “and”, “peach”,

“already”, “ate”) occur once in the text.

In NLP, N (r)’s need to be computed to build language models. It is also quite interesting to look at

N (r)’s. What we find is that a few words, such as “a”, “the”, “and”, are used very often in English

(that is for large r, N (r) is small). However, about half of all words in text occur only once or twice

(that is for small r, N (r) is very large). To explain this phenomenon, a “Principle of Least Effort”

has been developed. This principle roughly states that people act so as to minimize their probable

average rate of work. Both speaker and listener are trying to minimize their effort. The speaker’s effort

is conserved by having a small vocabulary of common words, and the listener’s effort is lessened by

having a large vocabulary of rarer words (the more there are words, the less ambiguous are messages,

and it’s easier to understand them). The maximally economical compromise between these competing

needs is to have a few very common words and a large amount of rare words. In this assignment, you

are to compute efficiently N (r)’s from a given text. Your main program will be called ’calculateNr’,

and should read the text file specified by a command line argument: ’CalculateNr text.txt’. It should

output N (r)s separately on each line. For our example text above, the program’s output should list

all N (r)’s sorted by the increasing value of r:

N(1) = 10

N(2) = 2

page 2 of 6

N(3) = 1

You have to write all the code yourself. You cannot use code from the textbook, Internet, and any

other sources. You may use java build-in class for linked list java.util.LinkedList and the Iterator

interface java.util.Iterator.

2 Implementation

You are to implement program very efficiently using the AVL tree data structure. First read all the

words from the input file, using the code provided in FileWordRead.java. As you read the words,

insert them in your AVL tree, let’s call this tree T1. The word itself will be the key. The value will be

an integer, which specifies how often the word occurs in the input file. The first time you see a word,

say word ’cat’ you should create an entry (’cat’,1) and insert it into your AVL tree. The second time

you see word ’cat’, you should update the count of the entry (’cat’,1’) to be (’cat’,2’). In our example

text: ‘I would like to eat an apple, a banana, and a peach. I already ate a banana.”, after you finish

reading all the words, the AVL tree T1 should contain the following entries:

(I,2), (would,1), (like,1), (to,1), (eat,1), (an,3),

(apple,1), (banana,2), (and,1), (peach,1), (already,1),

(ate,1).

After this first step, you know how often each word occurs in the input file. Now you need to count

how many words occur exactly 1 time, 2 times, etc. The efficient way of doing this is with another

AVL tree. You construct a new AVL tree, let’s call it T2. The entries in T2 are different from entries

in T1, since the purpose of T2 is different. In T2, the key is the word rate r. Notice that r is in the

value field of the first tree T1. The value field of an entry in T2 is the count of how many words have

the rate r.

Now we are ready to construct T2 from T1. Delete entries from T1 until T1 is empty. Recall that each

entry in T1 is of form (“word”, r), where r is how often the “word” occurs in the text file. For each

deleted entry (“word”, r), if the r is not present in T2 yet, insert a new (r, 1) entry in T2. If key r is

already present in T2, say we have an entry (r, count) in T2, update this entry to (r, count+ 1), since

you found one more word that occurs exactly r times in the input file. After you are done inserting

entries into T2, for any entry (r, c) in T2 c is the count of how many words occur in text exactly r

times. For our example text above:

”I would like to eat an apple, a banana, and a peach. I already ate a banana.”

after you are done reinserting entries from T1 into T2, T2 should contain the following entries:

(1,10), (2,2), (3,1).

When you are done inserting into T2, simply go over the tree in inorder traversal and print out the key

and value fields to the standard output. Notice that the keys in T1 are of type String, but the keys in

T2 are of type Integer. We will use Java Generics for both the keys and the value fields of the Entry

Class.

page 3 of 6

3 Classes Provided

3.1 FileWordRead

This is a program for reading words from file. I wanted to make sure you read exactly the same words

from file as my test program, so you have to use this program. You need the following methods from

this class:

FileWordRead(String name): This is a constructor which takes the name of the file to read

tokens from.

public Iterator getIterator(): Returns iterator over words read from the input

file. Here is how to use the iterator:

FileWordRead words = new FileWordRead(fileName);

Iterator it = words.getIterator(); // grab the iterator into variable it

while (it.hasNext()) { // Check if anything is left in the iterator

String next = it.next(); // get the next item in the iterator

....

3.2 AVLTree

This class implements various AVL operations and is a main supporting file to start with. It has

provided implementation details that would be helpful in presenting the solution. The description of

the methods are found in section 4. The provided methods are mentioned as follows,

/* Function to check if tree is empty */

public boolean isEmpty()

/* Make the tree logically empty */

public void makeEmpty()

/* Function to insert data */

public void insert(int data)

/* Function to get height of node */

private int height(AVLNode t )

/* Function to max of left/right node */

private int max(int lhs, int rhs)

/* Function to insert data recursively */

private AVLNode insert(int x, AVLNode t)

/* Rotate binary tree node with left child */

private AVLNode rotateWithLeftChild(AVLNode k2)

/* Rotate binary tree node with right child */

private AVLNode rotateWithRightChild(AVLNode k1)

page 4 of 6

/* Double rotate binary tree node: first left child * with its right child; then node k3 with new

left child */

private AVLNode doubleWithLeftChild(AVLNode k3)

/* Double rotate binary tree node: first right child * with its left child; then node k1 with new

right child */

private AVLNode doubleWithRightChild(AVLNode k1)

/* Functions to count number of nodes */

public int countNodes()

/* Functions to search for an element */

public boolean search(int val)

/* Function for inorder traversal */

public void inorder()

/* Function for preorder traversal */

public void preorder()

/* Function for postorder traversal */

public void postorder()

3.3 Test

This is a test program for your tree implementation. Compile and run Test.java with your tree

implementation. It has 11 tests, and will tell you whether your implementation passes/fails these

tests. First 10 tests are worth 4 points each, test 11 is worth 10 points, it’s most expensive since it

checks if your tree is a valid AVL tree.

3.4 AVLNode

The AVL Node class implements the structure of node. You will also need it in the AVLTree class

whose partial implementation is provided to you. The detail implementation of AVL Node is also

provided in the file mentioned in section 3.2.

4 Classes to Implement

With reference to provided code, assembly of code is required to create a logical output as elicited in

Test.java.

4.1 AVLTree

This class will extend the implementation of provided AVLTree class. For this class, you must imple-

ment all and only the following public methods in addition to the methods already implemented as

provided in the AVLTree.java:

page 5 of 6

public isBST(t1): This method will return true if the provided tree is BST.

public isBalanced(t1): This method will return true if given BST tree is also AVL.

You can implement any other methods that you want to in this class, but they must be declared as

private methods.

4.2 AVLTreeException

This is the class for AVL tree exception.

4.3 calculateNr

This class contains the main method, declared with the usual method header: public static void

main(String[] args) You can implement any other methods that you want in this class, but they

must be declared as private methods.

In your class AVLTree, you must have a private object of type

5 Coding Style

Your mark will be based partly on your coding style.

Variable and method names should be chosen to reflect their purpose in the program.

Comments, indenting, and whitespace should be used to improve readability.

No variable declarations should appear outside methods (“instance variables”) unless they con-

tain data which is to be maintained in the object from call to call. In other words, variables

which are needed only inside methods, whose value does not have to be remembered until the

next method call, should be declared inside those methods.

6 AVL Tree Implementation vs. Binary Search Tree

If you cannot get the AVL tree to re-balance correctly, you can implement a binary search tree(in which

case you should still have all the classes/method names as specified). The only difference between the

BST and the AVL tree is that BST does not have to be balanced, you do not have to implement

triNodeRestructuring for BST. You will loose 10 marks on Test 11 if you implement BST instead of

AVL tree.

6.1 Grading

Your grade will be computed as follows.

Program compiles, produces a meaningful output: 15 marks

page 6 of 6

Test pass: 50 marks. First 10 tests are worth 4 marks, Test 11 is worth 10 marks.

Coding Style: 20 marks

calculateNr program implementation: 15 marks.

7 Important Note

To be eligible for full marks, your programming assignment must run on the departmental com-

puting equipment. You may develop assignments on your home computer, but you must allow

for the amount of time it will take to get the final programs working on Computer Science’s

machines. All programming assignments must be submitted through OWL.

? The late penalty for programming assignment is [2.5i] (2.5 to the i-th power, rounded to the

nearest integer), where i > 0 is the number of days you are late. So if you hand in your

assignment 1 day late, you will be penalized 3%, a delay of 2 days will decrease your grade by

6%, 3 days is penalized 16% and 4 days takes 39% off your grade. You cannot be more than 4

days late.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp