联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2019-07-10 10:34

CSCI 1300

Summer 2019

Assignment 4

Task 1

Task 1.1

Write a function called read_data that takes the following

inputs:

● An array of doubles -- call it A

● The size of the array -- call it size

This function will read up to size numbers from a file called numbers.txt. As the name

suggests, this file contains a space separated list of numbers. Each number read from the file

will be stored in A. The function will continue reading numbers from numbers.txt until either it

reaches the end of the file, or it fills up the array. The function will return the amount of

numbers that it was able to read.

Task 1.2

Write your main. Your main will test out the functionality of your read_data function by filling

an array with double values. Use the returned value from read_data to determine how many

numbers were read into the input array. Use that value as the new size of the array. Call your

file task1.cpp

Task 2

Task 2.1

Write a function called count_words that takes a single argument, a string

filename.

● The function will open a file with the provided filename for reading.

● The goal of this function is to count the number of occurrences of each word

in the file

○ We will define a “word” as a sequence of characters

separated by a whitespace character.

● We will accomplish this using an unordered_map data structure from the

C++ standard library (keyed by a string, storing an integer value).

○ As we read a word in the file, we can count it’s occurrence in the

following way

// unordered_map keyed by string, and storing an

integer unordered_map<string, int> word_counter;

// adds one to the count of the string

“some_word” word_counter[“some_word”] +=

1;

Once all words have been read and counted, we will then return the

unordered_map

Task 2.2

In your main, we wish to count the occurrences of words in

two files:

● trainneg.txt

● trainpos.txt We will use the count_words function to return

two unordered_maps -- one for each file. We will then write to two files:

count_neg.txt

● count_pos.txt Each file will contain the list of unique words in

trainneg.txt and trainpos.txt along with their respective word counts on

each line.

Format your files in the following way:

● WORD

● NUM_OCCURENCES

Name your file task2.cpp

Extra Credit 2 -- 30pts

Our trainneg.txt/trainpos.txt files are more than simply collections of words. Each

line in these files contain a movie review. trainneg.txt contains negative reviews and

trainpos.txt contains positive reviews. So in our previous task, we basically counted the

occurrences of words in both negative and positive movie reviews. We can use this data

to predict whether a particular movie review is positive or negative depending on the words

used in the review! Your task is to write a program that does the following:

main:

● Reads two files produced from our previous task:

○ count_neg.txt

○ count_pos.txt

● Stores the words and their respective counts

into two unordered_maps

A separate function:

● This function will take the following inputs:

○ a string that represents a single movie review

○ The two unordered_maps that contain the word counts for

words in positive/negative reviews

● The function will be used to classify whether or not this movie review is

positive or negative (What return type do you think you would want for such a

function?).

● How do we classify a movie review given our word

counts?

○ We can use a simple probabilistic model called the Naive Bayes

Classifier to accomplish this task.

○ I recommend reading the following book chapter to learn about the Naive

Bayes Classifier https://web.stanford.edu/~jurafsky/slp3/6.pdf

○ TLDR; We can use the word counts for positive/negative reviews along with

a strong simplifying assumption to calculate the probability that a review is positive

or negative.

○ For those of you who do not understand basic probability notation,

here is a basic primer

■ Let P(x) denote the probability that ‘x’ occurs

■ Let P(x | y) denote the probability that ‘x’ occurs given that

‘y’ has occurred.

Back in main:

● Read two files:

○ testpos.txt

○ testneg.txt

● These files will contain movie reviews for us to test the effectiveness of our

classifier For each review in these files, classify it as either positive or negative

Count the number of correct classifications and determine the accuracy of your

classifier

See if you can get greater than 60% classification accuracy.

Name your file task2E.cpp

Zip all your files and submit it to Moodle.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp