联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2022-07-18 08:31

Assignment 2: Record Reader

CSE30: Computer Organization and Systems, Summer Session 1

2022

Instructors: John Eldon and Nishant Bhaskar

Due: Friday July 15th, 2022 @ 11:59PM

Please read over the entire assignment before starting to get a sense of what you will need to

get done in the next week. REMEMBER: Everyone procrastinates but it is important to know

that you are procrastinating and still leave yourself enough time to finish. Start early, start often.

You MUST run the assignment on the pi-cluster. You HAVE to SSH: You will not be able to

compile or run the assignment otherwise. The assignments are getting longer as this course

progresses.

ACADEMIC INTEGRITY REMINDER: you should do this assignment entirely on

your own with help only from course staff. Consulting with other students (past

or present) who are not in the course may result in an academic integrity violation

which can have serious consequences.

Need help or instructions? See CSE 30 FAQs

Table of Contents

1. Learning Goals

2. Assignment Overview

3. Getting Started

a. Developing Your Program

b. Running Your Program

4. How the Program Works

5. Submission and Grading

a. Submitting

b. Grading Breakdown [50 pts]

Learning Goals

● Parsing input

● Using pointers in C

● Using arrays of pointers

● Allocating memory at runtime with malloc()

Assignment Overview

A common format for moving data between systems is called a “Delimiter Separated Values”

(DSV) file. A DSV file uses a delimiter (',', '#', ' ', '\t', etc) to separate values on a line

of data1. Each line of the file, or “data record”, ends with a newline2 ('\n'). Each record

consists of one or more data fields (columns) separated by the delimiter. Fields are numbered

from left to right starting with column 1. A DSV file stores tabular data (numbers and text) in

plain text (ASCII strings). In a proper DSV file, each line will always have the same number of

fields (columns).

In the example above we have a sample of a Call Detail Record (CDR) in DSV format with the

delimiter as a comma. A CDR file describes a cell phone voice call from one phone to another

phone. Each record has 10 fields or columns. In this example, the first record of the file is a

label for that column (field). Each column can be empty, and the last column is never followed

by a ','. It always ends with a '\n' for every record.

The data that you will deal with in this assignment is sent in a DSV file where the delimiter is

whitespace.You must pick out the right columns from these DSVs to analyze.

2 In Unix systems, lines generally end with '\n' which is ASCII 0x0a (linefeed). On Windows systems,

the end of line is sometimes 2 characters "\r\n" ASCII 0x0d (carriage return) followed by ASCII 0x0a

(linefeed). The Wikipedia page describes this awkward situation. Windows’ use of CRLF dates back to

MS-DOS (1981). Unix, of course, is older. TOPS-10 (c 1967) used CRLF so it predates Unix. The terms

Linefeed and Carriage Return refer to old teletypes: A linefeed would scroll the paper to the next line and

a carriage return would move the print head to the left side of the paper. The Teletype Model 33 is a

famous one. (Professor Chin used this in elementary school!)

1 You may have noticed this option when saving a spreadsheet (Save As CSV file).

Getting Started

Developing Your Program

For this class, you MUST compile and run your programs on the pi-cluster. To access the

server, you will need your cs30wi22xx student account, and know how to SSH into the cluster.

We’ve provided you with the starter code at the following link:

https://github.com/cse30-su122/A2-starter.git

1. Download the files in the repository.

a. You can either use

git clone https://github.com/cse30-su122/A2-starter.git

directly from pi-cluster, or download the repo locally and scp the folder to

pi-cluster if that doesn’t work.

2. Fill out the fields in the README before turning in.

Running Your Program

We’ve provided you with a Makefile so compiling your program should be easy!

Additionally, we’ve placed the reference solution binary at:

/home/linux/ieng6/cs30wi22/public/bin/reader-a2-ref

You can use reader-a2-ref as a command as follows:

/home/linux/ieng6/cs30s122/public/bin/reader-a2-ref -c num_cols

< input-file

Makefile: The Makefile provided will create a reader executable from the source files

provided. Compile it by typing make into the terminal. By default, it will run with warnings turned

on. To compile without warnings, run make WARN= instead. Run make clean to remove files

generated by make.

How the Program Works

You will be given the number of columns in each record of the file and a list of column indices.

There may also be a flag, -c. Read input from stdin, parse it as a DSV, and print the

specified columns in the given order. If the -c flag was provided, you also need to print

statistics about the number of records in the input and the longest field (in terms of number of

characters).

Executable called with stdin from the command-line:

./reader [-c] num_cols

The angle brackets around “list of cols ...” indicates it is a mandatory list.

It may look like the program hangs. It is waiting for you to type input and enter it on the

command-line. Type what you want and press enter to send that line to stdin. If you want a

more streamlined way to enter text, redirect stdin as shown below.

Executable called with stdin redirected from a file:

./reader [-c] num_cols < input_filename

“< input_filename” is not part of the command-line arguments, and you will find neither

“<” nor “input_filename” in argv! The “<” is the indirection operator. It is not part of C,

and is in fact part of the bash shell. It sends the contents of input_filename to your

program’s stdin. You can read more about redirection operators on GNU.org.

Delimiters

In our DSVs, fields will be delimited by whitespace of any length. For example, 2 spaces

(' ') is a single delimiter, as is 1 space (' '), 1 tab ('\t'), 3 tabs ('\t\t\t'), or 1 space

and 1 tab (' \t'), or similar combinations. Therefore, the following two DSVs would,

functionally, have the same records.

It means no worries

For the\trest of your days

It's our problem-free philosophy

Hakuna\t\t Matata!

It means no worries

For the rest of your days

It's our problem-free\tphilosophy

Hakuna \t Matata!

Once you have read one record (line) from stdin, your job is to delimit it and print out the

columns corresponding to the indices that are passed in.

Program Implementation

You will be given the following as command-line arguments:

● The number of columns in a record

● A list of column indices

○ This list of indices always comes after the number of columns.

● Optionally: The -c flag. This flag can only appear as the first argument.

Note: You should only choose from the following library functions: malloc(), free(),

atoi(), printf(), fprintf(), getline(), strcmp(), strlen(), and exit() in your

code. You may not use any other function such as strtok().

Input guarantees

You do not have to handle degenerate input - input that violates any of these guarantees.

● All records will have at least one column.

● All the records in the file will have the same number of columns.

● No record will begin with whitespace.

● There will not be any empty columns. (In fact, it is impossible to create one: Our delimiter

is any amount of whitespace, and the only remaining possible location, the beginning, is

ruled out because no record can begin with whitespace.)

● The -c flag only appears as the first argument. If it appears, it will be as argv[1].

● The value provided for “number of columns” on the command line will be positive.

● The indices are valid decimal integers.

● The indices provided will be in-bounds, i.e. if num_col is the number of columns in a

record, then the indices will always be in the range [-num_col, num_col - 1]

inclusive.

● The only whitespace characters will be space (' '), tab ('\t'), and newline ('\n').

○ Only ' ' and '\t' can be delimiters, since '\n' indicates the end of a line.

Implementing Option Processing

First we will implement option processing in order to parse the optional -c flag.

The -c flag can only be used as the first argument. This means that, if it appears, it will always

be as argv[1]. You can compare two strings by using strcmp(). Its function signature

appears below. strcmp returns 0 if str1 and str2 are identical.

int strcmp(const char *str1, const char *str2)

You should set up a variable to indicate whether the -c flag appeared as argv[1]. You will use

it when you are collecting the data for statistics to print at the end.

Parsing the command-line arguments

The first argument that is not the -c flag will always be the number of columns in each record,

num_cols.

./reader 10 4 -1 9 4 ./reader -c 25 -10 11

In the leftmost example above, this would be the string “10”. In the rightmost example, this

would be the string “25”. As usual, you should convert this string to a number that you can use.

We will come back to num_cols in the next section.

The arguments following num_cols will be a list of columns that we want to print out, in the

order we want to print them. Indices can be negative as well as positive! We are following

Python’s indexing convention for the column indices, where negative indices count backwards

from the end of the array. For instance, if we have the array

int array[] = {10, 20, 30, 40};

then array[-1] would be 40, and array[-4] would be 10.

There can be duplicate numbers listed, and this list can be of any length. We won’t know until

runtime. To accommodate this, we need to use malloc() to allocate a dynamic amount of

memory for our storage (see example below).

Create an array of ints to store this list of indices. The number of elements in the array is equal

to the number of columns that will be written to stdout. This array's size is determined by the

number of arguments passed on the command line after num_cols. Each entry will contain the

column index to be output, and the order of the indices within this array is the order to be

written. Let’s assume there were 4 output column arguments: 4, -1, 9, 4. Then the output buffer,

named *obuf here, would look like this.

Think about how you could calculate the number of indices, to allocate memory for int *obuf,

without needing to iterate over argv. argc may be useful here. However, you will need to

iterate over argv at least once to fill out the contents of obuf.

Parsing the record from stdin

To prepare for parsing the record from stdin, use malloc() to allocate an array of pointers to

chars. This array is named **buf in the picture below. The number of array entries is the

same as the number of columns in a record, num_cols, which we found earlier. The purpose of

this array is to store the beginning of each column of a record. Since the record is an array of

chars, a location within it will be the location of a char, and therefore be of type char *.

As an example, say we are processing records that have 10 columns. The input processing

array will look like this.

If we had the following record while processing records with 5 columns:

I never thought hyenas essential

Then buf would contain the following pointers:

Pointer to

“I” in “I”

Pointer to “n” in

“never”

Pointer to “t” in

“thought”

Pointer to “h” in

“hyenas”

Pointer to “e” in

“essential”

I never thought hyenas essential

Here is how you can allocate an array of 20 int pointers and an array of 15 chars (char*).

.Processing the input from stdin

Read one line from stdin into an input buffer, using getline(). The function prototype for

getline is as follows:

ssize_t getline(char** lineptr, size_t* n, FILE*stream);

The last parameter to getline is a FILE *stream. In the previous assignment, this was

returned from fopen(). In C, there are 3 FILE pointers that are automatically opened for you :

stdin (standard input), stdout (standard output), and stderr (standard error). Input from

the terminal or from a redirected file is associated with stdin. Output to the terminal or a

redirected file are associated with stdout and stderr.

getline takes in a pointer to an input buffer (somewhere to store the characters it read), a

pointer to a variable to store the number of characters it read, and the source to read from. In

this example, let's call this pointer readbuffer. You will need to set this input buffer

readbuffer as NULL so that getline can use it. getline internally does a malloc, which

means there is no need to malloc space for readbuffer. However, it does require that

readbuffer is initialized to NULL the first time getline is called with readbuffer3.

In our case, the source will be stdin. The characters read into the input buffer will always be

terminated with a '\0'. Try man getline for more information.

Once you’ve read the line, fill out the input array (char **buf). No record will start with

whitespace, so the first character of the record is the beginning of the first column. Set the first

element of the input array (buf[0]) to point to the beginning of the input buffer readbuffer.

Walk through the rest of readbuffer to continue filling out buf. As you walk through, search

for whitespace, and stop once you see the '\0' that indicates the end of the buffer. For each

whitespace character located directly after a non-whitespace character, or the newline character

after the last column, replace it with a '\0' to terminate that substring. Continue walking until

you see another non-whitespace character. This next non-whitespace character is the first

character of the next field, so store a pointer to it in the next entry of buf.

At the end of processing the input line, you should have an array buf filled with pointers to the

start of each column in the record, and each column is a properly '\0'-terminated C string.

You should not need to zero out the array before each pass of getline; getline will

overwrite the contents anyway.

3 Something to think about, why does this variable need to be char ** instead of char *?

Before processing a record

Say we have an input buffer readbuffer that getline reads into, and it looks like this

pre-processing. Empty cells represent the space character.

T h e P r i d e \t \t L a n d s \n \0

After processing a record

This is what readbuffer would look like post-processing. Red cells are those that were

replaced with '\0's, while blue cells are the whitespace characters that remained unchanged.

T h e \0 P r i d e \0 \t L a n d s \0 \0

The input array buf would have these contents. Notice that buf[2] does not point to the blue

cell in readbuffer. You should not assume that the next character after a whitespace is going

to be the start of the next record.

Pointer to “T” in “The” Pointer to “P” in “Pride” Pointer to “L” in “Lands”

T h e \0 P r i d e \0 \t L a n d s \0 \0

Printing

Once you have filled out buf, print the elements in buf corresponding to the indices you stored

in obuf. Loop over obuf and for each number ind in it, print buf[ind] to stdout using

printf. Why does this work? You’ve null-terminated each of the individual columns in the

record, which means each of these substrings is now a proper C string.

Place a space between every word you print out. However, do not print a space at the end of

the line.

Loop back to the call to getline and repeat the process until you read EOF, indicating that

there is no more input.

Statistics

If the -c flag is set, collect some statistics about the input while you are parsing it.

● Number of records in the input

● Longest field in the input (in terms of number of characters)

After you are done processing every record in the input, print out these statistics to stdout.

Formatting strings for these statistics are provided in the starter code, and you need to calculate

the numbers as stated.

Examples

# Contents of circle-of-life.txt

In the circle of life

It's the wheel of fortune

It's the leap of faith

It's the band of hope

'Til we find our place

# Command-line

./reader 5 0 3 < circle-of-life.txt

# Output

In of

It's of

It's of

It's of

'Til our

# Contents of circle-of-life.txt

In the circle of life

It's the wheel of fortune

It's the leap of faith

It's the band of hope

'Til we find our place

# Command-line

./reader 5 2 -1 < circle-of-life.txt

# Output

circle life

wheel fortune

leap faith

band hope

find place

# Contents of be-prepared.txt

I know that your powers of retention

Are as wet as a warthog's backside

But thick as you are, pay attention

My words are a matter of pride

# Command-line

./reader 7 -7 1 -5 3 -3 5 -1 < be-prepared.txt

# Output

I know that your powers of retention

Are as wet as a warthog's backside

But thick as you are, pay attention

My words are a matter of pride

# Contents of be-prepared.txt

I know that your powers of retention

Are as wet as a warthog's backside

But thick as you are, pay attention

My words are a matter of pride

# Command-line

./reader -c 7 -7 1 -5 3 -3 5 -1 < be-prepared.txt

# Output

I know that your powers of retention

Are as wet as a warthog's backside

But thick as you are, pay attention

My words are a matter of pride

Number of lines: 4

Longest field: 9 characters

Midpoint

This part of the assignment is due earlier than the full assignment, on

Sunday 7/10 at 11:59 pm PST. There are no late submissions on the

Midpoint.

Complete the Gradescope assignment “Midpoint: Programming Assignment 2”, an Online

Assignment that is done entirely through Gradescope. This assignment consists of a few quiz

questions and a free-response question where you will document your pseudocode or the

algorithm in plain English. This is a planning document and does not need to reflect your final

implementation, although you are encouraged to keep it up to date.

Describe the complete algorithm for your program. Cover all the steps required to parse the

input arguments, get a line of input from stdin, identify where each column begins, print the

specified columns, and collect statistics.

Submission and Grading

Submitting

1. Submit your files to Gradescope under the assignment titled “Programming Assignment

2”.

You will submit the following files:

reader.c

README.md

You can upload multiple files to Gradescope by holding CTRL (? on a Mac) while you

are clicking the files. You can also hold SHIFT to select all files between a start point

and an endpoint.

Alternatively, you can place all files in a folder and upload the folder to the assignment.

Gradescope will upload all files in the folder. You can also zip all of the files and upload

the .zip to the assignment. Ensure that the files you submit are not in a nested folder.

2. After submitting, the autograder will run a few tests:

a. Check that all required files were submitted.

b. Check that reader compiles.

c. Runs some sanity tests on the resulting reader executable.

Grading Breakdown [50 pts]

Make sure to check the autograder output after submitting! We will be running

additional tests after the deadline passes to determine your final grade. Also, throughout this

course, make sure to write your own test cases. It is bad practice to rely on the minimal

public autograder tests as this is an insufficient test of your program.

To encourage you to write your own tests, we are not providing any public tests that have

not already been detailed in this writeup.

The assignment will be graded out of 50 points and will be allocated as follows:

● Midpoint writeup: 5 points. This part of the assignment is due earlier than the full

assignment, on Sunday 7/10 at 11:59 pm PST. Complete the Gradescope

assignment “Midpoint: Programming Assignment 2”.

● Code compiles without warnings: 1 point.

● No Memory Leaks : 4 points

● Public tests with the provided examples : 16 points

● Private tests with hidden test cases : 24 points

NOTE: The tests expect an EXACT output match with the reference binary. There will be

NO partial credit for any differences in the output. Test your code - do NOT rely on the

autograder for program validation (read this sentence again).

Make sure your assignment compiles correctly through the provided Makefile on the

pi-cluster without warnings. Any assignment that does not compile will receive 0 credit.

Note: You should only choose from the following library functions: malloc(), free(),

atoi(), printf(), fprintf(), getline(), strcmp(), strlen(), and exit() in your

code. You may not use any other function such as strtok().

Checking For Exact Output Match

A common technique is to redirect the outputs to files and compare against the reference

solution4:

./your-program args > output; our-reference args > ref

diff -s output ref

This command will output lines that differ with a < or > in front to tell which file the line came

from.

4 You might want to check out vimdiff on the pi-cluster


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp