联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2021-10-13 10:05

CSC 230 Project 3

Shell Pattern Matcher

Shell programs like the Bourne shell or bash (the default shell on the common platform) support a pattern matching syntax. This

is used for matching filenames or other syntax in shell commands. It lets us use a question mark to match any single character,

or an asterisk to match any sequence of zero or more characters. Other characters just match themselves. For this project, you're

going to write a program that uses the shell pattern syntax to match lines from a given file.

Our program will be called match. You can run it as follows to match lines from a file named file-d.txt that end with ".html".

The sample input file-d.txt is provided with the starter. It contains a bunch of randomly generated filenames.

$ ./match '*.html' file-d.txt

special-blue-monster.html

little-orange-shoe.html

old-pink-hat.html

special-red-shoe.html

first-gray-shoe.html

dirty-pink-car.html

big-blue-rock.html

special-pink-monster.html

dirty-gray-hat.html

special-green-rock.html

big-blue-car.html

first-orange-rock.html

old-red-car.html

special-red-rock.html

old-gray-monster.html

Or, you can run the match program as follows, to match all the lines from this file that have three characters between two dashes

and a 3-character sequence at the end after a dot. For this particular input file, the only lines that match have the word "red" in

the middle and end with either ".exe", ".txt", ".mp3" or ".jpg". The "-n" option tells the program to report line numbers with each

line it matches.

$ ./match -n '*-???-*.???' file-d.txt

3 little-red-shoe.exe

14 first-red-hat.mp3

16 first-red-monster.exe

19 first-red-shoe.txt

26 dirty-red-book.exe

29 little-red-rock.mp3

38 little-red-car.txt

40 old-red-hat.jpg

48 old-red-rock.exe

51 dirty-red-rock.jpg

58 special-red-monster.exe

64 dirty-red-car.txt

65 first-red-rock.txt

88 dirty-red-monster.txt

92 little-red-rock.jpg

100 first-red-book.mp3

As in project 2, you'll be developing this project using git for revision control. You should be able to just unpack the starter into

the p3 directory of your cloned repo to get started. See the Getting Started section for instructions.

This homework supports a number of our course objectives. See the Learning Outcomes section for a list.

Rules for Project 3

You get to complete this project individually. If you're unsure what's permitted, you can have a look at the academic integrity

guidelines in the course syllabus.

In the design section, you'll see some instructions for how your implementation is expected to work. Be sure you follow these

rules. It's not enough to just turn in a working program; your program needs to follow the design constraints we've asked you to

follow.

Requirements

You're going to write a program named match. It will let the user match lines from a text input file against a given pattern. It also

supports command-line options to include line numbers in the output and to invert the match (just printing lines that don't match

the pattern).

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 2/13

Input and Output

The match program will print a report of matching lines to standard output. It will read input from a file given on the commandline.

The input file will always be the last command-line argument. For example, when run as follows, the program will be reading

input from a file named file-c.txt:

$ ./match 'a???w' file-c.txt

When reporting matching lines to standard output, each output line should be limited to at most 80 characters. This is a standard

width for terminal windows. If the output line requires more than 80 characters, you will print up to character 78, then print ".."

at the end to indicate that you weren't able to show the entire line. For example, consider an input file containing the following

two lines:

This is a sort line.

This is a much longer line, containing a bunch of extra characters that won't fit in just 80 columns of output.

If both of these lines matched the pattern, then they would be printed as follows. The entire first line will fit, but, for the second

line, we can only print the first 78 characters, then ".." to show there was more of this line that we weren't able to report.

This is a sort line.

This is a much longer line, containing a bunch of extra characters that won't ..

The 80-character limit is just a restriction on output length. The program can read longer lines from the input (it just can't report

the whole line). Input lines may be no more than 1023 characters long (not counting line termination and not counting the null

terminator that might be used to store a string). If the input file contains a line that's longer than this, the program will not print

anything to standard output. Instead, it will terminate with an exit status of 1 and print the following line to standard error:

Line too long

The program can report on up to 1000 matched lines. If there are too many matching lines, it will not print anything to standard

output. Instead, it will terminate with an exit status of 1 and print the following line to standard error:

Too many matches

Pattern Matching

The second-to-last command-line argument gives a pattern to match against lines from the input file. This pattern uses some of

the same syntax used by the shell when you match things like filenames. The pattern has to match the entire input line; it doesn't

count as a match if the pattern just matches a substring in the input line.

Inside a pattern, most characters are considered ordinary characters; they just match themselves. For example, the letter 'a' in a

pattern will just match the letter 'a' in a line from the input file. A pattern like "abc" will only match a line "abc" in the input file.

In a pattern, a couple of characters are considered special. These are sometimes called metacharacters in pattern matching.

These let you give patterns that can match multiple different lines from the input file:

The question mark can match any single character from a line. For example, a pattern like "a?c" would match any line that

started with 'a', ended with 'c' and with any single character in between. So, "a?c" would match the line "abc", "aac", "a-c",

etc. You can use multiple question marks to match arbitrary characters anywhere in the line. For example "?bc?" would

match any 4-character line with the letters "bc" in the middle.

*

The star will match any sequence of zero or more characters. For example, 'a*b' will match any line (of two characters or

more) that starts with 'a' and ends with 'b'. Or, the pattern "*x*" will match any sequence of characters that contains the

letter 'x'.

In our implementation, we won't permit more than one consecutive star in the pattern; a pattern with multiple stars in a row

will be considered invalid. Supporting multiple consecutive stars makes pattern matching a little more difficult (just a little

bit), but it doesn't give us any additional expressive power in the pattern. Two or more consecutive stars will match exactly

the same things as just one star. So we will reject patterns like this and simplify our pattern matching code a little bit.

There's also an optional extension to the pattern syntax that you can implement for up to 8 points of extra credit. It's described

below in the Extra Credit section below.

Since our program is using the same pattern matching syntax as the shell, we'll usually need to put quotes around a pattern when

we run our program. Otherwise, the shell will try to match the pattern against filenames in the current directory. For example, if

we tried to run our program as follows, the shell would replace the pattern with a list of all the files in the current directory. This

would happen before our program even started.

# This won't work.

$ match * input-file.txt

# This will protect the pattern from the shell.

$ match '*' input-file.txt

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 3/13

Command-Line Options

The match program supports two optional command-line arguments, -n and -v. These may or may not be included on the

command line, and they may be given in any order (or, even, given multiple times). If given, they will occur before the pattern

argument. Giving either of these arguments multiple times is the same as giving that argument just once. So, "./match -n -v -n

'abc' file.txt" is the same as "./match -n -v 'abc' file.txt".

Line Number Report

If -n, is given on the command line, then output should give the line number where each matching line occurred in the input file.

Line numbers count from 1. The line number should be given right-justified at the start of each output line. It should be followed

by a space, then the contents of the matching line from the input.

In the output, the field width for the line number should be determined by the largest line number that actually occurs in the

output. In the following output, for example, we only need one space for the line number.

1 This is line one

5 This is line five

However, in the following output, we need to use a two-digit line number, so every output line will use two colums to report the

line number:

1 This is line one

5 This is line five

82 This is line eighty two

Or, if we had to report a line with a 3-digit line number, the output would look like:

1 This is line one

5 This is line five

82 This is line eighty two

902 This is line nine hundred and two

Notice that the width of the line number field depends on the lines that are actually reported, not just on the number of lines in

the input file. Also notice that the 80-character output limit applies to the entire output line (including the optional line-number

field). So, for example, if the line number takes three characters, then, with the space, you only have 76 characters remaining to

show the matching line from the input).

Match Inversion

If -v, is given on the command line, then the program should report lines that don't match the pattern, rather than lines that do.

Invalid Arguments

If the program is given an invalid pattern (one with two * characters in a row, or, for the extra credit, a pattern with a bad

character class), it will print the following line to standard error and terminate with an exit status of 1. In this error message, pat

is the pattern given on the command line:

Invalid pattern: pat

If the program is given invalid command line arguments, it will print the following usage message to standard error and then

terminate with an exit status of 1. Command line arguments would be invalid if some option other than -n or -v was given, or if

the pattern or filename arguments were not present:

usage: match [-n] [-v] pattern file

If the program is given an input file that can't be opened, it will print the following message to standard error and then terminate

with an exit status of 1. Here, filename is the name of the file given on the command line:

Can't open file: filename

Design

Your program will be implemented four components:

input.c and input.h

This component contains code to read lines of text from the input file.

list.c and list.h

This component contains code to save matching lines from the input file and report them when the program is done.

pattern.c and pattern.h

This is probably the most interesting component. It contains code to match lines of the input against the pattern.

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 4/13

match.c

This is the top-level main component. It contains the main() function and uses the other components to implement the

program's functionality.

The following figure shows the dependency structure of our components. You can see we've tried to make sure there are no

dependency cycles among the components. The match component can use the other three, but pattern, list and input are all

independent. They don't need to use code provided by each other.

Figure: dependencies among the program's components

Functions

You'll implement and use the following functions, to break the program into smaller components and to help simplify your toplevel

match.c component. Remember that functions that are defined in one component and used in another should be prototyped

(and documented) in the header. You can add other functions as you like, to help simplify your implementation. Functions that

are for internal use in a component should be given internal linkage (static) and should not be prototyped in the header.

The input component will just provide one function that the main component will use:

bool readLine( FILE *fp, char line[], int capacity )

The readLine() function will attempt to read a single line of text from the given file and store it as a string in the given

character array. The capacity argument tells it how long the array is, so it can exit and print an appropriate error message if

one of the input lines is too long. It returns true if it is successful reading another line of text, or false if there are no more

lines to read.

The list component will provide the following two functions, for recording matching lines and reporting them at the end:

void addLine( int lno, char const line[] )

This function adds the given line to the list of matching lines. The line parameter gives the contents of the line and the lno

parameter gives the line number where this line occurred in the input.

void printList( bool numberFlag )

This function prints all the matching lines (the ones previously recorded via addLine). If numberFlag is true, it prints the line

number field at the start of each line. Otherwise, it just prints the text of the matching line.

The pattern component provides the following two functions for checking the validity of a pattern and for matching an input line

against a pattern.

bool validPattern( char const pat[] )

Given a pattern, this function will check to make sure the pattern is valid and return true or false accordingly. Unless you do

the extra credit, the only thing that will make a pattern invalid is if it has two star characters in a row.

bool matchPattern( char const pat[], char const line[] )

This function returns true if the given line of text matches the given pattern. The "Pattern Matching" section below gives you

some guidance about how to implement this.

Match List Representation

The list component is responsible for remembering all the matching lines. Inside, it will use (static) global arrays to remember the

text of all the matching lines, along with a line number for each matching line and any other state you need in order to remember

all the matching lines.

You can use arrays to record the text of the matching lines and their line numbers. The line number can just be stored in an array

of ints. The text of the lines themselves can be stored in a two-dimensional array of char, with a row for each line of text.

You won't need to use dynamically allocated memory for this project (we'll use that on project 4). Since there's a fixed limit to the

number of lines the program can store and the length of each line you'll have to report, you can make fixed-sized arrays to hold

all the information needed by the list component. If these limits are exceeded, you're just expected to print an error message and

terminate.

Global Variables

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 5/13

Don't use any global variables other than the ones you'll use to represent the record of matching lines in the list component.

Functions should be able to get everything else they need from their parameters passed to them.

Pattern Matching

Matching an input line against a pattern will require working through each character of the input line and keeping up with how

much of the pattern has been matched so far. This will be kind of like matching two strings, but the special characters in the

pattern will make things a little more interesting. We'll really be using a type of finite state machine called a Nondeterministic

Finite Automaton (NFA). You'll learn all about these in CSC 333. For now, we can cover enough to be able to match simple

patterns like the ones we're working with.

State Machine Overview

We can model a pattern like "a?c" with the following finite state machine. The circles (states) represent how much of the pattern

we have matched so far. The arrows show what character has to occur in the input to get from one state to another. We'll match

a character from the input by following an arrow that's has the same symbol on it. A question mark on an arrow indicates that

you can get from one state to another by matching any single character.

NFA to match a?c

In this automaton, state zero indicates that you have not matched anything from the input line yet. You can get to state 1 after

matching a letter 'a'. From state 1, you can get to state 2 after matching any character. Getting from state 2 to state 3 requires

matching a 'c' from the input. The double circle on state 3 indicates that it is a final/accepting state. If we can match the whole

input line and get to this state, then the input line matches the pattern.

Imagine that we are matching a line containing "abc". Initially, if we haven't looked any characters in this line, the finite state

machine would be in state 0.

After matching the first character from "abc", the state machine would be in state 1.

After matching the next character from "abc", the state machine could transition to state 2, following the arrow with the question

mark on it to match the 'b' in "abc".

After matching the next character from "abc", the state machine would be in state 3. So, we reached a final state after matching

the whole input line. The pattern matches the input line, "abc".

State Machine Representation

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 6/13

How can we implement a state machine line the one shown above? We don't really have to implement anything that looks like this

drawing, with circles and arrows. We can think of the text of the pattern itself as a representation of the state machine. This is

shown in the following figure.

We can think of each state as the index into a position in the pattern. In sate zero, we need to match the character 'a' (just like

state zero in the figure above). In state 1, we've already matched the "a" and we need match the question mark next. In state 2,

we've already matched the "a?" in the pattern and we need to match the 'c' next. In state 3, we've already matched the whole

pattern. Notice that we have one extra state at the end. The pattern is only three characters long, but we need states 0, 1, 2 and

3. From a given state, you can look at the corresponding above character of the pattern to see what you need to match next. For

example, if you're in state 2, you need to match a 'c' next in the input.

We will use an array of boolean flags to keep up with the state of the automaton. The picture above shows just the flag for state 2

set. This would indicate that we've matched the "a?" part of the pattern but we still need to match the 'c'. Why do we need an

array of flags for keeping up with the state? Couldn't we just use an integer to store the current sate? We'll need this to support

the '*' character in a pattern. To handle star, we will need to consider more than one possible state that the finite state machine

could reach. We'll use the array of flags to keep up with the set of states the machine could reach by matching a given sequence

of characters.

Matching With the Stars

Consider the pattern, "ab*d?f". The following figure shows how this pattern could be represented as a state machine. This figure

shows how to handle a star in a pattern.

After matching the 'b' right before the star, we can match any sequence of zero or more characters before matching the 'd' right

after the star. To match zero characters with the star, we can go straight from state 1 to state 3. This will require matching a 'b'

and then a 'd' immediately afterward. The arrow under state 2 lets us do that. To match exactly one character with the star, we

can go from state 1 to state 2, matching a 'b'. Then, we can match any one character going from state 2 to state 3, then we

would have to match a 'd' to get past state 3. To match more than one character with the star, we can take the loop above state

2 to match any number of characters before going to state 3 on the last character matched by the star.

When there's a star in the pattern, it means we will have choices as we traverse the state machine. You can see this in the figure

above. From state 1, there are two places we could go while matching a 'b'. From state 2, there are two places we could go while

matching any character. In general, after matching some part of an input line, there may be more than one state we could reach.

For example, on the input "abcdef", we could reach state 6 by going from through states 0 --> 1 --> 2 --> 3 --> 4 --> 5 --> 6.

Or, we could get to state 2 via 0 --> 1 --> 2 --> 2 --> 2 --> 2 --> 2. Or, we could reach state 3, by going 0 --> 1 --> 2 --> 2 --

> 2 --> 2 --> 3. As long as there's a way to reach the last state while matching all the characters of the input line, then the input

line matches the pattern.

To figure out if it's possible to reach the last state, we'll use flags to keep up with which states could be reached after matching

some of an input line.

Pattern Matching Example

The following figure shows how we'll represent the pattern "ab*d?f". Initially, before matching any characters from the input line,

we will be in state 0, so just the flag for state 0 is set:

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 7/13

Let's say we're trying to match an input line, "abccdxcdxf". First we have to match the letter 'a'. From state 0 (the only flag that's

set in the picture above), matching an 'a' will take us to state 1. You can see this in the state machine figure; matching an 'a'

from state 0 takes you to state 1. You can also see this in the figure directly above. If you're in state 0, the next part of the

pattern to match is the letter 'a'. It's the character from the pattern right above the check mark. So after matching the first

character in the input line, state 1 is the only state we can reach:

Next, we have to match the 'b' from the input line. From state 1, matching a 'b' can take us to either state 2 or state 3. You an

see this in the finite state machine above. This is what will always happen when the next character in the pattern is a star. The

star can either match some characters from the input line (going to state 2 lets us do that), or it could match no characters from

the input line (going two states ahead, to state 3 lets us do that).

Next, we have to match a 'c' from the input line. From the picture above, we could get to either state 2 or state 3 after reading

just the first two characters of the input line. From state 2, matching the 'c' could take us to state 2 or state 3. From state 3, we

can't match the 'c'. So after matching the first three characters of the input line, we can still just get to state 2 or state 3:

The next character on the input line is another 'c'. Here, the same same happens as before. We could reach state 2 or 3 before

matching this 'c', so we can still reach states 2 or 3 after matching this 'c'.

Next, we have to match the 'd'. From state 2, matching a 'd' could take us to state 2 or state 3. From state 3, matching a 'd'

would take us to state 4. So, after matching the first five characters in the input line, "abccd", we could get to states 2, 3 or 4:

Next, we have to match an 'x'. From state 2, matching an 'x' could take us to state 2 or 3. From state 3, we can't match the 'x'.

From state 4, we could match an 'x' using the question mark and get to state 5. So, after matching "abccdx", we could get to

states 2, 3 or 5:

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 8/13

Next, we have to match a 'c'. From state 2, matching a 'c' could take us to states 2 or 3. From state 3 and state 5, we can't match

a 'c', so after matching the first 7 characters from the input line, we can only get to state 2 or 3:

Next, we have to match a 'd'. From state 2, matching a 'd' can take us to state 2 or 3. From state 3, matching a 'd' takes us to

state 4, so we could reach either states 2, 3 or 4:

Next, we have to match an 'x'. From state 2, matching an 'x' could take us to states 2 or 3. From state 3, we can't match an 'x' .

From state 4, we can match the 'x' with the question mark to get to state 5. So, matching the first 9 characters of the input line

could get us to state 2, 3 or 5.

Finally, we have to match the 'f' at the end of the input line. From state 2, matching 'f' can take us to state 2 or 3. From state 3,

we can't match the 'f'. From state 5, matching the 'f' takes us to state 6, so matching the whole input line could let us reach state

2, 3 or 6:

Now, we're done with this input line. We've matched the whole line, and it's possible to reach the last state (state 6). So the input

line matches the pattern. If our program was printing lines that match this pattern, we would want to print this input line (or,

really, save it for printing later).

Pattern Matching Implementation

You'll implement your matchPattern() function like the example shown above. You'll use an array of flags (one element longer

than the pattern) to store the set of states you can reach as you match characters from the current input line. For each character

in the input line, you'll compute the new set of states you can reach after matching that character. You'll probably want two

arrays for this, one holding the set of states you could reach before the current character, the other holding the new set of states

you can reach after matching the current character.

The starter includes a reportFlags() function to help you see what's going on in your implementation as you're matching a pattern

against an input line. It's fairly simple. Given a string for a pattern and an array of state flags, it will print out out the flags below

the pattern, kind of like what's shown in the pictures above. For example, given the pattern and the state array for the last figure

in the example above, it would print the following output. The caret characters show the reachable states.

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 9/13

ab*d?f

^^ ^

Patterns Starting with a Star

When you're matching a pattern, you normally start in state 0, before you've matched any characters from the input line. If the

pattern starts with a star, then the starting state will be either state 0 or state 1. Starting in state zero will let you match one or

more characters from the input line with the star. If you don't need to match anything with the star (it's OK for the star to match

zero characters), then starting in state 1 will let you do that. That way, you're not trying to modify an array while you're still

depending on the values it used to contain.

The following shows how we could match an input line of "12ab89" with the pattern "*ab*". Here, we're showing the output of the

reportFlags() function described above. The following shows the output of this function after matching each character from the

input line. I added a line above each call to reportFlags() to say how much of the input line had been matched.

Initial state(s)

*ab*

^^

After matching "1"

*ab*

^^

After matching "12"

*ab*

^^

After matching "12a"

*ab*

^^^

After matching "12ab"

*ab*

^^ ^^

After matching "12ab8"

*ab*

^^ ^^

After matching "12ab89"

*ab*

^^ ^^

At the bottom, we see that we can reach the last state after matching the whole input line, so the pattern matches the input line.

Magic Numbers

Be sure to avoid magic numbers in your source code. Use the preprocessor to give a meaningful name to all the important, nonobvious

values you need to use. For example, you can define constants for the maximum length of a line of the input file, or the

limit on the width of lines in the output. With some optional and some required arguments on the command line, it might be good

to have some constants for how many required arguments there need to be at the end. There are already standard constants you

should use for the exit status when your program terminates.

For constants that are explained right in the line of code where they occur, I wouldn't call these magic numbers. For example, if

you need to read two integers, you might write something like the following. Here, the value 2 wouldn't be considered a magic

number. It's explained right there in the format string, where we say we'd like to parse two integers. Defining a named constant

to use here would probably just make the program harder to maintain.

if ( fscanf( stream, "%d%d", &a, &b ) != 2 ) {

....

}

Extra Credit

There's some shell pattern syntax we aren't supporting in our application. For 8 points of extra credit, you can implement one

additional feature. In addition to the '?' and '*' special characters, the shell lets you match against sets of characters. The syntax

for this is like the character class syntax supported by printf and by regular expression. Inside a pattern, you can put any

sequence of characters inside square brackets. This will match any one of the given characters. So, for example, the pattern

"ab[cde]f' will match the strings "abcf", "abdf" or "abef". It won't match the string "abcdf", since the "[cde]" syntax only

matches a single character, not a sequence of characters.

The starter includes an extra test input file, file-ec-1.test for trying out extra credit test cases. It contains lines like

test-n.txt for numbers n from 1 to 100. If you've implemented the extra credit, you should be able to run your program as

follows to get all the strings for lines 1 .. 5.

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 10/13

$ ./match 'test-[12345].txt' file-ec-1.txt

test-1.txt

test-2.txt

test-3.txt

test-4.txt

test-5.txt

Or, you can run it as follows, to match any numbers that start with 3 or 4 and end with 6, 7 or 8.

$ ./match 'test-[34][678].txt' file-ec-1.txt

test-36.txt

test-37.txt

test-38.txt

test-46.txt

test-47.txt

test-48.txt

It's OK for characters ? and * to occur inside a character class, but they just match occurrences of ? or * (i.e., these two

characters are not considered special when they occur inside a character class). A character class must start with a [ and end

with ] and it must not contain square brackets inside. Every occurrence of ] in the pattern much match a previous occurrence of [

(so, you can't nest character classes and the pattern can't end inside a character class). If a pattern violates these rules, your

program will handle it just like a pattern with two * characters in a row. For example:

$ ./match 'test-[34][678.txt' file-ec-1.txt

Invalid pattern: test-[34][678.txt

Build Automation

You get to create your own Makefile for this project (called Makefile with a capital 'M', no filename extension). Its default target

should build your match program, compiling each source file to an object file, then linking to produce an executable. Your Makefile

should correctly describe the project's dependencies, so if just one source file changes it rebuilds just the parts of the project that

need to be rebuilt.

As in the previous project, your Makefile must have a clean rule, to remove any temporary files in the project (object files,

executables, temporary output files, etc). If you run make as follows, it should delete just the temporary files:

$ make clean

Testing

The starter includes a test script, along with test input files and expected outputs. When we grade your program, we'll test it with

this script, along with a few other test inputs we're not giving you. To run the automated test script, you should be able to enter

the following:

$ chmod +x test.sh # probably just need to do this once

$ ./test.sh

This will automatically build your program using your Makefile and see how it behaves on the test inputs.

You probably won't pass all the tests the first time, and the test script won't work until you have a working Makefile. Until then,

you can run tests yourself. You can use commands like the following to compile your program (although your Makefile will be

more efficient, compiling source files separately and then linking them together):

$ gcc -Wall -std=c99 -g match.c pattern.c list.c input.c -o match

If you want to try your program out on one of the provided test cases, you can enter commands like the following. Here, we're

doing the same thing test 10 does from the test script. We're running the program with file-c.txt as input, matching any lines

that start with '`z''. We're telling it to write the output to a file named output.txt and to capture standard error to stderr.txt. After

running the program, we check to make sure it exited successfully. Then, we compare its output to the expected output.

$ ./match 'z*' file-c.txt > output.txt 2> stderr.txt

$ echo $?

0

$ diff output.txt expected-10.txt

Test 18 is an error test. We can try it out as follows. Here, we run the match program, then make sure it exits with a status of 1.

After that, we check to make sure it printed the right error message to standard error:

$ ./match 'a**b' file-c.txt > output.txt 2> stderr.txt

$ echo $?

1

$ diff stderr.txt error-18.txt

Test Inputs

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 11/13

We've prepared several test cases to help you make sure your program is working right. The test cases all use the following

sample input files:

file-a.txt is a file with one line, "abc".

file-b.txt is a file with 676 lines, with all possible two-character strings of capital letters (e.g., AA, AB, AC ...)

file-c.txt is a file with 1000 lines, each one containing a different, random sequence of 5 letters, digits and underscores.

file-d.txt is a file with 100 lines, each containing a different randomly-generated filename. This is the kind of thing you'd

normally use a shell pattern with.

file-e.txt has several lines that for testing against the "ab*d?f" illustrated in the Pattern Matching section above.

file-f.txt has more than 1000 lines of different lengths, some of them longer than 80 characters.

file-g.txt has just a few lines, but one of them is 1024 characters long.

The test driver uses the files above for several different tests. Here's a summary of the tests cases.

1. This test uses file-a.txt. It uses a pattern, "abc". It should match the one line of text in this file.

2. This test uses file-a.txt. It uses a pattern, "xyz", which should not match anything from the file.

3. This test uses a pattern with a question mark to match all lines in file-b.txt that end with the letter Z.

4. This test uses a pattern with three question marks to match all lines of file-c.txt that start with a and end with w (there are 5

of them).

5. This test uses a pattern with four question marks to match all lines of fie-c.txt that have an underscore in the middle.

6. This test uses a pattern with five question marks to match all 1000 lines from file-c.txt

7. This test finds the the line "IT" in file-b.txt. It uses the -n option to report the line where it occurs.

8. This is just like test 6, but it uses the -n flag to list each line with a line number.

9. This test uses the pattern, * to match every line in file-b.txt.

10. This test uses the pattern, z* to match every line in file-c.txt starts with a z.

11. This test uses the pattern, *a* to match every line in file-c.txt that contains the letter 'a'.

12. This test uses the pattern, *.mp3 to match every line in file-d.txt that looks like an mp3 file.

13. This test matches lines from file-d.txt that contain '-b'.

14. This test uses the -v and -n options. It reports lines from file-d.txt that don't start with the word "big".

15. This test uses the pattern 'ab*d?f' to match lines from file-e.txt.

16. This test matches lines from file-f.txt that start with the letter A.

17. This test is just like test 16, but it uses the -n option to also report line numbers.

18. This is an error test. It uses a pattern with two stars in a row.

19. This is an error test. It uses an invalid command-line argument.

20. This is an error test. It gives a name for a file that doesn't exist.

21. This is an error test. It matches more than 1000 lines from file-f.txt.

22. This is an error test. There's an input line that's longer than 1023 characters.

Grading

The grade for your program will depend mostly on how well it functions. We'll also expect it to have a working Makefile, to

compile cleanly, to follow the style guide and to follow the design given for the program.

Compiling cleanly on the common platform: 10 points

Working Makefile: 8 points

Correct behavior on all tests: 80 points

Programs follow the style guide: 20 points

Support for character classes: 8 extra credit points

Deductions

Up to -50 percent for not following the required design.

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 12/13

Up to -30 percent for failing to submit required files or submitting files with the wrong name.

-20 percent penalty for late submission.

Getting Started

To get started on this project, you'll need to clone your NCSU github repo and unpack the given starter into the p3 directory of

your repo. You'll submit by adding and committing files into your repo and pushing the changes back up to the NCSU github.

Clone your Repository

You should have already cloned your assigned NCSU github repo when you were working on project 2. If you haven't already

done this, go back to the assignment for project 2 and follow the instructions for for cloning your repo.

Unpack the starter into your cloned repo

You will need to copy and unpack the project 3 starter. We're providing this file as a compressed tar archive, starter3.tgz. You can

get a copy of the starter by using the link in this document, or you can use the following curl command to download it from your

shell prompt.

$ curl -O https://www.csc2.ncsu.edu/courses/csc230/proj/p3/starter3.tgz

Temporarily, put your copy of the starter in the p3 directory of your cloned repo. Then, you should be able to unpack it with the

following command:

$ tar xzvpf starter3.tgz

Once you start working on the project, be sure you don't accidentally commit the starter to your repo. After you've successfully

unpacked it, you may want to delete the starter from your p3 directory, or move it out of your repo.

$ rm starter3.tgz

Instructions for Submission

If you've set up your repository properly, pushing your changes to your assigned CSC230 repository should be all that's required

for submission. When you're done, we're expecting your repo to contain the following files. You can use the web interface on

github.ncsu.edu to confirm that the right versions of all your files made it.

match.c : main implementation file, created by you.

pattern.c / pattern.h : code for the pattern component, with a little bit of the implementation file provided with the

starter.

list.c / list.h : Code for the list component, with a little bit of the header provided with the starter.

input.c / input.h : Code for the input component, created by you.

Makefile : the project's Makefile, created by you.

file-*.txt : Example input files for the test cases, provided in the starter.

expected-*.txt : Expected output files for the test cases, provided in the starter.

error-*.txt : Expected stderr output for the error tests, provided in the starter.

test.sh : test script, provided with the starter.

.gitignore : a file provided with the starter, to tell git not to track temporary files for this project.

Pushing your Changes

To submit your project, you'll need to commit your changes to your cloned repo, then push them to the NCSU github. Project 2

has more detailed instructions for doing this, but I've also summarized them here.

Whenever you create a new file that needs to go into your repo, you need to stage it for the next commit using the add

command:

$ git add some-new-file

Then, before you commit, it's a good idea to check to make sure your index has the right files staged:

$ git status

2021/10/11 CSC230 Project 3 –

https://www.csc2.ncsu.edu/courses/csc230/proj/p3/p3.html 13/13

Once you've added any new files, you can use a command like the following to commit them, along with any changes to files that

were already being tracked:

$ git commit -am "<meaningful message for future self>"

Of course, you haven't really submitted anything until you push your changes up to the NCSU github:

$ git push

Checking Jenkins Feedback

Checking jenkins feedback is similar to the previous homework. Visit our Jenkins system at http://go.ncsu.edu/jenkins-csc230

and you'll see a new build job for project 3. This job polls your repo periodically for changes and rebuilds and tests your project

automatically whenever it sees a change.

Learning Outcomes

The syllabus lists a number of learning outcomes for this course. This assignment is intended to support several of theses:

Write small to medium C programs having several separately-compiled modules.

Explain what happens during preprocessing, lexical analysis, parsing, code generation, code optimization, linking, and

execution. Explain the function and organization of relevant, intermediate formats, including pre-processor expanded source

code, object files and executables. Compare and contrast the build and execute behavior between C and Java.

Identify errors that can occur during various compilation phases, identify relevant error messages and warnings, and

appropriately correct these errors.

Correctly identify error messages and warnings from the preprocessor, compiler, and linker, and avoid them.

Find and eliminate runtime errors using a combination of logic, language understanding, trace printout, and gdb or a similar

command-line debugger.

Interpret and explain data types, conversions between data types, and the possibility of overflow and underflow.

Explain, inspect, and implement programs using structures such as enumerated types, unions, and constants and arithmetic,

logical, relational, assignment, and bitwise operators.

Trace and reason about variables and their scope in a single function, across multiple functions, and across multiple modules.

Use the C preprocessor to control tracing of programs, compilation for different systems, and write simple macros.

Write, debug, and modify programs using library utilities, including, but not limited to assert, the math library, the string

library, random number generation, variable number of parameters, standard I/O, and file I/O.

Use simple command-line tools to design, document, debug, and maintain their programs.

Use an automatic packaging tool, such as make or ant, to distribute and maintain software that has multiple compilation

units.

Use a version control tool, such as subversion (svn) or Git, to track changes and do parallel development of software.

Distinguish key elements of the syntax (what's legal), semantics (what does it do), and pragmatics (how is it used) of a

programming language.

Describe and demonstrate how to avoid the implications of common programming errors that lead to security vulnerabilities,

such as buffer overflows and injection attacks.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp