#### 联系方式

• QQ：99515681
• 邮箱：99515681@qq.com
• 工作时间：8:00-23:00
• 微信：codinghelp

#### 您当前位置：首页 >> Python编程Python编程

###### 日期：2020-10-22 10:46

10/19/2020 Program 1

https://www.ics.uci.edu/~pattis/ICS-33/assignments/program1/program.html 10/14

Deterministic

FA

Write the required functions and script that solve, for a Non-Deterministic Finite Automaton, the same problem that was solved for

a Deterministic Finite Automaton in Problem #3 (above). Read about the differences between these two automata (below). Hint:

Adapt your code for the FA problem to solve the more general NDFA problem.

A non-deterministic finite automaton (NDFA) is machine described by its states and its transitions: each transition for a state

specifies an input and a set of states (more than one is allowed) that input can lead to: sets with more than one states is what

makes it non-deterministic. We can illustrate a NDFA as a graph with state labels in circles and edge labels for transitions (see

below). The critical difference between an FA and an NDFA is that an NDFA can have multiple edges with the same label going to

different states (we'll see how to represent and simulate such transitions below).

Input and Output:

Read a file that describes a NDFA: each line contains a state and an arbitrary number of input->state transitions. Build a dictionary

such that each key is a str state and whose associated value is another dictionary specifying all the transitions from that state: this

second dictionary has keys that are str inputs and associated values that are sets of str states: all the states a particular input can

lead to. The first token on each line is the str state and the remaining tokens (always coming in pairs) are str inputs and states. All

tokens (which can comprise any number of characters) are separated by one semicolon character. We annotate this dictionary as

{str:{str:{str}}}.

For example, the input file ndfaendin01.txt contains the following lines (which could appear in this order, or any other and still

specify the same NDFA):

start;0;start;1;start;0;near

near;1;end

end

Here is a picture of the endin01 NDFA. It graphically illustrates the three states (start, near, and end) and their transitions, using

inputs (0 and 1).

Here, the state start is a key in the main dictionary. It's value is a dictionary with two key/value pairs: 0 mapping to the set

containing start and near, and 1 mapping to the set containing just start. It means that in the start state, if the input is a 0 the

NDFA can stay in the start state or it can go to the near state; if the input is a 1 the NDFA must stay in the start state. And

similarly the next line means that in the near state, if the input is a 1 the NDFA must go into the end state. The last line means that

the end state has no transitions out of it.

Print the NDFA, one state (and its transitions) per line; the states are printed alphabetically and the transition dictionary for each

state is printed as a list of input/set of state items (2-tuples) such that these are printed alphabetically by the inputs, and the set of

states for each input is printed as an alphabetically sorted list (e.g., near comes before start). Note that the state end is a key in the

main dictionary, whose associated transitions are an empty dictionary.

For example, the file above would produce:

Non-Deterministic Finite Automaton Dump: state: list of transitions

end transitions: []

near transitions: [('1', ['end'])]

start transitions: [('0', ['near', 'start']), ('1', ['start'])]

Note that there are multiple data files for this program: ndfaendin01.txt and ndfatrain.txt and ndfare.txt;; test/debug your

program on the first file; when you are done, test it on the last file. Draw the FA represented by each for to ensure that your code

correctly prints and computes with it.

Next, repeatedly read and process lines from a second input file, computing the results of the non-determinisitc finite automaton on

the specified start-state with its inputs ; then print out the results in a special form. Each line in the file contains a start-state

followed by a sequence of inputs (all separated by semicolons). The start-state will be a state in the FA (it is a key in the outer

dictionary) the inputs may specify legal or illegal transitions (may or may not be keys in some inner dictionary).

10/19/2020 Program 1

https://www.ics.uci.edu/~pattis/ICS-33/assignments/program1/program.html 11/14

For example, the input file ndfainputendin01.txt contains the following two lines:

start;1;0;1;1;0;1

start;1;0;1;1;0;0

For example, the first line means, the start-state is start and the inputs 1, 0, 1, 1, 0, and 1.

The result of processing each line is to print the start-state, and then each input and the new states (plural) it could transition to (the

could is what makes it non-deterministic), and finally print the stop-states. For the ndfaendin01 NDFA and the first line in this file,

it should print

Start state = start

Input = 1; new possible states = ['start']

Input = 0; new possible states = ['near', 'start']

Input = 1; new possible states = ['end', 'start']

Input = 1; new possible states = ['start']

Input = 0; new possible states = ['near', 'start']

Input = 1; new possible states = ['end', 'start']

Stop state(s) = ['end', 'start']

Note that the set of states it might be in are printed as an alphabetized list. Also note especially that in the start state, if the input is

a 0, then the NDFA can either remain in the start state or go into the near state. For this program, we keep track of all states that

the NDFA can be in, using a set of new possible states. For the next input, 1, we can be either in the start state (from the start

state; an input of 1 allows us to stay in the start state) or the end state (from the near state; an input of 1 allows us to transition to

the end state). Thus, we keep track of the set of states the NDFA can be in, and the new set of states the NDFA can be in after

processing the next input. In this example, because 'end' is included in the stop-states, this input does end in 01.

For any state that does not have a transition specifying the current input, ignore that input for that state. For example, if near is one

of the possible states and 0 is the input, ignore the 0 for the near state.

Functions and Script:

Write the following functions and script. I am providing line counts for these function bodies not as requirements, but to indicate

the lengths of well-written Pythonic code.

read_ndfa has an open (file) parameter; it returns the dictionary representing the non-deterministic finite automaton; hint: I

used splicing and the zip function to build the inner dinctionaries, and I called the setdefault function for the inner dict:

alternatively I could have built it as defaultdicts from the standard collections module (body is 9 lines).

ndfa_as_str has a dictionary parameter (representing the FA); it returns a multi-line string (each line is ended by '\n'), which

when printed shows the contents of the NDFA in the appropriate textual form: sorted alphabetically by state, with a state's

transitions sorted by their input values, and sorted by states if an input results in multiple states (body is 4 lines; can you do it

in 1?).

process has a dictionary parameter (representing the NDFA), a str parameter (representing the start-state), and a list

parameter (representing a list of str inputs); it returns a list that contains the start-state followed by tuples that show the

input and resulting set of states after each transition. For the example shown above, process returns the following list.

['start', ('1', {'start'}), ('0', {'near', 'start'}), ('1', {'end', 'start'}), ('1', {'start'}),

('0', {'near', 'start'}), ('1', {'end', 'start'})]

Finally, remember that if an input is illegal for the current state (is not the key in some transition for the current state), just

ignore it. But if the input leads to no possible states (the empty set of states) terminate processing there (body is 12 lines).

interpret has a list parameter (the list result produced by process); it returns a multi-line string (each line is ended by '\n'),

which when printed illustrates the results of processing an NDFA on an input in the appropriate textual form. Note that in

this output the sets computed in process appear as lists sorted alphabetically by state. See how it prints the example list

argument shown above in the Sample Interaction below (body is 5 lines).

Write a script at the bottom of this module (in if __name__ == '__main__':) that prompts the user to enter the file describing

the NDFA, prints it, prompts the user to enter the file containing lines of start-states and input, and simulates the NDFA on

each line, printing the results in the appropriate textual form (body is 7 lines).

Sample Interaction:

The program, as specified, will have the following interaction: user-typed information appears in italics. Your output should

"match" this one.

Provide the file name storing a Non-Deterministic Finite Automaton: ndfaendin01.txt

Non-Deterministic Finite Automaton Dump: state: list of transitions

end transitions: []

near transitions: [('1', ['end'])]

start transitions: [('0', ['near', 'start']), ('1', ['start'])]

Provide the file name storing a sequence of start-states and their subsequent inputs: ndfainputendin01.txt

10/19/2020 Program 1

https://www.ics.uci.edu/~pattis/ICS-33/assignments/program1/program.html 12/14

Tracing NDFA (from a start-state)

Start state = start

Input = 1; new possible states = ['start']

Input = 0; new possible states = ['near', 'start']

Input = 1; new possible states = ['end', 'start']

Input = 1; new possible states = ['start']

Input = 0; new possible states = ['near', 'start']

Input = 1; new possible states = ['end', 'start']

Stop state(s) = ['end', 'start']

Tracing NDFA (from a start-state)

Start state = start

Input = 1; new possible states = ['start']

Input = 0; new possible states = ['near', 'start']

Input = 1; new possible states = ['end', 'start']

Input = 1; new possible states = ['start']

Input = 0; new possible states = ['near', 'start']

Input = 0; new possible states = ['near', 'start']

Stop state(s) = ['near', 'start']

In Week #2 of this course we will cover EBNF and regular expressions, which relate to the files below. You can run these files on

your code to ensure they produce the correct results.

The ndfatrain.txt file is a non-deterministic finite automaton that determines whether or not a train (a sequence of characters

representing different kinds of cars) is a legal train according to Chapter Exercise #7 in the ENBF lecture. Its input file is

ndfainputtrain.txt, which starts with a legal train (one that ends with the state done as one possible state) followed by an illegal

train (one that does not end with the state done as one possible state).

The ndfare.txt file is a non-deterministic finite automaton translation of the regular expression ((a*|b)cd)+. Its input file is

ndfainputre.txt, which starts with a match (one that ends with the state last as one possible state) followed by a non-match (one

that does not end with the state last as one possible state).

Problem #5:

Synonym Finder

Problem Summary:

Write the required functions and script that prompts the user to enter the the names of any number of text files; reads these text

files (creating one special semantic dictionary); optionally prints the dictionary in a special form; prompts the user to enter the

name of a file containing a series of synonym problems (given a word, which of a series of words is its best synonym); then uses

the semantic dictionary to try to solve each synonym problem, printing the results of each problem in a special form, and finally

printing the percentage of problems that it solved correctly.

The program will read the text file(s) to learn which words are similar to which others: each word in the semantic dictionary is

associated with a context dictionary storing the words that appear in the same sentences (and how often they appear in the same

sentences). The program will use a metric function to determine how similar any two words are by comparing their context

dictionaries. By this process, the program can determine whether two words are more or less likely to be synonyms.

Input and Output:

After repeatedly prompting for the text file names (ensuring each one can be read: try to open it and handle any exceptions) read

each file, a sentence at a time, building a semantic dictionary of the form {str:{str:int}}. The outer keys of this dictionary are all

the words in the text files; each word is associated with an inner context dictionary whose keys are all the other words that

appear in sentences with the outer word; each inner key is associated with the number of times it appears in sentences with the

outer key. All words in the sentences should be converted to lower-case letters and all punctuation should be removed; certain

common English function words should also be removed as "noise words".

An easy way to read the files, one sentence at a time, is to iterate over the result returned by repeatedly calling the helper

function sentence_at_a_time (which I have supplied at the top of synonym.py). When passed an open file to read from and a

set of words to ignore, this function returns an object that we can iterate over -one sentence at a time- using a for loop. For

example, if a file named trivial.txt contained the text

I went to the gym this

morning. Later in the

morning I rested; I was tired!

then executing the code

for s in sentence_at_a_time(open('f.txt'), {'in','the','to'}):

print(s)

would print

['i', 'went', 'gym', 'this', 'morning']

['later', 'morning', 'i', 'rested']

['i', 'was', 'tired']

Note that sentence_at_a_time will correctly return full sentences (each as a list of lower-case words), whether they appear on

one or multiple lines. For now, just assume that this function works correctly (you can certainly experiment with it on small text

10/19/2020 Program 1

https://www.ics.uci.edu/~pattis/ICS-33/assignments/program1/program.html 13/14

files); later in the quarter we will be able to understand its body: when we learn about regular expressions (in week 2) and

implementing iterators with generator functions (in week 4).

Print all the associations in the semantic dictionary, one per line in standard lexical order (with the inner dictionaries also printed

in standard lexical order); after printing all associations, print the length of the smallest and largest inner dictionaries, using the

following form.

For example, the file trivial.txt shown above (ignoring the words in, the, and to) would print as:

Semantic Dictionary

context for gym = i@1, morning@1, this@1, went@1

context for i = gym@1, later@1, morning@2, rested@1, this@1, tired@1, was@1, went@1

context for later = i@1, morning@1, rested@1

context for morning = gym@1, i@2, later@1, rested@1, this@1, went@1

context for rested = i@1, later@1, morning@1

context for this = gym@1, i@1, morning@1, went@1

context for tired = i@1, was@1

context for was = i@1, tired@1

context for went = gym@1, i@1, morning@1, this@1

min/max context lengths = 2/8

Most of the words in these sentences are unique, but the words i and morning appeared in two sentences, so they have the

largest inner dictionaries: morning appears in 2 sentences with i, so i also appears in 2 sentences with morning. Also note that

the context for any word will not contain that same word.

Read a file that contains a series of synonym problems (one per line) with each line in the following form:

A word

Any number of other words separated by spaces: the choices for its possible synonyms

The correct choice of the synonym

For example, one line in this kind of input file might look like

draw walk eat paint paint

indicating the problem is to find the synonym of draw from the choices walk, eat, and paint, where the correct answer is paint.

For each line in the file of synonym problems, it will print either that (a) it found the correct synonym; (b) it failed to find the

correct synonym, because it chose incorrectly; or (c) it failed to find the correct synonym because the metric function raised an

exception on one of the possible synonyms.

In these cases it will display a message like one of the following (note that here the strings appear in quotes; Hint: what is the

differeence between the str and repr functions? Which does print use by default):

Correct: 'picture' is most like 'painting' from ['painting', 'chair']

Metric failure: could not choose synonym for 'earnest' from ['serious', 'amusing']

Functions and Script:

Write the following functions and script. I am providing line counts for these function bodies not as requirements, but to indicate

the lengths of well-written Pythonic code.

build_semantic_dictionary has a list of open files parameter (the training file(s)), and an open file parameter (words to

ignore file); it returns a semantic dictionary ({str:{str:int}}), in which each unignored word is associated with a context

dictionary whose keys are all the unignored words that appear in the same sentences with it, and whose associated value is

the number of times each key appears in the same sentences as the word: in the sentence I really really like you., the

context dictionary for each word in this sentence will count really twice (except the context dictionary for the word really

doesn't contain that word). Hints: Process each training file by reading it (using sentence_at_a_time, which is passed a set

of all the words to be ignored: remember to rstrip them when reading the file); process each word in a sentence by

incrementing the count in its context dictionary of every other word in the sentence. Recall that a word that is a key in the

semantic dictionary should not be a key in its own context dictionary (body is 9 lines containing four nested loops).

dict_as_str has a dictionary parameter (representing the semantic dictionary); it returns a multi-line string (each line is

ended by '\n'), which when printed shows the contents of the semantic dictionary followed by the max/min context

dictionary lengths in the appropriate textual form shown above (body is 5 lines).

cosine_metric has two context dictionaries as parameters; it returns a float indicating how close are the context

dictionaries (a higher number is better). Here is the formula for computing the cosine_metric

10/19/2020 Program 1

https://www.ics.uci.edu/~pattis/ICS-33/assignments/program1/program.html 14/14

The formula will always have a value between 0 and 1. Note that if either context dictionary is empty, the denominator

will be 0 and Python will raise a ZeroDivisionError exception, which this function should propagate, not handle with a

try/except.

For example, if cd1

= {'a':1, 'b':2, 'c':3} and cd2

= {'a':5, 'c':7, 'd':8} then we would calculate the cosine_metric as.

Hint: recall the get method on dictionaries can supply a default value to return if the key is not in the dictionary.

most similar has a str parameter (the target word, to find the synonym of), a list of str parameter (the synonym

candidates), a semantic dictionary parameter (of the context dictionaries for words), and a metric function parameter (of

which cosine_metric is one example); it returns the synonym candidate whose metric is largest when compared with the

target word. This function should raise the ZeroDivisionError exception if any word has no associated context dictionary

in the semantic dictionary.

similarity_test has an open file parameter (the file of synonym problems), a semantic dictionary parameter, and a metric

function parameter; it returns a descriptive str showing the results of all its attempts to solve synonym problems (whose

last line is the percentage of problems it solved correctly). See the three types of results illustrated in the section above

(body is 16 lines).

Write a script at the bottom of this module (in if __name__ == '__main__':) that prompts the user for (a) the names of the

training files (the default answer should be no-more; and unopenable files should be rejected) then builds the semantic

dictionary from these files, (b) whether to print the semantic dictionary, (c) the name of the synonym problem file

(rejecting any unopenable file - here use goody.safe_open) then attempts to solve all these problems and prints all the

results.

Sample Interaction:

The program, as specified, will have the following interaction: user-typed information appears in italics. Your output should

match the form of this one. It took my code about 5 seconds to execute build_semantic_dictionary.

Provide the name of a text file for training \(or type no-more\)[no-more]: bible.txt

Provide the name of a text file for training \(or type no-more\)[no-more]: archie_comics.txt

file named archie_comics.txt rejected: cannot be opened

Provide the name of a text file for training \(or type no-more\)[no-more]: war_and_peace.txt

Provide the name of a text file for training \(or type no-more\)[no-more]:

Dump (True or False) Semantic dictionary?[False]:

Enter name of problem file[synonym_problems.txt]: simple_problems.txt

Correct: 'earnest' is most like 'serious' from ['serious', 'amusing']

Incorrect: 'picture' is most like 'painting', not 'table' from ['house', 'painting', 'table', 'chair']

Incorrect: 'vexed' is most like 'annoyed', not 'amused' from ['amused', 'annoyed']

Correct: 'watch' is most like 'see' from ['hear', 'see', 'smell']

Metric failure: could not choose synonym for 'thief' from ['banker', 'robber', 'postman']

40.0% correct

It took my code about 8 seconds to execute the batch self-check file for this assignment. The "slow" lines were 13 (a method you

write) and 23 (loading a huge file that I have precomputed for you, used in many later tests).