联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Matlab编程Matlab编程

日期:2020-06-07 07:07

ANN workshop using WEKA and second part of assignment

You may find this material useful if you're stuck or want more information about what you are

working on. Anything marked with * is particularly useful for your assignment, so it would be a good

idea to become familiar with it in your own time.

Download and install Weka from https://www.cs.waikato.ac.nz/ml/weka/ . Version 3.8 is the latest

stable version. You may be redirected to SourceForge for the download.

http://www.cs.waikato.ac.nz/~ml/weka/arff.html * - Information about the .arff format. This

contains additional information about .arff files that you may find useful in your own work.

http://en.wikipedia.org/wiki/Backpropagation - More about the backpropagation algorithm used by

the multi-layered perceptron model.

http://en.wikipedia.org/wiki/Delta rule - Information about the delta rule used by the multi-layered

perceptron model for learning.

http://en.wikipedia.org/wiki/Machine vision\#Image processing * - A few approaches to similar

problems as you will be required to work on in your assignment.

Activity 1: Introduction to ANNs in WEKA

1. Open WEKA. On SCMS machines, you will find the software package under Start-All

programs – Research Tools. Click Explorer when the Weka GUI appears. You should now see

this:

2

2. Click on ‘Open file…’ and load the “iris.arff” file available from AUT On Line.1 Easiest would

be to download the file first from AUT OnLine onto a memory stick, where you can then

store the results of the workshop. You can use the ‘browse’ option under ‘Open file…’ to

navigate to your memory stick. You should now see this:

3. Go to the Classify tab, click on ‘Choose’ , go to the ‘functions’ folder, and choose the

MultilayerPerceptron function.

4. Next to the ‘Choose’ button, you should see MultilayerPerceptron, along with some letters

and numbers. Click to the right of those letters and numbers. You should see a window

called the gui generic object editor with multiple drop-down selectors, some text fields, and

1

Look under ‘Resources’ on AUT OnLine under the Nature Inspired Computing COMP701 pages.

3

two buttons (next screenshot). This is the settings page for the MultilayerPerceptron

function. If you click on ‘More’, you will get an additional window of details and information

about the various options and parameters you can set for the perceptron.

5. Set the GUI selector to `true', the ‘hiddenLayers’ field to “10” (without quotes), and the

nominalToBinaryFilter, normalizeAttributes and normalizeNumericClass selectors to “false”.

‘10’ here refers to the number of hidden nodes in the hidden layer, not the number of

layers. Lastly, set the trainingTime text field to 5000, and click OK (see next picture). You can

get more information on what each field of the perceptron GUI does by clicking on ‘More’.

6. You are now back to the Weka GUI. Select the ‘Use training set’ option, and click Start. You

should see a new window appear with some circles and lines drawn on it. This is a graphical

representation of your neural network before it starts learning.

4

Answer the following questions.

Q1. What colour are the input layer, hidden layer and output layer?

Q2. How many nodes are in each of the layers?

7. Click the Start button, and then observe the Error per Epoch label. Wait until all 5000 epochs

are complete, and note the final value of the Error per Epoch label. Then click Accept. Check

the confusion matrix at the bottom of the results window and the information immediately

above the matrix. An example is below.

Q3. For your run, how many instances have been correctly classified and which ones were

incorrectly classified?

Q4. What is the mean absolute error?

Q5. You can identify the overall accuracy of your ANN by looking at the weighted average figure

under the TP rate. What is it?

5

Q6. Scroll to the top of the ‘classifier output’ window. Now scroll down again. What transfer

function is being used to calculate the output of neurons?

8. Click on the MultilayerPerceptron window entry again. This time, change the number in the

‘hiddenLayers’ field to ‘0’. Press OK and then Start.

Q7: Looking at the new architecture, in what way is the neural network different?

9. Press Start on the ANN GUI. Wait until the ANN stops and press ‘Accept’. Now look at the

results window in Classifier output.

Q8: How many instances have been correctly classified and which ones were incorrectly classified?

Q9: What is the absolute mean error?

Q10: What is the overall accuracy?

Q11: Why is there a difference in accuracy in comparison to your previous run?

10. Your first workshop task is to adjust the number of hidden units until you get identical

accuracy figures to Q5.

Q12: What is the minimum number of hidden units to achieve comparable results to your answer

to Q5 above?

11. Go back to the MultilayerPerceptron window, click and open up the ANN GUI. This time,

change the learning rate to 0.0000001 and the momentum to 0.0000001. Leave everything

else alone (5000 epochs, your choice of hidden units). Click OK. Click Start in the Weka GUI

and Start in the ANN GUI.

Q13: What are the error per epoch and accuracy rates now? Why is there a difference between

these results and earlier results?

12. Repeat step 11 but this time change the learning rate and momentum to 0.9. Re-run the

ANN.

Q14: What are the error per epoch and accuracy rates now? Why is there a difference between

these results and earlier results?

13. Repeat step 11 but this time change the learning rate and momentum to 0.1. Re-run the

ANN.

Q14: What are the error per epoch and accuracy rates now? What can you conclude about the

learning rate and momentum values?

14. Separately, open the ‘iris.arff’ file in Notepad and examine its contents. Do not double-click

the file, otherwise Weka will fire up again. Your task is to understand the file contents,

especially the required file format. The critical parts are those lines that do not start with

‘%’, which means a comment in arff format. So you should concentrate on those parts of the

6

file starting with ‘@’. Use the first link at the top of this document to answer the following

questions.

Q15: What are the three compulsory parts of any arff file and what do they signify?

Q16: How many different attribute value types are there (e.g. ‘real’, etc)?

Q17: How are class values specified and where must class values appear in the file?

Q18: How are values within a ‘sample’ separated?

15. Your next task is to understand how a training-test regime can be created. You should still

have the Notepad iris file open. Create a new file. Go down to line 64 of the ‘iris.arff’ file and

copy and paste the Relation and Attribute headers to the new file (i.e. lines 64-72 inclusive).

16. Now, at random, cut and paste one example each of ‘Iris-setosa’, ‘Iris-versicolor’ and ‘Irisvirginica’

to your new file. It is important that you remove these three samples from the

‘iris.arff’ file. Make sure that, in your new file, each sample is on its own line.

17. Your new file should now contain copies of lines 64-72 of the ‘iris.arff’ file and only three

different iris samples, with their class values. Save your edited ‘iris.arff’ file as

‘iristraining.arff’ and the new file ‘iristesting.arff’. (Default in Notepad is to affix ‘.txt’ to each

file. To ensure that the suffix is ‘.arff’, use the double-quote save option, i.e. when asked for

a file name for your two new files, enclose the training and testing file names in double

quotes.

18. Keep Notepad open, however, in case you need to correct any editing errors you have

made: Weka will inform you when you load these files if they are correct or incorrect. If the

files are not formatted properly, you will get an error message when opening these files,

with a file line number where the error has occurred. If you do get an error, you can switch

to Notepad to identify and correct the error before opening the files again.

19. Return to Weka. Under ‘Preprocess’, open the ‘iristraining.arff’ file. Under ‘Classify’,

continue to use the muiltilayer perceptron. However, this time, check the ‘Supplied test set’

option and select your just generated ‘iristesting.arff’ file.

20. Use whatever parameters you wish for the perceptron. It is recommended that you keep the

GUI (true), autobuild (true), hidden layers (‘a’ or a number you choose), the default learning

and momentum rates and training time as 5000. All other options should be false.

21. Click on Start (Neural Network GUI). After the perceptron finishes, click Accept and look at

the Classifier output window in Weka. Go to the top of the window and scroll down. At the

bottom you will see information on how the perceptron, after training on 147 samples, has

classified the three ‘unseen’ or test cases.

22. Note that this is how you will test your mutated characters for your assignment. That is,

each of your mutated character sets will need to be named as a ‘supplied test set’ after you

have trained your ANN on the original, unmutated characters as the ‘use train set’. When

7

you test your ANN on a supplied test set, the learning algorithm is switched off and the

weights of your trained ANN are ‘clamped’ (i.e. they will no longer change when presented

when samples of your test sets). So you are testing your mutated character sets using the

weights obtained from your original, unmutated training set of characters.

Q19: What is the mean absolute error for your three test cases?

Q20: What was the classification accuracy on the three test cases?

23. Congratulations. If you have reached this far, you are almost ready to start your ANN

assignment. But first, here are some more options for you try.

24. Run a basic perceptron on the iris data using 10 hidden units, as described in paragraph 5

above. Use the training set only. Then run the perceptron with the ‘normalizeAttributes’ to

False and then True. Is there a difference in the resulting confusion matrices? You will

probably not use this feature for your assignment, but you should be aware that

normalization is available. When should you normalize the attributes, and why? Use the

internet to answer this question. You should set this option back to ‘False’ when you finish.

25. Open the ‘capitalletters.arff’ file in Weka. Also, open this file in Notepad and understand its

contents. You can use this file as a template for your assignment but you will need to

understand its contents.

26. Train a perceptron on this capital letters file. Use the ‘Use training set’ option (i.e. uncheck

the ‘supplied test test’ option first). Experiment with this capital letters file and create some

test sets. If you can do this, you are now ready to start the second part of your assignment

(i.e. the ANN part of your two-part coursework). But here are two final questions.

Q21: What does ‘a’ mean in the hidden layers window of the perceptron GUI? (See

http://weka.sourceforge.net/doc.stable/weka/classifiers/functions/MultilayerPerceptron.

html, ‘setHiddenLayers’)

Q22: What do you notice about the thresholds of the sigmoid nodes in the classifier output

window? Are they really thresholds? What are they? See the discussion at

http://comments.gmane.org/gmane.comp.ai.weka/27885

8

NIC ANN assignment details 2020

The aim of this assignment is for you to undertake a neural network exercise in simplified CAPTCHA

recognition. You are recommended to use Weka for this part of the assignment but you can also use

Matlab if you wish. Matlab has some advanced features but is also difficult to use for new users.

Your report on your ANN experiments will form the second part of your one (single) report covering

both the GA assignment and ANN assignment. Each is effectively worth 50%, making 100% of your

paper mark.

1. The first part of your assignment 2 is as follows. You will design, develop, train and test a

neural network that can recognise 10 correctly formatted visual patterns of your choice.

You will therefore need to generate 10 training samples (all as binary input matrices)

containing 1s and 0s that represent the pixels of the visual patterns you are attempting to

classify.

Once you have trained your ANN to recognise the 10 patterns, you will test the ability of

your ANN to still correctly identify the test patterns after you have mutated the patterns one

bit at a time. YOU MUST NOT RE-TRAIN YOUR ANN ON THE TEST PATTERNS. You must use

the same weights you obtained on the training set to test the mutated patterns without any

further changes to the weights.

This is the recommended procedure:

a. Generate 10 training samples. Don’t forget to add the desired output information.

Do not mutate any of these samples in the training set.

b. Design an ANN for training on these 10 patterns. You must use the ‘Use training set’

option here after reading the patterns into the neural network using the ‘Explorer’

tab. Run the ANN on your character training set. All training is finished at this stage.

c. Once you have trained the network, generate three test files from your training

patterns, as follows:

i. Test1 that contains the same 10 patterns as the training set, except that one

‘1’ bit somewhere in each input pattern is changed to ‘0’, or ‘0’ is changed to

‘1’.

ii. Test2 that contains the same 10 patterns as the training set, except that two

‘1’ bits somewhere in each input patters are changed to ‘0’, or two ‘0’s are

changed to ‘1’.

iii. Test3 that contains the same 10 patterns as the training set, except that

three ‘1’ bits somewhere in each input pattern are changed to ‘0’, or three

‘0’s are changed to ‘1’.

d. Test your previously trained network first with Test1, then Test2 and finally Test3.

You must enter these files in the ‘Supplied test set’ option. Make a note of the

9

results (i.e. the accuracy and other measures of the test sets). DO NOT RE-TRAIN

YOUR ANN ON THE TEST SET EXAMPLES!

e. Write a report (maximum 5 pages) which includes details of how you generated your

training set, how you represented the patterns in pixel matrix, how effective your

ANN was at learning the training patterns including details of all parameters used,

and how accurate your trained ANN was on the three test sets. In your conclusion

evaluate what you have done, including references to the ability of the network to

degrade gracefully in the face of noise.

2. Consider one or more of the following variations:

a. Amend the ANN architecture so that the network returns good results despite the

increasing severity of changes to the test patterns;

b. Identify at what point the ANN fails to recognise characters, no matter what you do

to try to improve the architecture.

c. Instead of removing or adding bits at random for your test set, remove and add bits

at random.

d. You may wish to compare your ANN results with another of the methods under

Functions in Weka.

3. If you are interested in using a different language, read

http://panl10n.net/english/Outputs%20Phase%202/CCs/Laos/Papers/2008/LaoOCRTechnicalReport.pdf

You will find an example .arff file containing all 26 English capital letters in the Resources pages

of Nature Inspired Computing COMP701 on AUT OnLine. These are meant to guide your

assignment. If, however, you wish to use this arff file for your own assignment, please be aware

that you may not get the best marks for that part of the marksheet dealing with ‘Generating

training and test sets’. See the final page of this handout for the marksheet.

Further information now follows. Please also look at the slides (with audio) that provide further

information on assignment 2 in the Assessment folder of Blackboard.

10

Further information

You have two challenges.

Data representation

The first challenge will be for you to find and represent 10 visual patterns. You need to decide on a

suitably sized pixel matrix to be able to represent your 10 symbols, characters, etc. For instance,

imagine you use a 5x5 matrix for representing uppercase English letters, Then ‘A’ could be

represented as:

‘E’ could be represented by:

A 5x5 matrix is obviously limited in its representational capability. You will probably need to use at

least a 5x7, 7x7 or even 10x10 matrix. The number of input units and the number of attributes will

always equal to the size of your pixel matrix. That is, a 5x7 matrix will give you 35 input

nodes/attributes.

For Weka, you will need to declare all attributes. One way to represent these characters is to declare

attributes by their row and column number starting from the top left hand corner. So, for ‘E’ above:

Position11 = 0 (i.e. row 1, column 1; top left hand corner)

11

Position 12 = 1 (i.e. row 1, column 2; second column along the top)

where the x and y in ‘Positionxy’ refer to the row and column numbers, respectively, of the pixel

matrix (starting from the top left hand corner and working across and down). Notice how a ‘1’

means that that particular position in the matrix is occupied by part of the symbol/letter and a ‘0’

means blank.

You will need to identify a suitably large matrix for increasingly sophisticated patterns. Once you

have done that, you can then declare all the positions one by one using Weka declaration format:

@ATTRIBUTE Position11 {0,1}

@ATTRIBUTE Position12 {0,1}

@ATTRIBUTE Position55 {0,1}

for as many attributes as you have pixel positions. The data file will then follow @DATA declaration

as a series of 1s and 0s for each file. For instance, for ‘E’:

0,1,1,1,0,0,1,0,0,0,0,1,1,1,0,0,1,0,0,0,0,1,1,1,0 followed by desired output value ‘E’ for this sample.

Please look at the ‘iris.arff’ file for another example of how data is stored and presented to Weka.

Representing the target value

The second challenge is how to represent the desired target value for each sample. If you are

recognizing letters, you can attach, for example, ‘E’ as the desired value at the end of the sample

above. You will need to declare

@ATTRIBUTE class {A,B,C,D,E,F,G,H,I,J}

12

as part of the declarations in your Weka file (if you are trying to learn the letters A-J), or whatever

class values you desire.

If you are trying to recognise other images, such as road signs, you can specify such targets as “stop”

“give_way” as possible targets. So your target values do not have to be single characters. But

whatever your target values, you will need to declare them all after an ATTRIBUTE label. You can call

your target attribute whatever you like. In the sample above, ‘class’ was used as the label, but you

may wish to use, say, “street_sign” as the attribute label.

Presentation of report

You will be provided with example ANN reports to help with your presentation of experimental

results. As with the first assignment, you must keep to a maximum page limit of 5 pages (including

all references, tables and figures), using IEEE format.

13

Marksheet for NIC Coursework – Part 2 (ANNs)

1. Background understanding (20%)

2. Generating training and test sets (20%)

3. ANN design, architecture and parameters (20%)

4. Experiments and Results (20%)

5. Discussion, Conclusion, Further Work (20%)

Marker’s summary of Part 2 (100; you mark will be divided by 2

to reduce to a mark out of 50):


版权所有:留学生编程辅导网 2018 All Rights Reserved 联系方式:QQ:99515681 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。