联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Java编程Java编程

日期:2023-07-18 09:11


Erick Purwanto and Teng Ma – July 2023

CPT111 2223 Resit-CW Task Sheet

Overview

Resit Coursework (Resit-CW) is the final coursework component of the course for resit

students. It contributes to 100% of your final marks.

You will use your object-oriented techniques, file processing, and data structures you

have learned throughout the semester to solve a problem that consists of two main

tasks. In addition, you will create a video presentation to showcase your problem

solving knowledge and algorithm analysis skill, which mainly involves string

processing. You need to complete Java Code, Ethic Quiz, MP4 and PPT presentation.

Timeline

Resit 1

st Week, Resit-CW package is released, containing

July 11, 2023 this task sheet, skeleton codes, and partial test cases.

Resit 2nd Week, Resit-CW Java Code Online Quiz and Ethic Online Quiz

July 21, 2023, 18:00 CST are open;

23:59 CST are closed.

Resit 2nd Week, Video MP4 Dropbox and PPT Dropbox

July 17, 2023, 14:00 CST are open;

July 21, 2023, 23:59 CST are closed.

Late Submission Period 5% lateness penalty per-day only for Video and PPT.

No lateness allowed for Code / Ethic Quiz.

July 28, 2023, 23:59 CST End of Late Submission Period.

No submissions are accepted thereafter.

University Lateness Policy

Video MP4 and PPT are allowed to have late submission with penalty for max 5 days.

There will be no late Code or Ethic Quiz submissions since feedback is given by the

quiz. This is consistent to University lateness policy on not having late submission

period for assessment with feedback.

Outline

The rest of the task sheet will describe the background of the problem, detailed

specification of the two main tasks, and the deliverables you have to submit.

CPT111

Erick Purwanto and Teng Ma – July 2023

Resit-CW – DNA for Profiling and Disease

Detection

Background

DNA carries the genetic information in living beings. Interestingly, it has been used in

criminal justice system for profiling work, as well as disease diagnosis in medicine. In

this resit coursework, your task is to develop algorithms for those two purposes.

DNA

Deoxyribonucleic acid (DNA) is a sequence of molecules called nucleotides, arranged

into a double helix shape. Each nucleotide of DNA contains one of four different

bases: Adenine (A), Cytosine (C), Guanine (G), or Thymine (T).

Every human cell has billions of these nucleotides arranged in sequence. Some

portions of this sequence are the same or very similar, across almost all humans.

However, there are some portions of the sequence have a higher genetic diversity

and thus vary more across the population.

Short Tandem Repeats (STRs)

One place where DNA tends to have high genetic diversity is in Short Tandem

Repeats (STRs). An STR is a short sequence of DNA bases that is repeated

continuously numerous times at specific locations in DNA. The number of times any

particular STR repeats varies a lot among different people.

CPT111

Erick Purwanto and Teng Ma – July 2023

In the DNA samples below, for example, Alice has the STR AAGT repeated back-toback three times in her DNA, while Bob has the same STR repeated back-to-back four

times.

DNA Profiling and Database

DNA profiling is a procedure used to identify individuals on the basis of their unique

genetic makeup. Recording the number of STR of the population in a DNA database,

and then firstly using it for searching can help speeding up the identification process.

Using multiple STRs, we can improve the accuracy of DNA profiling. If the probability

that two people have the same number of a single STR is 5% and we look at 10

different STRs, then the probability that two DNA samples match solely by chance

(assuming independence of all STRs) is about 1 in 1 quadrillion. So, if two DNA

samples match in the number of continuous repeats for each of the STRs, we can

have enough confidence that they came from the same person.

Let us have a very simple DNA database in the form of a CSV file. Each row

corresponds to an individual, and each column corresponds to a particular STR.

For example, database.csv contains:

name,AAGT,ACTC,TATG

Alice,22,35,18

Bob,16,20,18


The data in the above CSV file would suggest that Alice has the sequence AAGT

repeated 22 times consecutively somewhere in her DNA, the sequence ACTC

repeated 35 times, and TATG repeated 18 times. Bob, meanwhile, has those same

three STRs repeated 16 times, 20 times, and 18 times, respectively.

Next, a sequence of DNA is queried to the database. Given that sequence of DNA,

how can one identify to whom it belongs? Well, for example, one may first search for

the longest length of consecutive repeats of AAGT in the sequence, followed

similarly by ACTC and TATG. If one then found that the longest sequence of AAGTs is

22 repeats long, ACTCs is 35 repeats long, and TATGs is 18; one may as a result

conclude that the DNA was Alice's. Finally, it's also possible that after one takes the

CPT111

Erick Purwanto and Teng Ma – July 2023

counts for each of the STRs, it doesn't match anyone in the DNA database, in which

case one reports no match.

One of your task is to write a program that will first take a CSV file containing STR

counts for a list of individuals, build a DNA database of your own, take another TXT

file that contains a DNA sequence, and then output to whom the DNA belongs or

reports no match.

Huntington's Disease Diagnosis

Huntington’s disease (HD) is an inherited and terminal neurological disorder. It is a

condition that stops parts of the brain working properly over time, and is usually

fatal after a period of up to 20 years.

At this time, there is no cure for HD. However, in 1993, a group of scientists

discovered a very accurate genetic test for diagnosing HD. The gene that causes HD

is actually located on Chromosome 4, and has a consecutive repeats of CAG. The

normal range of CAG repeats is between 10 and 35. Individuals with HD have

between 36 and 180 repeats.

Doctors use a certain DNA test to count the number of CAG repeats; and consult the

following table to produce a diagnosis:

Number of Repeats Diagnosis

0 - 9 Faulty Test

10 - 35 Normal

36 - 39 High Risk

40 - 180 Huntington's

>= 181 Faulty Test

The other one of your task is to write a method that based on the DNA sequence

read before, will analyze that sequence for Huntington's disease and produce a

diagnosis following the table above.

CPT111

Erick Purwanto and Teng Ma – July 2023

Specification and Deliverables

In this section, you will find details about your implementation and the files that you

have to submit.

Specification and Use Cases

Your implementation must satisfy the following specification and use cases:

1. You will implement your program in DnaProfileDiagnosis.java.

2. A new object of DnaProfileDiagnosis is created by calling

DnaProfileDiagnosis constructor. The name of the CSV file containing

the DNA database would be passed to the constructor.

3. Your program should open the CSV file and read its contents into the instance

variables. You may assume that the first row of the CSV file will be the

column names. The first column will be the word name and the remaining

columns would be the STR sequences. The following columns would be the

actual name and the corresponding STR counts.

4. The name of the TXT file containing the DNA sequence would be passed to

the readDna instance method. Your program should open the TXT file and

read its contents into the instance variables.

5. The DNA sequence in the TXT file may contain some whitespace (spaces,

tabs, newlines). Your program should remove any whitespace before storing

and computing on it.

6. The method checkProfile could then be called, after setting the query

sequence. Your algorithm will try to match the STRs counts of the database

and the DNA sequence. If a match is found, the name of the individual will be

returned as a String, such as "Alice". Otherwise, the String "None

matches" will be returned.

You may assume the STR counts will not match more than one individual.

7. Calling the checkProfile method before setting the DNA sequence would

cause an IllegalArgumentException to be thrown.

8. The method diagnoseHd could also then be called after setting the DNA

sequence.

Your algorithm will perform a diagnosis based on the CAG repeats and the

table at the previous section. The output of the method would be one of the

following Strings: "Faulty Test", "Normal", "High Risk", or

"Huntington's".

9. Calling the diagnoseHd method before setting the DNA sequence would

cause an IllegalArgumentException to be thrown.

10. Another readDna calls may be made to change the DNA sequence.

CPT111

Erick Purwanto and Teng Ma – July 2023

Instance Variable and Complexity Requirements

In this Resit Coursework, to store, query and compute on the DNA database and the

DNA sequence, you must use ArrayList and/or TreeMap, and their methods. Failing

to satisfy this by using other data structures would result in getting 0 marks.

There is no requirements on the running time of your program.

Public API

public class DnaProfileDiagnosis {

// build a database from database.csv

public DnaProfileDiagnosis(String database)

// store a dna sequence with no whitespace from dna.txt

public void readDna(String dna)

// based on the STR counts, return either a name in

// database, or "None Matches"

// throws IllegalArgumentException if dna has not been set

public String checkProfile()

// based on the CAG repeats, return either "Faulty Test",

// "Normal", "High Risk", or "Huntington's"

// throws IllegalArgumentException if dna has not been set

public String diagnoseHd()

}

Sample Client

Your program should behave as the example below:

public class TestCoursework {

public static void main(String[] args) {

DnaProfileDiagnosis test = new DnaProfileDiagnosis(db1);

test.readDna(dna1);

System.out.println(test.checkProfile()); // Alice

System.out.println(test.diagnoseHd()); // Normal

test.readDna(dna2);

System.out.println(test.checkProfile()); // Bob

System.out.println(test.diagnoseHd()); // Huntington's

DnaProfileDiagnosis test2 = new DnaProfileDiagnosis(db2);

System.out.println(test2.checkProfile()); // IllegArgExce

// ception thrown

}

}

CPT111

Erick Purwanto and Teng Ma – July 2023

Video Requirements

Create a video and make a submission to Learning Mall with the following

requirements:

1. The video must contain description and discussion of the algorithms you use

to complete both the profiling and the diagnosis tasks, followed by their

running time analysis.

2. The length of the video must be less than or equal to 4 minutes.

Violating the length requirements will result in 0 marks of your video grade.

3. Your video must show your face for the purpose of authenticity verification.

Violating the showing face requirement will result in 0 marks in your video

grade.

4. You may want to make your video look nicer, however, the grade will not be

based on the looks. Only the quality and clarity of the algorithm description,

discussion and analysis will count.

A simple recording of a PPT explanation while showing the presenter's face in

a box by shared screen with BBB or Tencent Meeting would be sufficient.

5. Submit to Learning Mall the following:

a. The video file in .mp4

b. The PPT file you used to create a video

Grades

The marks of your submission:

1. Correctness of all the methods: 70 marks

(your code will be tested on a new set of test cases)

2. Algorithm discussion, analysis and clarity of the video: 25 marks

3. Ethics Online Quiz 5 marks

Total 100 marks

Academic Integrity

1. Plagiarism, e.g. copying materials from other sources without proper acknowledgement, copying, or collusion are serious academic offences. Plagiarism,

copying, or collusion will not be tolerated and will be dealt with in accordance

with the University Code of Practice on Academic Integrity.

2. In some cases, individual students may be invited to explain parts of their code

in person, and if they fail to demonstrate an understanding of the code, no

credit will be given for that part.

3. In more severe cases, the violation will be reported to the Exam Officer for

further investigation and will be permanently recorded in the student's official

academic transcript.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp