联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2021-03-17 11:39

CSE 231 Spring 2021

Computer Project #07

Assignment Overview

This assignment focuses on the implementation of Python programs to read files and process data by

using lists and functions.

It is worth 55 points (5.5% of course grade) and must be completed no later than 11:59 PM on

Monday, March 15.

Assignment Deliverable

The deliverable for this assignment is the following file:

proj07.py – the source code for your Python program

Be sure to use the specified file name and to submit it for grading via Mimir before the project

deadline.

Assignment Background

One commonly hears reference to “the one percent” referring to the people whose income is in the

top 1% of incomes. What is the data behind that number and where do others fall? Using the

National Average Wage Index (AWI), an index used by the Social Security Administration to gauge

individual's earnings for the purpose of calculating their retirement benefit, we can answer such

questions.

In this project, you will process AWI data. Example data for 2019 is provided in the file

year2019.txt (2019 is the most recent year of complete data). The data is a table with the first

row as the title and the second row defining the data fields; remaining rows are data. The URL for

the data is: https://www.ssa.gov/cgi-bin/netcomp.cgi?year=2019

Here is the second line of data from the file followed by descriptions of the data. Notice that some

data are ints and some are floats:

5,000.00 — 9,999.99 12,620,757 32,801,513 19.37150 93,403,927,820.81 7,400.82

Column 0 is bottom of this income range.

Column 1 is the dash separating the bottom of the range from the top (see note below).

Column 2 is the top of this income range (see note below).

Column 3 is the number of individuals in the income range.

Column 4 is the cumulative number of individuals in this income range and all lower ranges.

Column 5 is the Column 4 value represented as a cumulative percentage of all individuals.

Column 6 is the combined income of all the individuals in this range of income.

Column 7 is the average income of individuals in this range of income.

Note: The final row of the file is different than all the others. You must account for that.

Assignment Specifications

The program must provide following functions to extract some statistics.

a) def open_file():

Prompts the user to enter a year number for the data file. The program will check whether

the year is between 1990 and 2019 (both inclusive). If year number is valid, the program will

try to open data file with file name ‘yearXXXX.txt’, where XXXX is the year. Appropriate

error message should be shown if the data file cannot be opened or if the year number is

invalid. The year is invalid if it is not a number between 1990 and 2019, inclusively. The

invalid year error is shown in this case. If the loop is correct but the file does not exist, the

other error will be output. This function will loop until it receives proper input and

successfully opens the file. It returns a file pointer and year. Hint: use string concatenation

to construct the file name.

i. Parameters: None

ii. Display: prompt and error message

iii. Return: file pointer and int

b) def handle_commas(s,T) ? int or float or None

The parameters are s, a string, and T, a string. The expected values of T are int and

float; any other value returns None. If the value of T is int, the string s will be

converted to an int and that int value will be returned. Similar for float. If a value of

s cannot be converted to an int or float, None will be returned (hint: use tryexcept).

Note: this is the same function we had in Project 5.

i. Parameters: str, str

ii. Display: nothing

iii. Returns: int or float or None

c) def read_file(fp):

The function uses the file pointer parameter to read the data file. This function returns a list

of tuples where each tuple is the data on one line of the file, and is a mix of ints and floats as

follows:

tup = ((float, float), int, int, float, float, float)

the tuple is filled with the following data:

( (column 0, column 2), column 3, column 4, column 5, column 6, column 7)

Note that the numbers have commas that you should handle (Hint: use the handle_commas

function). There are also two header lines to skip. Also, the last line of the file has words

where data is supposed to be. Find which column this affects, and record that column as

None

i. Parameter: file pointer

ii. Display: nothing

iii. Return: list of tuples

d) def get_range(data_list, percent):

Takes a list of data (output from the read_file function) and a percent and returns data

for the first data line whose cumulative percentage (Column 5 in the data file) is greater than

or equal to the percent parameter. The function should return a tuple of the salary range

(Columns 0 and 2 in the file data) the cumulative percentage value (Column 5 in the data

file) and the average income (Column 7 in the data file):

( (column 0, column 2), column 5, column 7)

For testing using the 2014 data and a percent value of 90 your function will return

((90000.0, 94999.99), 90.80624, 92420.5)

i. Parameters: list of tuples, float

ii. Display: nothing

iii.Return: tuple

e) def get_percent(data_list, income):

Takes a list of data (output from the read_file function) and an income and returns the

income range (Columns 0 and 2 in the file) that the specified income is in the income range

(Columns 0 and 2 in the file) and the corresponding cumulative percentage (Column 5 in the

file).( (column 0, column 2), column 5 )

For testing using the 2014 data and an income value of 150,000 your function will return

((150000.0, 154999.99), 96.87301)

i. Parameters: list of tuples, float

ii. Display: nothing

iii. Return: tuple

f) def find_average(data_list):

Takes a list of data (output from the read_file function) and returns the average salary.

Round the result to cents (i.e. two decimal places) before returning the value.

Hints:

i. This is NOT (!) the average of the last column of data. It is not mathematically valid to

find an average by finding the average of averages—for example, in this case there are

many more in the lowest category than in the highest category.

ii. How many wage earners are considered in finding the average (denominator)? There

are a couple of ways to determine this. I think the easiest uses the “cumulative number”

column (Column 4 in the file), but using Column 3 is not hard and may make more

sense to some students.

iii. How does one find the total dollar value of income (numerator)? Notice that Column 6

in the file is the combined income of all the individuals in this range of income.”

For testing your function notice that for the 2014 data the average should be $44,569.20.

That value is listed on the web page referenced above.

iv. Parameters: list of tuples

v. Display: nothing

vi. Return: float # rounded to two decimal places

g) def find_median(data_list):

Takes a list of data (output from the read_file function) and returns the median income.

Unfortunately, this file of data is not sufficient to find the true median so we need to

approximate it (at least 50%).

i. Here is the rule we will use: find the data line whose cumulative percentage (Column 5)

is closest to 50% and return its average income (Column 7). If two data lines are equally

close, return the smaller.

ii. Hint: Python’s abs() function (absolute value) is potentially useful here.

iii. Hint: your get_range() function should be useful here. The get_range()

function returns the first tuple where the cumulative percentage is higher than a

particular percentage. For the median the percentage is 50%.

iv. For testing your function, using our rule, the median income for the 2014 data is

$27,457.00

v. Parameters: list of tuples

vi. Display: nothing

vii. Return: float

h) def do_plot(x_vals,y_vals,year) provided by us takes two equal-length lists of

numbers and plots them. You have to fill the two labels (replace the empty string with the

appropriate string. Note that if you plot the whole file of data, the income ranges are so

skewed that the result is a nearly vertical plot at the leftmost edge so close to the edge that

you cannot see it in the plot—it looks like nothing was plotted. Plotting the lowest 40

income ranges results in a more easily readable plot.

i) def main():

a) Open the file

b) Print the year.

c) Read the file

d) Print the average income.

e) Print the median income.

f) Prompt for plotting (yes/no).

If yes, plot the data: cumulative percentage (Column 5 in the file (y values)) vs. income

(Column 0 in the file (x values)). Call the do_plot() function to plot the data. Plot the

lowest 40 income ranges.

g) Loop, prompting for either “r” for range , “p” for percent, or nothing

i. r: prompt for a percent and output the income that is below that percent. The percent

needs to be valid (between 0 and 100 inclusive). Hint: Call the get_range()

function to get the range of income about that percentage. The bottom income range

is what we are looking for.

ii. p: prompt for an income and output the percent that earned more. The income needs

to be valid (positive). Hint: Call the get_percent() function to get the

corresponding cumulative percentage.

iii. if only a carriage-return is entered, halt the program

This is a new and different requirement. Hint: if someone simply hits the Enter key,

what will be the value input?

Assignment Notes

1. Items 1-9 of the Coding Standard will be enforced for this project.

2. Files for year2000.txt, year2014.txt and year2019.txt are provided so that you

can test your program.

3. For output you need to insert commas. There is a format specification, e.g. if you might have

formatted a floating-point value without commas as {:<12.2f} you can simply insert a comma

before the dot as in {:<12,.2f}.

Sample Output

Test 1

Enter a year where 1990 <= year <= 2019: 2019

For the year 2019:

The average income was $51,916.27

The median income was $32,452.59

Do you want to plot the data (yes/no): no

Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r

Enter a percent: 90

90.00% of incomes are below $100,000.00 .

Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p

Enter an income: 100000

An income of $100,000.00 is in the top 90.01% of incomes.

Enter a choice to get (r)ange, (p)ercent, or nothing to stop:

Test 2 (no plotting)

Enter a year where 1990 <= year <= 2019: 2000

For the year 2000:

The average income was $30,846.09

The median income was $22,458.80

Do you want to plot the data (yes/no): no

Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r

Enter a percent: 40

40.00% of incomes are below $15,000.00 .

Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p

Enter an income: 50000

An income of $50,000.00 is in the top 87.41% of incomes.

Enter a choice to get (r)ange, (p)ercent, or nothing to stop:

Test 2 (plotting)

Enter a year where 1990 <= year <= 2019: 2000

For the year 2000:

The average income was $30,846.09

The median income was $22,458.80

Do you want to plot the data (yes/no): yes

Enter a choice to get (r)ange, (p)ercent, or nothing to stop:

Test 3

Enter a year where 1990 <= year <= 2019: xxx

Error in year. Please try again.

Enter a year where 1990 <= year <= 2014: 1900

Error in year. Please try again.

Enter a year where 1990 <= year <= 2014: 1999

Error in file name: year1999.txt Please try again.

Enter a year where 1990 <= year <= 2014: 2014

For the year 2014:

The average income was $44,569.20

The median income was $27,457.00

Do you want to plot the data (yes/no): no

Enter a choice to get (r)ange, (p)ercent, or nothing to stop: r

Enter a percent: 70

70.00% of incomes are below $45,000.00 .

Enter a choice to get (r)ange, (p)ercent, or nothing to stop: p

Enter an income: 150000

An income of $150,000.00 is in the top 96.87% of incomes.

Enter a choice to get (r)ange, (p)ercent, or nothing to stop:

Function Test: read_data

year2014.txt

[((0.01, 4999.99), 22574440, 22574440, 14.27075, 46647919125.68, 2066.4),

((5000.0, 9999.99), 13848841, 36423281, 23.02549, 102586913092.61, 7407.62),

((10000.0, 14999.99), 12329270, 48752551, 30.81961, 153566802438.45, 12455.47),

((15000.0, 19999.99), 11505776, 60258327, 38.09315, 200878198035.07, 17458.9),

((20000.0, 24999.99), 10918555, 71176882, 44.99547, 245317570246.88, 22467.95),

((25000.0, 29999.99), 10192863, 81369745, 51.43903, 279865461187.05, 27457.0),

((30000.0, 34999.99), 9487840, 90857585, 57.4369, 307828947411.16, 32444.58),

((35000.0, 39999.99), 8578215, 99435800, 62.85974, 321200755103.44, 37443.78),

((40000.0, 44999.99), 7553972, 106989772, 67.63509, 320563569965.15, 42436.43),

((45000.0, 49999.99), 6542882, 113532654, 71.77126, 310391706424.23, 47439.6),

((50000.0, 54999.99), 5723269, 119255923, 75.38931, 300016377448.51, 52420.46),

((55000.0, 59999.99), 4846517, 124102440, 78.4531, 278354367841.41, 57433.9),

((60000.0, 64999.99), 4201232, 128303672, 81.10897, 262203932128.68, 62411.2),

((65000.0, 69999.99), 3573471, 131877143, 83.36799, 240948179180.4, 67426.93),

((70000.0, 74999.99), 3094739, 134971882, 85.32437, 224145278103.36, 72427.85),

((75000.0, 79999.99), 2684481, 137656363, 87.0214, 207853372824.62, 77427.77),

((80000.0, 84999.99), 2297338, 139953701, 88.4737, 189370862869.17, 82430.56),

((85000.0, 89999.99), 1975400, 141929101, 89.72248, 172719042418.7, 87434.97),

((90000.0, 94999.99), 1714370, 143643471, 90.80624, 158442931588.44, 92420.5),

((95000.0, 99999.99), 1486636, 145130107, 91.74604, 144858203365.61, 97440.26),

((100000.0, 104999.99), 1309068, 146439175, 92.57358, 134083282259.67,

102426.52), ((105000.0, 109999.99), 1117128, 147556303, 93.27979,

120020513136.11, 107436.67), ((110000.0, 114999.99), 977055, 148533358, 93.89745,

109855105705.14, 112434.93), ((115000.0, 119999.99), 865889, 149399247, 94.44483,

101693061676.62, 117443.53), ((120000.0, 124999.99), 773339, 150172586, 94.93371,

94660281091.31, 122404.64), ((125000.0, 129999.99), 673971, 150846557, 95.35977,

85886152964.93, 127433.01), ((130000.0, 134999.99), 595827, 151442384, 95.73643,

78899843713.01, 132420.73), ((135000.0, 139999.99), 527341, 151969725, 96.0698,

72476546845.3, 137437.72), ((140000.0, 144999.99), 466992, 152436717, 96.36501,

66519743635.12, 142443.0), ((145000.0, 149999.99), 419003, 152855720, 96.62989,

61787674520.19, 147463.56), ((150000.0, 154999.99), 384581, 153240301, 96.87301,

58607775121.57, 152393.84), ((155000.0, 159999.99), 335391, 153575692, 97.08503,

52801735517.69, 157433.37), ((160000.0, 164999.99), 296048, 153871740, 97.27218,

48087213596.86, 162430.46), ((165000.0, 169999.99), 265309, 154137049, 97.4399,

44426198104.69, 167450.78), ((170000.0, 174999.99), 239515, 154376564, 97.59131,

41304379348.95, 172450.07), ((175000.0, 179999.99), 216255, 154592819, 97.72802,

38370042895.27, 177429.62), ((180000.0, 184999.99), 200592, 154793411, 97.85483,

36588064085.78, 182400.42), ((185000.0, 189999.99), 179005, 154972416, 97.96799,

33554727208.93, 187451.34), ((190000.0, 194999.99), 165277, 155137693, 98.07247,

31807897759.84, 192452.05), ((195000.0, 199999.99), 154070, 155291763, 98.16987,

30425466536.83, 197478.2), ((200000.0, 249999.99), 1039897, 156331660, 98.82726,

230863458226.21, 222006.08), ((250000.0, 299999.99), 565105, 156896765, 99.1845,

153945762663.99, 272419.75), ((300000.0, 349999.99), 333584, 157230349, 99.39537,

107708119615.81, 322881.55), ((350000.0, 399999.99), 219923, 157450272, 99.5344,

82117070706.61, 373390.1), ((400000.0, 449999.99), 151162, 157601434, 99.62996,

63997346472.5, 423369.28), ((450000.0, 499999.99), 108881, 157710315, 99.69879,

51583042398.64, 473756.14), ((500000.0, 999999.99), 345935, 158056250, 99.91748,

230331407862.96, 665822.79), ((1000000.0, 1499999.99), 65548, 158121798,

99.95892, 78672933288.58, 1200233.92), ((1500000.0, 1999999.99), 24140,

158145938, 99.97418, 41431838733.52, 1716314.78), ((2000000.0, 2499999.99),

12137, 158158075, 99.98185, 26997226154.27, 2224373.91), ((2500000.0,

2999999.99), 6871, 158164946, 99.98619, 18747446313.27, 2728488.77), ((3000000.0,

3499999.99), 4799, 158169745, 99.98923, 15507304422.66, 3231361.62), ((3500000.0,

3999999.99), 3258, 158173003, 99.99129, 12166741762.34, 3734420.43), ((4000000.0,

4499999.99), 2353, 158175356, 99.99277, 9970953222.98, 4237549.18), ((4500000.0,

4999999.99), 1822, 158177178, 99.99393, 8633941395.34, 4738716.46), ((5000000.0,

9999999.99), 6468, 158183646, 99.99802, 43887775808.42, 6785370.41),

((10000000.0, 19999999.99), 2230, 158185876, 99.99942, 30065006121.19,

13482065.53), ((20000000.0, 49999999.99), 776, 158186652, 99.99992,

22450911983.01, 28931587.61), ((50000000.0, None), 134, 158186786, 100.0,

11564829969.82, 86304701.27)]

Function Test: find_average

Instructor: 44569.2

Student: 44569.2

Function Test: find_median

year2014.txt

Instructor: 27457.0

Student: 27457.0

--------------------

year2019.txt

Instructor: 32452.59

Student: 32452.59

Function Test: get_range

year2014.txt; get_range(data,90)

Instructor: ((90000.0, 94999.99), 90.80624, 92420.5)

Student: ((90000.0, 94999.99), 90.80624, 92420.5)

--------------------

year2014.txt,get_range(data,50)

Instructor: ((25000.0, 29999.99), 51.43903, 27457.0)

Student: ((25000.0, 29999.99), 51.43903, 27457.0)

--------------------

year2000.txt,get_range(data,90)

Instructor: ((60000.0, 64999.99), 91.31401, 62377.2)

Student: ((60000.0, 64999.99), 91.31401, 62377.2)

Function Test: get_percent

year2014.txt; get_precent(data,150000)

Instructor: ((150000.0, 154999.99), 96.87301)

Student: ((150000.0, 154999.99), 96.87301)

--------------------

year2014.txt,get_percent(data,50000)

Instructor: ((50000.0, 54999.99), 75.38931)

Student: ((50000.0, 54999.99), 75.38931)

--------------------

year2000.txt,get_percent(data,150000)

Instructor: ((150000.0, 154999.99), 98.72567)

Student: ((150000.0, 154999.99), 98.72567)

Function Test: handle_commas

s,T: 5 int

Instructor: 5

Student : 5

--------------------

s,T: 5.3 float

Instructor: 5.3

Student : 5.3

--------------------

s,T: 1,234 int

Instructor: 1234

Student : 1234

--------------------

s,T: 1,234.56 float

Instructor: 1234.56

Student : 1234.56

--------------------

s,T: 5.3 xxx

Instructor: None

Student : None

--------------------

s,T: aaa int

Instructor: None

Student : None

--------------------

s,T: 1,234.56 int

Instructor: None

Student : None

=====================================================

Scoring Rubric

Computer Project #07 Scoring Summary

General Requirements

______ 5 pts Coding Standard 1-9

(descriptive comments, function header, etc...)

Implementation:

__0__ (5 pts) open_file (manual grading)

__0__ (3 pts) Function Test handle_commas

__0__ (8 pts) Function Test read_file

__0__ (5 pts) Function Test find_average

__0__ (6 pts) Function Test find_median

__0__ (5 pts) Function Test get_range

__0__ (5 pts) Function Test get_percent

__0__ (5 pts) Test 1

__0__ (2 pts) Test 2 (no plotting)

__0__ (2 pts) Test 2 (plotting) (manual grading)

__0__ (4 pts) Test 3


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp