联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2023-04-18 09:07

ENGG2112 Coding Assignment

Due on 23 April 2023, 11.59pm

10 APRIL 2023

Instructions

This is an individual assignment and the submitted work must be your orig

inal work. You are allowed to discuss the method of solution with others,

however the submitted code must be entirely written by you.

Submit your work as a Python notebook in the template provided.

Submissions must be made through Canvas only, and not by e-mail. The

deadline will be strictly enforced: 11:59pm on 23 April 2023. (Students with

disability adjustments will be contacted separately.)

Please plan your time according to your own ability and schedule, seek help

from the teaching team and peers in a timely fashion, and try not to ask for

deadline extensions.

Download the fifile House_Rent_Dataset.csv from the Canvas website. Use

the notebook template ENGG 2112 coding assignment 2023 (1).ipynb to an

swer the following questions.

12

Description of the Dataset

In this Dataset, we have information on almost 4700+ Houses/Apartments/Flats

Available for Rent with different parameters like BHK, Rent, Size, No. of Floors,

Area Type, Area Locality, City, Furnishing Status, Type of Tenant Preferred, No. of

Bathrooms, Point of Contact.

Dataset Glossary

BHK Number of Bedrooms, Hall, Kitchen

Rent Weekly Rent of the Property

Size Size of the Property in Square Feet

Floor Floor location of property and total number of flfloors in building (e.g. Ground

out of 2, 3 out of 5, etc.)

Area Type Size of the property calculated on either Super Area, Carpet Area or

Build Area.

Area Locality Locality of the Property

City City where the Property is located

Furnishing Status Furnished, semi-furnished or unfurnished

Tenant Preferred Type of tenant preferred by the owner or agent

Bathroom Number of bathrooms

Point of Contact Person to contact for more information

Problem 1

(This problem is worth 2 marks in each part, 6 marks in total.)

1. Find the minimum, maximum and average rent in the entire dataset. Assign

these values to the variables rent_min, rent_max and rent_avg respectively.

2. Find the subset of data records that satisfifies the following conditions:

? The posted date is in June 2022 (i.e. 1st to 30th June 2022).

? Information on the property should be obtained from the agent.

? The size of the property is at least 1,000 square feet.

Create the dataframe df_q2 to hold the data, and determine the number of

eligible records/samples. Put this value in the variable num_rows.3

3. In the fifirst cell, plot a histogram of property sizes. In the second cell, plot a

scatter plot of property size versus date of posting. Ensure that the date is in

ascending (i.e. chronological) order.

Problem 2

1. (3 marks) Use the columns “BHK", “Size", “Area Type" and “Bathroom" to

build a linear regression model to predict the rent of a property. Convert all

the categorical data into binary variables using one-hot encoding. Use 75% of

the data for training, with the random state set to 2112. Find the coeffificient of

determination R2 and the mean squared error, and store these values in the

variables R2 and mse respectively.

2. (6 marks) Use the columns “BHK", “Size", “Floor", “Area Type" and “Bath

room" to build the following three classififiers to predict the furnished status

of the property:

a) Logistic regression with max_iter = 1000.

b) Multi-layer perceptron with one hidden layer of 100 neurons, maximum

number of iterations = 500, and random state = 2112.

c) Gaussian Na?ve Bayes

Process the data as in Problem 2.1 above. In addition, for the “Floor" column,

extract the information to two new columns “Floor_new" and “Total_flfloor”,

containing the flfloor location of the property and the total number of flfloors

in the building, respectively. Transform the “Floor” information as follows:

Ground → 0, Upper Basement → ?1, Lower Basement → ?2. Insert the two

new columns into the dataframe and delete the original column “Floor".

Compare the performance of the three classififiers on the test data. The evalu

ation metrics are f1 score and accuracy, both stored in the variable result.

Problem 3

(5 marks) Use the columns “BHK”, “Size” and “Bathroom” in a K-nearest neigh

bours (KNN) predictor of rent and furnished status. The model needs to be built

from scratch, i.e. without using any pre-packaged function that implements KNN

in existing libraries. The data should fifirst be pre-processed using min-max normal

ization, i.e. replace each feature xi , with minimum and maximum values across the

dataset of xi,min and xi,max, with the normalized feature.

Test your function knn using the two sample data records provided in the last two

cells of the Python answer notebook.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp