联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2023-10-11 09:43

COMP6311-Advanced Data Analytics

Assignment 1 (Due date: 23:59, 9 October 2023)

Introduction

Suppose there is a real estate company based in the United States, specializing in the sale of

apartments in various locations. In the real estate market, apartment prices are influenced by a

variety of factors, including location, size, noise level, air conditions, etc. To help real estate

investors make informed decisions, the company regularly releases information on apartment sales

in different areas. By providing comprehensive sales data, the company empowers investors to

design accurate and effective apartment price prediction systems. By analyzing sales data,

investors can identify latent patterns and develop predictive models that are useful to make datadriven decisions on apartment transactions.

Datasets

The datasets are described as follows:

1. Train_Data.csv contains 4000 samples of estate basic information, and the target variable is

the Total Cost:

● Property size – number of rooms in the house.

● Community safety score – the higher the safer.

● Residence space – square feet area of the living rooms.

● Building space – square feet area of the whole building.

● Noise level – the lower the value, the greater the noise.

● Waterfront – If the house has water front or not.

● View – Number of viewings before the house is sold.

● Air quality index – the higher the value, the better the air quality.

● Aboveground area – square feet area of the above house.

● Basement area – square feet area of the basement in the house.

● Construction year – the year in which the house was built.

● Decoration year – the year in which the house was decorated.

● District – the address of the house.

● City – the city in which the house is located.

● Zip code – the zip code of the house.

● Region – the region of the house.

● Exchange rate – when the house is sold, the exchange rate between the US dollar and the

Hong Kong dollar.

● Unit price of residence space – the unit price of residence space (US dollar).

● Unit price of building space – the unit price of building space (US dollar).

● Total cost – the total price of residence and building space (Hong Kong dollar).

2. Test_Data.csv contains 400 samples of estate basic information and the total cost is unknown.

Task

Task 1: Total Cost Comparison

Please compare the average total cost (in Hong Kong dollar) between the “economical houses”

and all the houses in each city. Here, total cost = (unit price of residence space * residence space

+ unit price of building space * building space) * exchange rate, and an “economical house”

should satisfy the following two requirements:

(i) its construction year is after 1995 (not including 1995), and

(ii) its residence ratio = residence space / (residence space + building space) is greater than 24%

(not including 24%).

You are required to use MapReduce to conduct the calculation by 5 mappers and 2 reducers.

MapReduce example:

https://colab.research.google.com/drive/1cqgjCH9ZCXedswxmND5u3Ma3HIC68gkY?usp=shar

ing

This is an example of implementing MapReduce in Google Colab. You can freely access Colab

resources by logging in your Google account.

Task 2: Total Cost Classification

Suppose you are a real estate investor who does not know the unit price of the house (including

both residence space and building space). You need to remove columns Unit price of residence

space and Unit price of building space from Train_Data.csv, and design a machine learning/deep

learning model that predicts the total cost of each house. Then, you need to evaluate the model

performance by using Test_Data.csv.

You are only required to predict the price range of the total cost for each sample in Test_Data.csv.

The label is organized in four classes including:

? 1: it means the total cost is less than 300000HKD (i.e., 0 <= total cost < 300000).

? 2: it means the total cost is greater than or equal to 300000HKD and less than 500000HKD

(i.e., 300000 <= total cost < 500000).

? 3: it means the total cost is greater than or equal to 500000HKD and less than 700000HKD

(i.e., 500000 <= total cost < 700000).

? 4: it means the total cost is greater than or equal to 700000HKD (i.e., 700000 <= total

cost).

Submission Format

1. For task 1, first, save your MapReduce results in a single file named “mapreduce_result.csv”,

which contains columns of: (1) city, (2) average total cost of economical houses, and (3)

average total cost of all houses. Then, package your mapreduce_result.csv and MapReduce

source codes as a zip file. Please rename it as student_ID_task1.zip.

2. For task 2, use your designed model to predict the total cost, and fill your results in

Test_Data.csv. Then, package your Test_Data.csv and the source code of model training as a

zip file. Please rename it as student_ID_task2.zip.

Grading Criteria

● The program needs to be clearly annotated and a detailed Readme file should be provided.

● Task1: we will check the results and the logic you implement the MapReduce functions.

● Task2: we will compare your predicted results in Test_Data.csv with the Ground-truth values,

and the performance evaluation is based on the Top1-Accuracy.


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp