联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2020-11-28 11:14

HW1 (Due date: 11/12).

Upload your answers in a Word document to Canvas with your name in the filename. For code

questions, paste your code in Word and the corresponding results (print out needed outputs and

figures), and add some explanations. Alternatively, you can also submit a jupyter notebook file with

some comments. For by hand calculation problems, you may write the answers in word or on a piece of

paper and take a picture and paste the picture into Word.

1. Gradient accumulation for a 2-layer neural network with 2-dimensional input (5 points).

x = [11; 5]; W1=[1.1, 2.1, 4; 0.8, 3.2, 3.3] (a 2x3 matrix); b1=-1.3; W2=[1/8; 1/6; 1/7] (a 3x1

matrix); b2=0.1. ?? = ???

?

(???

??? + ???

) + ???. The observation y*=20. Use squared loss. Calculate:

(a) what is dL/dW1 (hint: you should have a matrix of 6 values here)

(b) with a learning rate of 0.01, if you update the weights, what are the new weights?

(c) after the update, what is the new loss?

2. Clustering and dimensional reduction. From the CAMELS dataset (attributes.csv), we can

extract the following attributes: (i) aridity: annual potential evapotranspiration (PET) divided by

precipitation; (ii) precipitation seasonality index; (iii) fraction of precipitation falling as snow.

(a) Define the distance as Euclidean distance of the above three indices. Run a k-means

clustering of the CAMELS basins. How many clusters should you set? Show the

total_sum_of_squared_distance vs k plot to justify your choice.

(b) There are 17 attributes in attributes.csv, use principal component analysis to find the first

principal components. Do scatter plot of basins on the 2D plot with PC-1 and PC-2 as the axes.

Better yet, use colors to indicate which cluster they belong to.

3. Boosting and feature importance. Still working with the CAMELS dataset, extract annual

average runoff from runoff_mm.csv (we did this in hw1). Together with the 17 attributes, you

have 18 attributes. Normalize the attributes first.

(a) Write a loop, in each iteration, predict one of the attributes using xgboost with the rest 17

attributes as inputs. You can predict all 18 attributes with this loop. Which attribute has the

highest predictability?

(b) For the most predictable attribute you found in (a), use permutation_importance to rank the

feature importance.

4. Neural network training. Use 80% of the basins as train and 20% as test. Report both the train

and the test metrics.

(a) For the problem of predicting annual average runoff_mm using the other 17 attributes, write

a PyTorch code to train a 2-layer neural network.

(b) write an two-layer MLP as an autoencoder for the 17 catchment attributes (not including

runoff) with a hidden size of 4 or 6. What kind of reconstruction error do you get for these two

setups?


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp