联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2022-11-19 03:55


SDSC4001 (Semester B, 2022)

Foundation of Reinforcement Learning

Assignment 2

All questions are weighted equally. For questions that require Python, please submit

the .py file and not the screenshot of the code.

Question 1. Use the idea of Bellman operator Tpi to solve the following equation

vpi =

3/4 1/4 0

1/4 3/4vpi,

where vpi ∈ R3. By applying Tpi, solve the above problem (in Python) with all combinations of

γ = 0.9, 0.999 and = 0.5, 0.01. What is the difference in terms of convergence? Discuss your

results and please submit your Python code.

Question 2. As a stock enthusiast, Tom decides to participate in the stock market and make

extremely risky investments on every Friday.

He plans to start with an initial budget $300, and he will stop when he either (i) lose it all

or (ii) double his money or above (i.e. ≥ $600).

On each Friday, he would invest in expiring option contracts that either (a) give him an

immediate reward at the end of that day or (b) lose all the money that he invested on that

day.

He has two strategies. On each Friday, he would choose either

Strategy A: Invest $100. With probability 0.45, it will return $200 (i.e. net gain $100),

and $0 (i.e. net gain ?$100) otherwise.

Strategy B: Invest $100. With probability 0.4, it will return $300 (i.e. net gain $200),

and $0 (i.e. net gain ?$100) otherwise.

1

His discount factor is γ = 1 (or you may use γ = 0.99999 as an approximation for computa-

tion)

Based on the above information,

(a) Model this problem as a MDP. Define the state space, action space, reward function, and the

transition kernel.

(b) Suppose Tom would choose one strategy (A or B) at the beginning and stick with this strategy

until this process stops. His good friend, Pete, claims that Strategy A is a better choice. Do

you agree with him? Explain your answer. You may use Python to help for the computations;

please submit your Python code as part of your submission in that case. Please include

the code that will print/display your result(s).

Note: Questions 3 and 4 require the book “Reinforcement Learning: An Introduction”

(2nd edition), which can be found here: http://incompleteideas.net/book/RLbook2018.

pdf

Question 3. Choose either (a) or (b) below.

(a) Write Python code to reproduce the right figure in Example 6.2 (Random Walk) on page 125

from the book. Please submit your code.

(b) Write Python code to reproduce the lower figure in Example 6.6 (Cliff Walking) on page 132

from the book. Please submit your code.

Question 4. Write Python code to reproduce Figure 13.1 on page 328 from the book (softmax

policy is used). Please submit your code.

2


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp