代写SDSC4001、代写python程序语言-代写Python编程

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> Python编程Python编程

代写SDSC4001、代写python程序语言

日期：2022-11-19 03:55

SDSC4001 (Semester B, 2022)

Foundation of Reinforcement Learning

Assignment 2

All questions are weighted equally. For questions that require Python, please submit

the .py file and not the screenshot of the code.

Question 1. Use the idea of Bellman operator Tpi to solve the following equation

vpi =

3/4 1/4 0

1/4 3/4vpi,

where vpi ∈ R3. By applying Tpi, solve the above problem (in Python) with all combinations of

γ = 0.9, 0.999 and = 0.5, 0.01. What is the difference in terms of convergence? Discuss your

results and please submit your Python code.

Question 2. As a stock enthusiast, Tom decides to participate in the stock market and make

extremely risky investments on every Friday.

He plans to start with an initial budget $300, and he will stop when he either (i) lose it all

or (ii) double his money or above (i.e. ≥ $600).

On each Friday, he would invest in expiring option contracts that either (a) give him an

immediate reward at the end of that day or (b) lose all the money that he invested on that

day.

He has two strategies. On each Friday, he would choose either

Strategy A: Invest $100. With probability 0.45, it will return $200 (i.e. net gain $100),

and $0 (i.e. net gain ?$100) otherwise.

Strategy B: Invest $100. With probability 0.4, it will return $300 (i.e. net gain $200),

and $0 (i.e. net gain ?$100) otherwise.

His discount factor is γ = 1 (or you may use γ = 0.99999 as an approximation for computa-

tion)

Based on the above information,

(a) Model this problem as a MDP. Define the state space, action space, reward function, and the

transition kernel.

(b) Suppose Tom would choose one strategy (A or B) at the beginning and stick with this strategy

until this process stops. His good friend, Pete, claims that Strategy A is a better choice. Do

you agree with him? Explain your answer. You may use Python to help for the computations;

please submit your Python code as part of your submission in that case. Please include

the code that will print/display your result(s).

Note: Questions 3 and 4 require the book “Reinforcement Learning: An Introduction”

(2nd edition), which can be found here: http://incompleteideas.net/book/RLbook2018.

pdf

Question 3. Choose either (a) or (b) below.

(a) Write Python code to reproduce the right figure in Example 6.2 (Random Walk) on page 125

from the book. Please submit your code.

(b) Write Python code to reproduce the lower figure in Example 6.6 (Cliff Walking) on page 132

from the book. Please submit your code.

Question 4. Write Python code to reproduce Figure 13.1 on page 328 from the book (softmax

policy is used). Please submit your code.

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：ECMT2150代写、代写Python/C++程序

【下一篇】：ECMT2150代写、代写Python/C++程序

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Python编程Python编程

代写SDSC4001、代写python程序语言

日期：2022-11-19 03:55

相关文章