联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> Python编程Python编程

日期:2019-12-20 09:08

(Exercise 9) For this week’s exercise I will provide some code below to help you reach

Part II. Recall the Gambler’s Ruin problem on the integers {0, 1, 2, . . . , 10} with bets

at each stage of an amount a ∈ [1, 2, . . . , x], and a fixed (independent) probability

p of winning each bet. A very useful way to formulate Gambler’s Ruin has already

been seen in the notes:

? to have no rewards at all, because instead we . . .

? place a boundary condition on the value function, essentially enforcing that

Vn(x) = 1 for any x ≥ 10. (This is like receiving a reward of +1 after reaching

an x ≥ 10.)

? Note the reward is obtained after moving, not during the action, so steprewards

are 0 and also independent of the action taken.

What we therefore do is as follows: let VT (x) represent the maximized reward

over T steps following an optimal betting strategy starting with wealth x. Then if

during our T steps we reach a state ≥ 10 we will have collected reward of +1, if

we haven’t reached such a state (including all cases where we have gone bankrupt

already) we have a reward of 0. As discussed in lectures this approach will maximize

the probability of reaching such a state ≥ 10.

In part (h) of Exercise 9 you should use the following ‘shell’ for a function:

findValueFunction <? function ( T , p ) {

steps <? 0

# In s e r t your code here c rea ting V

# In s e r t your code here c rea ting W, pu t ting . . .

# in boundary values

while ( steps < T ) {

# In s e r t code here f o r pu t ting in . . .

# boundary values o f W

# In s e r t loop code here f o r pu t ting . . .

# values i n t o W using Op timali ty . . .

# Equation , V , and also s t o rin g the . . .

# best choice a? f o r each s ta te x

V <? W steps <? steps + 1}

return ( l i s t ( value=W, act ions=<your . best . action . vector > )}

findValueFunction(100,0.4), for example, will now work out V100(x) when p = 0.4.

Can you see why? It even tells you which action to optimally take initially.

(The idea here is that the code increments t from 1, with V always containing

Vt?1 and being used (at each x) to work out Vt which is then stored in W. Then t

is incremented, W stored into V and the process repeated. The value of t although

not explicit, is always steps+1.)

3-30

Exercise 9 (Al-Khwarizmi). Part I: Solving Gambler’s Ruin.

Our value function for VT (x) satisfies this optimality equation:

VT (x) = max

1≤a≤x

pVT ?1(x + a) + (1 ? p)VT ?1(x ? a),

which holds for all T ≥ 1, and x ∈ {1, 2, 3, . . . , 9}. We also have boundary conditions:

? For all T ≥ 0: VT (0) = 0 and VT (x) = 1 for any x ≥ 10;

? V0(x) = 0 for x ≤ 9;

You will use this Optimality Equation (for each x ∈ {1, 2, . . . , 9}) to calculate Vn

from Vn?1 for each n ≥ 1 until you have found V1000(x). (I suggest you fix p = 0.4

until Part II). Parts (a)-(b)-(c) are intended to be very quick, the exercise really starts

at (d).

(a) Why is X = {0, 1, 2, . . . , 18} the full set of reachable state while following the

game rules? For which values of x ∈ X do you already explicilty know the values

of Vn(x), for every n ≥ 0? (i.e. boundary conditions)

(b) Create a vector, V of length 19, to hold the values of V0(·). Note that in R, the

first element of a vector is called element 1, so V[i ] will need to contain V0(i?1).

(c) Create another vector, W, this time holding the values of V1(·). Only insert

values from the boundary conditions. Values not known from the boundary

conditions should be set equal to NA.

(d) Define a function of x, a, p and VT ?1 which calculates the right-hand side (RHS)

bracket of the optimality equation for the given values.

(e) Identify the valid choices a ∈ Ax. Construct a loop over all valid (x, a) combinations

(we only need 1 ≤ x ≤ 9, and I suggest an outer loop over x, and inner

loop over a). Test it by printing all (x, a) pairs.

(f) Inside the loop, add a step to call the function defined in (d) for each a ∈ Ax in

order to find the best choice a at each x. Then for each x ∈ {1, 2, . . . , 9} store

the best choice of a in some vector you create.

(g) In your loop above, use the the vector W to store the values of VT (x) calculated

as the best value of the RHS of the optimality equation when given x and VT ?1.

(You can use V to store VT ?1, and W for VT .)

(h) Copy your code above inside the function shell described in the accompanying

purple block from the notes (see previous page) and ...

Part II: Use your value function function to answer questions... most importantly

comment on the answers!

? Find V1(x), V2(x) and V3(x) for x ∈ {1, 2, . . . , 9}, when p = 0.6. Also finding

the optimal first initial actions.

? How does V1000(5) vary for p ∈ { 110 ,13,12,34}?

? What is the initial optimal betting action for each state when p = 0.4 and

T = 20?

? How do V1000(x) and V999(x) compare?

? Feel free to answer other questions you pose yourself too!

3-31


版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp