(Exercise 9) For this week’s exercise I will provide some code below to help you reach
Part II. Recall the Gambler’s Ruin problem on the integers {0, 1, 2, . . . , 10} with bets
at each stage of an amount a ∈ [1, 2, . . . , x], and a fixed (independent) probability
p of winning each bet. A very useful way to formulate Gambler’s Ruin has already
been seen in the notes:
• to have no rewards at all, because instead we . . .
• place a boundary condition on the value function, essentially enforcing that
Vn(x) = 1 for any x ≥ 10. (This is like receiving a reward of +1 after reaching
an x ≥ 10.)
• Note the reward is obtained after moving, not during the action, so steprewards
are 0 and also independent of the action taken.
What we therefore do is as follows: let VT (x) represent the maximized reward
over T steps following an optimal betting strategy starting with wealth x. Then if
during our T steps we reach a state ≥ 10 we will have collected reward of +1, if
we haven’t reached such a state (including all cases where we have gone bankrupt
already) we have a reward of 0. As discussed in lectures this approach will maximize
the probability of reaching such a state ≥ 10.
In part (h) of Exercise 9 you should use the following ‘shell’ for a function:
findValueFunction <− function ( T , p ) {
steps <− 0
# In s e r t your code here c rea ting V
# In s e r t your code here c rea ting W, pu t ting . . .
# in boundary values
while ( steps < T ) {
# In s e r t code here f o r pu t ting in . . .
# boundary values o f W
# In s e r t loop code here f o r pu t ting . . .
# values i n t o W using Op timali ty . . .
# Equation , V , and also s t o rin g the . . .
# best choice a∗ f o r each s ta te x
V <− W steps <− steps + 1}
return ( l i s t ( value=W, act ions=<your . best . action . vector > )}
findValueFunction(100,0.4), for example, will now work out V100(x) when p = 0.4.
Can you see why? It even tells you which action to optimally take initially.
(The idea here is that the code increments t from 1, with V always containing
Vt−1 and being used (at each x) to work out Vt which is then stored in W. Then t
is incremented, W stored into V and the process repeated. The value of t although
not explicit, is always steps+1.)
3-30
Exercise 9 (Al-Khwarizmi). Part I: Solving Gambler’s Ruin.
Our value function for VT (x) satisfies this optimality equation:
VT (x) = max
1≤a≤x
pVT −1(x + a) + (1 − p)VT −1(x − a),
which holds for all T ≥ 1, and x ∈ {1, 2, 3, . . . , 9}. We also have boundary conditions:
• For all T ≥ 0: VT (0) = 0 and VT (x) = 1 for any x ≥ 10;
• V0(x) = 0 for x ≤ 9;
You will use this Optimality Equation (for each x ∈ {1, 2, . . . , 9}) to calculate Vn
from Vn−1 for each n ≥ 1 until you have found V1000(x). (I suggest you fix p = 0.4
until Part II). Parts (a)-(b)-(c) are intended to be very quick, the exercise really starts
at (d).
(a) Why is X = {0, 1, 2, . . . , 18} the full set of reachable state while following the
game rules? For which values of x ∈ X do you already explicilty know the values
of Vn(x), for every n ≥ 0? (i.e. boundary conditions)
(b) Create a vector, V of length 19, to hold the values of V0(·). Note that in R, the
first element of a vector is called element 1, so V[i ] will need to contain V0(i−1).
(c) Create another vector, W, this time holding the values of V1(·). Only insert
values from the boundary conditions. Values not known from the boundary
conditions should be set equal to NA.
(d) Define a function of x, a, p and VT −1 which calculates the right-hand side (RHS)
bracket of the optimality equation for the given values.
(e) Identify the valid choices a ∈ Ax. Construct a loop over all valid (x, a) combinations
(we only need 1 ≤ x ≤ 9, and I suggest an outer loop over x, and inner
loop over a). Test it by printing all (x, a) pairs.
(f) Inside the loop, add a step to call the function defined in (d) for each a ∈ Ax in
order to find the best choice a at each x. Then for each x ∈ {1, 2, . . . , 9} store
the best choice of a in some vector you create.
(g) In your loop above, use the the vector W to store the values of VT (x) calculated
as the best value of the RHS of the optimality equation when given x and VT −1.
(You can use V to store VT −1, and W for VT .)
(h) Copy your code above inside the function shell described in the accompanying
purple block from the notes (see previous page) and ...
Part II: Use your value function function to answer questions... most importantly
comment on the answers!
• Find V1(x), V2(x) and V3(x) for x ∈ {1, 2, . . . , 9}, when p = 0.6. Also finding
the optimal first initial actions.
• How does V1000(5) vary for p ∈ { 110 ,13,12,34}?
• What is the initial optimal betting action for each state when p = 0.4 and
T = 20?
• How do V1000(x) and V999(x) compare?
• Feel free to answer other questions you pose yourself too!
3-31
版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。