代做ECE 277, WINTER 2024 GPU Programming调试SPSS-代写Web作业

联系方式

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-23:00
微信：codinghelp

您当前位置：首页 >> Web作业Web作业

代做ECE 277, WINTER 2024 GPU Programming调试SPSS

日期：2024-06-27 04:45

ECE 277, WINTER 2024

GPU Programming

LAB 3: Reinforcement learning: Q-learning (Multi-Agent, CUDA multithreads)

This lab requires to design multiple agents to interact with the environment using the rein- forcement learning algorithm in Figure 1. Speciﬁcally, you need to design multiple agents to maximize rewards from the mine game environment using Q-learning. The agents should interact with the given mine game environment in Figure 2.

Figure 1: Parallel reinforcement learning.

Figure 2: 32x32 mine game environment.

• The number of agents: 128

• Action: right:0, down:1, left:2, up:3

• Reward: flag: +1, mine: -1, otherwise: 0

• State: (x, y) current position of an agent in the coordinator of (0,0) at the top-left corner

• Every episode restarts after the number of active agents reaches less than 20%.

• You need to maintain active agents in each episode. You should prevent inactive agents from taking an action and updating the Q-table since environment returns wrong rewards to inactive agents.

• Initial states of environment are randomized every episode.

• Environment elements such as mine distributions and a flag position are randomized every game.

• Agenti should return action[Agenti] to the corresponding current state,cstate[Agenti].

• Agenti should update a centralized Q table along with a current state, cstate[Agenti], a next state, nstate[Agenti], and a reward, rewards[Agenti].

• Share a single centralized Q table for all agents (a centralized learning and decentralized execution approach).

• You should initialize the Q table using a multithreads kernel instead of using CPU functions such as“cudaMemSet”.

• In the learning environment, TA means the total number of agents, FA indicates a percentile of agents catching flag, and AA shows a percentile of active agents in a current episode.

You should not modify any given codes except CMakeLists to add your codes. You only need to add your agent code to the lab project.

You have to use CUDA to program a multi-agent reinforcement learning algorithm.

Interface pointers of all the extern functions are allocated to the Device (GPU) memory (not CPU memory) The below function is an informative RL environment routine to show

when and how agent functions are called extern void agent in it ( ) ;

extern void agent in it episode ( ) ;

extern float agent adjust epsilon ( ) ;

extern short * agent action ( in t 2 * c state ) ;

extern void agent update ( in t 2 * c state , in t 2 * n state , float * rewards ) ;

int q learning Cls : : learning ( int * board , unsigned int &episode , unsigned int &steps )

if ( m episode == 0 && m steps==0) { // only for first episode

env . reset ( m sid ) ;

agent in it ( ) ; // clear action + in it Q table + self initialization

} else {

active agent = check status ( board , env . m state , flag agent ) ;

if ( m newepisode ) {

env . reset ( m sid ) ;

agent in it episode ( ) ; // set all agents in active status

float epsilon = agent adjust epsilon ( ) ; // adjust epsilon

m steps = 0 ;

print f ( "EP=%4d , eps =%4.3 f \n" , m episode , epsilon ) ; m episode++;

} else {

short * action = agent action ( env . d state [ m sid ] ) ;

env . step ( m sid , action ) ;

agent update ( env . d state [ m sid ] , env . d state [ m sid ^ 1 ] , env . d reward ) ;

m sid ^= 1 ;

episode = m episode ;

steps = m steps ;

}}

m steps++;

env . render ( board , m sid ) ;

return m newepisode ;

The provided parameters are just for reference.

√ = 0.9; Q = 0.1; 0.1 ≤ ∈ - δ∈ ≤ 1.0; δ∈ = 0.001

Submit only your agent ﬁles into the assignment.

Programming language: CUDA

【返回顶部】【打印本稿】【关闭本页】

【上一篇】：代做IEOR 242: Applications in Data Analysis, Spring 2021 Practice...

【下一篇】：代做IEOR 242: Applications in Data Analysis, Spring 2021 Practice...

联系方式

最新辅导

热门辅导

您当前位置：首页 >> Web作业Web作业

代做ECE 277, WINTER 2024 GPU Programming调试SPSS

日期：2024-06-27 04:45

相关文章