联系方式

  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp

您当前位置:首页 >> C/C++编程C/C++编程

日期:2024-11-21 09:41

ARS - Coursework Guide – 24/25

Version History

1.0 29/09/24 First version.

1.1 12/11/24 Fleshed out marking criteria for task 2 report

Summary

Title: Reinforcement Learning using Gymnasium environments

Hand-in: Programs AND a written report will need to be submitted online via Moodle. Check

the module’s Moodle page for the precise deadline.

Late policy: The coursework deadlines (task 1 and task 2) are absolute. Late submissions are

subject to a 5% deduction of the overall coursework mark per day.

Informal Description

The coursework consists of two tasks as described below. Your aim is to build several reinforcement

learning agents and to design, implement and run several basic research-based experiments. You

will hand-in software and a report that discusses your work on these tasks. Briefly, task 1 is about

implementing some basic RL prototypes (with noise injection and basic modularity) for your chosen

environment(s) and identification of key literature, gaps, and research questions, whereas task 2 is

about designing, developing and running experiments based on the research questions identified in

task 1.

Aims and Outcomes

• If you take the labs seriously, at the end of the semester you should be:

o comfortable with implementing and modifying reinforcement learning agents,

o capable of adapting your RL solutions to different kinds of robotic problems with

well-defined states, actions and rewards,

o comfortable with neural network approaches for the mapping of complex high dimensional states to actions (if you choose to use neural network based RL

solutions),

o comfortable with setting up experiments pertaining to noise and studying and

mitigating its impact,

o comfortable with designing modular AI solutions,

o capable of scanning the literature in order to understand modern RL techniques, and

incorporating/extending these in your own solutions,

o capable of identifying gaps, and/or weaknesses/limitations in state-of-the-art

research, and using this to define research questions for guiding your research,

o capable of studying and evaluating algorithm performance objectively,

o capable of designing innovative algorithms and experiments, and reporting the

results of these in a clear and well-structured manner.

Rough Timetable

Week Main Lab Main activities

1 01/10/24 Getting started. Familiarization with Gymnasium

2 08/10/24 Task 1

3 15/10/24 Task 1

4 22/10/24 Task 1

5 (28|29)/10/24 Task 1. Demos for task 1 – we may need both Mon. & Tue. slots

6 05/11/24 Task 2

7 12/11/24 Task 2.

8 19/11/24 Task 2.

9 26/11/24 Task 2

10 (02|03)/12/24 Task 2. Demos for task 2 – we may need both Mon. & Tue. slots

Laboratory notes

• You will work individually.

• We need to start working hard from the very first day to make the most of the lab sessions.

In the first week you will learn the basics of Gymnasium, will experiment with several

environments, and will even try some small heuristics on simple control problems (e.g.

cartpole).

• Rough time estimation:

o Total hours: 20 credits ≈ 200 hours

o Subtract lectures (22 hours) and labs (20 hours) = 200 – 42 = 158

o Divide the remainder by 12 weeks = 158 / 12 ≈ 13 hours per week for everything

else, e.g.: studying, researching, reading, thinking, coding, testing, analyzing, writing.

Getting Started

Preliminary steps

• Check the following three main Gymnasium resources:

o Farama’s general documentation page for Gymnasium.

o Basic usage page in the above documentation.

o Gymnasium GitHub page – includes installation instructions.

• Install Gymnasium.

• For the purpose of the coursework it is sufficient to work with the “classic control” set of

environments, however do feel free to install and use other categories of environments (e.g.

MuJoCo and Atari), if you wish.

• Go through the Basic Usage page.

• You can install Gym on your own machines, or in your local directory in UNM’s HPC, or you

can also use Google Colaboratory. Please note that in the past there were ways to render

environments properly in Colab (e.g. have a look at this tutorial) however this may change

from time to time. For an example of a Jupyter notebook for the cart pole example, refer to

the module’s Moodle page. I suggest not bothering with rendering, except for some

debugging exercises, since performance metrics are the key concern.

• As mentioned, if you want to use any of the MuJoCo environments you can. Deep Mind

recently bought MuJoCo and made it open source, which means there are no more licensing

issues. You are not required to use MuJoCo, but if you really want to, you are free to install

it, and get the environments setup.

• To see what environments are available use:

import gymnasium as gym

print(gym.envs.registry.keys())

• To better understand some Gymnasium environments consult this Wiki or scroll to

“environments” in the Gymnasium’s GitHub page, and search for your environment. For

example for the cart pole environment have a look at this page.

Try to come up with some heuristic solutions for Cart Pole

• Try to come up with some simple heuristics to keep the pole up based on your

understanding of the environment. You can start from and modify the (failing) heuristic

example provided in the Moodle page (i.e. sol-H1-cart-pole-v0).

• Difficult? Let's see whether reinforcement learning helps.

Have a look at a Q-learning solution

• Example: s1cart-pole-v0-sol1.

• Try to run the code.

• Read the code. Try to understand it as much as possible, although note, it will only fully

make sense once we have done Q-Learning in the lectures.

Task Description

• Requirements for Task 1:

o Title. Prototypes, literature, gaps, and research questions.

o Prototypes:

▪ Environment selection. Select two environments to work on throughout

the whole assignment. Select one environment from within the control

category (e.g. CartPole-v1) and one environment from any category

(including the control one). Please recall that different environments

may impose significant changes to your reinforcement learning

algorithm since, for example, they may involve continual action spaces,

or other representational differences. To simplify matters you might

want to constrain yourself to environments with discrete action spaces.

▪ Core method required: reinforcement learning. If you want to use other

methods for other integrated modules, that is fine.

▪ Additional requirements: (1) noise injection at the inputs and/or

outputs, (2) some modularity (e.g. RL component and denoising

component).

▪ Aim: for each environment develop at least one viable proof of concept

based on RL.

o Literature:

▪ Steps:

• Explore the recent RL literature in relation to the topic of noise

and or modularity.

• Select 1-3 good papers from the date range 2022-2023 and

highlight their gaps (i.e. limitations and/or open

questions/problems). Note that although these 1-3 papers will

be your “core/seed” papers, you should still study the literature

more broadly (i.e. your report should cite other papers apart

from the core papers).

• Select your gaps for further investigation. Justify your choices.

• Design at least 2 research questions based on your selected

gaps.

▪ Aim: clearly outline 1-3 selected papers, overall gaps, selected gaps, and

research questions. Note that it is crucial for the papers, gaps and

research questions to be 100% credible, i.e.: (1) the papers must be

recent and good, (2) the gaps must be genuine open problems, and (3)

the research questions must sit squarely in the gaps and must point in

useful directions.

▪ Constraint 1: Every student must have a different set of core papers

and/or a different set of gaps and/or a different set of research

questions (RQs). Once a student has defined their selected papers, gaps,

and RQs, they must email them to me, in order for me to check and

approve them. Please note that this process will operate on a “first

come first served” basis. Please also note that if two students share the

same papers, they can still be different in terms of the chosen gaps or

RQs, however, it is preferable if all elements are distinct.

▪ Constraint 2: The selected research questions must include, or focus on,

(1) noise, (2) modularity, or (3) both.

• Requirements for Task 2:

o Title. Research questions and experiments.

o Environment selection. You must use the same two environment you selected

for task 1.

o Core method required: reinforcement learning. As before, if you want to use

other methods for other integrated modules, that is fine.

o Goals. Keywords: novel experiments and insights. The aim of this task is for you

to design, develop, run, and analyze, experiments that address the research

questions your listed in task 1. The mains tasks would be: (1) design experiments

that address the research questions, (2) implement the experiments, (3) debug

and finetune your code, (4) run the experiments and collect results, (5) analyze

the results and assess whether they answered the research questions, (6) either

proceed back to step 1 with adjustments to the experiments/solutions, or

proceed with additional experiments (depending on time and completion

status). Document your findings.

• Requirements for all tasks (i.e. tasks 1 and 2):

o Performance. Define one or more valid performance measures, apart from the

default/compulsory one, i.e.: the average number of episodes needed before

learning a problem (see below for more information).

o Evaluation. Run your experiments and report your results for both of your

chosen environments consistently.

o Four I’s. Try to maximize your work along the following dimensions: (1)

informedness (i.e. it is based on a solid understanding of the literature), (2)

innovativeness (i.e. novel), (3) inventiveness (i.e. not technically trivial), (4)

impactfulness (e.g. generates new knowledge).

o Core themes. The core themes for both tasks are: (1) reinforcement learning, (2)

noise, (3) modularity. Please note that the research questions can be exclusively

about noise, or modularity, or both, however, the models must always include

elements of noise and modularity.

• Demo. Show and explain the performance of your solutions, and the results of your

experiments.

Performance Evaluation

• Since you will be injecting noise into your sensor data and/or actions, your results are

not directly comparable to solutions on external leaderboards (e.g.:

https://github.com/openai/gym/wiki/Leaderboard). Your focus will be on internal

comparisons (i.e. your own experimental conditions) and innovation.

• One key performance measure that you should recall is the number of episodes required

before solving the problem. In other words, here you are interested in the speed of

learning. Care must be taken in being explicit and consistent regarding what constitutes

having solved the problem.

Assessment – Overall

Component Marks

(100)

Description Main Criteria

Task 1 - demo 5

Demo of work so

far.

Evidence of understanding of the base code. Evidence of solid

understanding of literature, gaps, questions, and innovation.

Task 1 - report 20

Report (1-2

pages)

summarizing task

1

Are the core papers (1-3) well explained? Are the overall gaps

well identified and explained? Are the selected gaps justified

properly? Are the research questions grounded in the gaps,

and are they clear, concrete, and heading in the right

direction?

Task 2 - demo 5

Demo of work so

far.

Evidence of understanding of the base code. Good explanation

of gaps, question, experimental design, results, analyses, and

conclusions. Solid argumentation vis-à-vis the 4 I’s. Strong

justifications and arguments. Clear communication.

Task 2 - paper 50

Mini-conference

paper (4 pages)

summarizing all of

the work done on

both tasks.

Are the structure, grammar and argumentation of the

paper/report good? Are the introduction, background,

methods, results and analyses, clear, comprehensive and

insightful? Does the paper show critical and creative thinking?

Task 2 - software 20

Multiple files

organized with a

clear structure.

Is the code complete? Is the code well-designed, clean,

elegant, and well commented? Is the code

complex/challenging enough?

Assessment Criteria for the Report (task 1) and Paper (task 2)

• 1st an excellent, well-written report/paper demonstrating extensive understanding and

good insight.

• 2:1 a comprehensive, well-written report/paper demonstrating thorough understanding and

some insight.

• 2:2 a competent report/paper demonstrating good understanding of the implementation.

• 3rd an adequate report/paper covering all specified topics at a basic level of understanding.

• F an inadequate report/paper failing to cover the specified topics.

Report guide (task 1)

• The report for task 1 has no fixed format, as long as it is well structured and well organized.

The only constraint is that it should be 1-2 pages long. No appendices are allowed, and to be

fair to all, no material on page 3 onwards (if you exceed 2 pages) will be included in the

assessment. The font size of the main text should not be smaller than 11.

• This report will exclusively focus on: (1) a very brief summary of your prototypes, (2) brief

summaries of your selected core papers, and why they were chosen, (3) lengthier

explanations on the weaknesses/gaps of the papers, (4) an explanation and justification of

your selected gaps, and (5) an explanation and justification of your research questions, and

how they are grounded in the gaps.

Paper Guide (task 2)

You should design your final report as a conference paper. The paper should contain:

• [8 marks] Introduction (about 1 page). Brief explanation of the motivation and main

concepts, a problem statement, an extremely brief overview of the key papers and their

gaps, the research questions, and a brief summary of your main contributions. Key marking

criteria: (1) Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation,

(5) Insightfulness, (6) Critical and creative thinking.

• [8 marks] Background (about 0.5 pages). Brief overview of the field and the key papers

closely related to your work (this will include the core 1-3 papers and other relevant papers).

The core selected papers with their gaps, and why there were chosen selected, must be

clearly explained. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)

Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking.

• [8 marks] Methods (about 1 page). A detailed and concise description of how you

implemented task 2 (e.g. algorithms and experimental design). Key marking criteria: (1)

Structure and grammar, (2) Clarity, (3) Comprehensiveness, (4) Argumentation.

• [10 marks] Results (about 1 page). An overview of your key results encompassing

performance measures and other results leading to insights about the problem and/or your

solutions. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)

Comprehensiveness, (4) Argumentation, (5) Insightfulness.

• [10 marks] Discussion (about 0.5 pages). Your interpretation of the results, your conclusions,

and proposed future work. Key marking criteria: (1) Structure and grammar, (2) Clarity, (3)

Comprehensiveness, (4) Argumentation, (5) Insightfulness, (6) Critical and creative thinking.

• [6 marks] References & Appendices (not included in the word count). Key marking criteria:

(1) Consistency of references, (2) Comprehensiveness of references, (3) Structure and clarity

of appendices, (4) Insightfulness of appendices.

Note: Writing a concise report/paper is a core part of the assignment. The total number of pages for

your paper (i.e. main sections, excluding references and Appendices) cannot exceed 4 pages (with a

minimum page margin of 2.5cm on each side), using single line spacing, a two-column format, and a

minimum font size of 11).


相关文章

版权所有:留学生编程辅导网 2020 All Rights Reserved 联系方式:QQ:99515681 微信:codinghelp 电子信箱:99515681@qq.com
免责声明:本站部分内容从网络整理而来,只供参考!如有版权问题可联系本站删除。 站长地图

python代写
微信客服:codinghelp