Assignment 2 Monte-Carlo(MC) Learning




Rate this product

Reinforcement Learning

1 Introduction
The goal of this assignment is to do experiments with Monte-Carlo(MC) Learning and Temporal-Difference(TD) Learning. MC and TD methods learn directly
from episodes of experience without knowledge of MDP model. TD method can
learn after every step, while MC method requires a full episode to update value
evaluation. Your goal is to implement MC and TD methods and test them in
the small gridworld.
2 Small Gridworld
Figure 1: Gridworld
As shown in Fig.1, each grid in the gridwold represents a certain state.
Let st denotes the state at grid t. Hence the state space can be denoted as
S = {st|t ∈ 0, .., 35}. S1 and S35 are terminal states, where the others are nonterminal states and can move one grid to north, east, south and west. Hence
the action space is A = {n, e, s, w}. Note that actions leading out of the grid
leave state unchanged. Each movement get a reward of -1 until the terminal
state is reached.
3 Experiment Requirments
• Programming language: python3
• You should implement both first-visit and every-visit MC method and
TD(0) to evaluate an uniform random policy π(n|·) = π(e|·) = π(s|·) =
π(w|·) = 0.25.
4 Report and Submission
• Your reports and source files (.py) should be compressed and named after