+1 vote
in Reinforcement Learning by
How does the Monte Carlo prediction method compute the Value Function?

1 Answer

0 votes
by
Let's recap the definition of the Value Function. The value function or the value of the state s can be defined as the expected return the agent would obtain starting from the state s and following the policy . It can be expressed as:

In order to approximate the value of the state using the Monte Carlo method, we do the following: 1. Sample N episodes (trajectories) following the given policy . Our approximation will be better when N is higher. 2. Compute the value function as the average return of a state across the sample episodes.

In a nutshell, in the Monte Carlo prediction method, we generate some N episodes using the given policy and then we compute the value function as the average return of the state across these N episodes, instead of taking the expected return.
...