How does the Monte Carlo prediction method compute the Value Function?

Question

How does the Monte Carlo prediction method compute the Value Function?

1 Answer

sharadyadav1986 · Answer 1 · 2023-05-05T18:10:57+0000

Let's recap the definition of the Value Function. The value function or the value of the state s can be defined as the expected return the agent would obtain starting from the state s and following the policy . It can be expressed as:

In order to approximate the value of the state using the Monte Carlo method, we do the following: 1. Sample N episodes (trajectories) following the given policy . Our approximation will be better when N is higher. 2. Compute the value function as the average return of a state across the sample episodes.

In a nutshell, in the Monte Carlo prediction method, we generate some N episodes using the given policy and then we compute the value function as the average return of the state across these N episodes, instead of taking the expected return.

How does the Monte Carlo prediction method compute the Value Function?

Please log in or register to answer this question.

1 Answer