Eligibility Traces

bridge TD to Monte Carlo methods (forward view)
a temporary record of the occurrence of an event, such as the visiting of a state of the taking of an action (backward view)

The trace marks the memory parameters associated with the event as eligible for undergoing learning changes

TD 1 step:

R_{t} = r_{t+1} + γV_{t}(s_{t+1})

TD 2 steps:

R_{t} = r_{t+1} + γr_{t+2} + γ^2V_{t}(s_{t+2})

TD n steps:

R_{t} = r_{t+1} + γr_{t+2} + γ^2r_{t+3} + ... + γ^(n-1)r_{t+n} + γ^nV_{t}(s_{t+n})

Monte Carlo:

R_{t} = r_{t+1} + γr_{t+2} + γ^2r_{t+3} + ... + γ^(T-t-1)r_{T}

You May Also Enjoy