Eligibility Traces
Definition
- bridge TD to Monte Carlo methods (forward view)
- a temporary record of the occurrence of an event, such as the visiting of a state of the taking of an action (backward view)
The trace marks the memory parameters associated with the event as eligible for undergoing learning changes
TD 1 step:
R_{t} = r_{t+1} + γV_{t}(s_{t+1})
TD 2 steps:
R_{t} = r_{t+1} + γr_{t+2} + γ^2V_{t}(s_{t+2})
TD n steps:
R_{t} = r_{t+1} + γr_{t+2} + γ^2r_{t+3} + ... + γ^(n-1)r_{t+n} + γ^nV_{t}(s_{t+n})
Monte Carlo:
R_{t} = r_{t+1} + γr_{t+2} + γ^2r_{t+3} + ... + γ^(T-t-1)r_{T}