Reinforcement Learning

Basic Reinforcement Learning: MDP, value functions, policy/value iteration, Monte Carlo control, on/off policy, off policy importance sampling, and their python realizations with examples as follows

[Overview on Github] [Gridworld, Blackjack]

Temporal-Difference Learning: TD(0), SARSA, Q-learning, Expected SARSA, Double Q-learning

[Random Walk] [Windy Grid] [Cliff Walking] [Max Bias]

Policy Gradient and Actor-Critic: REINFORCE, REINFORCE-baseline, Actor-Critic

[Short Corridor] [Cartpole Gaussian policy]

n-step Bootstrapping: TBD

Tabular Model-based Methods: TBD

Function Approximations: TBD

Eligibility Traces: TBD

All About Cartpole: Cartpole realizations with classic RL algorithms: Q-learning, SARSA(\(\lambda\)), DQN, REINFORCE, REINFORCE-baseline, Actor-Critic, …

[Cartpole Q, SARSA] [Cartpole REINFORCE]


Bias and Variance: expectation, variance, bias, biased naive sample variance, bias-variance trade-off

[Bias Variance example]