Reinforcement Learning
Basic Reinforcement Learning: MDP, value functions, policy/value iteration, Monte Carlo control, on/off policy, off policy importance sampling, and their python realizations with examples as follows
[Overview on Github] [Gridworld, Blackjack]
Temporal-Difference Learning: TD(0), SARSA, Q-learning, Expected SARSA, Double Q-learning
[Random Walk] [Windy Grid] [Cliff Walking] [Max Bias]
Policy Gradient and Actor-Critic: REINFORCE, REINFORCE-baseline, Actor-Critic
[Short Corridor] [Cartpole Gaussian policy]
n-step Bootstrapping: TBD
Tabular Model-based Methods: TBD
Function Approximations: TBD
Eligibility Traces: TBD
All About Cartpole: Cartpole realizations with classic RL algorithms: Q-learning, SARSA(\(\lambda\)), DQN, REINFORCE, REINFORCE-baseline, Actor-Critic, …
[Cartpole Q, SARSA] [Cartpole REINFORCE]
Others
Bias and Variance: expectation, variance, bias, biased naive sample variance, bias-variance trade-off