Model-based RL

Background

model free

(-) high sample complexity

(-) need of sampling unsafe outcomes

(-) stability and reproducibility

model based RL for

robotic control
safety for human and their own
human-AI interaction, minimizing the risks
games, alpha go
science, chemical synthesis plan
operations research, energy allocation, low cost

what is model?

def: a model is a representation that explicitly encodes knowledge about the structure of the environment and task

a transition/dynamics model: s_(t+1)=f(s_t,a_t)
a model of rewards: r_(t+1)=f(s_t,a_t)
an inverse transition/dynamics model: a_t=f^(-1)(s_t,s_(t+1))
a model of distance: d_ij=f_d(s_i,s_j)
a model of future returns: G_t=Q(s_t,a_t) or =V(s_t)

Refs

https://sites.google.com/view/mbrl-tutorial

https://kargarisaac.github.io/blog/reinforcement%20learning/mbrl/jupyter/2020/10/26/mbrl.html

You May Also Enjoy

Generative Adversarial Nets

abstract

Action Selection in RL

drawback of \(\epsilon\)-greedy

Latex test

[\nabla_\boldsymbol{x} J(\boldsymbol{x})]

Differential Equations

First Order Linear