Model-based RL
Background
model free
(-) high sample complexity
(-) need of sampling unsafe outcomes
(-) stability and reproducibility
model based RL for
-
robotic control
-
safety for human and their own
-
human-AI interaction, minimizing the risks
-
games, alpha go
-
science, chemical synthesis plan
-
operations research, energy allocation, low cost
what is model?
def: a model is a representation that explicitly encodes knowledge about the structure of the environment and task
-
a transition/dynamics model: s_(t+1)=f(s_t,a_t)
-
a model of rewards: r_(t+1)=f(s_t,a_t)
-
an inverse transition/dynamics model: a_t=f^(-1)(s_t,s_(t+1))
-
a model of distance: d_ij=f_d(s_i,s_j)
-
a model of future returns: G_t=Q(s_t,a_t) or =V(s_t)
Refs
https://sites.google.com/view/mbrl-tutorial
https://kargarisaac.github.io/blog/reinforcement%20learning/mbrl/jupyter/2020/10/26/mbrl.html