Grid RBF - SARSA, Q

RBF

RBF models the data using smooth transitioning circular shapes instead of sharp cut-off circles.

RBF enables us to be aware of the the rate of the closeness between centroids and any data point irrespective of the range of the distance.

Simple Implementation

1d 4xRBF, 8xRBF, normalized center 0~1, data range 0~1

            ||s-c_i||^2
rbf = exp - -----------
              2*σ_i^2

         rbf
nrbf = -------
       Σ_c rbf

def rbf(n,nd,data):
    wid=1./(n-1) #sig=wid/2.
    den=2*(wid/2.)**2
    c=np.zeros(n)
    for i in range(n):
        c[i]=i*wid

    res=np.zeros((n,nd))

    for i in range(n):
        for j in range(nd):
            res[i,j]=np.exp(-np.linalg.norm(data[j]-c[i])**2/den)

    return res

#4*rbf
n=4 #number of rbf
nd=100 #data size
data=np.linspace(0,1,num=nd) #data range 0~1

out4=rbf(n,nd,data)
plt.plot(out4.T)
plt.xticks(np.linspace(0,100,num=n),np.around(np.linspace(0.0,1.0,num=n),decimals=2))
plt.title('4*RBF')
plt.show()

#4*nrbf
nout4=np.zeros((n,nd))
for i in range(n):
    nout4[i,:]=out4[i,:]/np.sum(out4[i,:])

plt.plot(nout4.T)
plt.xticks(np.linspace(0,100,num=n),np.around(np.linspace(0.0,1.0,num=n),decimals=2))
plt.title('4*NRBF')
plt.show()

#8*rbf
n=8
out8=rbf(n,nd,data)
plt.plot(out8.T)
plt.xticks(np.linspace(0,100,num=n),np.around(np.linspace(0.0,1.0,num=n),decimals=2))
plt.title('8*RBF')
plt.show()

#8*nrbf
nout8=np.zeros((n,nd))
for i in range(n):
    nout8[i,:]=out8[i,:]/np.sum(out8[i,:])

plt.plot(nout8.T)
plt.xticks(np.linspace(0,100,num=n),np.around(np.linspace(0.0,1.0,num=n),decimals=2))
plt.title('8*NRBF')
plt.show()

2d 4x4 RBF

def rbf_2d(n,ns,nd,data):
    wid=1./(n-1.)
    sig=wid[0]/2.
    den=2*sig**2
    c=np.zeros((np.prod(n),ns))
    for i in range(n[0]):
        for j in range(n[1]):
            c[i*n[1]+j,:]=(i*wid[1],j*wid[0])

    data_x,data_y=np.meshgrid(data,data)
    res=np.zeros((np.prod(n),nd,nd))

    for k in range(16):
        for i in range(nd):
            for j in range(nd):
                res[k,i,j]=np.exp(-np.linalg.norm([data_x[i,j]-c[k,0],data_y[i,j]-c[k,1]])**2/den)

    return res

ns=2 #2d rbf
n=4*np.ones(ns).astype(int)  #number of rbf
nd=100 #data size
data=np.linspace(0,1,num=nd)
out=rbf_2d(n,ns,nd,data)

#show diagonal
plt.imshow(out[0]+out[5]+out[10]+out[15])
plt.colorbar()

#show all
plt.imshow(np.sum(out,axis=0))
plt.colorbar()

>>c
array([[0.        , 0.        ],
       [0.        , 0.33333333],
       [0.        , 0.66666667],
       [0.        , 1.        ],
       [0.33333333, 0.        ],
       [0.33333333, 0.33333333],
       [0.33333333, 0.66666667],
       [0.33333333, 1.        ],
       [0.66666667, 0.        ],
       [0.66666667, 0.33333333],
       [0.66666667, 0.66666667],
       [0.66666667, 1.        ],
       [1.        , 0.        ],
       [1.        , 0.33333333],
       [1.        , 0.66666667],
       [1.        , 1.        ]])

GD-SARSA(λ)

Gradient-Descent SARSA with Eligibility Traces

θ <- θ + αδe

δ = r + γQ(s',a') - Q(s,a)

e <- γλe + ∇θ Q(s,a)

e <- γλe + Φ(s)         since Q(s,a) = θ'Φ(s)

Algorithm flow:

init θ
for each episode:
    e=0
    s=env.start()
    fs=Φ(s)
    a=rand()       or  a=εgreedy(Q=θ.T*fs,ε)

    for each step:
        e(i)<-1         i in fs, fs should be 16x16 if 2d 16RBF
        #e(i)<-e(i)+1

        s',r,done=env.step(a)
        fs'=Φ(s')
        a'=εgreedy(Q'=θ.T*fs',ε)

        δ=r+γQ'(s',a')-Q(s,a)      Q(s)=θ.T*fs, Q'(s')=θ.T*fs'
        θ=θ+αδe
        e=γλe

GD-Watkin’s Q(λ)

Algorithm flow:

init θ
for each episode:
    e=0
    s=env.start()
    fs=Φ(s)

    for each step:
        e(i)<-1         i in fs, fs should be 16x16 if 2d 16RBF
        #e(i)<-e(i)+1

        a,greedy=εgreedy(Q=θ.T*fs,ε)
        s',r,done=env.step(a)
        fs'=Φ(s')

        δ=r+γmaxQ(s',a)-Q(s,a)     Q(s)=θ.T*fs, Q(s')=θ.T*fs'

        if greedy:
            e=γλe
        else:
            e=0*e

        θ=θ+αδe

Reference

Sutton 1st 8.4 Control with Function Approximation
SARSA with FA
SARSA-RBF mtcar
SARSA-RBF mtcar
SARSA lambda
RL-intro
RBF for classification
RBF for regression

Jiexin Wang

Grid RBF - SARSA, Q

RBF

Simple Implementation

GD-SARSA(λ)

GD-Watkin’s Q(λ)

Reference

You May Also Enjoy

Generative Adversarial Nets

Action Selection in RL

Latex test

Model-based RL