PCA

PCA can be defined as the orthogonal projection of the data onto a lower dimensional linear space, known as the principal subspace, such that the variance of the projected data is maximized.

PCA can also be defined as the linear projection that minimizes the average projection cost, defined as the mean squared distance between the data points and their projections.

Maximum Variance Formulation

Consider a data set of N observations {x_n}, n=1,…,N
x_n is a Euclidean variable with dimensionality D

X = [ x_1, x_2, ..., x_N ] NxD  ->  NxM

               data dimension D      reduced dimension k  
observation 1 [      x_1      ]     [     x_1_low      ]
            2 [      x_2      ]  -> [     x_2_low      ]
            3 [      x_3      ]     [     x_3_low      ]
            .
            .
            N

Goal: to project the data onto a space having dimensionality M, where M<D, while maximizing the variance of the projected data (suppose M is given)

Example, consider the projection onto a 1-D space (M=1)
We can define the direction of this space using D-dim vector u1, note u1 is a unit vector u1’u1=1

Each data point x_n is projected onto a scalar u1’x_n : 1xDxDx1
The mean of the projected data is u1’m

m = 1/N Σ_N x_n

m - Nx1

The variance of the projected data is u1’Su1

1/N Σ_N (u1'x_n - u1'm)^2 = u1'Su1

S = 1/N Σ_N (x_n-m)(x_n-m)'

Then we max the projected variance

max u1'Su1 w.r.t u1  
constrained to u1'u1=1  

To enforce the constraint, we introduce a Lagrange multiplier λ1

u1'Su1 + λ1(1-u1'u1)

By setting the derivative w.r.t u1 equal to zero, we see the quantity will have a stationary point when

Su1 = λ1u1

u1 must be an eigenvector of S  

u1'Su1 = λ1

Therefore, the variance will be a maximum when we set u1 equal to the eigenvector having the largest eigenvalue λ1, the eigenvector is known as the first principal component

PCA limitation:

  • if the data does not follow a multidimensional Gaussian, PCA may not give the best principal components
  • if the data follows some wave-type structure, after projection, wave shape gets distorted

PCA toy example

Reduce 3D samples to 2D and 1D

Generate multivariate Gaussian data X with

mu1 = [0,0,0] and mu2 = [1,1,1]
      [1,0,0]           [1,0,0]
sig1 =[0,1,0]     sig2 =[0,1,0]
      [0,0,1]           [0,0,1]

    [x1_1, x1_2, ..., x1_20]
X1= [x2_1, x2_2, ..., x2_20]
    [x3_1, x3_2, ..., x3_20]

    [x1_1, x1_2, ..., x1_20]
X2= [x2_1, x2_2, ..., x2_20]
    [x3_1, x3_2, ..., x3_20]

X = X1 + X2
 3x40
import numpy as np
mu1=np.array([0,0,0])
sig1=np.array([[1,0,0],[0,1,0],[0,0,1]])
x1=np.random.multivariate_normal(mu1,sig1,20).T

#x1: 3x20  x1[0] first dimension, x1[1] second dimension, x1[2] third dimension  

mu2=np.array([1,1,1])
sig2=np.array([[1,0,0],[0,1,0],[0,0,1]])
x2=np.random.multivariate_normal(mu2,sig2,20).T

#merge data to 3x40
x=np.concatenate((x1,x2),axis=1)

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt  

fig=plt.figure()                                                        
ax=fig.add_subplot(111,projection='3d')                                
ax.plot(x1[0,:],x1[1,:],x1[2,:],'o',color='r',label='x1')
ax.plot(x2[0,:],x2[1,:],x2[2,:],'o',color='b',label='x2')
ax.legend()

Compute the observation mean

m = 1/N Σ_N x_i
N = 40
#3x1
m=np.mean(x,axis=1)

>>m
array([0.3019378 , 0.58810653, 0.74258542])

Compute the scatter matrix

S = Σ_N (x_i-m)(x_i-m)'

[x1_i - m0]        [x1_i - m0].T
[x2_i - m1]  .dot  [x2_i - m1]
[x3_i - m2]        [x3_i - m2]

i = 1~40
N = 40
s=np.zeros((3,3))
for i in range(x.shape[1]):
    s+=(x[:,i].reshape(3,1)-m.reshape(3,1)).dot((x[:,i].reshape(3,1)-m.reshape(3,1)).T)

>>s
array([[45.54147306, 13.25957003,  4.34813046],
       [13.25957003, 61.72462824, 17.32890804],
       [ 4.34813046, 17.32890804, 68.53181199]])

Compute the Covariance Matrix (alternatively to the scatter matrix)
Similar with S but using the scaling factor 1/(N-1)

cov=np.cov([x[0,:],x[1,:],x[2,:]])

>>cov
array([[1.16773008, 0.33998898, 0.11149052],
       [0.33998898, 1.58268278, 0.44433098],
       [0.11149052, 0.44433098, 1.75722595]])

>>cov*39
array([[45.54147306, 13.25957003,  4.34813046],
       [13.25957003, 61.72462824, 17.32890804],
       [ 4.34813046, 17.32890804, 68.53181199]])

Compute eigenvectors and corresponding eigenvalues
check eigen vector for S and cov are the same

eig_val,eig_vec=np.linalg.eig(s)

>>eig_val
array([86.31710459, 37.10687219, 52.37393651])

>>eig_vec
array([[-0.28648657, -0.79372883, -0.53658175],
       [-0.65026563,  0.57239385, -0.49951966],
       [-0.70361925, -0.205815  ,  0.68011773]])

eig_val_,eig_vec_=np.linalg.eig(cov)

>>eig_val_
array([2.21325909, 0.95145826, 1.34292145])

>>eig_vec_
array([[-0.28648657, -0.79372883, -0.53658175],
       [-0.65026563,  0.57239385, -0.49951966],
       [-0.70361925, -0.205815  ,  0.68011773]])

Check the calculation:

Σv=λv

Σ - cov matrix
v - eigenvector  
λ - eigenvalue  
eigv0=eig_vec[:,0].reshape(3,1)
eigv1=eig_vec[:,1].reshape(3,1)
eigv2=eig_vec[:,2].reshape(3,1)

>>s.dot(eigv0)
array([[-24.72869154],
       [-56.12904639],
       [-60.73437668]])

>>eig_val[0] * eigv0
array([[-24.72869154],
       [-56.12904639],
       [-60.73437668]])

>>s.dot(eigv1)
array([[-29.45279443],
       [ 21.23974555],
       [ -7.63715082]])

>>eig_val[1] * eigv1
array([[-29.45279443],
       [ 21.23974555],
       [ -7.63715082]])

>>s.dot(eigv2)
array([[-28.10289876],
       [-26.16181078],
       [ 35.62044305]])

>>eig_val[2] * eigv2
array([[-28.10289876],
       [-26.16181078],
       [ 35.62044305]])

Visualize EigenVector

from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d import proj3d
from matplotlib.patches import FancyArrowPatch

class Arrow3D(FancyArrowPatch):
    def __init__(self, xs, ys, zs, *args, **kwargs):
        FancyArrowPatch.__init__(self, (0,0), (0,0), *args, **kwargs)
        self._verts3d = xs, ys, zs

    def draw(self, renderer):
        xs3d, ys3d, zs3d = self._verts3d
        xs, ys, zs = proj3d.proj_transform(xs3d, ys3d, zs3d, renderer.M)
        self.set_positions((xs[0],ys[0]),(xs[1],ys[1]))
        FancyArrowPatch.draw(self, renderer)

fig=plt.figure()
ax=fig.add_subplot(111, projection='3d')
ax.plot(x[0,:],x[1,:],x[2,:],'o',color='g')
ax.plot([m[0]],[m[1]],[m[2]],'o',color='r')
for v in eig_vec.T:
    a = Arrow3D([m[0],v[0]],[m[1],v[1]],[m[2],v[2]],mutation_scale=20,lw=3,arrowstyle="-|>",color="r")
    ax.add_artist(a)

Eigenvectors only define the directions of the new axis, since they have all the same unit length 1

>>np.linalg.norm(eig_vec[0,:])
1.0

>>np.linalg.norm(eig_vec[1,:])                                          
1.0000000000000002

>>np.linalg.norm(eig_vec[2,:])                                          
0.9999999999999999

Sort by eigenvalues

eig=[(np.abs(eig_val[i]),eig_vec[:,i]) for i in range(len(eig_val))]
eig.sort(key=lambda x:x[0], reverse=True)

>>eig
[(86.31710458559974, array([-0.28648657, -0.65026563, -0.70361925])),
 (52.37393651019379, array([-0.53658175, -0.49951966,  0.68011773])),
 (37.10687218897268, array([-0.79372883,  0.57239385, -0.205815  ]))]

From 3D to 2D, combining the two eigenvectors with the highest eigenvalues to construct new eigenvector W:3x2
Transform the samples onto the new subspace y=w’x, 2x40<-2x3x3x40

w=np.hstack((eig[0][1].reshape(3,1),eig[1][1].reshape(3,1)))
y=w.T.dot(x)

plt.plot(y[0,0:20],y[1,0:20],'o',color='r',label='y1')
plt.plot(y[0,20:40],y[1,20:40],'o',color='b',label='y2')
plt.legend()

From 3D to 1D, w:3x1, y=w’x, 1x3x3x40

w=eig[0][1].reshape(3,1)
y=w.T.dot(x)

plt.plot(y[0,0:20],20*[1],'o',color='r',label='y1')  
plt.plot(y[0,20:40],20*[1],'o',color='b',label='y2')
plt.legend()

Other libs

from matplotlib.mlab import PCA as mlabPCA

mlab=mlabPCA(x.T)

plt.plot(mlab.Y[0:20,0],mlab.Y[0:20,1],'o',color='r',label='y1')
plt.plot(mlab.Y[20:40,0], mlab.Y[20:40,1],'o',color='b',label='y2')

from sklearn.decomposition import PCA as sklPCA

skl=sklPCA(n_components=2)
y=skl.fit_transform(x.T)

plt.plot(y[0:20,0],y[0:20,1],'o',color='r',label='y1')
plt.plot(y[20:40,0],y[20:40,1],'o',color='b',label='y2')

PCA for mnist

pre-process: standardize variable to mean=0, and std=1

import numpy as np
import matplotlib.pyplot as plt
from itertools import product
from sklearn.decomposition import PCA
from keras.datasets import mnist  
from sklearn.utils import shuffle

#mnist=fetch_mldata("MNIST original")
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x=x_train/255.
x=x.reshape(60000,784)
pca=PCA(n_components=2)
x_=pca.fit_transform(x) #output

#plt.hsv() #set colormap
plt.jet()
sca=plt.scatter(x_[:1000,0],x_[:1000,1],c=y_train[:1000])  
plt.legend(*sca.legend_elements())
plt.show()   

1000 data points and 60000 data points

Reference

pca step-by-step
t-SNE
github