abstract

a generative model \(G\) - captures the data distribution

a discriminative model \(D\) - estimates the probability that a sample came from the training data rather than \(G\)

Training for \(G\) is to max the probability of \(D\) making a mistake

intros

Discriminative models map a high-dimensional, rich sensory input to a class label

Generative models have difficulties on

1) approximating many intractable probabilistic computations that arise in maximum likelihood estimation and related strategies

2) leveraging the benefits of piecewise linear units in the generative context

\(G\) - a team of counterfeiters trying to produce fake currency and use it without detection

\(D\) - the police, trying to detect the counterfeiter currency

competition in this game drives both teams to improve their methods until the counterfeits are indistinguishable from the genuine articles

undirected graphical models with latent variables, eg. restricted Boltzmann machines(RBMs), deep Boltzmann machines(DBMs)

deep belief networks(DBN)

score matching, noise-contrastive estimation (NCE)

softmax

Gibbs or Boltzmann distribution

\[\frac{\exp{Q_t(a)/\tau}}{\sum_{b=1}^n \exp{Q_t(b)/\tau}}\]

\(\tau\) - temperature

high \(\tau\) causes the actions to be all euiprobable

low \(\tau\) cause a greater difference in selection probability for actions that differ in their value estimates

\(\tau \rightarrow 0\) - same as greedy action selection

note: when number of action =2, softmax becomes sigmoid

softmax in machine learning

“Any time we wish to represent a probability distribution over a discrete variable with n possible values, we may use the softmax function”

  • activation function in nn

when the nn is configured to output N class labels (multi-class classification)

  • can be used as an activation function for a hidden layer, when the model internally needs to choose or weight multiple different inputs at a bottleneck or concatenation layer

  • softmax is a softened version of the argmax function that returns the index of the largest value in a list

    given [1,3,2]

    hard max returns [0, 1, 0]

    softmax returns [0.09, 0.67, 0.24]

import numpy as np

def softmax(vector):
    e=np.exp(vector)
    return e/e.sum()

softmax([1,3,2])

>>array([0.09003057, 0.66524096, 0.24472847])
from scipy.special import softmax

softmax([1,3,2])

>>array([0.09003057, 0.66524096, 0.24472847])

encoded class labels (one-hot encoding):

class 0: [1,0,0]
class 1: [0,1,0]
class 2: [0,0,1]

in softmax case:

class 2: [0.09, 0.67, 0.24]
  • the error between the expected and predicted multinomial probability distribution is often calculated using cross-entropy, this error is then used to update the model

we may want to convert the probabilities back into an integer encoded class label