Generative Adversarial Nets
abstract
a generative model \(G\) - captures the data distribution
a discriminative model \(D\) - estimates the probability that a sample came from the training data rather than \(G\)
Training for \(G\) is to max the probability of \(D\) making a mistake
intros
Discriminative models map a high-dimensional, rich sensory input to a class label
Generative models have difficulties on
1) approximating many intractable probabilistic computations that arise in maximum likelihood estimation and related strategies
2) leveraging the benefits of piecewise linear units in the generative context
\(G\) - a team of counterfeiters trying to produce fake currency and use it without detection
\(D\) - the police, trying to detect the counterfeiter currency
competition in this game drives both teams to improve their methods until the counterfeits are indistinguishable from the genuine articles
related work
undirected graphical models with latent variables, eg. restricted Boltzmann machines(RBMs), deep Boltzmann machines(DBMs)
deep belief networks(DBN)
score matching, noise-contrastive estimation (NCE)
softmax
Gibbs or Boltzmann distribution
\[\frac{\exp{Q_t(a)/\tau}}{\sum_{b=1}^n \exp{Q_t(b)/\tau}}\]\(\tau\) - temperature
high \(\tau\) causes the actions to be all euiprobable
low \(\tau\) cause a greater difference in selection probability for actions that differ in their value estimates
\(\tau \rightarrow 0\) - same as greedy action selection
note: when number of action =2, softmax becomes sigmoid
softmax in machine learning
“Any time we wish to represent a probability distribution over a discrete variable with n possible values, we may use the softmax function”
- activation function in nn
when the nn is configured to output N class labels (multi-class classification)
-
can be used as an activation function for a hidden layer, when the model internally needs to choose or weight multiple different inputs at a bottleneck or concatenation layer
-
softmax is a softened version of the argmax function that returns the index of the largest value in a list
given [1,3,2]
hard max returns [0, 1, 0]
softmax returns [0.09, 0.67, 0.24]
import numpy as np
def softmax(vector):
e=np.exp(vector)
return e/e.sum()
softmax([1,3,2])
>>array([0.09003057, 0.66524096, 0.24472847])
from scipy.special import softmax
softmax([1,3,2])
>>array([0.09003057, 0.66524096, 0.24472847])
encoded class labels (one-hot encoding):
class 0: [1,0,0]
class 1: [0,1,0]
class 2: [0,0,1]
in softmax case:
class 2: [0.09, 0.67, 0.24]
- the error between the expected and predicted multinomial probability distribution is often calculated using cross-entropy, this error is then used to update the model
we may want to convert the probabilities back into an integer encoded class label