NADS: Neural Architecture Distribution Search for Uncertainty Awareness
ICML 2020 08-28-2020
Motivation
OOD errors are so common in machine learning systems when testing data dealt with is from a distribution different from training data.
The existing OOD detection approaches are prone to errors and even sometimes OOD examples are assigned with higher likelihoods.
There is currently no well established guiding principle for designing OOD detection architectures that can accurately quantify uncertainty.
NADS is proposed for the designing uncertainty-aware architectures, which could be used to optimize a stochastic OOD detection objective and construct an ensemble of models to perform OOD detection.
Idea

For simplicity, suppose the architecture has one operator ( K candidate operations).
Each operation i has a corresponding weight ϕi​(i=1,…,K) [This could be done by DARTS]

In this special case, we can viewα : architecture, ∈{0,1}K [comments by myself: if zero-operation is added, then α is a dummy vector. Otherwise, it's a one-hot vector] https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/
Actually let b=[b1​,b2​,…,bK​]∈{0,1}K denote the random categorical indicator vector sampled from the probability vectorϕ=[ϕ1​,ϕ2​,…,ϕK​] . [See the bottom figure from Tianhao's presentation]

Sampling α (is equivalent to b in this setting) ∼Multi(σ(ϕ)) [it could be viewed as : The probability ofα is a function of ϕ Pϕ​(α) ]
Given α , we can calculate the random output y of the hidden layer given input data x :y=∑i=1K​bi​oi​(x)
The original objective: (maximizing the Widely Applicable Information Criteria (WAIC) of the training data)

After Monte Carlo sampling (mentioned above):

To make optimization tractable, Gumbel-Softmax reparameterization is used to relax the discrete mask b to be a continuous random variable b~ .
Note:
MC sampling
flow-based generative model
reparameterized method: Gumbel-Softmax reparameterization
how it works for OOD detection?
Reference
Last updated