NADS: Neural Architecture Distribution Search for Uncertainty Awareness
ICML 2020 08-28-2020
Last updated
ICML 2020 08-28-2020
Last updated
OOD errors are so common in machine learning systems when testing data dealt with is from a distribution different from training data.
The existing OOD detection approaches are prone to errors and even sometimes OOD examples are assigned with higher likelihoods.
There is currently no well established guiding principle for designing OOD detection architectures that can accurately quantify uncertainty.
NADS is proposed for the designing uncertainty-aware architectures, which could be used to optimize a stochastic OOD detection objective and construct an ensemble of models to perform OOD detection.
For simplicity, suppose the architecture has one operator ( candidate operations).
Each operation has a corresponding weight [This could be done by DARTS]
In this special case, we can view : architecture, [comments by myself: if zero-operation is added, then is a dummy vector. Otherwise, it's a one-hot vector] https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/
Actually let denote the random categorical indicator vector sampled from the probability vector . [See the bottom figure from Tianhao's presentation]
Sampling (is equivalent to in this setting) [it could be viewed as : The probability of is a function of ]
Given , we can calculate the random output of the hidden layer given input data :
The original objective: (maximizing the Widely Applicable Information Criteria (WAIC) of the training data)
After Monte Carlo sampling (mentioned above):
To make optimization tractable, Gumbel-Softmax reparameterization is used to relax the discrete mask to be a continuous random variable .
MC sampling
flow-based generative model
reparameterized method: Gumbel-Softmax reparameterization
how it works for OOD detection?