NADS: Neural Architecture Distribution Search for Uncertainty Awareness

ICML 2020 08-28-2020

Motivation

OOD errors are so common in machine learning systems when testing data dealt with is from a distribution different from training data.

The existing OOD detection approaches are prone to errors and even sometimes OOD examples are assigned with higher likelihoods.

There is currently no well established guiding principle for designing OOD detection architectures that can accurately quantify uncertainty.

NADS is proposed for the designing uncertainty-aware architectures, which could be used to optimize a stochastic OOD detection objective and construct an ensemble of models to perform OOD detection.

Idea

For simplicity, suppose the architecture has one operator ( $K$ candidate operations).

Each operation $i$ has a corresponding weight $\phi_i (i=1,\dots, K)$ [This could be done by DARTS]

In this special case, we can view $\alpha$ : architecture, $\in\{0,1\}^K$ [comments by myself: if zero-operation is added, then $\alpha$ is a dummy vector. Otherwise, it's a one-hot vector] https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/

Actually let $b=[b_1,b_2,\dots, b_K] \in \{0,1\}^K$ denote the random categorical indicator vector sampled from the probability vector $\phi=[\phi_1,\phi_2,\dots,\phi_K]$ . [See the bottom figure from Tianhao's presentation]

Sampling $\alpha$ (is equivalent to $b$ in this setting) $\sim Multi(\sigma (\phi))$ [it could be viewed as : The probability of $\alpha$ is a function of $\phi$ $P_\phi(\alpha)$ ]

Given $\alpha$ , we can calculate the random output $y$ of the hidden layer given input data $x$ : $y=\sum_{i=1}^{K} b_i o_i(x)$

The original objective: (maximizing the Widely Applicable Information Criteria (WAIC) of the training data)

After Monte Carlo sampling (mentioned above):

To make optimization tractable, Gumbel-Softmax reparameterization is used to relax the discrete mask $b$ to be a continuous random variable $\tilde b$ .

Note:

MC sampling
flow-based generative model
reparameterized method: Gumbel-Softmax reparameterization
how it works for OOD detection?

Reference

PreviousModular Meta-Learning with Shrinkage NextModular Meta Learning

Last updated 5 years ago

Was this helpful?