NADS: Neural Architecture Distribution Search for Uncertainty Awareness

ICML 2020 08-28-2020

Motivation

OOD errors are so common in machine learning systems when testing data dealt with is from a distribution different from training data.

The existing OOD detection approaches are prone to errors and even sometimes OOD examples are assigned with higher likelihoods.

There is currently no well established guiding principle for designing OOD detection architectures that can accurately quantify uncertainty.

NADS is proposed for the designing uncertainty-aware architectures, which could be used to optimize a stochastic OOD detection objective and construct an ensemble of models to perform OOD detection.

Idea

idea of the paper

For simplicity, suppose the architecture has one operator ( KK candidate operations).

Each operation ii has a corresponding weight ϕi(i=1,…,K)\phi_i (i=1,\dots, K) [This could be done by DARTS]

In this special case, we can viewα\alpha : architecture, ∈{0,1}K\in\{0,1\}^K [comments by myself: if zero-operation is added, then α\alpha is a dummy vector. Otherwise, it's a one-hot vector] https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/

Actually let b=[b1,b2,…,bK]∈{0,1}Kb=[b_1,b_2,\dots, b_K] \in \{0,1\}^K denote the random categorical indicator vector sampled from the probability vectorϕ=[ϕ1,ϕ2,…,ϕK]\phi=[\phi_1,\phi_2,\dots,\phi_K] . [See the bottom figure from Tianhao's presentation]

Sampling α\alpha (is equivalent to bb in this setting) ∼Multi(σ(ϕ))\sim Multi(\sigma (\phi)) [it could be viewed as : The probability ofα\alpha is a function of ϕ\phi Pϕ(α)P_\phi(\alpha) ]

Given α\alpha , we can calculate the random output yy of the hidden layer given input data xx :y=∑i=1Kbioi(x)y=\sum_{i=1}^{K} b_i o_i(x)

The original objective: (maximizing the Widely Applicable Information Criteria (WAIC) of the training data)

After Monte Carlo sampling (mentioned above):

To make optimization tractable, Gumbel-Softmax reparameterization is used to relax the discrete mask bb to be a continuous random variable b~\tilde b .

Note:

  • MC sampling

  • flow-based generative model

  • reparameterized method: Gumbel-Softmax reparameterization

  • how it works for OOD detection?

Reference

Last updated

Was this helpful?