📒
PaperNotes
  • PAPER NOTES
  • Meta-Learning with Implicit Gradient
  • DARTS: Differentiable Architecture Search
  • Meta-Learning of Neural Architectures for Few-Shot Learning
  • Towards Fast Adaptation of Neural Architectures with Meta Learning
  • Editable Neural Networks
  • ANIL (Almost No Inner Loop)
  • Meta-Learning Representation for Continual Learning
  • Learning to learn by gradient descent by gradient descent
  • Modular Meta-Learning with Shrinkage
  • NADS: Neural Architecture Distribution Search for Uncertainty Awareness
  • Modular Meta Learning
  • Sep
    • Incremental Few Shot Learning with Attention Attractor Network
    • Learning Steady-States of Iterative Algorithms over Graphs
      • Experiments
    • Learning combinatorial optimization algorithms over graphs
    • Meta-Learning with Shared Amortized Variational Inference
    • Concept Learners for Generalizable Few-Shot Learning
    • Progressive Graph Learning for Open-Set Domain Adaptation
    • Probabilistic Neural Architecture Search
    • Large-Scale Long-Tailed Recognition in an Open World
    • Learning to stop while learning to predict
    • Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift
    • Learning to Generalize: Meta-Learning for Domain Generalization
  • Oct
    • Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization
    • Network Architecture Search for Domain Adaptation
    • Continuous Meta Learning without tasks
    • Learning Causal Models Online
    • Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
    • Conditional Neural Progress (CNPs)
    • Reviving and Improving Recurrent Back-Propagation
    • Meta-Q-Learning
    • Learning Self-train for semi-supervised few shot classification
    • Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
  • Nov
    • Neural Process
    • Adversarially Robust Few-Shot Learning: A Meta-Learning Approach
    • Learning to Adapt to Evolving Domains
  • Tutorials
    • Relax constraints to continuous
    • MAML, FO-MAML, Reptile
    • Gradient Descent
      • Steepest Gradient Descent
      • Conjugate Gradient Descent
  • KL, Entropy, MLE, ELBO
  • Coding tricks
    • Python
    • Pytorch
  • ml
    • kmeans
Powered by GitBook
On this page
  • Motivation
  • Idea
  • Note:
  • Reference

Was this helpful?

NADS: Neural Architecture Distribution Search for Uncertainty Awareness

ICML 2020 08-28-2020

PreviousModular Meta-Learning with ShrinkageNextModular Meta Learning

Last updated 4 years ago

Was this helpful?

Motivation

OOD errors are so common in machine learning systems when testing data dealt with is from a distribution different from training data.

The existing OOD detection approaches are prone to errors and even sometimes OOD examples are assigned with higher likelihoods.

There is currently no well established guiding principle for designing OOD detection architectures that can accurately quantify uncertainty.

NADS is proposed for the designing uncertainty-aware architectures, which could be used to optimize a stochastic OOD detection objective and construct an ensemble of models to perform OOD detection.

Idea

The original objective: (maximizing the Widely Applicable Information Criteria (WAIC) of the training data)

After Monte Carlo sampling (mentioned above):

Note:

  • MC sampling

  • flow-based generative model

  • reparameterized method: Gumbel-Softmax reparameterization

  • how it works for OOD detection?

Reference

For simplicity, suppose the architecture has one operator ( KKK candidate operations).

Each operation iii has a corresponding weight ϕi(i=1,…,K)\phi_i (i=1,\dots, K)ϕi​(i=1,…,K) [This could be done by DARTS]

In this special case, we can viewα\alphaα : architecture, ∈{0,1}K\in\{0,1\}^K∈{0,1}K [comments by myself: if zero-operation is added, then α\alphaα is a dummy vector. Otherwise, it's a one-hot vector]

Actually let b=[b1,b2,…,bK]∈{0,1}Kb=[b_1,b_2,\dots, b_K] \in \{0,1\}^Kb=[b1​,b2​,…,bK​]∈{0,1}K denote the random categorical indicator vector sampled from the probability vectorϕ=[ϕ1,ϕ2,…,ϕK]\phi=[\phi_1,\phi_2,\dots,\phi_K]ϕ=[ϕ1​,ϕ2​,…,ϕK​] . [See the bottom figure from Tianhao's presentation]

Sampling α\alphaα (is equivalent to bbb in this setting) ∼Multi(σ(ϕ))\sim Multi(\sigma (\phi)) ∼Multi(σ(ϕ)) [it could be viewed as : The probability ofα\alphaα is a function of ϕ\phiϕ Pϕ(α)P_\phi(\alpha)Pϕ​(α) ]

Given α\alphaα , we can calculate the random output yyy of the hidden layer given input data xxx :y=∑i=1Kbioi(x)y=\sum_{i=1}^{K} b_i o_i(x)y=∑i=1K​bi​oi​(x)

To make optimization tractable, Gumbel-Softmax reparameterization is used to relax the discrete mask bbb to be a continuous random variable b~\tilde bb~ .

https://proceedings.icml.cc/static/paper_files/icml/2020/5738-Paper.pdf
https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html
https://ai.googleblog.com/2019/12/improving-out-of-distribution-detection.html
https://machinelearningmastery.com/one-hot-encoding-for-categorical-data/
idea of the paper