📒
PaperNotes
  • PAPER NOTES
  • Meta-Learning with Implicit Gradient
  • DARTS: Differentiable Architecture Search
  • Meta-Learning of Neural Architectures for Few-Shot Learning
  • Towards Fast Adaptation of Neural Architectures with Meta Learning
  • Editable Neural Networks
  • ANIL (Almost No Inner Loop)
  • Meta-Learning Representation for Continual Learning
  • Learning to learn by gradient descent by gradient descent
  • Modular Meta-Learning with Shrinkage
  • NADS: Neural Architecture Distribution Search for Uncertainty Awareness
  • Modular Meta Learning
  • Sep
    • Incremental Few Shot Learning with Attention Attractor Network
    • Learning Steady-States of Iterative Algorithms over Graphs
      • Experiments
    • Learning combinatorial optimization algorithms over graphs
    • Meta-Learning with Shared Amortized Variational Inference
    • Concept Learners for Generalizable Few-Shot Learning
    • Progressive Graph Learning for Open-Set Domain Adaptation
    • Probabilistic Neural Architecture Search
    • Large-Scale Long-Tailed Recognition in an Open World
    • Learning to stop while learning to predict
    • Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift
    • Learning to Generalize: Meta-Learning for Domain Generalization
  • Oct
    • Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization
    • Network Architecture Search for Domain Adaptation
    • Continuous Meta Learning without tasks
    • Learning Causal Models Online
    • Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
    • Conditional Neural Progress (CNPs)
    • Reviving and Improving Recurrent Back-Propagation
    • Meta-Q-Learning
    • Learning Self-train for semi-supervised few shot classification
    • Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
  • Nov
    • Neural Process
    • Adversarially Robust Few-Shot Learning: A Meta-Learning Approach
    • Learning to Adapt to Evolving Domains
  • Tutorials
    • Relax constraints to continuous
    • MAML, FO-MAML, Reptile
    • Gradient Descent
      • Steepest Gradient Descent
      • Conjugate Gradient Descent
  • KL, Entropy, MLE, ELBO
  • Coding tricks
    • Python
    • Pytorch
  • ml
    • kmeans
Powered by GitBook
On this page
  • Motivation
  • Rapid Learning vs Feature Reuse
  • ANIL
  • advantages:
  • Reference:

Was this helpful?

ANIL (Almost No Inner Loop)

ICLR 2020 8-23-2020

Motivation

Idea of MAML: to builder a meta-learner which could learn a set of optimal initialization useful for learning different tasks, then adapt to specific tasks quickly (within a few gradient steps) and efficiently (with only a few examples).

It is also viewed as a bi-level optimization problem. Two types of parameters updates are required:

  • the inner loop and the outer loop

  • The inner loop takes the initialization and performs the task-specific adaptation to new tasks.

  • The outer loop updates the meta-initialization of the neural network architecture parameters to a setting which could be adopted in the inner loop to enable fast adaptation to new tasks.

Conjecture/hypothesis of the author of ANIL:

"we can obtain the same rapid learning performance of MAML solely through feature reuse."

Rapid Learning vs Feature Reuse

Rapid learning:

"In rapid learning, the meta-initialization in the outer loop results in a parameter setting that is favorable for fast learning, thus significant adaptation to new tasks can rapidly take place in the inner loop. "

"In feature reuse, the meta-initialization already contains useful features that can be reused, so little adaptation on the parameters is required in the inner loop."

"To prove feature reuse is a competitive alternative to rapid learning in MAML, the authors proposed a simplified algorithm, ANIL, where the inner loop is removed for all but the task-specific head of the underlying neural network during training and testing."

ANIL

  • base model/learner: a neural network architecture (i.e., CNN)

  • θ\thetaθ : the set of meta-initialization parameters of the feature extractable layers of the neural network architecture

  • www: the set of meta-initialization parameters of the head layer (final classification layer?)

  • ϕθ\phi_{\theta}ϕθ​: the feature extractor parametrized by θ\thetaθ

  • y^=wTϕθ(x)\hat{y} = w^{T}\phi_{\theta}(x)y^​=wTϕθ​(x) : label prediction

Outer loop

Given θi\theta_{i}θi​ and wiw_iwi​ at iteration step iii , the outer loop will update both parameters via gradient descent:

θi+1=θi−α∇θiL(wi′Tϕθi′(x),y)wi+1=wi−α∇wiL(wi′Tϕθi′(x),y)\theta_{i+1} = \theta_i - \alpha\nabla_{\theta_i}\mathcal{L}({w^{\prime}_i}^{T}\phi_{\theta^{\prime}_i}(x), y)\\ w_{i+1} = w_i - \alpha\nabla_{w_i}\mathcal{L}({w^{\prime}_i}^{T}\phi_{\theta^{\prime}_i}(x), y)θi+1​=θi​−α∇θi​​L(wi′​Tϕθi′​​(x),y)wi+1​=wi​−α∇wi​​L(wi′​Tϕθi′​​(x),y)
  • L\mathcal{L}L is the loss for one task (or several tasks) (query set loss I think)

  • α\alphaα meta learning rate

  • θi′ \theta^{\prime}_iθi′​ task-specific parameters (task adapted parameters) after one/several steps from θi\theta_iθi​ in inner loop

  • wi′w^{\prime}_iwi′​ task-specific parameters (task adapted parameters) after one/several steps from wiw_iwi​ in inner loop

  • (x,y)(x,y)(x,y) samples from query set

Inner loop (one step for illustration)

θi′=θiwi′=wi−β∇wiL(wiTϕθi(x),y) {\color{red} \theta^{\prime}_{i} = \theta_{i}} \\ w^{\prime}_i = w_i - \beta\nabla_{w_i}\mathcal{L}(w_i^{T}\phi_{\theta_i}(x), y)θi′​=θi​wi′​=wi​−β∇wi​​L(wiT​ϕθi​​(x),y)
  • β\betaβ learning rate in inner loop

  • L\mathcal{L}L loss function for one/several tasks during support set

  • (x,y)(x,y)(x,y) samples from support set

In contrast:

inner loop in MAML:

θi′=θi−β∇θiL(wiTϕθi(x),y)wi′=wi−β∇wiL(wiTϕθi(x),y) {\color{red} \theta^{\prime}_i = \theta_i - \beta\nabla_{\theta_i}\mathcal{L}(w_i^{T}\phi_{\theta_i}(x), y)} \\ w^{\prime}_i = w_i - \beta\nabla_{w_i}\mathcal{L}(w_i^{T}\phi_{\theta_i}(x), y)θi′​=θi​−β∇θi​​L(wiT​ϕθi​​(x),y)wi′​=wi​−β∇wi​​L(wiT​ϕθi​​(x),y)

advantages:

  • much more computationally efficient since it requires fewer updates in the inner loop.

  • performance is comparable with MAML

Reference:

PreviousEditable Neural NetworksNextMeta-Learning Representation for Continual Learning

Last updated 4 years ago

Was this helpful?

https://openreview.net/forum?id=rkgMkCEtPB
https://maithraraghu.com/assets/files/RapidLearningFeatureReuse.pdf
https://maithraraghu.com/assets/files/ICLR2020_ANIL.pdf
http://learn2learn.net/tutorials/anil_tutorial/ANIL_tutorial/