📒
PaperNotes
  • PAPER NOTES
  • Meta-Learning with Implicit Gradient
  • DARTS: Differentiable Architecture Search
  • Meta-Learning of Neural Architectures for Few-Shot Learning
  • Towards Fast Adaptation of Neural Architectures with Meta Learning
  • Editable Neural Networks
  • ANIL (Almost No Inner Loop)
  • Meta-Learning Representation for Continual Learning
  • Learning to learn by gradient descent by gradient descent
  • Modular Meta-Learning with Shrinkage
  • NADS: Neural Architecture Distribution Search for Uncertainty Awareness
  • Modular Meta Learning
  • Sep
    • Incremental Few Shot Learning with Attention Attractor Network
    • Learning Steady-States of Iterative Algorithms over Graphs
      • Experiments
    • Learning combinatorial optimization algorithms over graphs
    • Meta-Learning with Shared Amortized Variational Inference
    • Concept Learners for Generalizable Few-Shot Learning
    • Progressive Graph Learning for Open-Set Domain Adaptation
    • Probabilistic Neural Architecture Search
    • Large-Scale Long-Tailed Recognition in an Open World
    • Learning to stop while learning to predict
    • Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift
    • Learning to Generalize: Meta-Learning for Domain Generalization
  • Oct
    • Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization
    • Network Architecture Search for Domain Adaptation
    • Continuous Meta Learning without tasks
    • Learning Causal Models Online
    • Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
    • Conditional Neural Progress (CNPs)
    • Reviving and Improving Recurrent Back-Propagation
    • Meta-Q-Learning
    • Learning Self-train for semi-supervised few shot classification
    • Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
  • Nov
    • Neural Process
    • Adversarially Robust Few-Shot Learning: A Meta-Learning Approach
    • Learning to Adapt to Evolving Domains
  • Tutorials
    • Relax constraints to continuous
    • MAML, FO-MAML, Reptile
    • Gradient Descent
      • Steepest Gradient Descent
      • Conjugate Gradient Descent
  • KL, Entropy, MLE, ELBO
  • Coding tricks
    • Python
    • Pytorch
  • ml
    • kmeans
Powered by GitBook
On this page
  • MAML(Bi-level optimization)
  • Problem setting
  • Meta-Train
  • Meta-test
  • Gradient-based Meta-Learning
  • MAML
  • Reptile
  • iMAML
  • Derivative process
  • FO-MAML
  • Reptile
  • How iMAML generalizes them

Was this helpful?

  1. Tutorials

MAML, FO-MAML, Reptile

gradient optimization based meta-learning algorithms

PreviousRelax constraints to continuousNextGradient Descent

Last updated 1 year ago

Was this helpful?

MAML(Bi-level optimization)

Problem setting

Meta-Train

Notations:

  • θML∗\theta^*_{ML}θML∗​ optimal meta-learned parameters

  • ϕi\phi_iϕi​ task-specific parameters for task iii

  • MMM the number of tasks in meta-train, iii is the index of task iii

  • Ditr\mathcal{D}^{tr}_i Ditr​ support set, Ditest\mathcal{D}^{test}_i Ditest​ query set in task iii

  • L(ϕ,D)\mathcal{L}(\phi, \mathcal{D}) L(ϕ,D) loss function with parameter vector and dataset

  • ϕi=Alg(θ,Ditr)=θ−α∇θL(θ,Ditr)\phi_i = \mathcal{A}lg(\theta, \mathcal{D}^{tr}_{i})=\theta - \alpha\nabla_{\theta}\mathcal{L}(\theta,\mathcal{D}^{tr}_i)ϕi​=Alg(θ,Ditr​)=θ−α∇θ​L(θ,Ditr​) : one (or multiple) steps of gradient descent initialized at θ\thetaθ. [inner-level of MAML]

Meta-test

Gradient-based Meta-Learning

  • Task ttt Tt,\mathcal{T}_t, Tt​, is associated with a finite dataset Dt={xt,n}∣n=1Nt\mathcal{D}_{t}=\left\{\mathbf{x}_{t, n}\right\}|_{n=1}^{N_t}Dt​={xt,n​}∣n=1Nt​​

  • Task ttt Tt:\mathcal{T}_t:Tt​: Dttrain,Dtval\mathcal{D}_{t}^{train}, \mathcal{D}_{t}^{val}Dttrain​,Dtval​

  • meta parameters ϕ∈RD\boldsymbol{\phi} \in \mathbb{R}^{D}ϕ∈RD

  • Task-specific parameters θt∈RD\boldsymbol{\theta}_{t} \in \mathbb{R}^{D}θt​∈RD

  • loss function ℓ(Dt;θt)\ell\left(\mathcal{D}_{t} ; \boldsymbol{\theta}_{t}\right)ℓ(Dt​;θt​)

Algorithm 1 is a structure of a typical meta-learning algorithm, which could be:

  • MAML

  • iMAML

  • Reptile

  1. TASKADAPT: task adaptation (inner loop)

  2. The meta-updateΔt \Delta_tΔt​specifies the contribution of task ttt to the meta parameters. (outer loop)

MAML

  1. task adaptation: minimizing the training loss ℓttrain(θt)=ℓ(Dttrain;θt)\ell_t^{train}(\boldsymbol{\theta}_t)=\ell(\mathcal{D}_t^{train}; \boldsymbol{\theta}_t)ℓttrain​(θt​)=ℓ(Dttrain​;θt​) by the gradient descent w.r.t. task parameters

  2. meta parameters update: by gradient descent on the validation loss ℓtval(θt)=ℓ(Dtval;θ),\ell_t^{val}(\boldsymbol{\theta}_t)=\ell(\mathcal{D}_t^{val}; \boldsymbol{\theta}), ℓtval​(θt​)=ℓ(Dtval​;θ), resulting in the meta update (gradient) for task ttt : ΔtMAML=∇ϕℓtval(θt(ϕ))\Delta_t^{\text{MAML}} = \nabla_{\phi}\ell_t^{\text{val}}(\boldsymbol{\theta}_t(\phi))ΔtMAML​=∇ϕ​ℓtval​(θt​(ϕ))

This approach treats the task parameters as a function of the meta parameters, and hence requires back-propagation through the entire L-step task adaptation process. When L is large, this becomes computationally prohibitive.

Reptile

Reptile optimizes θt\theta_tθt​ on the entire dataset Dt\mathcal{D}_tDt​ , and move ϕ\phiϕ towards the adapted task parameters, yielding ΔtReptile=ϕ−θt\Delta_t^{\text{Reptile}}=\phi-\boldsymbol{\theta}_tΔtReptile​=ϕ−θt​

iMAML

iMAML introduces an L2 regularizer λ2∣∣θt−ϕ∣∣2\frac{\lambda}{2}||\boldsymbol{\theta}_t-\phi||^22λ​∣∣θt​−ϕ∣∣2 to training loss, and optimizes the task parameters on the regularized training loss.

Provided that this task adaptation process converges to a stationary point, implicit differentiation enables the computation of meta gradient based only on the final solution of the adaptation process: ΔtiMAML=(I+1λ∇θt2ℓttrain⁡(θt))−1∇θtℓtval(θt)\Delta_{t}^{\mathrm{iMAML}}=\left(\mathbf{I}+\frac{1}{\lambda} \nabla_{\boldsymbol{\theta}_{t}}^{2} \ell_{t}^{\operatorname{train}}\left(\boldsymbol{\theta}_{t}\right)\right)^{-1} \nabla_{\boldsymbol{\theta}_{t}} \ell_{t}^{\mathrm{val}}\left(\boldsymbol{\theta}_{t}\right)ΔtiMAML​=(I+λ1​∇θt​2​ℓttrain​(θt​))−1∇θt​​ℓtval​(θt​)

Derivative process

FO-MAML

Reptile

How iMAML generalizes them

The goal of meta-learning is to learn meta-parameters that produce good task specific parameters after adaptation.