πŸ“’
PaperNotes
  • PAPER NOTES
  • Meta-Learning with Implicit Gradient
  • DARTS: Differentiable Architecture Search
  • Meta-Learning of Neural Architectures for Few-Shot Learning
  • Towards Fast Adaptation of Neural Architectures with Meta Learning
  • Editable Neural Networks
  • ANIL (Almost No Inner Loop)
  • Meta-Learning Representation for Continual Learning
  • Learning to learn by gradient descent by gradient descent
  • Modular Meta-Learning with Shrinkage
  • NADS: Neural Architecture Distribution Search for Uncertainty Awareness
  • Modular Meta Learning
  • Sep
    • Incremental Few Shot Learning with Attention Attractor Network
    • Learning Steady-States of Iterative Algorithms over Graphs
      • Experiments
    • Learning combinatorial optimization algorithms over graphs
    • Meta-Learning with Shared Amortized Variational Inference
    • Concept Learners for Generalizable Few-Shot Learning
    • Progressive Graph Learning for Open-Set Domain Adaptation
    • Probabilistic Neural Architecture Search
    • Large-Scale Long-Tailed Recognition in an Open World
    • Learning to stop while learning to predict
    • Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift
    • Learning to Generalize: Meta-Learning for Domain Generalization
  • Oct
    • Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization
    • Network Architecture Search for Domain Adaptation
    • Continuous Meta Learning without tasks
    • Learning Causal Models Online
    • Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
    • Conditional Neural Progress (CNPs)
    • Reviving and Improving Recurrent Back-Propagation
    • Meta-Q-Learning
    • Learning Self-train for semi-supervised few shot classification
    • Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
  • Nov
    • Neural Process
    • Adversarially Robust Few-Shot Learning: A Meta-Learning Approach
    • Learning to Adapt to Evolving Domains
  • Tutorials
    • Relax constraints to continuous
    • MAML, FO-MAML, Reptile
    • Gradient Descent
      • Steepest Gradient Descent
      • Conjugate Gradient Descent
  • KL, Entropy, MLE, ELBO
  • Coding tricks
    • Python
    • Pytorch
  • ml
    • kmeans
Powered by GitBook
On this page
  • Motivation
  • Marrying Gradient-based Meta Learning and Gradient-based NAS
  • Task-dependent Architecture Adaptation
  • Reference:

Was this helpful?

Meta-Learning of Neural Architectures for Few-Shot Learning

CVPR 2020 8-22-2020

Motivation

Few-shot learning is typically done with a fixed neural architecture. This paper proposes MetaNAS, the first method which fully integrates NAS with gradient-based meta learning.

MetaNAS allows adapting architectures to novel tasks based on few data points with just a few steps of a gradient-based task optimizer. This allows MetaNAS to generate task-specific architectures that are adapted to every task separately (but from a joint meta- learned meta-architecture).

Marrying Gradient-based Meta Learning and Gradient-based NAS

  • Ξ±meta\alpha_{meta}Ξ±meta​ : meta-learned architecture

  • wmetaw_{meta}wmeta​ : corresponding meta-learned weights for the architecture

  • Task Ti\mathcal{T}_iTi​ : (Ditr,Ditest)(\mathcal{D_i^{tr}}, \mathcal{D_i^{test}})(Ditr​,Ditest​)

Meta-objective:

min⁑α,wLmeta(Ξ±,w,ptrain,Ξ¦k)=min⁑α,wβˆ‘Ti∼ptrainLi(Ξ¦k(Ξ±,w,Ditr),Ditest)=min⁑α,wβˆ‘Ti∼ptrainLi((Ξ±Tiβˆ—,wTiβˆ—),Ditest)\begin{aligned} & \min_{\alpha, w}\mathcal{L}_{meta}(\alpha, w, p^{train}, \Phi^k) \\ = & \min_{\alpha, w}\sum_{\mathcal{T}_i \sim p^{train} }\mathcal{L}_i\left(\Phi^k(\alpha, w, \mathcal{D}_i^{tr}), \mathcal{D}_i^{test}\right) \\ = & \min_{\alpha, w}\sum_{\mathcal{T}_i \sim p^{train} }\mathcal{L}_i\left((\alpha_{\mathcal{T}_i}^{*},w_{\mathcal{T}_i}^{*}), \mathcal{D}_i^{test}\right) \end{aligned} ==​α,wmin​Lmeta​(Ξ±,w,ptrain,Ξ¦k)Ξ±,wmin​Tiβ€‹βˆΌptrainβˆ‘β€‹Li​(Ξ¦k(Ξ±,w,Ditr​),Ditest​)Ξ±,wmin​Tiβ€‹βˆΌptrainβˆ‘β€‹Li​((Ξ±Tiβ€‹βˆ—β€‹,wTiβ€‹βˆ—β€‹),Ditest​)​

where Ξ±Tiβˆ—,wTiβˆ—=Ξ¦k(Ξ±,w,Ditr)=argmin⁑α,wL^i(Ξ±,w,Ditr)\alpha_{\mathcal{T}_i}^{*},w_{\mathcal{T}_i}^{*} = \Phi^k(\alpha, w, \mathcal{D}_i^{tr})=\operatorname{argmin}_{\alpha, w} \hat{\mathcal{L}}_i (\alpha, w, \mathcal{D}_i^{tr}) Ξ±Tiβ€‹βˆ—β€‹,wTiβ€‹βˆ—β€‹=Ξ¦k(Ξ±,w,Ditr​)=argminΞ±,w​L^i​(Ξ±,w,Ditr​) are the task-specific architecture and parameters after kkk gradient steps, which could approximated using SGD.

  • Li\mathcal{L}_iLi​ : query loss for task iii

  • Li^\hat{\mathcal{L}_i}Li​^​ : support loss for task iii

Inner loop update Ξ±,andΒ Β w\alpha, \text{and} \ \ wΞ±,andΒ Β w with weight learning rate ΞΎtask\xi_{task}ΞΎtask​ and architecture learning rate Ξ»task\lambda_{task}Ξ»task​ :

(Ξ±j+1wj+1)=Ξ¦(Ξ±j,wj,Ditr)=(Ξ±jβˆ’ΞΎtaskβˆ‡Ξ±LT(Ξ±j,wj,Ditr)wjβˆ’Ξ»taskβˆ‡wLT(Ξ±j,wj,Ditr))\begin{aligned} \left(\begin{array}{c} \alpha^{j+1} \\ w^{j+1} \end{array}\right) &=\Phi\left(\alpha^{j}, w^{j}, D_{i}^{tr}\right) \\ &=\left(\begin{array}{c} \alpha^{j}-\xi_{\text {task}} \nabla_{\alpha} \mathcal{L}_{\mathcal{T}}\left( \alpha^{j}, w^{j}, D_{i}^{tr}\right) \\ w^{j}-\lambda_{\text {task}} \nabla_{w} \mathcal{L}_{\mathcal{T}}\left(\alpha^{j}, w^{j}, D_{i}^{tr}\right) \end{array}\right) \end{aligned}(Ξ±j+1wj+1​)​=Ξ¦(Ξ±j,wj,Ditr​)=(Ξ±jβˆ’ΞΎtaskβ€‹βˆ‡Ξ±β€‹LT​(Ξ±j,wj,Ditr​)wjβˆ’Ξ»taskβ€‹βˆ‡w​LT​(Ξ±j,wj,Ditr​)​)​

Outer loop update:

(Ξ±metai+1wmetai+1)=Ξ¨MAML(Ξ±metai,wmetai,ptrain,Ξ¦k)=(Ξ±metaiβˆ’ΞΎmetaβˆ‡Ξ±Lmeta(Ξ±metai,wmetai,ptrain,Ξ¦k)wmetaiβˆ’Ξ»metaβˆ‡wLmeta(Ξ±metai,wmetai,ptrain,Ξ¦k))\begin{aligned} \left(\begin{array}{l} \alpha_{\text {meta}}^{i+1} \\ w_{\text {meta}}^{i+1} \end{array}\right) =& \Psi^{M A M L}\left(\alpha_{\text {meta}}^{i}, w_{\text {meta}}^{i}, p^{\text {train}}, \Phi^{k}\right) \\ = & \left(\begin{array}{c} \alpha_{\text {meta}}^{i}-\xi_{\text {meta}} \nabla_{\alpha} \mathcal{L}_{\text {meta}}\left(\alpha_{\text {meta}}^{i}, w_{\text {meta}}^{i}, p^{\text {train}}, \Phi^{k}\right) \\ w_{\text {meta}}^{i}-\lambda_{\text {meta}} \nabla_{w} \mathcal{L}_{\text {meta}}\left(\alpha_{\text {meta}}^{i}, w_{\text {meta}}^{i}, p^{\text {train}}, \Phi^{k}\right) \end{array}\right) \end{aligned}(Ξ±metai+1​wmetai+1​​)==​ΨMAML(Ξ±metai​,wmetai​,ptrain,Ξ¦k)(Ξ±metaiβ€‹βˆ’ΞΎmetaβ€‹βˆ‡Ξ±β€‹Lmeta​(Ξ±metai​,wmetai​,ptrain,Ξ¦k)wmetaiβ€‹βˆ’Ξ»metaβ€‹βˆ‡w​Lmeta​(Ξ±metai​,wmetai​,ptrain,Ξ¦k)​)​

Reptile could as be used instead of MAML here:

(Ξ±metai+1wmetai+1)=Ξ¨Reptile(Ξ±metai,wmetai,ptrain,Ξ¦k)=(Ξ±metai+ΞΎmetaβˆ‘Ti(Ξ±Tiβˆ—βˆ’Ξ±metai)wmetai+Ξ»metaβˆ‘Tiβˆ—(wTiβˆ—βˆ’wmetai))\begin{aligned} \left(\begin{array}{l} \alpha_{\text {meta}}^{i+1} \\ w_{\text {meta}}^{i+1} \end{array}\right) =& \Psi^{Reptile}\left(\alpha_{\text {meta}}^{i}, w_{\text {meta}}^{i}, p^{\text {train}}, \Phi^{k}\right) \\ = & \left(\begin{array}{c} \alpha_{\text {meta}}^{i}+\xi_{\text {meta}} \sum_{\mathcal{T}_i}(\alpha_{\mathcal{T}_i}^{*}-\alpha_{meta}^i)\\ w_{\text {meta}}^{i} +\lambda_{\text {meta}} \sum_{\mathcal{T}_i^{*}}(w_{\mathcal{T}_i}^{*} - w_{meta}^{i}) \end{array}\right) \end{aligned}(Ξ±metai+1​wmetai+1​​)==​ΨReptile(Ξ±metai​,wmetai​,ptrain,Ξ¦k)(Ξ±metai​+ΞΎmetaβ€‹βˆ‘Ti​​(Ξ±Tiβ€‹βˆ—β€‹βˆ’Ξ±metai​)wmetai​+Ξ»metaβ€‹βˆ‘Tiβˆ—β€‹β€‹(wTiβ€‹βˆ—β€‹βˆ’wmetai​)​)​

Task-dependent Architecture Adaptation

Two modifications to remove the need for retraining.

Reference:

PreviousDARTS: Differentiable Architecture SearchNextTowards Fast Adaptation of Neural Architectures with Meta Learning

Last updated 1 year ago

Was this helpful?

https://openaccess.thecvf.com/content_CVPR_2020/papers/Elsken_Meta-Learning_of_Neural_Architectures_for_Few-Shot_Learning_CVPR_2020_paper.pdf
https://arxiv.org/pdf/1911.11090.pdf