📒
PaperNotes
  • PAPER NOTES
  • Meta-Learning with Implicit Gradient
  • DARTS: Differentiable Architecture Search
  • Meta-Learning of Neural Architectures for Few-Shot Learning
  • Towards Fast Adaptation of Neural Architectures with Meta Learning
  • Editable Neural Networks
  • ANIL (Almost No Inner Loop)
  • Meta-Learning Representation for Continual Learning
  • Learning to learn by gradient descent by gradient descent
  • Modular Meta-Learning with Shrinkage
  • NADS: Neural Architecture Distribution Search for Uncertainty Awareness
  • Modular Meta Learning
  • Sep
    • Incremental Few Shot Learning with Attention Attractor Network
    • Learning Steady-States of Iterative Algorithms over Graphs
      • Experiments
    • Learning combinatorial optimization algorithms over graphs
    • Meta-Learning with Shared Amortized Variational Inference
    • Concept Learners for Generalizable Few-Shot Learning
    • Progressive Graph Learning for Open-Set Domain Adaptation
    • Probabilistic Neural Architecture Search
    • Large-Scale Long-Tailed Recognition in an Open World
    • Learning to stop while learning to predict
    • Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift
    • Learning to Generalize: Meta-Learning for Domain Generalization
  • Oct
    • Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization
    • Network Architecture Search for Domain Adaptation
    • Continuous Meta Learning without tasks
    • Learning Causal Models Online
    • Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
    • Conditional Neural Progress (CNPs)
    • Reviving and Improving Recurrent Back-Propagation
    • Meta-Q-Learning
    • Learning Self-train for semi-supervised few shot classification
    • Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
  • Nov
    • Neural Process
    • Adversarially Robust Few-Shot Learning: A Meta-Learning Approach
    • Learning to Adapt to Evolving Domains
  • Tutorials
    • Relax constraints to continuous
    • MAML, FO-MAML, Reptile
    • Gradient Descent
      • Steepest Gradient Descent
      • Conjugate Gradient Descent
  • KL, Entropy, MLE, ELBO
  • Coding tricks
    • Python
    • Pytorch
  • ml
    • kmeans
Powered by GitBook
On this page
  • Motivation
  • Objective
  • Reference

Was this helpful?

Modular Meta Learning

CoRL 2018

PreviousNADS: Neural Architecture Distribution Search for Uncertainty AwarenessNextIncremental Few Shot Learning with Attention Attractor Network

Last updated 1 year ago

Was this helpful?

Motivation

Previous approaches to meta-learning have focused on finding distributions or initial values of parameters.

Our objective is similar, but rather than focusing on transferring information about parameter values, we focus on finding a set of reusable modules that can form components of a solution to a new task, possibly with a small amount of tuning.

The authors provide an algorithm, called BounceGrad, which learns a set of modules and then combines them appropriately for a new task.

Objective

Given the specification of a composition rule and a basis set of modules, (C,F,Θ)(\mathcal{C}, F,\mathbb{\Theta})(C,F,Θ)represents a set of possible functional input-output mappings that will serve as the hypothesis space for the meta-test task.

FFF is a basis set of modules, which are functions f1,f2,…,fkf_1,f_2,\dots,f_kf1​,f2​,…,fk​

Each function has a parametric form y=fi(x;θi)y=f_i(x;\theta_i)y=fi​(x;θi​)where θi\theta_iθi​is a fixed-dimensional vector of parameters.

In this work, all the fif_ifi​are neural networks, potentially with different architectures, and the parameters Θ=(θ1,…,θk)\Theta=(\theta_1,\dots,\theta_k)Θ=(θ1​,…,θk​) are the weights of the neural networks, which differ among the modules. Some examples of modules:

  • Single module h(x)=fi(x)h(x)=f_i(x)h(x)=fi​(x)

  • A fixed compositional structure: h(x)=fi(x)+fj(x)h(x)=f_i(x)+f_j(x)h(x)=fi​(x)+fj​(x)

  • A weighted ensemble.

S\mathbb{S}S: the set of possible structures and S∈SS\in \mathbb{S}S∈S is a particular structure generated by C\mathcal{C}C . This approach has two phases: an off-line meta-learning phase and an on-line meta-test learning phase.

Meta-learning phase: we take training and validation data sets for tasks 1,…,k1, \dots,k1,…,k as input and generate a parametrization Θ\mathbb{\Theta}Θfor each module. The objective is to construct modules that will work together as good building blocks for future tasks.

At meta-learning time, SSS is specified, and the objective is to find parameter values Θ\mathbb{\Theta}Θ that constitute a set of modules that can be recombined to effectively solve each of the training tasks.

Validation set is used for the meta-training tasks to avoid choosing Θ\mathbb{\Theta}Θ in a way that overfits.

The training objective is to find Θ\mathbb{\Theta}Θthat minimizing the average generalization performance of the hypotheses SΘ∗S_{\Theta}^{*}SΘ∗​using parameter set Θ\ThetaΘ : (See above figure)

Meta-test learning phase: we take a training data set for the meta-test task as input, as well as S\mathbb{S}S and Θ\mathbb{\Theta}Θ; the output is a compositional form S∈SS\in \mathbb{S}S∈Swhich includes a selection of modules f1,…,fmf_1,\dots,f_mf1​,…,fm​ to be used in that form. SinceΘ\mathbb{\Theta}Θis already specified, the choice of SSS completely determines a mapping from inputs to outputs.

It looks that this is a bi-level optimization problem, and use iterative optimization method to solve it.

Notes:

  • Simulated Annealing

  • iteration optimization vs bilevel optimization

Reference

h

https://arxiv.org/abs/1806.10166
https://github.com/FerranAlet/modular-metalearning
ttps://phillipi.github.io/6.882/2020/notes/The%20problem%20of%20very%20little%20data/Modular%20Meta-Learning.pdf
https://www.youtube.com/watch?v=sdkEP7RfO60
https://docs.google.com/presentation/d/1XqZoJDRMf1sMSRuAoTMNCTU8-7EJxqRcpOQIEh8Whik/edit#slide=id.g910f4e9d00_0_1415
https://lis.csail.mit.edu/alet/NRI_modular_metalearning_slides.pdf
Objective Function in the paper