Modular Meta Learning

CoRL 2018

Motivation

Previous approaches to meta-learning have focused on finding distributions or initial values of parameters.

Our objective is similar, but rather than focusing on transferring information about parameter values, we focus on finding a set of reusable modules that can form components of a solution to a new task, possibly with a small amount of tuning.

The authors provide an algorithm, called BounceGrad, which learns a set of modules and then combines them appropriately for a new task.

Objective

Given the specification of a composition rule and a basis set of modules, (C,F,Θ)(\mathcal{C}, F,\mathbb{\Theta})represents a set of possible functional input-output mappings that will serve as the hypothesis space for the meta-test task.

FF is a basis set of modules, which are functions f1,f2,…,fkf_1,f_2,\dots,f_k

Each function has a parametric form y=fi(x;θi)y=f_i(x;\theta_i)where θi\theta_iis a fixed-dimensional vector of parameters.

In this work, all the fif_iare neural networks, potentially with different architectures, and the parameters Θ=(θ1,…,θk)\Theta=(\theta_1,\dots,\theta_k) are the weights of the neural networks, which differ among the modules. Some examples of modules:

  • Single module h(x)=fi(x)h(x)=f_i(x)

  • A fixed compositional structure: h(x)=fi(x)+fj(x)h(x)=f_i(x)+f_j(x)

  • A weighted ensemble.

Objective Function in the paper

S\mathbb{S}: the set of possible structures and S∈SS\in \mathbb{S} is a particular structure generated by C\mathcal{C} . This approach has two phases: an off-line meta-learning phase and an on-line meta-test learning phase.

Meta-learning phase: we take training and validation data sets for tasks 1,…,k1, \dots,k as input and generate a parametrization Θ\mathbb{\Theta}for each module. The objective is to construct modules that will work together as good building blocks for future tasks.

At meta-learning time, SS is specified, and the objective is to find parameter values Θ\mathbb{\Theta} that constitute a set of modules that can be recombined to effectively solve each of the training tasks.

Validation set is used for the meta-training tasks to avoid choosing Θ\mathbb{\Theta} in a way that overfits.

The training objective is to find Θ\mathbb{\Theta}that minimizing the average generalization performance of the hypotheses SΘ∗S_{\Theta}^{*}using parameter set Θ\Theta : (See above figure)

Meta-test learning phase: we take a training data set for the meta-test task as input, as well as S\mathbb{S} and Θ\mathbb{\Theta}; the output is a compositional form S∈SS\in \mathbb{S}which includes a selection of modules f1,…,fmf_1,\dots,f_m to be used in that form. SinceΘ\mathbb{\Theta}is already specified, the choice of SS completely determines a mapping from inputs to outputs.

It looks that this is a bi-level optimization problem, and use iterative optimization method to solve it.

Notes:

  • Simulated Annealing

  • iteration optimization vs bilevel optimization

Reference

Last updated

Was this helpful?