Modular Meta Learning

CoRL 2018

Motivation

Previous approaches to meta-learning have focused on finding distributions or initial values of parameters.

Our objective is similar, but rather than focusing on transferring information about parameter values, we focus on finding a set of reusable modules that can form components of a solution to a new task, possibly with a small amount of tuning.

The authors provide an algorithm, called BounceGrad, which learns a set of modules and then combines them appropriately for a new task.

Objective

Given the specification of a composition rule and a basis set of modules, $(\mathcal{C}, F,\mathbb{\Theta})$ represents a set of possible functional input-output mappings that will serve as the hypothesis space for the meta-test task.

$F$ is a basis set of modules, which are functions $f_1,f_2,\dots,f_k$

Each function has a parametric form $y=f_i(x;\theta_i)$ where $\theta_i$ is a fixed-dimensional vector of parameters.

In this work, all the $f_i$ are neural networks, potentially with different architectures, and the parameters $\Theta=(\theta_1,\dots,\theta_k)$ are the weights of the neural networks, which differ among the modules. Some examples of modules:

Single module $h(x)=f_i(x)$
A fixed compositional structure: $h(x)=f_i(x)+f_j(x)$
A weighted ensemble.

$\mathbb{S}$ : the set of possible structures and $S\in \mathbb{S}$ is a particular structure generated by $\mathcal{C}$ . This approach has two phases: an off-line meta-learning phase and an on-line meta-test learning phase.

Meta-learning phase: we take training and validation data sets for tasks $1, \dots,k$ as input and generate a parametrization $\mathbb{\Theta}$ for each module. The objective is to construct modules that will work together as good building blocks for future tasks.

At meta-learning time, $S$ is specified, and the objective is to find parameter values $\mathbb{\Theta}$ that constitute a set of modules that can be recombined to effectively solve each of the training tasks.

Validation set is used for the meta-training tasks to avoid choosing $\mathbb{\Theta}$ in a way that overfits.

The training objective is to find $\mathbb{\Theta}$ that minimizing the average generalization performance of the hypotheses $S_{\Theta}^{*}$ using parameter set $\Theta$ : (See above figure)

Meta-test learning phase: we take a training data set for the meta-test task as input, as well as $\mathbb{S}$ and $\mathbb{\Theta}$ ; the output is a compositional form $S\in \mathbb{S}$ which includes a selection of modules $f_1,\dots,f_m$ to be used in that form. Since $\mathbb{\Theta}$ is already specified, the choice of $S$ completely determines a mapping from inputs to outputs.

It looks that this is a bi-level optimization problem, and use iterative optimization method to solve it.

Notes:

Simulated Annealing
iteration optimization vs bilevel optimization

Reference

PreviousNADS: Neural Architecture Distribution Search for Uncertainty Awareness NextIncremental Few Shot Learning with Attention Attractor Network

Last updated 1 year ago

Was this helpful?