MAML, FO-MAML, Reptile
gradient optimization based meta-learning algorithms
Last updated
Was this helpful?
gradient optimization based meta-learning algorithms
Last updated
Was this helpful?
optimal meta-learned parameters
task-specific parameters for task
the number of tasks in meta-train, is the index of task
support set, query set in task
loss function with parameter vector and dataset
: one (or multiple) steps of gradient descent initialized at . [inner-level of MAML]
Task is associated with a finite dataset
Task
meta parameters
Task-specific parameters
loss function
Algorithm 1 is a structure of a typical meta-learning algorithm, which could be:
MAML
iMAML
Reptile
TASKADAPT: task adaptation (inner loop)
The meta-updatespecifies the contribution of task to the meta parameters. (outer loop)
task adaptation: minimizing the training loss by the gradient descent w.r.t. task parameters
meta parameters update: by gradient descent on the validation loss resulting in the meta update (gradient) for task :
This approach treats the task parameters as a function of the meta parameters, and hence requires back-propagation through the entire L-step task adaptation process. When L is large, this becomes computationally prohibitive.
Reptile optimizes on the entire dataset , and move towards the adapted task parameters, yielding
iMAML introduces an L2 regularizer to training loss, and optimizes the task parameters on the regularized training loss.
Provided that this task adaptation process converges to a stationary point, implicit differentiation enables the computation of meta gradient based only on the final solution of the adaptation process: