ANIL (Almost No Inner Loop)
ICLR 2020 8-23-2020
Motivation
Idea of MAML: to builder a meta-learner which could learn a set of optimal initialization useful for learning different tasks, then adapt to specific tasks quickly (within a few gradient steps) and efficiently (with only a few examples).
It is also viewed as a bi-level optimization problem. Two types of parameters updates are required:
the inner loop and the outer loop
The inner loop takes the initialization and performs the task-specific adaptation to new tasks.
The outer loop updates the meta-initialization of the neural network architecture parameters to a setting which could be adopted in the inner loop to enable fast adaptation to new tasks.
Conjecture/hypothesis of the author of ANIL:
"we can obtain the same rapid learning performance of MAML solely through feature reuse."
Rapid Learning vs Feature Reuse
Rapid learning:
"In rapid learning, the meta-initialization in the outer loop results in a parameter setting that is favorable for fast learning, thus significant adaptation to new tasks can rapidly take place in the inner loop. "
"In feature reuse, the meta-initialization already contains useful features that can be reused, so little adaptation on the parameters is required in the inner loop."
"To prove feature reuse is a competitive alternative to rapid learning in MAML, the authors proposed a simplified algorithm, ANIL, where the inner loop is removed for all but the task-specific head of the underlying neural network during training and testing."
ANIL
base model/learner: a neural network architecture (i.e., CNN)
: the set of meta-initialization parameters of the feature extractable layers of the neural network architecture
: the set of meta-initialization parameters of the head layer (final classification layer?)
: the feature extractor parametrized by
: label prediction
Outer loop
Given and at iteration step , the outer loop will update both parameters via gradient descent:
is the loss for one task (or several tasks) (query set loss I think)
meta learning rate
task-specific parameters (task adapted parameters) after one/several steps from in inner loop
task-specific parameters (task adapted parameters) after one/several steps from in inner loop
samples from query set
Inner loop (one step for illustration)
learning rate in inner loop
loss function for one/several tasks during support set
samples from support set
In contrast:
inner loop in MAML:
advantages:
much more computationally efficient since it requires fewer updates in the inner loop.
performance is comparable with MAML
Reference:
Last updated