Motivation
Few-shot learning is typically done with a fixed neural architecture. This paper proposes MetaNAS, the first method which fully integrates NAS with gradient-based meta learning.
MetaNAS allows adapting architectures to novel tasks based on few data points with just a few steps of a gradient-based task optimizer. This allows MetaNAS to generate task-specific architectures that are adapted to every task separately (but from a joint meta- learned meta-architecture).
Marrying Gradient-based Meta Learning and Gradient-based NAS
αmeta : meta-learned architecture
wmeta : corresponding meta-learned weights for the architecture
Task Ti : (Ditr,Ditest)
Meta-objective:
==α,wminLmeta(α,w,ptrain,Φk)α,wminTi∼ptrain∑Li(Φk(α,w,Ditr),Ditest)α,wminTi∼ptrain∑Li((αTi∗,wTi∗),Ditest) where αTi∗,wTi∗=Φk(α,w,Ditr)=argminα,wL^i(α,w,Ditr) are the task-specific architecture and parameters after k gradient steps, which could approximated using SGD.
Li : query loss for task i
Li^ : support loss for task i
Inner loop update α,and w with weight learning rate ξtask and architecture learning rate λtask :
(αj+1wj+1)=Φ(αj,wj,Ditr)=(αj−ξtask∇αLT(αj,wj,Ditr)wj−λtask∇wLT(αj,wj,Ditr)) Outer loop update:
(αmetai+1wmetai+1)==ΨMAML(αmetai,wmetai,ptrain,Φk)(αmetai−ξmeta∇αLmeta(αmetai,wmetai,ptrain,Φk)wmetai−λmeta∇wLmeta(αmetai,wmetai,ptrain,Φk)) Reptile could as be used instead of MAML here:
(αmetai+1wmetai+1)==ΨReptile(αmetai,wmetai,ptrain,Φk)(αmetai+ξmeta∑Ti(αTi∗−αmetai)wmetai+λmeta∑Ti∗(wTi∗−wmetai)) Task-dependent Architecture Adaptation
Two modifications to remove the need for retraining.
Reference: