Motivation
Few-shot learning is typically done with a fixed neural architecture. This paper proposes MetaNAS, the first method which fully integrates NAS with gradient-based meta learning.
MetaNAS allows adapting architectures to novel tasks based on few data points with just a few steps of a gradient-based task optimizer. This allows MetaNAS to generate task-specific architectures that are adapted to every task separately (but from a joint meta- learned meta-architecture).
Marrying Gradient-based Meta Learning and Gradient-based NAS
Ξ±metaβ : meta-learned architecture
wmetaβ : corresponding meta-learned weights for the architecture
Task Tiβ : (Ditrβ,Ditestβ)
Meta-objective:
==βΞ±,wminβLmetaβ(Ξ±,w,ptrain,Ξ¦k)Ξ±,wminβTiββΌptrainββLiβ(Ξ¦k(Ξ±,w,Ditrβ),Ditestβ)Ξ±,wminβTiββΌptrainββLiβ((Ξ±Tiβββ,wTiβββ),Ditestβ)β where Ξ±Tiβββ,wTiβββ=Ξ¦k(Ξ±,w,Ditrβ)=argminΞ±,wβL^iβ(Ξ±,w,Ditrβ) are the task-specific architecture and parameters after k gradient steps, which could approximated using SGD.
Liβ : query loss for task i
Liβ^β : support loss for task i
Inner loop update Ξ±,andΒ Β w with weight learning rate ΞΎtaskβ and architecture learning rate Ξ»taskβ :
(Ξ±j+1wj+1β)β=Ξ¦(Ξ±j,wj,Ditrβ)=(Ξ±jβΞΎtaskββΞ±βLTβ(Ξ±j,wj,Ditrβ)wjβΞ»taskββwβLTβ(Ξ±j,wj,Ditrβ)β)β Outer loop update:
(Ξ±metai+1βwmetai+1ββ)==βΞ¨MAML(Ξ±metaiβ,wmetaiβ,ptrain,Ξ¦k)(Ξ±metaiββΞΎmetaββΞ±βLmetaβ(Ξ±metaiβ,wmetaiβ,ptrain,Ξ¦k)wmetaiββΞ»metaββwβLmetaβ(Ξ±metaiβ,wmetaiβ,ptrain,Ξ¦k)β)β Reptile could as be used instead of MAML here:
(Ξ±metai+1βwmetai+1ββ)==βΞ¨Reptile(Ξ±metaiβ,wmetaiβ,ptrain,Ξ¦k)(Ξ±metaiβ+ΞΎmetaββTiββ(Ξ±TiββββΞ±metaiβ)wmetaiβ+Ξ»metaββTiβββ(wTiββββwmetaiβ)β)β Task-dependent Architecture Adaptation
Two modifications to remove the need for retraining.
Reference: