📒
PaperNotes
  • PAPER NOTES
  • Meta-Learning with Implicit Gradient
  • DARTS: Differentiable Architecture Search
  • Meta-Learning of Neural Architectures for Few-Shot Learning
  • Towards Fast Adaptation of Neural Architectures with Meta Learning
  • Editable Neural Networks
  • ANIL (Almost No Inner Loop)
  • Meta-Learning Representation for Continual Learning
  • Learning to learn by gradient descent by gradient descent
  • Modular Meta-Learning with Shrinkage
  • NADS: Neural Architecture Distribution Search for Uncertainty Awareness
  • Modular Meta Learning
  • Sep
    • Incremental Few Shot Learning with Attention Attractor Network
    • Learning Steady-States of Iterative Algorithms over Graphs
      • Experiments
    • Learning combinatorial optimization algorithms over graphs
    • Meta-Learning with Shared Amortized Variational Inference
    • Concept Learners for Generalizable Few-Shot Learning
    • Progressive Graph Learning for Open-Set Domain Adaptation
    • Probabilistic Neural Architecture Search
    • Large-Scale Long-Tailed Recognition in an Open World
    • Learning to stop while learning to predict
    • Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift
    • Learning to Generalize: Meta-Learning for Domain Generalization
  • Oct
    • Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization
    • Network Architecture Search for Domain Adaptation
    • Continuous Meta Learning without tasks
    • Learning Causal Models Online
    • Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
    • Conditional Neural Progress (CNPs)
    • Reviving and Improving Recurrent Back-Propagation
    • Meta-Q-Learning
    • Learning Self-train for semi-supervised few shot classification
    • Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
  • Nov
    • Neural Process
    • Adversarially Robust Few-Shot Learning: A Meta-Learning Approach
    • Learning to Adapt to Evolving Domains
  • Tutorials
    • Relax constraints to continuous
    • MAML, FO-MAML, Reptile
    • Gradient Descent
      • Steepest Gradient Descent
      • Conjugate Gradient Descent
  • KL, Entropy, MLE, ELBO
  • Coding tricks
    • Python
    • Pytorch
  • ml
    • kmeans
Powered by GitBook
On this page
  • Motivation
  • Learning Objective
  • NAS for Domain Adaptation
  • Adversarial Training for Domain Adaptation
  • Reference

Was this helpful?

  1. Oct

Network Architecture Search for Domain Adaptation

arxiv 10-2-2020

PreviousMeta-Learning Acquisition Functions for Transfer Learning in Bayesian OptimizationNextContinuous Meta Learning without tasks

Last updated 4 years ago

Was this helpful?

Motivation

While such models typically learn feature mapping from one domain to another or derive a joint representation across domains, the developed models have limited capacities in deriving an optimal neural architecture specific for domain transfer.

To efficiently devise a neural architecture across different data domains, the authors propose a novel learning task called NASDA (Neural Architecture Search for Domain Adaptation).

The ultimate goal of NASDA is to minimize the validation loss of the target domain. We postulate that a solution to NASDA should not only minimize validation loss of the source domain, but should also reduce the domain gap between the source and target.

Learning Objective

Where Φ∗=Φα,w∗(α)\Phi^{*}=\Phi_{\alpha,w^{*}(\alpha)}Φ∗=Φα,w∗(α)​, and disc(Φ∗(xs),Φ∗(xt))disc(\Phi^{*}(\mathbf{x}^s),\Phi^{*}(\mathbf{x}^t))disc(Φ∗(xs),Φ∗(xt)) denotes the domain discrepancy between the source and target.

Note that in unsupervised domain adaptation, LtraintL_{train}^tLtraint​ and LtesttL_{test}^tLtestt​ cannot be computed directly due to the lack of label in the target domain.

The algorithm is comprised with two training phases, as shown in the above figure. The first is the neural architecture searching phase, aiming to derive an optimal neural architecture (α∗\alpha^{*}α∗), following the learning schema in previous slide. The second training phase aims to learn a good feature generator with task-specific loss, based on the derived α∗\alpha^{*}α∗ from the first phase.

NAS for Domain Adaptation

Inspired by the gradient-based hyperparameter optimization, we set the architecture parameters α as a special type of hyperparameter. This implies a bilevel optimization problem with α as the upper-level variable and w as the lower-level variable. In practice, we utilize the MK-MMD to evaluate the domain discrepancy. The optimization can be summarized as follows:

Where λ is the trade-off hyperparameter between the source validation loss and the MK-MMD loss.

Adversarial Training for Domain Adaptation

By the discussed neural architecture searching, we have derived the optimal cell structure ( α∗\alpha^{*}α∗) for domain adaptation. We then stack the cells to derive our feature generator G. Assume C includes N independent classifiers {C(i)}i=1N\{C^{(i)}\}_{i=1}^N{C(i)}i=1N​ and denote pi(y∣x)p_i(y|x)pi​(y∣x)as the K-way probabilistic outputs of C(i)C^{(i)}C(i), where K is the category number.

The high-level intuition is to consolidate the feature generator G such that it can make the diversified C generate similar outputs. To this end, our training process include three steps:

(1) train G and C on DsD^sDs to obtain task-specific features,

(2) fix G and train C to make {C(i)}i=1N\{C^{(i)}\}_{i=1}^N{C(i)}i=1N​ have diversified output,

(3) fix C and train G to minimize the output discrepancy between C.

Reference: Maximum Classifier Discrepancy for Unsupervised Domain Adaptation

Reference

https://arxiv.org/pdf/2008.05706.pdf