πŸ“’
PaperNotes
  • PAPER NOTES
  • Meta-Learning with Implicit Gradient
  • DARTS: Differentiable Architecture Search
  • Meta-Learning of Neural Architectures for Few-Shot Learning
  • Towards Fast Adaptation of Neural Architectures with Meta Learning
  • Editable Neural Networks
  • ANIL (Almost No Inner Loop)
  • Meta-Learning Representation for Continual Learning
  • Learning to learn by gradient descent by gradient descent
  • Modular Meta-Learning with Shrinkage
  • NADS: Neural Architecture Distribution Search for Uncertainty Awareness
  • Modular Meta Learning
  • Sep
    • Incremental Few Shot Learning with Attention Attractor Network
    • Learning Steady-States of Iterative Algorithms over Graphs
      • Experiments
    • Learning combinatorial optimization algorithms over graphs
    • Meta-Learning with Shared Amortized Variational Inference
    • Concept Learners for Generalizable Few-Shot Learning
    • Progressive Graph Learning for Open-Set Domain Adaptation
    • Probabilistic Neural Architecture Search
    • Large-Scale Long-Tailed Recognition in an Open World
    • Learning to stop while learning to predict
    • Adaptive Risk Minimization: A Meta-Learning Approach for Tackling Group Shift
    • Learning to Generalize: Meta-Learning for Domain Generalization
  • Oct
    • Meta-Learning Acquisition Functions for Transfer Learning in Bayesian Optimization
    • Network Architecture Search for Domain Adaptation
    • Continuous Meta Learning without tasks
    • Learning Causal Models Online
    • Meta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples
    • Conditional Neural Progress (CNPs)
    • Reviving and Improving Recurrent Back-Propagation
    • Meta-Q-Learning
    • Learning Self-train for semi-supervised few shot classification
    • Watch, Try, Learn: Meta-Learning from Demonstrations and Rewards
  • Nov
    • Neural Process
    • Adversarially Robust Few-Shot Learning: A Meta-Learning Approach
    • Learning to Adapt to Evolving Domains
  • Tutorials
    • Relax constraints to continuous
    • MAML, FO-MAML, Reptile
    • Gradient Descent
      • Steepest Gradient Descent
      • Conjugate Gradient Descent
  • KL, Entropy, MLE, ELBO
  • Coding tricks
    • Python
    • Pytorch
  • ml
    • kmeans
Powered by GitBook
On this page
  • Motivation
  • Problem Setup
  • Reference

Was this helpful?

  1. Oct

Learning Causal Models Online

arxiv 10-9-2020

PreviousContinuous Meta Learning without tasksNextMeta-Dataset: A Dataset of Datasets for Learning to Learn from Few Examples

Last updated 4 years ago

Was this helpful?

Motivation

Predictive models – learned from observational data not covering the complete data distribution – can rely on spurious correlations in the data for making predictions. These correlations make the models brittle and hinder generalization. One solution for achieving strong generalization is to incorporate causal structures in the models; such structures constrain learning by ignoring correlations that contradict them. However, learning these structures is a hard problem in itself. Moreover, it’s not clear how to incorporate the machinery of causality with online continual learning. In this work, we take an indirect approach to discovering causal models. Instead of searching for the true causal model directly, we propose an online algorithm that continually detects and removes spurious features. Our algorithm works on the idea that the correlation of a spurious feature with a target is not constant over-time. As a result, the weight associated with that feature is constantly changing. We show that by continually removing such features, our method converges to solutions that have strong generalization. Moreover, our method combined with random search can also discover non-spurious features from raw sensory data. Finally, our work highlights that the information present in the temporal structure of the problem – destroyed by shuffling the data – is essential for detecting spurious features online.

Problem Setup

Learning to make predictions in a Markov decision process (MDP): (S,A,r,p)(\mathcal{S},\mathcal{A}, r, p)(S,A,r,p)

  • set of states S

  • set of actions A

  • reward function r

  • Transition model

In a prediction problem, the agent has to learn a functionfΞΈ(stβ€²,at)f_{\theta}(s_t', a_t)fθ​(st′​,at​)to predict a target yty_tyt​using parameters ΞΈ. As the agent transitions to the new state st+1s_{t+1}st+1​, it receives the ground truth label y^t\hat{y}_ty^​t​ from the environment and accumulates regret given by L(yt,y^t)\mathcal{L}(y_t,\hat{y}_t)L(yt​,y^​t​). The agent can use it to update its estimate of fΞΈf_{\theta}fθ​ .

Objective

Consider n features x=f1,f2,...,fnx = f_1, f_2, ..., f_nx=f1​,f2​,...,fn​that can be linearly combined using parameters w1,w2,...,wnw_1, w_2,...,w_nw1​,w2​,...,wn​ to predict a target y. Moreover, assume that all features are binary (0 or 1). Given these features, our goal is to identify and remove the spurious ones.

We define a feature fif_ifi​to have a spurious correlation with the target y if the expected value of target given fif_ifi​is not constant in temporally distant parts of the MDP i.e. E[y∣fi=1]\mathbb{E}[y|f_i=1]E[y∣fi​=1] slowly changes as the agent interacts with the world.

Detecting Spurious Features

For a linear prediction problem, detecting if the i-th feature is spurious is equivalent to tracking the stability of the wiw_iwi​ across time. i.e. if the online learner is always learning using the most recent data with the following update:

the weight corresponding to the features with a constant expected value, E[y∣fi=1]=c\mathbb{E}[y|f_i=1]=cE[y∣fi​=1]=c, would converge to a fixed magnitude. Whereas if E[y∣fi=1]\mathbb{E}[y|f_i=1]E[y∣fi​=1] is changing, wiw_iwi​would track this change by changing its magnitude overtime. This implies that weights that are constantly changing in a stationary prediction problem encode spurious correlations. We can approximate the change in the weight, wiw_iwi​, overtime by approximating its variance online. Our hypothesis is that spurious features would have weights that have high variance.

Learning Causal Models Online

Reference

https://arxiv.org/pdf/2006.07461.pdf