Conditional Neural Progress (CNPs)

ICML 18 10-23-2020

Motivation

CNPs combine benefits of NNs and GPs:

  • the flexibility of stochastic processes such as GPs

  • structured as NNs and trained via Gradient Descent from data directly

Model

we have a function f(xi)=yif(x_i)=y_i with input xix_i and output yiy_i .

f()f() is drawn from PP , a distribution over functions.

Define two sets:

  • Observations: O={(xi,yi)}O = \{(x_i, y_i)\}

  • Targets: T={xj}T = \{x_j\}

Our goal is: given some observations, we want to be able to make predictions at unseen target inputs at test time. Just like supervise learning.

The architecture of our model captures this task:

  • rir_i are the representations of the pairs {(xi,yi)}\{(x_i, y_i)\}

  • rr is the overall representation obtained by summing all rir_i

  • hθh_{\theta} and gθg_{\theta} are NNs

  • Ï•i\phi_i parametrizes the output distribution (either a Gaussian or a categorical distribution)

Key properties of the model:

  • CNPs are conditional distributions over functions trained to model the empirical conditional distributions of functions f∼Pf\sim P .

  • CNPs are permutation invariant in O and T.

  • scalable, achieving a running time complexity of O(m+n)O(m+n) for making m predictions with n observations.

Reference

Last updated

Was this helpful?