ICML 18 10-23-2020
CNPs combine benefits of NNs and GPs:
the flexibility of stochastic processes such as GPs
structured as NNs and trained via Gradient Descent from data directly
we have a function f(xi)=yif(x_i)=y_if(xi​)=yi​ with input xix_ixi​ and output yiy_iyi​ .
f()f()f() is drawn from PPP , a distribution over functions.
Define two sets:
Observations: O={(xi,yi)}O = \{(x_i, y_i)\}O={(xi​,yi​)}
Targets: T={xj}T = \{x_j\}T={xj​}
Our goal is: given some observations, we want to be able to make predictions at unseen target inputs at test time. Just like supervise learning.
The architecture of our model captures this task:
rir_iri​ are the representations of the pairs {(xi,yi)}\{(x_i, y_i)\}{(xi​,yi​)}
rrr is the overall representation obtained by summing all rir_iri​
hθh_{\theta}hθ​ and gθg_{\theta}gθ​ are NNs
ϕi\phi_iϕi​ parametrizes the output distribution (either a Gaussian or a categorical distribution)
CNPs are conditional distributions over functions trained to model the empirical conditional distributions of functions f∼Pf\sim Pf∼P .
CNPs are permutation invariant in O and T.
scalable, achieving a running time complexity of O(m+n)O(m+n)O(m+n) for making m predictions with n observations.
https://arxiv.org/pdf/1807.01613.pdfarrow-up-right
https://vimeo.com/312299226arrow-up-right
https://www.martagarnelo.com/projectsarrow-up-right
https://github.com/deepmind/neural-processesarrow-up-right
Last updated 5 years ago