# Probabilistic Neural Architecture Search

## Motivation

Most existing methods of NAS cannot be directly applied to large scale problems because of their prohibitive computational complexity or high memory usage.

This paper proposes a Probabilistic approach to NAS (PARSEC) that drastically reduces memory requirements while maintaining SOTA computational complexity, making it possible to directly search over more complex architectures and larger datasets.

* a **memory-efficient sampling procedure** wherein we learn a **probability distribution over high-performing neural network architectures**.&#x20;
* Importantly, this framework enables us to **transfer the distribution of architectures learnt on smaller problems to larger ones**, further reducing the computational cost.&#x20;

## Importance-weighted Monte Carlo empirical Bayes

![](https://1687130946-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MEnQbUIupyAn8eMmrmG%2F-MHgOGYrZCFmmHebvD-Y%2F-MHisOQuk6Xx9NtADoVK%2Feq3.png?alt=media\&token=57dd8c2b-cb66-489a-838b-5c273a7999a3)

* $$p(\alpha|\pi):$$ a prior on the choices of inputs and operations that define the cell, where hyper-parameters $$\pi$$ are the probabilities corresponding to the different choices.
* $$y: \text{target}$$&#x20;
* $$\mathbf{X}: \text{input}$$&#x20;
* $$v:$$ network weights

Given the estimator:

![](https://1687130946-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MEnQbUIupyAn8eMmrmG%2F-MHgOGYrZCFmmHebvD-Y%2F-MHitYziVrf4Uw0CWf9y%2F6.png?alt=media\&token=0c4d5df4-4b6c-4b14-8a0f-7179a914cdc8)

![](https://1687130946-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MEnQbUIupyAn8eMmrmG%2F-MHgOGYrZCFmmHebvD-Y%2F-MHiu1BPmAkIGsjToIFd%2F7.png?alt=media\&token=5cd5378a-d36f-460d-b711-9fa5f81855a2)

{% hint style="success" %}
From equation (7):

$$\nabla\_{v, \pi} \log p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi})\ =\frac{1}{p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi})} \int \nabla\_{\boldsymbol{v}, \pi} \log p(\boldsymbol{y}, \boldsymbol{\alpha} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi})  p(\boldsymbol{y}, \boldsymbol{\alpha} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi}) \mathrm{d} \alpha \ = \frac{1}{p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi})} {\color{red}\int} \nabla\_{\boldsymbol{v}, \pi} \log p(\boldsymbol{y}, \boldsymbol{\alpha} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi})  p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\alpha}) {\color{red}p(\boldsymbol{\alpha} \mid \boldsymbol{\pi}) \mathrm{d} \alpha} \ = \frac{1}{p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi})} {\color{red}\sum\_{k=1}^K} \nabla\_{\boldsymbol{v}, \pi} \log p(\boldsymbol{y}, {\color{red}\boldsymbol{\alpha}*k} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi})  p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, {\color{red}\boldsymbol{\alpha}*k}) \ = \frac{1}{p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi})} \sum*{k=1}^K \nabla*{\boldsymbol{v}, \pi} (\log p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\alpha}\_k)+\log p(\boldsymbol{\alpha}\_k|\boldsymbol{\pi}))  p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\alpha}\_k) \\$$&#x20;

$$= \frac{1}{p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi})} \sum\_{k=1}^K \nabla\_{\boldsymbol{v}} (\log p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\alpha}\_k)p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\alpha}*k))+\nabla*{\boldsymbol\pi}\log p(\boldsymbol{\alpha}\_k|\boldsymbol{\pi})  p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\alpha}\_k) \\$$&#x20;

$$=  \sum\_{k=1}^K \frac{p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\alpha}*k)}{p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi})}\nabla*{\boldsymbol{v}} \log p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\alpha}*k)+\sum*{k=1}^{K}\frac{p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\alpha}*k)}{p(\boldsymbol{y} \mid \boldsymbol{X}, \boldsymbol{v}, \boldsymbol{\pi})}\nabla*{\boldsymbol\pi}\log p(\boldsymbol{\alpha}\_k|\boldsymbol{\pi})   \\$$&#x20;
{% endhint %}

![](https://1687130946-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MEnQbUIupyAn8eMmrmG%2F-MHgOGYrZCFmmHebvD-Y%2F-MHitc2UelxP0q3-QDWR%2F8.png?alt=media\&token=8f450a7f-62fe-4552-812a-0556446ac0f7)

## Reference

* <https://arxiv.org/pdf/1902.05116.pdf>
