ICML 18 10-26-2020
the original BPTT is not efficient, because of the inverse matrix
instead of using this, this paper use Neurman series to approximate the inverse matrix to reduce the cost.
similar to the hyperparameter optimization paper
http://proceedings.mlr.press/v80/liao18c/liao18c.pdfarrow-up-right
Last updated 5 years ago