# Ensemble Quasi-Newton

In this project we are looking to compute samples \(x\in X\subseteq\mathbb{R}^N\) weighted according to a probability distribution \(\pi(x)\propto\exp(-V(x))\) where the dimension \(N\) is large and \(V\) is sometimes called the *potential energy* or the *negative log posterior*. The energy landscape of \(\pi(x)\) can be scaled poorly so that some directions can be explored much slower than others. Consider a valley-like landscape, where if we move too quickly isotropically we will hit a steep incline and be bounced out.

In a schemes for the conventional over-damped dynamics (such as in Euler-Maruyama integrating Brownian dynamics), we make moves like

\[x_{k+1} \leftarrow x_k + \delta t \nabla \log\pi(x_k) + \sqrt{2 \delta t} R_{k},\]at iteration \(k\), for a random number \(R_k \sim N(0,I)\). If the timestep \(\delta t>0\) is too large, then we will (proverbially) ping off the edges of the valley and the system will go off to infinity. Ideally, we would like to *pre-condition* the system, so that moves in the direction perpendicular to the valleyâ€™s edges are made longer and we move more efficiently.

The difficulty lies in figuring out the direction in which to bias the movement. In Newton-iteration schemes the Hessian matrix provides this information, but second derivatives are not always available and expensive to compute at each step. Instead, we approximate the Hessian with the covariance of independent copies of the system.

Consider an *ensemble* of \(L\) independent walkers, with positions \(Q = (x^{(i)} )_{i=1,\ldots,L}\), and use the update for walker \(i\) as

where \(B(Q):=\text{cov}(Q)\) is the Euclidean \(N\times N\) covariance matrix, and \(Q_{[i]}\) is the set of all positions *except* walker \(i\). This choice removes the divergence term as then \(\nabla_{i} \cdot B(Q_{[i]})\equiv 0\). This choice of $B$ makes the dynamics *affine invariant* for a large enough (ie \(O(N)\) ) number of walkers \(L\), so the performance will be constant regardless of rescalings of the problem. This is a hugely important feature when we are trying to sample from distributions without physical intuition.

In the article we go into more detail about the convergence results, and demonstrate some numerical experiments showing the power of this formalism. We also show a second-order method with momentum, in the case where barriers stymie progress.

For more details, checkout the article and the Python code.