During the last phase of my PhD I worked together with Dr. Payel Das, Dr. Simon Hadfield, Dr. Yunpeng Li and Dr. Paula Gherghinescu on a project aimed at accelerating Hamiltonian Monte-Carlo. This project was quite a cross-disciplinary effort, as we had to combine ideas from (astro)physics and geometric mechanics together with computer science and probabilistic mathematics.
In this study we include results looking at the trade-off between the map-building and the faster sampling, but the techniques to build these maps are in rapid development, and might get more efficient soon.
The process of this research project has been a profound experience for me. I think because, for both me and Payel, the research topics covered in this project have generally been new to both of us. Over the entire project i've been on a journey from learning contemporary machine learning technique and probabilistic programming, to galactic dynamics, to transport theory, and finally to geometric mechanics. Awesome stuff.
Hamiltonian Monte-Carlo is a Monte-Carlo sampling/inference technique which uses techniques from classical mechanics to guide the sampling through the posterior. This technique is efficient in high dimensions, and is generally faster than the standard Metropolis-Hastings in uncovering the posterior landscape.
The core idea of this project is to make use of several transformations with the aim to map the dynamical system to a linearized coordinate system. In a linear space, evolving that dynamical system forward in time is a trivial step, and most importantly won't require costly gradient calculations.
The first transformation is to map the target distribution (posterior) to a known, controlled, base distribution, and the second transformation is a transformation to action-angle space. Combining these two transformations does enable (significantly) faster sampling of (complex) posterior shapes, but there is a price to pay. Learning the transformation between the target and the base distribution requires training which is not cheap.
In the paper we broadly discuss the three following things.
We first discuss the integration-by-transformation which underlies our new method, as well as introducing the relevant concepts.
We then dig into how the accuracy of AAHMC compares to an independent and trusted calibration method, and in particular how the quality of the map affects that accuracy. Moreover, we compare AAHMC to the popular NUTS sampler.
We finally perform a series of variations on the training and architecture of the transformations to understand how robust our method is.
We are currently wrapping up the project and aim to publish the paper in a computational physics journal. When the paper is submitted I will write down a more detailed version of the project here.
Recently we applied for a low-TRL early stage research and development scheme grant (https://www.ukri.org/opportunity/early-stage-research-and-development-scheme/) to be able to continue this research and the development and application of the AAHMC sampling. While we sadly did not win this grant, the whole process of applying for it did solidify my belief in the merit of this project.
As such, I am aiming to continue this line of research with two 'extensions' of the basic method we lay out in the first paper. These extensions aim to increase the efficiency of the method by making better use of already generated samples, and by making more effective use of those samples during the training of the transformations.