Skip to content

FaizSayyid/approximate-bayesian-inference-tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

Approximate inference tutorial

This repo covers the main methods for performing approximate inference in Bayesian statistics.

Contents:

  1. Markov Chain Monte Carlo (MCMC)
  2. Approximate Bayesian Computation (ABC)
  3. Variational Inference (VI)
  4. Sequential Monte Carlo (SMC)
  5. Amortised Inference

Why approximate?

Let's look at Bayes' rule:

Bayes

expanding the denominator, we have:

Bayes2

where

evidence


  • posterior : The posterior; the probability of the hypothesis (e.g. that a parameter has a certain value) given the data
  • likelihood : The likelihood of observing/generating the data given the hypothesis
  • prior : The prior probability of the hypothesis
  • marglike : The probability of observing the data under all hypotheses

It is p(d) that makes this equation so difficult to solve in high dimensions (number of parameters to estimate).

We are left in the following situation:

We have a distribution function

unnormalised

where fx is easy to compute but nc, the normalising constant, is very hard. Because it is so hard to compute exactly in most complicated (non-conjugate) models we instead approximate it.

Why do we need to compute the evidence?

Because if we want to work out the probability of a particular outcome, we need to normalize by the total probability of the data under all possible hypotheses.

In the finite/discrete case this looks like the familiar formula we learned in school:

prob

In Bayesian inference the same principle applies, but instead of counting outcomes, we sum or integrate over hypotheses, weighting each by its prior plausibility and the likelihood of the data under it:

evidence

This term ensures that the posterior posterior is a proper probability distribution (i.e. integrates to 1).

All of the methods covered in this tutorial repo are ways of avoiding having to compute evidence term in its entireity

A sketch to demonstrate how hard computing the evidence can be:

Releases

No releases published

Packages

No packages published