- Aishwarya Pothula

# Variational Inference

**Introduction**

I am writing this blog in an attempt to understand the various concepts presented in the papers " Probing Physics Knowledge using Tools from Developmental Psychology" and "Generative Temporal Models with Memory". The former uses variational inference to probe an artificial agent's acquirement of intuitive physics concepts such as *object permanence, unchangeableness, continuity, solidity* and *containment.* This blog is an attempt to understand and summarize my understanding of variational inference.

A funny example on Quora.

**What is Inference**?

Inference in machine learning refers to the process of presenting a machine learning model new instances of data in order to make predictions about it. We will talk about variational inference in a bit.

**Problem and Solution**

Often times, determining the inference in probabilistic models is not tractable. Examples include trying to ...Variational inference is one of the methods used to approximate inference in probabilistic models.

Reference: https://www.cs.jhu.edu/~jason/tutorials/variational.html

Variational inference models work by coming up with a family of distributions Q that estimate the original probability distribution of the data. Now, inference is turned into an optimization problem by trying to select the best possible distribution q ∈ Q under some definition of best.

**Definition of Best**

One of the measures used to determine the best suited q is the Kullback-Leibler divergence (KL divergence). More specifically, the KL divergence is used to capture the magnitude of similarity between the estimated distribution q and the original distribution p. It works in the following way

KL(q||p) >= 0$ for all p, q

KL(q||p) = 0$ iff q = p

This means that the KL divergence is a non-zero number for all distributions p, q but is zero, indicating zero dissimilarity, for two distributions p =q.

**Why is it called variational ?**

The optimization problem in variational inference involves choosing the best q while minimizing the KL divergence between p and q. This answers our question of why it is called variational inference. It is variational since it involves choosing the best q in the family of Q.