Motivation:

Earlier today I was talking to a researcher about how well a normal distribution could approximate a uniform distribution over an interval $[a,b] \subset \mathbb{R}$. I gave a few arguments for why I thought a normal distribution wouldn’t be good but I didn’t have the exact answer at the top of my head so I decided to find out. Although the following analysis involves nothing fancy I consider it useful as it’s easily generalised to higher dimensions(i.e. multivariate uniform distributions) and we arrive at a result which I wouldn’t consider intuitive.

For those who appreciate numerical experiments, I wrote a small TensorFlow script to accompany this blog post.

Statement of the problem:

We would like to minimise the KL-Divergence:

where $P$ is the target uniform distribution and $Q$ is the approximating Gaussian:

and

Now, given that $\lim_{x \to 0} x\ln(x) = 0$ if we assume that $(a,b)$ is fixed our loss may be expressed in terms of $\mu$ and $\sigma$:

Minimising with respect to $\mu$ and $\sigma$:

We can easily show that the mean and variance of the Gaussian which minimises $\mathcal{L}(\mu,\sigma)$ correspond to the mean and variance of a uniform distribution over $[a,b]$:

Although I wouldn’t have guessed this result the careful reader will notice that this result readily generalises to higher dimensions.

Analysing the loss with respect to optimal Gaussians:

After entering the optimal values of $\mu$ and $\sigma$ into $\mathcal{L}(\mu,\sigma)$ and simplifying the resulting expression we have the following residual loss:

I find this result surprising because I didn’t expect the dependence on $\Delta = b-a$ to vanish. That said, my current intuition for this result is that if we tried fitting $\mathcal{U}(a,b)$ to $\mathcal{N}(\mu,\sigma)$ we would obtain:

so this minimisation problem corresponds to a linear re-scaling of the uniform parameters in terms of $\mu$ and $\sigma$.

Remark:

The reader may experiment with the following TensorFlow function which outputs the approximating mean and variance of a Gaussian given a uniform distribution on the interval $[a,b]$.