Earlier today I was talking to a researcher about how well a normal distribution could approximate a uniform distribution over an interval . I gave a few arguments for why I thought a normal distribution wouldn’t be good but I didn’t have the exact answer at the top of my head so I decided to find out. Although the following analysis involves nothing fancy I consider it useful as it’s easily generalised to higher dimensions(i.e. multivariate uniform distributions) and we arrive at a result which I wouldn’t consider intuitive.

For those who appreciate numerical experiments, I wrote a small TensorFlow script to accompany this blog post.

Statement of the problem:

We would like to minimise the KL-Divergence:

where is the target uniform distribution and is the approximating Gaussian:


Now, given that if we assume that is fixed our loss may be expressed in terms of and :

Minimising with respect to and :

We can easily show that the mean and variance of the Gaussian which minimises correspond to the mean and variance of a uniform distribution over :

Although I wouldn’t have guessed this result the careful reader will notice that this result readily generalises to higher dimensions.

Analysing the loss with respect to optimal Gaussians:

After entering the optimal values of and into and simplifying the resulting expression we have the following residual loss:

I find this result surprising because I didn’t expect the dependence on to vanish. That said, my current intuition for this result is that if we tried fitting to we would obtain:

so this minimisation problem corresponds to a linear re-scaling of the uniform parameters in terms of and .


The reader may experiment with the following TensorFlow function which outputs the approximating mean and variance of a Gaussian given a uniform distribution on the interval .