Earlier today I was talking to a researcher about how well a normal distribution could approximate a uniform distribution over an interval . I gave a few arguments for why I thought a normal distribution wouldn’t be good but I didn’t have the exact answer at the top of my head so I decided to find out. Although the following analysis involves nothing fancy I consider it useful as it’s easily generalised to higher dimensions(i.e. multivariate uniform distributions) and we arrive at a result which I wouldn’t consider intuitive.
For those who appreciate numerical experiments, I wrote a small TensorFlow script to accompany this blog post.
Statement of the problem:
We would like to minimise the KL-Divergence:
where is the target uniform distribution and is the approximating Gaussian:
Now, given that if we assume that is fixed our loss may be expressed in terms of and :
Minimising with respect to and :
We can easily show that the mean and variance of the Gaussian which minimises correspond to the mean and variance of a uniform distribution over :
Although I wouldn’t have guessed this result the careful reader will notice that this result readily generalises to higher dimensions.
Analysing the loss with respect to optimal Gaussians:
After entering the optimal values of and into and simplifying the resulting expression we have the following residual loss:
I find this result surprising because I didn’t expect the dependence on to vanish. That said, my current intuition for this result is that if we tried fitting to we would obtain:
so this minimisation problem corresponds to a linear re-scaling of the uniform parameters in terms of and .
The reader may experiment with the following TensorFlow function which outputs the approximating mean and variance of a Gaussian given a uniform distribution on the interval .