The MathWorks - MATLAB News & Notes - Normal Behavior

Tags:

This 2001 article on MathWorks’ Matlab News and Notes by Cleve Moler explains in plain terms 3 algorithms to generate normal distributions.


Almost all algorithms for generating normally distributed random numbers are based on transformations of uniform distributions. The simplest way to generate an m-by-n matrix with approximately normally distributed elements is to use the expression

sum(rand(m,n,12),3) - 6

This works because R = rand(m,n,p) generates a three-dimensional uniformly distributed array and sum(R,3) sums along the third dimension. The result is a two-dimensional array with elements drawn from a distribution with mean p/2 and variance p/12 that approaches a normal distribution as p increases. If we take p = 12, we get a pretty good approximation to the normal distribution and we get the variance to be equal to one without any additional scaling. There are two difficulties with this approach. It requires twelve uniforms to generate one normal, so it is slow. And the finite p approximation causes it to have poor behavior in the tails of the distribution.

This is waranted by the Central Limit Theorm, but the method is costly (12 uniform -> 1 approximately normal) and inaccurate at the tails (the range is within 0 adn 12).

Older versions of MATLAB - before MATLAB 5 - used the polar algorithm. This generates two values at a time. It involves finding a random point in the unit circle by generating uniformly distributed points in the [0,1]x[0,1] square and rejecting any points outside of the circle. Points in the square are represented by vectors with two components. The rejection portion of the code is:

r = 2;
while r > 1
   u = 2*rand(2,1)-1
   r = u’*u
end

For each point accepted, the polar transformation

v = sqrt(-2*log(r)/r)*u

produces a vector with two independent normally distributed elements. This algorithm does not involve any approximations, so it has the proper behavior in the tails of the distribution. But it is moderately expensive. Over 21% of the uniform numbers are rejected when they fall outside of the circle and the square root and logarithm calculations contribute significantly to the cost.

 Here is what’s most interesting:

Beginning with MATLAB 5, and continuing with MATLAB 6, randn uses a sophisticated table lookup algorithm developed by George Marsaglia of Florida State University. Marsaglia calls his approach the "ziggurat" algorithm. Ziggurats are ancient Mesopotamian terraced temple mounds that, mathematically, are two-dimensional step functions. A one-dimensional ziggurat underlies Marsaglia’s algorithm.

  The ziggurats of ancient Mesopotamia are all in ruins today, but the Mayan pyramid of Kukulkan illustrates the same step-function structure.   Step-function, or ziggurat, covering of the probability density function for the normal distribution.

For a specified number, n, of sections, it is possible to solve a transcendental equation to find , the point where the infinite tail meets the first rectangular section. In our picture with n = 8, it turns out that = 2.34. In the actual code with n = 128, = 3.4426. Once is known, it is easy to compute the common area of the sections and the other right-hand end points, . It is also possible to compute , the fraction of each section that lies underneath the section above it. Let’s call these fractional sections the "core" of the ziggurat. The right-hand edge of the core is the dotted blue line in our picture. The computation of these and is done in an initialization section of code that is run only once in any MATLAB session.

After the initialization, normally distributed random numbers can be computed very quickly. The key portion of the code computes a single random integer, k, between 1 and n, and a single uniformly distributed random number, u, between -1 and 1. A check is then made to see if u falls in the core of the k-th section. If it does, then we know that is the x-coordinate of a point under the pdf and this value can be returned as one sample from the normal distribution.
This is the part I don’t understand — why is the outcome "exactly normal?"

 

It is important to realize that, even though the ziggurat step function only approximates the probability density function, the resulting distribution is exactly normal. Decreasing n decreases the amount of storage required for the tables and increases the fraction of time that extra computation is required, but does not affect the accuracy. Even with n = 8, we would have to do the more costly corrections almost 23% of the time, instead of less than 3%, but we would still get an exact normal distribution.

But apparently the result is good: 

With this algorithm, MATLAB 6 can generate normally distributed random numbers as fast as it can generate uniformly distributed ones. In fact, MATLAB on my 800 MHz Pentium III laptop can generate over 10 million random numbers from either distribution in less than one second.

Leave a Reply

If the above Image does not contain text, use this secure code: DF9KOXKA