Random Numbers:
Python Module rv

Reference

Release 1.1
November 27, 1998

Ivan Frohne
4461 Doubletree Road
Wasilla, Alaska 99654-9461 U.S.A.
email: statistics@frohne.westhost.com


Copyright © 1998 by Ivan Frohne; Wasilla, Alaska, U.S.A.

All Rights Reserved

Permission to use, copy, modify and distribute this software and its documentation for any purpose, free of charge, is granted subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the software.

THE SOFTWARE AND DOCUMENTATION IS PROVIDED WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR OR COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM OR DAMAGES IN A CONTRACT ACTION, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR ITS DOCUMENTATION.


Contents


Introduction

Release 1.1 of Python module rv is an easy-to-use collection of random number generators, with utilities to simplify housekeeping procedures. There are 4 modern uniform(0,1) generators, and 20 non-uniform generators for commonly encountered continuous probability distributions like the normal or Chi-square. Generators producing pseudo-random integers from binomial, Poisson, and 5 other discrete distributions are new in rv 1.1. Points may be generated in or on a simplex or sphere of any size and dimension, and there are routines to sub sample with or without replacement from, and to randomly permute the elements of, a list or array.

The uniform(0,1) generators are reasonably fast and have long periods. Three of the four were thoroughly tested statistically by their proposers. Algorithms used for the non-uniform generators are a compromise between simplicity and efficiency.

No user initialization is needed; a new random series seed is derived from the system clock when rv is imported, after which all features of the module are available. The number of pseudo-random uniform numbers generated in the current series is retrieved by a function call, as is a string containing a nominal (one-line) description of the uniform algorithm in use, and the seed that started the selected generator. The user can restart the uniform generator, specifying a different generator algorithm if desired, and either supply an integer seed or allow one to be calculated from the clock. The state of the module can be saved to disk at any stage, and it can be set to precisely that state at a later time.

The module concludes with a self-testing script that verifies the uniform generators; checks the state-saving utility; and exercises, tests and displays execution times and statistics for all the generators.


Uniform(0,1) Generator Algorithms

Each of the uniform random number generators supplied in module rv returns floating point numbers in the open interval (0, 1). Uses of uniform random numbers u commonly involve the evaluation of expressions such as 1/u and log u/(1-u), and restricting uniforms to the open interval is a time and labor-saving convenience that introduces no theoretical difficulties.

Flip

The fastest uniform generator, 'Flip', is from Donald E. Knuth (The Stanford GraphBase: A Platform for Combinatorial Computing, 1993, Addison-Wesley, ISBN 0-201-54275-7; pp. 216-220). It is a subtractive recurrence employing a series of 55 integers. The exact period is unknown, but it exceeds 3.6×1016 and may be as large as 3.9×1025. The algorithm has apparently not been extensively tested statistically, but if speed is the overriding requirement this is the generator of choice. Hardware requirements are two's complement arithmetic and 32-bit integers. The pseudo-random numbers delivered range from 1 / (231 + 1) =  4.66×10-10 to 1 - 1 /  (231 + 1) = 0.999,999,999,534.

CMRG

A Postscript file (May, 1998) describing this recently proposed uniform generator,'CMRG', can be downloaded from L'Ecuyer. A published account will appear in the journal Operations Research. This generator is the combined parallel multiple recursive sequence with two components of order three identified by its author, Pierre L'Ecuyer, as MRG32k3a. The period is 3.1×1057 and thorough testing by L'Ecuyer revealed no statistical faults. 'CMRG' is only 75% as fast as 'Flip', but notice that its period is at least 1032 times as long. The pseudo-random numbers generated range from 1 / (232 - 208) =  2.33×10-10 to 1 - 1 /  (232 - 208) = 0.999,999,999,767.

SMRG

'SMRG' is a simple multiplicative congruential recursive generator with modulus 261 -  1, the Mersenne prime following 231 - 1. It was proposed by Pei-Chi Wu in "Multiplicative, Congruential Random-Number Generators with Multiplier ±2k1 ±2k2 and Modulus 2p - 1," ACM Transactions on Mathematical Software, June, 1997, Vol. 23, No. 2, pp 255-265. The multiplier is 37458191 mod(261 - 1). Because 37 is the minimal primitive root of 261 - 1, the generator has the full period of 261 - 2, about 2.3×1018. Wu found this multiplier to have the best performance in spectral tests through dimension 8 of any multiplier of the type 37k  mod(261 - 1) with 1 <=  k <= 1,000,000. Generated pseudo-random numbers range from 4.34×10-19 to 1 -  2-53 = 0.999,999,999,999,999,890: all bits are random, an advantage over 31- or 32-bit generators. 'SMRG' delivers random numbers at about 85% the speed of 'Flip'.

Twister

Makoto Matsumoto and Takuji Nishimura described this generator, which has an astronomically long period exceeding 106000, in "Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator," ACM Transactions on Modeling and Computer Simulation (January 1998), vol. 8, no. 1, pp. 3-30. 'Twister' is based on linear recurrences modulo two, and has excellent statistical properties. It is identified by its proposers as MT19937, which stands for "Mersenne Twister with period 219937 - 1". It employs a series of 623 integers. The generated pseudo-random numbers range from 2.33×10-10 to 1 - 1  / (232 + 1) = 0.999,999,999,767. 'Twister' is the slowest of the four generators, about 40% as fast as 'Flip'.


Basic Usage

The default uniform(0,1) random number generator is 'CMRG'. To get the next random number from the current generator, just reference rv.random. Here's an example of elementary interactive rv use:

>>> import rv
>>> rv.random()
0.535447730305
>>> rv.initial_seed()     # What seed was used?
748346677L
>>> rv.random_algorithm() # Which algorithm was it?
"'CMRG': Combined Multiplicative Recursion MRG32k3a (L'Ecuyer, 1998)"
>>>
>>> for j in range(100): s = rv.random() # Generate 100 more.
>>> print s               # s is the 101th uniform generated.
0.530929800225
>>> rv.random_count()     # How many was that?
101L
>>>
>>> rv.save_state()       # Save module state in default file.
>>> for j in range(100): s = rv.random() # Generate 100 more.
>>> print s               # s is the 201th uniform generated.
0.247631267996
>>> rv.initialize(file_string = 'default')   # Restore.
>>> for j in range(100): s = rv.random()     # Generate 100.
>>> print s               # s should =  201th uniform above.
0.247631267996            # The state was successfully restored.
>>> rv.random_count()     # number generated in this series
201L

Uniform(0,1) Generator

random(buffer=None)
Return a uniformly distributed random number from the current generator algorithm in the open interval (0, 1). If buffer is supplied, it must be a mutable sequence (list or array); it is filled with random numbers and None is returned. The default generator algorithm is 'CMRG'.

0.0 < random()  <  1.0

Some interactive usage examples:

>>> import rv, array
>>> rv.random()
0.253942895438
>>> empty_list = []
>>> rv.random(buffer=empty_list)
0.125938460594   # returned because buffer is empty.
>>> another_list = 2*[0.0]
>>> rv.random(another_list)   # None returned here.
>>> another_list
[0.509709359483, 0.131553253443]
>>> an_array = array.array('d', 2*[0.0]) # float[2] array. 
>>> rv.random(an_array)
>>> an_array
array('d', [0.356453425341, 0.978563485744])
>>> rv.random_count()
6L


Utilities

initial_seed()
Return the (Python long integer) seed that started this uniform generator. No arguments.


initialize(file_string='' , algorithm='CMRG' , seed=0L)
Set the uniform(0,1) pseudo-random number generator and starting seed, or read and restore a former rv state. If file_string is specified, it should be a legitimate [path]/filename containing the result of a previous save_state. In this case, algorithm and seed are ignored. file_string may also be a single space or the string 'default', indicating that the file name is PRVSTATE.DAT and it is to be found in the current directory. If file_string is not given or is null, the supplied algorithm and seed (or their defaults) are used to start a new random series. Acceptable algorithm values are 'CMRG', 'Flip', 'SMRG', or 'Twister'; they are not case-sensitive. If seed is supplied, it should be a Python long integer.


random_algorithm()
Return a one-line (string) description of the uniform algorithm in use. No arguments.


random_count()
Return the number of uniform(0,1) random numbers generated so far in the current series. No arguments.


save_state(file_string= 'PRVSTATE.DAT')
Store the state of the selected uniform generator, and any non-uniform generators which have a variable state (currently only normal). These are saved to the file specified by file_string, which is backed up first to file_string.BAK if it exists.



Non-uniform Generators

Generators for most well-known continuous and discrete probability distributions are included in rv. All require distribution parameters, some of which must be positive or non-negative. There are frequently no obvious defaults for parameter values, but all non-uniform generator parameters do have specified defaults.

Like random, all the non-uniform(0,1) generators operate in two modes. If the only arguments supplied are distributional parameters, or if there are no arguments, a single pseudo-random variate from the selected distribution is returned. For example, after x = rv.normal(5.0, 2.5) executes, x is a pseudo-random variate from the normal distribution with mean 5.0 and standard deviation 2.5. But if mseq is a list or array of length 100, say, rv.normal(5.0, 2.5, buffer=mseq) results in mseq being filled with 100 normal(5.0, 2,5) pseudo-randoms. None is returned. The last parameter is always buffer, and the keyword may be omitted, of course: rv.normal(5.0, 2.5, mseq) is fine.

Generator names are lower case unless the name is proper name or pseudonym. For example: gamma(), Poisson(), von_Mises(), Student().


beta(mu=4.0, nu=2.0, buffer=None)
Return pseudo-random variates from the beta distribution with mean mu / (mu + nu). The beta distribution is also known as the incomplete beta function. Both mu and nu must be positive.

0.0 <= beta(mu, nu) <= 1.0


binomial (trials=25, pr_success=0.5, buffer=None)
Return pseudo-random (non-negative) integers from a binomial distribution. trials must be a positive integer, and 0 <= pr_success <= 1.


Cauchy(median=0.0, scale=1.0, buffer=None)
Return pseudo-random variates from the Cauchy distribution with median (center of symmetry) median and scale scale. All moments, among them the mean and variance, are undefined. Student's t distribution with one degree of freedom is the Cauchy distribution with median 0 and scale 1.


chi_square(df=10.0, buffer=None)
Return pseudo-random variates from the chi square distribution with df degrees of freedom. The degrees of freedom need not be integral, but df must be positive.

chi_square(df) >= 0


choice(seq=[0, 1], buffer=None)
Return element k with probability 1.0/len(seq) from the non-empty sequence seq; k = 0,  1,...,len(seq) - 1.


exponential(scale=1.0, buffer=None)
Return pseudo-random variates from the exponential distribution with mean scale, which must be positive.

exponential(scale) >= 0


Fisher_F( numdf=2.0, denomdf=10.0, buffer=None)
Return pseudo-random variates from the F (variance-ratio) distribution with numdf degrees of freedom in the numerator of the ratio, and denomdf degrees of freedom in the denominator. Neither parameter need be integral, but both must be positive.

Fisher_F(numdf, denomdf) >= 0


gamma(mu=2.0, buffer=None)
Return pseudo-random variates from the standard gamma distribution with shape mu, which must be positive but need not be integral. The mean is also mu.

gamma(mu) >= 0


geometric( pr_failure=0.5, buffer=None)
Return pseudo-random (non-negative) integers from a geometric distribution. If k has the geometric distribution with probability of failure q it is the number of failures before the first success in a series of Bernoulli trials with probability of success 1 - q.


Gumbel( mode=1.0, scale=1.0, buffer=None)
Return pseudo-random variates from the Gumbel (extreme value) distribution with mode mode; and scale scale, which must be positive.


hypergeometric( bad=10, good=25, sample=10, buffer=None)
Return pseudo-random (non-negative) integers from a hypergeometric distribution. If k is hypergeometric, it is the number of "bad" items in a sample without replacement from a population of size bad + good. bad, good, and sample must be positive integers.


Laplace( mu=0.0, scale=1.0, buffer=None)
Return pseudo-random variates from the Laplace (double exponential) distribution with mean mu; and scale scale, which must be positive.


logarithmic(p=0.5, buffer=None)
Return positive pseudo-random integers from the logarithmic (also called the logarithmic series) distribution with shape parameter p; 0 < p < 1.


lognormal( mu=0.0, sigma=1.0, buffer=None)
Return pseudo-random variates from the lognormal distribution with median exp(mu); and shape sigma, which must be positive. The exponential of a normal random variate with mean mu and standard deviation sigma is lognormal with the same parameters.

lognormal( mu, sigma) >= 0


negative_binomial( r=1.0, pr_failure=0.5, buffer=None)
Return pseudo-random (non-negative) integers from a negative binomial distribution with parameters r and pr_failure. The parameter r must be positive, and 0 <= pr_failure <= 1.


normal( mu=0.0, sigma=1.0, buffer=None)
Return pseudo-random variates from the normal (gaussian) distribution with mean mu and standard deviation sigma; the latter must be positive.


Pareto( mode=1.0, shape=4.0, buffer=None)
Return pseudo-random variates from the Pareto distribution with mode mode and scale scale, both of which must be positive.

Pareto( mode, scale)  >= mode


Poisson(rate=1.0, buffer=None)
Return pseudo-random (non-negative) integers from a Poisson distribution. rate must be positive.


randint(lowint=0, upint=1, buffer=None)
Return pseudo-random integers k with probability 1.0/(upint - lowint + 1).

lowint <= randint(lowint, upint)  <=  upint


Rayleigh(mode=1.0, buffer=None)
Return pseudo-random variates from the Rayleigh distribution with scale (and mode) mode, which must be positive.

Rayleigh(mode) >= 0


Student_t(df=100.0, buffer=None)
Return pseudo-random variates following Student's  t distribution with df degrees of freedom, which must be positive but need not be integral.


triangular( left=0.0, mode=0.5, right=1.0, buffer=None)
Return pseudo-random variates from the triangular distribution with one vertex at left, another at right, and the projection of the third on the line between the first two at mode. Notice that none of the angles in the triangle are obtuse, and left <= mode <= right.

left <= triangular( left, mode, right) <= right


von_Mises( mean=0.0, shape=1.0, buffer=None)
Return pseudo-random variates from the von Mises distribution with mean mean and shape shape; the latter must be positive.

-math.pi <= von_Mises (mean, shape) <= +math.pi


uniform( lower=-0.5, upper=+0.5, buffer=None)
Return pseudo-random uniform variates from the interval ( lower, upper). Depending on the parameter values, either endpoint may or may not be returned, although neither endpoint will occur with the default interval (-0.5, +0.5).


Wald( mean=1.0, scale=1.0, buffer=None)
Return pseudo-random variates from the Wald (inverse gaussian) distribution with mean mean; and shape shape, both of which must be positive.

Wald(mean, scale) > 0


Weibull( scale=1.0, shape=0.5, buffer=None)
Return pseudo-random variates from the Weibull distribution with characteristic life scale and shape shape, both of which must be positive.

Weibull(scale, shape) > 0


Zipf(a=4.0, buffer=None)
Return pseudo-random (positive) integers from a Zipf distribution. The parameter a must be larger than 1.0.



Geometrical Point Generators, and Routines for Sampling and Permutations

in_simplex(mseq=5*[0.0], bound=1.0)
Return the coordinates of a pseudo-random point in the simplex of dimension d = len(mseq) and bound b = bound in a sequence of the same type and length as mseq. Such a simplex is defined by all points z = (z1, z2, ..., zd) with z1 + z2 + ... +  zd <= b and all zi >= 0. A two-dimensional simplex with bound b is the region enclosed by the horizontal (x) and vertical (y) axes and the line y = b - x, for example. The argument mseq must be a mutable sequence, and bound must be positive.


in_sphere(center=5*[0.0], radius=1.0)
Return the coordinates of a pseudo-random point in the sphere of dimension d = len(center) and radius r = radius centered at center, in a sequence (list or array) of the same length and type as center. If d is 1, r or -r is returned; if d is 2, the returned point is in the circle of radius r and center center. In general, point will have the coordinates (z1, z2, ..., zd), where x12 + x22 + ... + xd2 <= r2 and zi = xi - center[i]. The argument center must be a mutable sequence, and radius must be positive.


on_simplex(mseq=5*[0.0], bound=1.0)
Return the coodinates of a pseudo-random point on the boundary of the simplex of dimension d = len(seq) and bound b = bound in a sequence of the same type and length as mseq. A boundary point of the simplex has the sum of its (all non-negative) coordinates equal to b. For example, if d = 2, the boundary is the line y = b - x. The argument mseq must be a mutable sequence (list or array) and bound must be positive.


on_sphere(center=5*[0.0], radius=1.0)
Return the coordinates of a pseudo-random point on the surface of the sphere of dimension len(center) and radius radius centered at center in a sequence of the same type and length as center. The argument center must be a mutable sequence (list or array) and radius must be positive.


permutation(elements=[0, 1, 2, 3, 4])
Return a pseudo-random permutation of the items supplied in elements as a mutable sequence (list or array) of the same type and length. elements must be a mutable sequence.


sample(inlist=[0,1,2], sample_size=2)
Return a sequence of the same type as inlist containing a simple random sample of size sample_size from the items in the mutable sequence (list or array) inlist. Sampling is with replacement; duplicates may occur in the returned sequence. The length of the returned sequence is sample_size, which may be any positive integer.


smart_sample( inlist=[0,1,2], sample_size=2)
Return a sequence of the same type as inlist containing a sample taken without replacement of items in inlist, which must be a mutable sequence (list or array). Since sampling is without replacement, there is no possibility of duplicates in the returned sequence unless there are duplicates in inlist. The length of the returned sequence is the smaller of sample_size and len(inlist).


Discussion

Python standard module whrandom has a uniform(0,1) generator and functions to randomly select sequence elements and generate random integers. The fundamental design of rv builds on that of whrandom. All functions in whrandom are present in rv with the exception of whrandom.seed(), which is replaced by rv.initialize(). A minor difference should not cause any distress:
0 <= whrandom.random()  < 1, but
0 < rv.random() < 1.

The fastest rv uniform generator, 'Flip', is about 15% slower than whrandom.random(). However, the period for whrandom.random() is 6.95×1012, compared to a lower bound of 3.6×1016 for 'Flip'. Long periods are important because generators with good statistical properties tend to have long periods, and because only a small proportion of the period should be utilized. One authority (Ripley, B. D., Statistical Simulation, Wiley, 1987, p. 26) recommends a period exceeding 200 times the square of the number of random numbers needed. Monte Carlo and statistical resampling experiments not uncommonly require more than 1010 random numbers, which would necessitate a period of at least 2×1022 using Ripley's criterion.

'CMRG' and 'Twister' are slower than whrandom.random() by about 35% and 70%, respectively, but they do have very long periods (larger than 1057) and also have been tested carefully for statistical anomalies by their proposers. The new uniform(0,1) generator 'SMRG' is 30% slower than whrandom.random() but provides doubles with all bits random and values ranging from 4.3×10-19 to 1 -  1.1×10-16 compared to from about 10-10 to 1 - 10-10 for the others.

On a 200 MHz Pentium-II PC, whrandom.random() produces about 20,000 random numbers per second. Rates for the rv 1.1 algorithms range from 6,000/s for 'Twister' to 17,000/s for 'Flip'. Higher speeds result from the use of rv.random( buffer=x), where x is a double array, because function call overhead is avoided. For len(x) =  100, rv 1.1 uniform(0,1) generation rates improve to 7,400/s for 'Twister', and to nearly 32,000/s for 'Flip'. The faster the generator the more impressive the improvement, since function call overhead is proportionately larger for smaller internal function execution times.

It is now widely accepted that all random number algorithms have statistical peculiarities, however undiscovered they may be. To compensate for this sorry state of affairs, it is prudent to confirm interesting Monte Carlo or simulation results by repeating the run using a different uniform(0,1) random number algorithm. This is easy to arrange in rv.

Another bothersome detail involves the specification of an initial seed for the uniform(0,1) generator. Should the seed be renewed once every run, more frequently, or never? For casual use, there is rarely any reason to specify a seed at all with rv; the seed will be constructed automatically using the system clock and some bit-twiddling. Even for more intensive or sensitive applications, moreover, frequently specifying new seeds is worrisome because this might result in reuse of portions of the pseudo-random series. To avoid this possibility it is preferable to save the module state at the end of each run, initializing to this saved state at the beginning of the subsequent run. Periods of the four generators will usually be long enough to guarantee that the random number stream will never repeat. The save_state facility is also useful when it is necessary to repeat precisely a simulation or Monte Carlo run.

Random variates from all the non-uniform univariate continuous probability distributions included in Python standard module random are also available from rv, although the names have been shortened. In addition, generators for binomial, Cauchy, chi-square, F, geometric, Gumbel (extreme value), hypergeometric, Laplace (double exponential), logarithmic (logarithmic series), logistic, negative binomial, Student's t, Rayleigh, triangular, Wald (inverse gaussian), and Zipf distributions are included in rv.


What's New in Python Module rv 1.1


Email bug reports or comments to
Ivan Frohne
4461 Doubletree Road
Wasilla, Alaska 99654-9461 U.S.A.