Release 1.1
November 27, 1998
Ivan Frohne
4461 Doubletree Road
Wasilla, Alaska 99654-9461 U.S.A.
email: statistics@frohne.westhost.com
Copyright © 1998 by Ivan Frohne; Wasilla, Alaska, U.S.A.
All Rights Reserved
Permission to use, copy, modify and distribute this software and its documentation for any purpose, free of charge, is granted subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the software.
THE SOFTWARE AND DOCUMENTATION IS PROVIDED WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHOR OR COPYRIGHT HOLDER BE LIABLE FOR ANY CLAIM OR DAMAGES IN A CONTRACT ACTION, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR ITS DOCUMENTATION.
Release 1.1 of Python module rv is an easy-to-use collection of random number generators, with utilities to simplify housekeeping procedures. There are 4 modern uniform(0,1) generators, and 20 non-uniform generators for commonly encountered continuous probability distributions like the normal or Chi-square. Generators producing pseudo-random integers from binomial, Poisson, and 5 other discrete distributions are new in rv 1.1. Points may be generated in or on a simplex or sphere of any size and dimension, and there are routines to sub sample with or without replacement from, and to randomly permute the elements of, a list or array.
The uniform(0,1) generators are reasonably fast and have long periods. Three of the four were thoroughly tested statistically by their proposers. Algorithms used for the non-uniform generators are a compromise between simplicity and efficiency.
No user initialization is needed; a new random series seed is derived from the system clock when rv is imported, after which all features of the module are available. The number of pseudo-random uniform numbers generated in the current series is retrieved by a function call, as is a string containing a nominal (one-line) description of the uniform algorithm in use, and the seed that started the selected generator. The user can restart the uniform generator, specifying a different generator algorithm if desired, and either supply an integer seed or allow one to be calculated from the clock. The state of the module can be saved to disk at any stage, and it can be set to precisely that state at a later time.
The module concludes with a self-testing script that verifies the uniform generators; checks the state-saving utility; and exercises, tests and displays execution times and statistics for all the generators.
Each of the uniform random number generators supplied in module rv returns floating point numbers in the open interval (0, 1). Uses of uniform random numbers u commonly involve the evaluation of expressions such as 1/u and log u/(1-u), and restricting uniforms to the open interval is a time and labor-saving convenience that introduces no theoretical difficulties.
The fastest uniform generator, 'Flip', is from Donald E. Knuth (The Stanford GraphBase: A Platform for Combinatorial Computing, 1993, Addison-Wesley, ISBN 0-201-54275-7; pp. 216-220). It is a subtractive recurrence employing a series of 55 integers. The exact period is unknown, but it exceeds 3.6×1016 and may be as large as 3.9×1025. The algorithm has apparently not been extensively tested statistically, but if speed is the overriding requirement this is the generator of choice. Hardware requirements are two's complement arithmetic and 32-bit integers. The pseudo-random numbers delivered range from 1 / (231 + 1) = 4.66×10-10 to 1 - 1 / (231 + 1) = 0.999,999,999,534.
A Postscript file (May, 1998) describing this recently proposed uniform generator,'CMRG', can be downloaded from L'Ecuyer. A published account will appear in the journal Operations Research. This generator is the combined parallel multiple recursive sequence with two components of order three identified by its author, Pierre L'Ecuyer, as MRG32k3a. The period is 3.1×1057 and thorough testing by L'Ecuyer revealed no statistical faults. 'CMRG' is only 75% as fast as 'Flip', but notice that its period is at least 1032 times as long. The pseudo-random numbers generated range from 1 / (232 - 208) = 2.33×10-10 to 1 - 1 / (232 - 208) = 0.999,999,999,767.
'SMRG' is a simple multiplicative congruential recursive generator with
modulus 261 - 1, the Mersenne prime following
231 - 1. It was proposed by Pei-Chi Wu in
"Multiplicative, Congruential Random-Number Generators with Multiplier
±2k
Makoto Matsumoto and Takuji Nishimura described this generator, which has an astronomically long period exceeding 106000, in "Mersenne Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator," ACM Transactions on Modeling and Computer Simulation (January 1998), vol. 8, no. 1, pp. 3-30. 'Twister' is based on linear recurrences modulo two, and has excellent statistical properties. It is identified by its proposers as MT19937, which stands for "Mersenne Twister with period 219937 - 1". It employs a series of 623 integers. The generated pseudo-random numbers range from 2.33×10-10 to 1 - 1 / (232 + 1) = 0.999,999,999,767. 'Twister' is the slowest of the four generators, about 40% as fast as 'Flip'.
The default uniform(0,1) random number generator is 'CMRG'. To get the next random number from the current generator, just reference rv.random. Here's an example of elementary interactive rv use:
>>> import rv >>> rv.random() 0.535447730305 >>> rv.initial_seed() # What seed was used? 748346677L >>> rv.random_algorithm() # Which algorithm was it? "'CMRG': Combined Multiplicative Recursion MRG32k3a (L'Ecuyer, 1998)" >>> >>> for j in range(100): s = rv.random() # Generate 100 more. >>> print s # s is the 101th uniform generated. 0.530929800225 >>> rv.random_count() # How many was that? 101L >>> >>> rv.save_state() # Save module state in default file. >>> for j in range(100): s = rv.random() # Generate 100 more. >>> print s # s is the 201th uniform generated. 0.247631267996 >>> rv.initialize(file_string = 'default') # Restore. >>> for j in range(100): s = rv.random() # Generate 100. >>> print s # s should = 201th uniform above. 0.247631267996 # The state was successfully restored. >>> rv.random_count() # number generated in this series 201L
0.0 < random() < 1.0
Some interactive usage examples:
>>> import rv, array
>>> rv.random()
0.253942895438
>>> empty_list = []
>>> rv.random(buffer=empty_list)
0.125938460594 # returned because buffer is empty.
>>> another_list = 2*[0.0]
>>> rv.random(another_list) # None returned here.
>>> another_list
[0.509709359483, 0.131553253443]
>>> an_array = array.array('d', 2*[0.0]) # float[2] array.
>>> rv.random(an_array)
>>> an_array
array('d', [0.356453425341, 0.978563485744])
>>> rv.random_count()
6L
Generators for most well-known continuous and discrete probability distributions are included in rv. All require distribution parameters, some of which must be positive or non-negative. There are frequently no obvious defaults for parameter values, but all non-uniform generator parameters do have specified defaults.
Like random, all the non-uniform(0,1) generators operate in two modes. If the only arguments supplied are distributional parameters, or if there are no arguments, a single pseudo-random variate from the selected distribution is returned. For example, after x = rv.normal(5.0, 2.5) executes, x is a pseudo-random variate from the normal distribution with mean 5.0 and standard deviation 2.5. But if mseq is a list or array of length 100, say, rv.normal(5.0, 2.5, buffer=mseq) results in mseq being filled with 100 normal(5.0, 2,5) pseudo-randoms. None is returned. The last parameter is always buffer, and the keyword may be omitted, of course: rv.normal(5.0, 2.5, mseq) is fine.
Generator names are lower case unless the name is proper name or pseudonym. For example: gamma(), Poisson(), von_Mises(), Student().
0.0 <= beta(mu, nu) <= 1.0
chi_square(df) >= 0
exponential(scale) >= 0
Fisher_F(numdf, denomdf) >= 0
gamma(mu) >= 0
lognormal( mu, sigma) >= 0
Pareto( mode, scale) >= mode
lowint <= randint(lowint, upint) <= upint
Rayleigh(mode) >= 0
left <= triangular( left, mode, right) <= right
-math.pi <= von_Mises (mean, shape) <= +math.pi
Wald(mean, scale) > 0
Weibull(scale, shape) > 0
Python standard module whrandom has a uniform(0,1) generator
and functions to randomly select sequence elements and generate random
integers. The fundamental design of rv builds on that of
whrandom. All functions in whrandom are
present in rv with the exception of
whrandom.seed(), which is replaced by
rv.initialize(). A minor difference should not cause any
distress:
0 <= whrandom.random()
< 1, but
0 < rv.random() < 1.
The fastest rv uniform generator, 'Flip', is about 15% slower than whrandom.random(). However, the period for whrandom.random() is 6.95×1012, compared to a lower bound of 3.6×1016 for 'Flip'. Long periods are important because generators with good statistical properties tend to have long periods, and because only a small proportion of the period should be utilized. One authority (Ripley, B. D., Statistical Simulation, Wiley, 1987, p. 26) recommends a period exceeding 200 times the square of the number of random numbers needed. Monte Carlo and statistical resampling experiments not uncommonly require more than 1010 random numbers, which would necessitate a period of at least 2×1022 using Ripley's criterion.
'CMRG' and 'Twister' are slower than whrandom.random() by about 35% and 70%, respectively, but they do have very long periods (larger than 1057) and also have been tested carefully for statistical anomalies by their proposers. The new uniform(0,1) generator 'SMRG' is 30% slower than whrandom.random() but provides doubles with all bits random and values ranging from 4.3×10-19 to 1 - 1.1×10-16 compared to from about 10-10 to 1 - 10-10 for the others.
On a 200 MHz Pentium-II PC, whrandom.random() produces about 20,000 random numbers per second. Rates for the rv 1.1 algorithms range from 6,000/s for 'Twister' to 17,000/s for 'Flip'. Higher speeds result from the use of rv.random( buffer=x), where x is a double array, because function call overhead is avoided. For len(x) = 100, rv 1.1 uniform(0,1) generation rates improve to 7,400/s for 'Twister', and to nearly 32,000/s for 'Flip'. The faster the generator the more impressive the improvement, since function call overhead is proportionately larger for smaller internal function execution times.
It is now widely accepted that all random number algorithms have statistical peculiarities, however undiscovered they may be. To compensate for this sorry state of affairs, it is prudent to confirm interesting Monte Carlo or simulation results by repeating the run using a different uniform(0,1) random number algorithm. This is easy to arrange in rv.
Another bothersome detail involves the specification of an initial seed for the uniform(0,1) generator. Should the seed be renewed once every run, more frequently, or never? For casual use, there is rarely any reason to specify a seed at all with rv; the seed will be constructed automatically using the system clock and some bit-twiddling. Even for more intensive or sensitive applications, moreover, frequently specifying new seeds is worrisome because this might result in reuse of portions of the pseudo-random series. To avoid this possibility it is preferable to save the module state at the end of each run, initializing to this saved state at the beginning of the subsequent run. Periods of the four generators will usually be long enough to guarantee that the random number stream will never repeat. The save_state facility is also useful when it is necessary to repeat precisely a simulation or Monte Carlo run.
Random variates from all the non-uniform univariate continuous probability distributions included in Python standard module random are also available from rv, although the names have been shortened. In addition, generators for binomial, Cauchy, chi-square, F, geometric, Gumbel (extreme value), hypergeometric, Laplace (double exponential), logarithmic (logarithmic series), logistic, negative binomial, Student's t, Rayleigh, triangular, Wald (inverse gaussian), and Zipf distributions are included in rv.
Email bug reports or comments to
Ivan Frohne
4461 Doubletree Road
Wasilla, Alaska 99654-9461 U.S.A.