332 lines
17 KiB
ReStructuredText
332 lines
17 KiB
ReStructuredText
=======================================
|
||
NEP 19 — Random Number Generator Policy
|
||
=======================================
|
||
|
||
:Author: Robert Kern <robert.kern@gmail.com>
|
||
:Status: Final
|
||
:Type: Standards Track
|
||
:Created: 2018-05-24
|
||
:Updated: 2019-05-21
|
||
:Resolution: https://mail.python.org/pipermail/numpy-discussion/2018-July/078380.html
|
||
|
||
Abstract
|
||
--------
|
||
|
||
For the past decade, NumPy has had a strict backwards compatibility policy for
|
||
the number stream of all of its random number distributions. Unlike other
|
||
numerical components in ``numpy``, which are usually allowed to return
|
||
different when results when they are modified if they remain correct, we have
|
||
obligated the random number distributions to always produce the exact same
|
||
numbers in every version. The objective of our stream-compatibility guarantee
|
||
was to provide exact reproducibility for simulations across numpy versions in
|
||
order to promote reproducible research. However, this policy has made it very
|
||
difficult to enhance any of the distributions with faster or more accurate
|
||
algorithms. After a decade of experience and improvements in the surrounding
|
||
ecosystem of scientific software, we believe that there are now better ways to
|
||
achieve these objectives. We propose relaxing our strict stream-compatibility
|
||
policy to remove the obstacles that are in the way of accepting contributions
|
||
to our random number generation capabilities.
|
||
|
||
|
||
The Status Quo
|
||
--------------
|
||
|
||
Our current policy, in full:
|
||
|
||
A fixed seed and a fixed series of calls to ``RandomState`` methods using the
|
||
same parameters will always produce the same results up to roundoff error
|
||
except when the values were incorrect. Incorrect values will be fixed and
|
||
the NumPy version in which the fix was made will be noted in the relevant
|
||
docstring. Extension of existing parameter ranges and the addition of new
|
||
parameters is allowed as long the previous behavior remains unchanged.
|
||
|
||
This policy was first instated in Nov 2008 (in essence; the full set of weasel
|
||
words grew over time) in response to a user wanting to be sure that the
|
||
simulations that formed the basis of their scientific publication could be
|
||
reproduced years later, exactly, with whatever version of ``numpy`` that was
|
||
current at the time. We were keen to support reproducible research, and it was
|
||
still early in the life of ``numpy.random``. We had not seen much cause to
|
||
change the distribution methods all that much.
|
||
|
||
We also had not thought very thoroughly about the limits of what we really
|
||
could promise (and by “we” in this section, we really mean Robert Kern, let’s
|
||
be honest). Despite all of the weasel words, our policy overpromises
|
||
compatibility. The same version of ``numpy`` built on different platforms, or
|
||
just in a different way could cause changes in the stream, with varying degrees
|
||
of rarity. The biggest is that the ``.multivariate_normal()`` method relies on
|
||
``numpy.linalg`` functions. Even on the same platform, if one links ``numpy``
|
||
with a different LAPACK, ``.multivariate_normal()`` may well return completely
|
||
different results. More rarely, building on a different OS or CPU can cause
|
||
differences in the stream. We use C ``long`` integers internally for integer
|
||
distribution (it seemed like a good idea at the time), and those can vary in
|
||
size depending on the platform. Distribution methods can overflow their
|
||
internal C ``longs`` at different breakpoints depending on the platform and
|
||
cause all of the random variate draws that follow to be different.
|
||
|
||
And even if all of that is controlled, our policy still does not provide exact
|
||
guarantees across versions. We still do apply bug fixes when correctness is at
|
||
stake. And even if we didn’t do that, any nontrivial program does more than
|
||
just draw random numbers. They do computations on those numbers, transform
|
||
those with numerical algorithms from the rest of ``numpy``, which is not
|
||
subject to so strict a policy. Trying to maintain stream-compatibility for our
|
||
random number distributions does not help reproducible research for these
|
||
reasons.
|
||
|
||
The standard practice now for bit-for-bit reproducible research is to pin all
|
||
of the versions of code of your software stack, possibly down to the OS itself.
|
||
The landscape for accomplishing this is much easier today than it was in 2008.
|
||
We now have ``pip``. We now have virtual machines. Those who need to
|
||
reproduce simulations exactly now can (and ought to) do so by using the exact
|
||
same version of ``numpy``. We do not need to maintain stream-compatibility
|
||
across ``numpy`` versions to help them.
|
||
|
||
Our stream-compatibility guarantee has hindered our ability to make
|
||
improvements to ``numpy.random``. Several first-time contributors have
|
||
submitted PRs to improve the distributions, usually by implementing a faster,
|
||
or more accurate algorithm than the one that is currently there.
|
||
Unfortunately, most of them would have required breaking the stream to do so.
|
||
Blocked by our policy, and our inability to work around that policy, many of
|
||
those contributors simply walked away.
|
||
|
||
|
||
Implementation
|
||
--------------
|
||
|
||
Work on a proposed new Pseudo Random Number Generator (PRNG) subsystem is
|
||
already underway in the randomgen_
|
||
project. The specifics of the new design are out of scope for this NEP and up
|
||
for much discussion, but we will discuss general policies that will guide the
|
||
evolution of whatever code is adopted. We will also outline just a few of the
|
||
requirements that such a new system must have to support the policy proposed in
|
||
this NEP.
|
||
|
||
First, we will maintain API source compatibility just as we do with the rest of
|
||
``numpy``. If we *must* make a breaking change, we will only do so with an
|
||
appropriate deprecation period and warnings.
|
||
|
||
Second, breaking stream-compatibility in order to introduce new features or
|
||
improve performance will be *allowed* with *caution*. Such changes will be
|
||
considered features, and as such will be no faster than the standard release
|
||
cadence of features (i.e. on ``X.Y`` releases, never ``X.Y.Z``). Slowness will
|
||
not be considered a bug for this purpose. Correctness bug fixes that break
|
||
stream-compatibility can happen on bugfix releases, per usual, but developers
|
||
should consider if they can wait until the next feature release. We encourage
|
||
developers to strongly weight user’s pain from the break in
|
||
stream-compatibility against the improvements. One example of a worthwhile
|
||
improvement would be to change algorithms for a significant increase in
|
||
performance, for example, moving from the `Box-Muller transform
|
||
<https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform>`_ method of
|
||
Gaussian variate generation to the faster `Ziggurat algorithm
|
||
<https://en.wikipedia.org/wiki/Ziggurat_algorithm>`_. An example of a
|
||
discouraged improvement would be tweaking the Ziggurat tables just a little bit
|
||
for a small performance improvement.
|
||
|
||
Any new design for the random subsystem will provide a choice of different core
|
||
uniform PRNG algorithms. A promising design choice is to make these core
|
||
uniform PRNGs their own lightweight objects with a minimal set of methods
|
||
(randomgen_ calls them “BitGenerators”). The broader set of non-uniform
|
||
distributions will be its own class that holds a reference to one of these core
|
||
uniform PRNG objects and simply delegates to the core uniform PRNG object when
|
||
it needs uniform random numbers (randomgen_ calls this the Generator). To
|
||
borrow an example from randomgen_, the
|
||
class ``MT19937`` is a BitGenerator that implements the classic Mersenne Twister
|
||
algorithm. The class ``Generator`` wraps around the BitGenerator to provide
|
||
all of the non-uniform distribution methods::
|
||
|
||
# This is not the only way to instantiate this object.
|
||
# This is just handy for demonstrating the delegation.
|
||
>>> bg = MT19937(seed)
|
||
>>> rg = Generator(bg)
|
||
>>> x = rg.standard_normal(10)
|
||
|
||
We will be more strict about a select subset of methods on these BitGenerator
|
||
objects. They MUST guarantee stream-compatibility for a specified set
|
||
of methods which are chosen to make it easier to compose them to build other
|
||
distributions and which are needed to abstract over the implementation details
|
||
of the variety of BitGenerator algorithms. Namely,
|
||
|
||
* ``.bytes()``
|
||
* ``integers()`` (formerly ``.random_integers()``)
|
||
* ``random()`` (formerly ``.random_sample()``)
|
||
|
||
The distributions class (``Generator``) SHOULD have all of the same
|
||
distribution methods as ``RandomState`` with close-enough function signatures
|
||
such that almost all code that currently works with ``RandomState`` instances
|
||
will work with ``Generator`` instances (ignoring the precise stream
|
||
values). Some variance will be allowed for integer distributions: in order to
|
||
avoid some of the cross-platform problems described above, these SHOULD be
|
||
rewritten to work with ``uint64`` numbers on all platforms.
|
||
|
||
.. _randomgen: https://github.com/bashtage/randomgen
|
||
|
||
|
||
Supporting Unit Tests
|
||
:::::::::::::::::::::
|
||
|
||
Because we did make a strong stream-compatibility guarantee early in numpy’s
|
||
life, reliance on stream-compatibility has grown beyond reproducible
|
||
simulations. One use case that remains for stream-compatibility across numpy
|
||
versions is to use pseudorandom streams to generate test data in unit tests.
|
||
With care, many of the cross-platform instabilities can be avoided in the
|
||
context of small unit tests.
|
||
|
||
The new PRNG subsystem MUST provide a second, legacy distributions class that
|
||
uses the same implementations of the distribution methods as the current
|
||
version of ``numpy.random.RandomState``. The methods of this class will have
|
||
strict stream-compatibility guarantees, even stricter than the current policy.
|
||
It is intended that this class will no longer be modified, except to keep it
|
||
working when numpy internals change. All new development should go into the
|
||
primary distributions class. Bug fixes that change the stream SHALL NOT be
|
||
made to ``RandomState``; instead, buggy distributions should be made to warn
|
||
when they are buggy. The purpose of ``RandomState`` will be documented as
|
||
providing certain fixed functionality for backwards compatibility and stable
|
||
numbers for the limited purpose of unit testing, and not making whole programs
|
||
reproducible across numpy versions.
|
||
|
||
This legacy distributions class MUST be accessible under the name
|
||
``numpy.random.RandomState`` for backwards compatibility. All current ways of
|
||
instantiating ``numpy.random.RandomState`` with a given state should
|
||
instantiate the Mersenne Twister BitGenerator with the same state. The legacy
|
||
distributions class MUST be capable of accepting other BitGenerators. The
|
||
purpose
|
||
here is to ensure that one can write a program with a consistent BitGenerator
|
||
state with a mixture of libraries that may or may not have upgraded from
|
||
``RandomState``. Instances of the legacy distributions class MUST respond
|
||
``True`` to ``isinstance(rg, numpy.random.RandomState)`` because there is
|
||
current utility code that relies on that check. Similarly, old pickles of
|
||
``numpy.random.RandomState`` instances MUST unpickle correctly.
|
||
|
||
|
||
``numpy.random.*``
|
||
::::::::::::::::::
|
||
|
||
The preferred best practice for getting reproducible pseudorandom numbers is to
|
||
instantiate a generator object with a seed and pass it around. The implicit
|
||
global ``RandomState`` behind the ``numpy.random.*`` convenience functions can
|
||
cause problems, especially when threads or other forms of concurrency are
|
||
involved. Global state is always problematic. We categorically recommend
|
||
avoiding using the convenience functions when reproducibility is involved.
|
||
|
||
That said, people do use them and use ``numpy.random.seed()`` to control the
|
||
state underneath them. It can be hard to categorize and count API usages
|
||
consistently and usefully, but a very common usage is in unit tests where many
|
||
of the problems of global state are less likely.
|
||
|
||
This NEP does not propose removing these functions or changing them to use the
|
||
less-stable ``Generator`` distribution implementations. Future NEPs
|
||
might.
|
||
|
||
Specifically, the initial release of the new PRNG subsystem SHALL leave these
|
||
convenience functions as aliases to the methods on a global ``RandomState``
|
||
that is initialized with a Mersenne Twister BitGenerator object. A call to
|
||
``numpy.random.seed()`` will be forwarded to that BitGenerator object. In
|
||
addition, the global ``RandomState`` instance MUST be accessible in this
|
||
initial release by the name ``numpy.random.mtrand._rand``: Robert Kern long ago
|
||
promised ``scikit-learn`` that this name would be stable. Whoops.
|
||
|
||
In order to allow certain workarounds, it MUST be possible to replace the
|
||
BitGenerator underneath the global ``RandomState`` with any other BitGenerator
|
||
object (we leave the precise API details up to the new subsystem). Calling
|
||
``numpy.random.seed()`` thereafter SHOULD just pass the given seed to the
|
||
current BitGenerator object and not attempt to reset the BitGenerator to the
|
||
Mersenne Twister. The set of ``numpy.random.*`` convenience functions SHALL
|
||
remain the same as they currently are. They SHALL be aliases to the
|
||
``RandomState`` methods and not the new less-stable distributions class
|
||
(``Generator``, in the examples above). Users who want to get the fastest, best
|
||
distributions can follow best practices and instantiate generator objects explicitly.
|
||
|
||
This NEP does not propose that these requirements remain in perpetuity. After
|
||
we have experience with the new PRNG subsystem, we can and should revisit these
|
||
issues in future NEPs.
|
||
|
||
|
||
Alternatives
|
||
------------
|
||
|
||
Versioning
|
||
::::::::::
|
||
|
||
For a long time, we considered that the way to allow algorithmic improvements
|
||
while maintaining the stream was to apply some form of versioning. That is,
|
||
every time we make a stream change in one of the distributions, we increment
|
||
some version number somewhere. ``numpy.random`` would keep all past versions
|
||
of the code, and there would be a way to get the old versions.
|
||
|
||
We will not be doing this. If one needs to get the exact bit-for-bit results
|
||
from a given version of ``numpy``, whether one uses random numbers or not, one
|
||
should use the exact version of ``numpy``.
|
||
|
||
Proposals of how to do RNG versioning varied widely, and we will not
|
||
exhaustively list them here. We spent years going back and forth on these
|
||
designs and were not able to find one that sufficed. Let that time lost, and
|
||
more importantly, the contributors that we lost while we dithered, serve as
|
||
evidence against the notion.
|
||
|
||
Concretely, adding in versioning makes maintenance of ``numpy.random``
|
||
difficult. Necessarily, we would be keeping lots of versions of the same code
|
||
around. Adding a new algorithm safely would still be quite hard.
|
||
|
||
But most importantly, versioning is fundamentally difficult to *use* correctly.
|
||
We want to make it easy and straightforward to get the latest, fastest, best
|
||
versions of the distribution algorithms; otherwise, what's the point? The way
|
||
to make that easy is to make the latest the default. But the default will
|
||
necessarily change from release to release, so the user’s code would need to be
|
||
altered anyway to specify the specific version that one wants to replicate.
|
||
|
||
Adding in versioning to maintain stream-compatibility would still only provide
|
||
the same level of stream-compatibility that we currently do, with all of the
|
||
limitations described earlier. Given that the standard practice for such needs
|
||
is to pin the release of ``numpy`` as a whole, versioning ``RandomState`` alone
|
||
is superfluous.
|
||
|
||
|
||
``StableRandom``
|
||
::::::::::::::::
|
||
|
||
A previous version of this NEP proposed to leave ``RandomState`` completely
|
||
alone for a deprecation period and build the new subsystem alongside with new
|
||
names. To satisfy the unit testing use case, it proposed introducing a small
|
||
distributions class nominally called ``StableRandom``. It would have provided
|
||
a small subset of distribution methods that were considered most useful in unit
|
||
testing, but not the full set such that it would be too likely to be used
|
||
outside of the testing context.
|
||
|
||
During discussion about this proposal, it became apparent that there was no
|
||
satisfactory subset. At least some projects used a fairly broad selection of
|
||
the ``RandomState`` methods in unit tests.
|
||
|
||
Downstream project owners would have been forced to modify their code to
|
||
accommodate the new PRNG subsystem. Some modifications might be simply
|
||
mechanical, but the bulk of the work would have been tedious churn for no
|
||
positive improvement to the downstream project, just avoiding being broken.
|
||
|
||
Furthermore, under this old proposal, we would have had a quite lengthy
|
||
deprecation period where ``RandomState`` existed alongside the new system of
|
||
BitGenerator and Generator classes. Leaving the implementation of
|
||
``RandomState`` fixed meant that it could not use the new BitGenerator state
|
||
objects. Developing programs that use a mixture of libraries that have and
|
||
have not upgraded would require managing two sets of PRNG states. This would
|
||
notionally have been time-limited, but we intended the deprecation to be very
|
||
long.
|
||
|
||
The current proposal solves all of these problems. All current usages of
|
||
``RandomState`` will continue to work in perpetuity, though some may be
|
||
discouraged through documentation. Unit tests can continue to use the full
|
||
complement of ``RandomState`` methods. Mixed ``RandomState/Generator``
|
||
code can safely share the common BitGenerator state. Unmodified ``RandomState``
|
||
code can make use of the new features of alternative BitGenerator-like settable
|
||
streams.
|
||
|
||
|
||
Discussion
|
||
----------
|
||
|
||
- `NEP discussion <https://mail.python.org/pipermail/numpy-discussion/2018-June/078126.html>`_
|
||
- `Earlier discussion <https://mail.python.org/pipermail/numpy-discussion/2018-January/077608.html>`_
|
||
|
||
|
||
Copyright
|
||
---------
|
||
|
||
This document has been placed in the public domain.
|