CofeehousePy/deps/numpy/doc/neps/nep-0027-zero-rank-arrarys.rst

255 lines
9.2 KiB
ReStructuredText

=========================
NEP 27 — Zero Rank Arrays
=========================
:Author: Alexander Belopolsky (sasha), transcribed Matt Picus <matti.picus@gmail.com>
:Status: Final
:Type: Informational
:Created: 2006-06-10
:Resolution: https://mail.python.org/pipermail/numpy-discussion/2018-October/078824.html
.. note ::
NumPy has both zero rank arrays and scalars. This design document, adapted
from a `2006 wiki entry`_, describes what zero rank arrays are and why they
exist. It was transcribed 2018-10-13 into a NEP and links were updated.
The pull request sparked `a lively discussion`_ about the continued need
for zero rank arrays and scalars in NumPy.
Some of the information here is dated, for instance indexing of 0-D arrays
now is now implemented and does not error.
Zero-Rank Arrays
----------------
Zero-rank arrays are arrays with shape=(). For example:
>>> x = array(1)
>>> x.shape
()
Zero-Rank Arrays and Array Scalars
----------------------------------
Array scalars are similar to zero-rank arrays in many aspects::
>>> int_(1).shape
()
They even print the same::
>>> print int_(1)
1
>>> print array(1)
1
However there are some important differences:
* Array scalars are immutable
* Array scalars have different python type for different data types
Motivation for Array Scalars
----------------------------
Numpy's design decision to provide 0-d arrays and array scalars in addition to
native python types goes against one of the fundamental python design
principles that there should be only one obvious way to do it. In this section
we will try to explain why it is necessary to have three different ways to
represent a number.
There were several numpy-discussion threads:
* `rank-0 arrays`_ in a 2002 mailing list thread.
* Thoughts about zero dimensional arrays vs Python scalars in a `2005 mailing list thread`_]
It has been suggested several times that NumPy just use rank-0 arrays to
represent scalar quantities in all case. Pros and cons of converting rank-0
arrays to scalars were summarized as follows:
- Pros:
- Some cases when Python expects an integer (the most
dramatic is when slicing and indexing a sequence:
_PyEval_SliceIndex in ceval.c) it will not try to
convert it to an integer first before raising an error.
Therefore it is convenient to have 0-dim arrays that
are integers converted for you by the array object.
- No risk of user confusion by having two types that
are nearly but not exactly the same and whose separate
existence can only be explained by the history of
Python and NumPy development.
- No problems with code that does explicit typechecks
``(isinstance(x, float)`` or ``type(x) == types.FloatType)``. Although
explicit typechecks are considered bad practice in general, there are a
couple of valid reasons to use them.
- No creation of a dependency on Numeric in pickle
files (though this could also be done by a special case
in the pickling code for arrays)
- Cons:
- It is difficult to write generic code because scalars
do not have the same methods and attributes as arrays.
(such as ``.type`` or ``.shape``). Also Python scalars have
different numeric behavior as well.
- This results in a special-case checking that is not
pleasant. Fundamentally it lets the user believe that
somehow multidimensional homoegeneous arrays
are something like Python lists (which except for
Object arrays they are not).
Numpy implements a solution that is designed to have all the pros and none of the cons above.
Create Python scalar types for all of the 21 types and also
inherit from the three that already exist. Define equivalent
methods and attributes for these Python scalar types.
The Need for Zero-Rank Arrays
-----------------------------
Once the idea to use zero-rank arrays to represent scalars was rejected, it was
natural to consider whether zero-rank arrays can be eliminated altogether.
However there are some important use cases where zero-rank arrays cannot be
replaced by array scalars. See also `A case for rank-0 arrays`_ from February
2006.
* Output arguments::
>>> y = int_(5)
>>> add(5,5,x)
array(10)
>>> x
array(10)
>>> add(5,5,y)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: return arrays must be of ArrayType
* Shared data::
>>> x = array([1,2])
>>> y = x[1:2]
>>> y.shape = ()
>>> y
array(2)
>>> x[1] = 20
>>> y
array(20)
Indexing of Zero-Rank Arrays
----------------------------
As of NumPy release 0.9.3, zero-rank arrays do not support any indexing::
>>> x[...]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
IndexError: 0-d arrays can't be indexed.
On the other hand there are several cases that make sense for rank-zero arrays.
Ellipsis and empty tuple
~~~~~~~~~~~~~~~~~~~~~~~~
Alexander started a `Jan 2006 discussion`_ on scipy-dev
with the following proposal:
... it may be reasonable to allow ``a[...]``. This way
ellipsis can be interpereted as any number of ``:`` s including zero.
Another subscript operation that makes sense for scalars would be
``a[...,newaxis]`` or even ``a[{newaxis, }* ..., {newaxis,}*]``, where
``{newaxis,}*`` stands for any number of comma-separated newaxis tokens.
This will allow one to use ellipsis in generic code that would work on
any numpy type.
Francesc Altet supported the idea of ``[...]`` on zero-rank arrays and
`suggested`_ that ``[()]`` be supported as well.
Francesc's proposal was::
In [65]: type(numpy.array(0)[...])
Out[65]: <type 'numpy.ndarray'>
In [66]: type(numpy.array(0)[()]) # Indexing a la numarray
Out[66]: <type 'int32_arrtype'>
In [67]: type(numpy.array(0).item()) # already works
Out[67]: <type 'int'>
There is a consensus that for a zero-rank array ``x``, both ``x[...]`` and ``x[()]`` should be valid, but the question
remains on what should be the type of the result - zero rank ndarray or ``x.dtype``?
(Alexander)
First, whatever choice is made for ``x[...]`` and ``x[()]`` they should be
the same because ``...`` is just syntactic sugar for "as many `:` as
necessary", which in the case of zero rank leads to ``... = (:,)*0 = ()``.
Second, rank zero arrays and numpy scalar types are interchangeable within
numpy, but numpy scalars can be use in some python constructs where ndarrays
can't. For example::
>>> (1,)[array(0)]
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: tuple indices must be integers
>>> (1,)[int32(0)]
1
Since most if not all numpy function automatically convert zero-rank arrays to scalars on return, there is no reason for
``[...]`` and ``[()]`` operations to be different.
See SVN changeset 1864 (which became git commit `9024ff0`_) for
implementation of ``x[...]`` and ``x[()]`` returning numpy scalars.
See SVN changeset 1866 (which became git commit `743d922`_) for
implementation of ``x[...] = v`` and ``x[()] = v``
Increasing rank with newaxis
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Everyone who commented liked this feature, so as of SVN changeset 1871 (which became git commit `b32744e`_) any number of ellipses and
newaxis tokens can be placed as a subscript argument for a zero-rank array. For
example::
>>> x = array(1)
>>> x[newaxis,...,newaxis,...]
array([[1]])
It is not clear why more than one ellipsis should be allowed, but this is the
behavior of higher rank arrays that we are trying to preserve.
Refactoring
~~~~~~~~~~~
Currently all indexing on zero-rank arrays is implemented in a special ``if (nd
== 0)`` branch of code that used to always raise an index error. This ensures
that the changes do not affect any existing usage (except, the usage that
relies on exceptions). On the other hand part of motivation for these changes
was to make behavior of ndarrays more uniform and this should allow to
eliminate ``if (nd == 0)`` checks altogether.
Copyright
---------
The original document appeared on the scipy.org wiki, with no Copyright notice, and its `history`_ attributes it to sasha.
.. _`2006 wiki entry`: https://web.archive.org/web/20100503065506/http://projects.scipy.org:80/numpy/wiki/ZeroRankArray
.. _`history`: https://web.archive.org/web/20100503065506/http://projects.scipy.org:80/numpy/wiki/ZeroRankArray?action=history
.. _`2005 mailing list thread`: https://sourceforge.net/p/numpy/mailman/message/11299166
.. _`suggested`: https://mail.python.org/pipermail/numpy-discussion/2006-January/005572.html
.. _`Jan 2006 discussion`: https://mail.python.org/pipermail/numpy-discussion/2006-January/005579.html
.. _`A case for rank-0 arrays`: https://mail.python.org/pipermail/numpy-discussion/2006-February/006384.html
.. _`rank-0 arrays`: https://mail.python.org/pipermail/numpy-discussion/2002-September/001600.html
.. _`9024ff0`: https://github.com/numpy/numpy/commit/9024ff0dc052888b5922dde0f3e615607a9e99d7
.. _`743d922`: https://github.com/numpy/numpy/commit/743d922bf5893acf00ac92e823fe12f460726f90
.. _`b32744e`: https://github.com/numpy/numpy/commit/b32744e3fc5b40bdfbd626dcc1f72907d77c01c4
.. _`a lively discussion`: https://github.com/numpy/numpy/pull/12166