CofeehousePy/deps/numpy/doc/neps/nep-0030-duck-array-protoco...

186 lines
7.9 KiB
ReStructuredText

======================================================
NEP 30 — Duck Typing for NumPy Arrays - Implementation
======================================================
:Author: Peter Andreas Entschev <pentschev@nvidia.com>
:Author: Stephan Hoyer <shoyer@google.com>
:Status: Draft
:Type: Standards Track
:Created: 2019-07-31
:Updated: 2019-07-31
:Resolution:
Abstract
--------
We propose the ``__duckarray__`` protocol, following the high-level overview
described in NEP 22, allowing downstream libraries to return arrays of their
defined types, in contrast to ``np.asarray``, that coerces those ``array_like``
objects to NumPy arrays.
Detailed description
--------------------
NumPy's API, including array definitions, is implemented and mimicked in
countless other projects. By definition, many of those arrays are fairly
similar in how they operate to the NumPy standard. The introduction of
``__array_function__`` allowed dispatching of functions implemented by several
of these projects directly via NumPy's API. This introduces a new requirement,
returning the NumPy-like array itself, rather than forcing a coercion into a
pure NumPy array.
For the purpose above, NEP 22 introduced the concept of duck typing to NumPy
arrays. The suggested solution described in the NEP allows libraries to avoid
coercion of a NumPy-like array to a pure NumPy array where necessary, while
still allowing that NumPy-like array libraries that do not wish to implement
the protocol to coerce arrays to a pure NumPy array via ``np.asarray``.
Usage Guidance
~~~~~~~~~~~~~~
Code that uses ``np.duckarray`` is meant for supporting other ndarray-like objects
that "follow the NumPy API". That is an ill-defined concept at the moment --
every known library implements the NumPy API only partly, and many deviate
intentionally in at least some minor ways. This cannot be easily remedied, so
for users of ``np.duckarray`` we recommend the following strategy: check if the
NumPy functionality used by the code that follows your use of ``np.duckarray``
is present in Dask, CuPy and Sparse. If so, it's reasonable to expect any duck
array to work here. If not, we suggest you indicate in your docstring what kinds
of duck arrays are accepted, or what properties they need to have.
To exemplify the usage of duck arrays, suppose one wants to take the ``mean()``
of an array-like object ``arr``. Using NumPy to achieve that, one could write
``np.asarray(arr).mean()`` to achieve the intended result. If ``arr`` is not
a NumPy array, this would create an actual NumPy array in order to call
``.mean()``. However, if the array is an object that is compliant with the NumPy
API (either in full or partially) such as a CuPy, Sparse or a Dask array, then
that copy would have been unnecessary. On the other hand, if one were to use the new
``__duckarray__`` protocol: ``np.duckarray(arr).mean()``, and ``arr`` is an object
compliant with the NumPy API, it would simply be returned rather than coerced
into a pure NumPy array, avoiding unnecessary copies and potential loss of
performance.
Implementation
--------------
The implementation idea is fairly straightforward, requiring a new function
``duckarray`` to be introduced in NumPy, and a new method ``__duckarray__`` in
NumPy-like array classes. The new ``__duckarray__`` method shall return the
downstream array-like object itself, such as the ``self`` object, while the
``__array__`` method raises ``TypeError``. Alternatively, the ``__array__``
method could create an actual NumPy array and return that.
The new NumPy ``duckarray`` function can be implemented as follows:
.. code:: python
def duckarray(array_like):
if hasattr(array_like, '__duckarray__'):
return array_like.__duckarray__()
return np.asarray(array_like)
Example for a project implementing NumPy-like arrays
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Now consider a library that implements a NumPy-compatible array class called
``NumPyLikeArray``, this class shall implement the methods described above, and
a complete implementation would look like the following:
.. code:: python
class NumPyLikeArray:
def __duckarray__(self):
return self
def __array__(self):
raise TypeError("NumPyLikeArray can not be converted to a NumPy "
"array. You may want to use np.duckarray() instead.")
The implementation above exemplifies the simplest case, but the overall idea
is that libraries will implement a ``__duckarray__`` method that returns the
original object, and an ``__array__`` method that either creates and returns an
appropriate NumPy array, or raises a``TypeError`` to prevent unintentional use
as an object in a NumPy array (if ``np.asarray`` is called on an arbitrary
object that does not implement ``__array__``, it will create a NumPy array
scalar).
In case of existing libraries that don't already implement ``__array__`` but
would like to use duck array typing, it is advised that they introduce
both ``__array__`` and``__duckarray__`` methods.
Usage
-----
An example of how the ``__duckarray__`` protocol could be used to write a
``stack`` function based on ``concatenate``, and its produced outcome, can be
seen below. The example here was chosen not only to demonstrate the usage of
the ``duckarray`` function, but also to demonstrate its dependency on the NumPy
API, demonstrated by checks on the array's ``shape`` attribute. Note that the
example is merely a simplified version of NumPy's actual implementation of
``stack`` working on the first axis, and it is assumed that Dask has implemented
the ``__duckarray__`` method.
.. code:: python
def duckarray_stack(arrays):
arrays = [np.duckarray(arr) for arr in arrays]
shapes = {arr.shape for arr in arrays}
if len(shapes) != 1:
raise ValueError('all input arrays must have the same shape')
expanded_arrays = [arr[np.newaxis, ...] for arr in arrays]
return np.concatenate(expanded_arrays, axis=0)
dask_arr = dask.array.arange(10)
np_arr = np.arange(10)
np_like = list(range(10))
duckarray_stack((dask_arr, dask_arr)) # Returns dask.array
duckarray_stack((dask_arr, np_arr)) # Returns dask.array
duckarray_stack((dask_arr, np_like)) # Returns dask.array
In contrast, using only ``np.asarray`` (at the time of writing of this NEP, this
is the usual method employed by library developers to ensure arrays are
NumPy-like) has a different outcome:
.. code:: python
def asarray_stack(arrays):
arrays = [np.asanyarray(arr) for arr in arrays]
# The remaining implementation is the same as that of
# ``duckarray_stack`` above
asarray_stack((dask_arr, dask_arr)) # Returns np.ndarray
asarray_stack((dask_arr, np_arr)) # Returns np.ndarray
asarray_stack((dask_arr, np_like)) # Returns np.ndarray
Backward compatibility
----------------------
This proposal does not raise any backward compatibility issues within NumPy,
given that it only introduces a new function. However, downstream libraries
that opt to introduce the ``__duckarray__`` protocol may choose to remove the
ability of coercing arrays back to a NumPy array via ``np.array`` or
``np.asarray`` functions, preventing unintended effects of coercion of such
arrays back to a pure NumPy array (as some libraries already do, such as CuPy
and Sparse), but still leaving libraries not implementing the protocol with the
choice of utilizing ``np.duckarray`` to promote ``array_like`` objects to pure
NumPy arrays.
Previous proposals and discussion
---------------------------------
The duck typing protocol proposed here was described in a high level in
`NEP 22 <https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html>`_.
Additionally, longer discussions about the protocol and related proposals
took place in
`numpy/numpy #13831 <https://github.com/numpy/numpy/issues/13831>`_
Copyright
---------
This document has been placed in the public domain.