158 lines
6.3 KiB
ReStructuredText
158 lines
6.3 KiB
ReStructuredText
|
=====================================
|
|||
|
NEP 15 — Merging multiarray and umath
|
|||
|
=====================================
|
|||
|
|
|||
|
:Author: Nathaniel J. Smith <njs@pobox.com>
|
|||
|
:Status: Final
|
|||
|
:Type: Standards Track
|
|||
|
:Created: 2018-02-22
|
|||
|
:Resolution: https://mail.python.org/pipermail/numpy-discussion/2018-June/078345.html
|
|||
|
|
|||
|
Abstract
|
|||
|
--------
|
|||
|
|
|||
|
Let's merge ``numpy.core.multiarray`` and ``numpy.core.umath`` into a
|
|||
|
single extension module, and deprecate ``np.set_numeric_ops``.
|
|||
|
|
|||
|
|
|||
|
Background
|
|||
|
----------
|
|||
|
|
|||
|
Currently, numpy's core C code is split between two separate extension
|
|||
|
modules.
|
|||
|
|
|||
|
``numpy.core.multiarray`` is built from
|
|||
|
``numpy/core/src/multiarray/*.c``, and contains the core array
|
|||
|
functionality (in particular, the ``ndarray`` object).
|
|||
|
|
|||
|
``numpy.core.umath`` is built from ``numpy/core/src/umath/*.c``, and
|
|||
|
contains the ufunc machinery.
|
|||
|
|
|||
|
These two modules each expose their own separate C API, accessed via
|
|||
|
``import_multiarray()`` and ``import_umath()`` respectively. The idea
|
|||
|
is that they're supposed to be independent modules, with
|
|||
|
``multiarray`` as a lower-level layer with ``umath`` built on top. In
|
|||
|
practice this has turned out to be problematic.
|
|||
|
|
|||
|
First, the layering isn't perfect: when you write ``ndarray +
|
|||
|
ndarray``, this invokes ``ndarray.__add__``, which then calls the
|
|||
|
ufunc ``np.add``. This means that ``ndarray`` needs to know about
|
|||
|
ufuncs – so instead of a clean layering, we have a circular
|
|||
|
dependency. To solve this, ``multiarray`` exports a somewhat
|
|||
|
terrifying function called ``set_numeric_ops``. The bootstrap
|
|||
|
procedure each time you ``import numpy`` is:
|
|||
|
|
|||
|
1. ``multiarray`` and its ``ndarray`` object are loaded, but
|
|||
|
arithmetic operations on ndarrays are broken.
|
|||
|
|
|||
|
2. ``umath`` is loaded.
|
|||
|
|
|||
|
3. ``set_numeric_ops`` is used to monkeypatch all the methods like
|
|||
|
``ndarray.__add__`` with objects from ``umath``.
|
|||
|
|
|||
|
In addition, ``set_numeric_ops`` is exposed as a public API,
|
|||
|
``np.set_numeric_ops``.
|
|||
|
|
|||
|
Furthermore, even when this layering does work, it ends up distorting
|
|||
|
the shape of our public ABI. In recent years, the most common reason
|
|||
|
for adding new functions to ``multiarray``\'s "public" ABI is not that
|
|||
|
they really need to be public or that we expect other projects to use
|
|||
|
them, but rather just that we need to call them from ``umath``. This
|
|||
|
is extremely unfortunate, because it makes our public ABI
|
|||
|
unnecessarily large, and since we can never remove things from it then
|
|||
|
this creates an ongoing maintenance burden. The way C works, you can
|
|||
|
have internal API that's visible to everything inside the same
|
|||
|
extension module, or you can have a public API that everyone can use;
|
|||
|
you can't (easily) have an API that's visible to multiple extension
|
|||
|
modules inside numpy, but not to external users.
|
|||
|
|
|||
|
We've also increasingly been putting utility code into
|
|||
|
``numpy/core/src/private/``, which now contains a bunch of files which
|
|||
|
are ``#include``\d twice, once into ``multiarray`` and once into
|
|||
|
``umath``. This is pretty gross, and is purely a workaround for these
|
|||
|
being separate C extensions. The ``npymath`` library is also
|
|||
|
included in both extension modules.
|
|||
|
|
|||
|
|
|||
|
Proposed changes
|
|||
|
----------------
|
|||
|
|
|||
|
This NEP proposes three changes:
|
|||
|
|
|||
|
1. We should start building ``numpy/core/src/multiarray/*.c`` and
|
|||
|
``numpy/core/src/umath/*.c`` together into a single extension
|
|||
|
module.
|
|||
|
|
|||
|
2. Instead of ``set_numeric_ops``, we should use some new, private API
|
|||
|
to set up ``ndarray.__add__`` and friends.
|
|||
|
|
|||
|
3. We should deprecate, and eventually remove, ``np.set_numeric_ops``.
|
|||
|
|
|||
|
|
|||
|
Non-proposed changes
|
|||
|
--------------------
|
|||
|
|
|||
|
We don't necessarily propose to throw away the distinction between
|
|||
|
multiarray/ and umath/ in terms of our source code organization:
|
|||
|
internal organization is useful! We just want to build them together
|
|||
|
into a single extension module. Of course, this does open the door for
|
|||
|
potential future refactorings, which we can then evaluate based on
|
|||
|
their merits as they come up.
|
|||
|
|
|||
|
It also doesn't propose that we break the public C ABI. We should
|
|||
|
continue to provide ``import_multiarray()`` and ``import_umath()``
|
|||
|
functions – it's just that now both ABIs will ultimately be loaded
|
|||
|
from the same C library. Due to how ``import_multiarray()`` and
|
|||
|
``import_umath()`` are written, we'll also still need to have modules
|
|||
|
called ``numpy.core.multiarray`` and ``numpy.core.umath``, and they'll
|
|||
|
need to continue to export ``_ARRAY_API`` and ``_UFUNC_API`` objects –
|
|||
|
but we can make one or both of these modules be tiny shims that simply
|
|||
|
re-export the magic API object from where-ever it's actually defined.
|
|||
|
(See ``numpy/core/code_generators/generate_{numpy,ufunc}_api.py`` for
|
|||
|
details of how these imports work.)
|
|||
|
|
|||
|
|
|||
|
Backward compatibility
|
|||
|
----------------------
|
|||
|
|
|||
|
The only compatibility break is the deprecation of ``np.set_numeric_ops``.
|
|||
|
|
|||
|
|
|||
|
Rejected alternatives
|
|||
|
---------------------
|
|||
|
|
|||
|
Preserve ``set_numeric_ops`` for monkeypatching
|
|||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|||
|
|
|||
|
In discussing this NEP, one additional use case was raised for
|
|||
|
``set_numeric_ops``: if you have an optimized vector math library
|
|||
|
(e.g. Intel's MKL VML, Sleef, or Yeppp), then ``set_numeric_ops`` can
|
|||
|
be used to monkeypatch numpy to use these operations instead of
|
|||
|
numpy's built-in vector operations. But, even if we grant that this is
|
|||
|
a great idea, using ``set_numeric_ops`` isn't actually the best way to
|
|||
|
do it. All ``set_numeric_ops`` allows you to do is take over Python's
|
|||
|
syntactic operators (``+``, ``*``, etc.) on ndarrays; it doesn't let
|
|||
|
you affect operations called via other APIs (e.g., ``np.add``), or
|
|||
|
operations that don't have built-in syntax (e.g., ``np.exp``). Also,
|
|||
|
you have to reimplement the whole ufunc machinery, instead of just the
|
|||
|
core loop. On the other hand, the `PyUFunc_ReplaceLoopBySignature
|
|||
|
<https://docs.scipy.org/doc/numpy/reference/c-api.ufunc.html#c.PyUFunc_ReplaceLoopBySignature>`__
|
|||
|
API – which was added in 2006 – allows replacement of the inner loops
|
|||
|
of arbitrary ufuncs. This is both simpler and more powerful – e.g.
|
|||
|
replacing the inner loop of ``np.add`` means your code will
|
|||
|
automatically be used for both ``ndarray + ndarray`` as well as direct
|
|||
|
calls to ``np.add``. So this doesn't seem like a good reason to not
|
|||
|
deprecate ``set_numeric_ops``.
|
|||
|
|
|||
|
|
|||
|
Discussion
|
|||
|
----------
|
|||
|
|
|||
|
* https://mail.python.org/pipermail/numpy-discussion/2018-March/077764.html
|
|||
|
* https://mail.python.org/pipermail/numpy-discussion/2018-June/078345.html
|
|||
|
|
|||
|
Copyright
|
|||
|
---------
|
|||
|
|
|||
|
This document has been placed in the public domain.
|