904 lines
28 KiB
ReStructuredText
904 lines
28 KiB
ReStructuredText
.. -*-rst-*-
|
|
|
|
*********************************************
|
|
Developer notes on the transition to Python 3
|
|
*********************************************
|
|
|
|
:date: 2010-07-11
|
|
:author: Charles R. Harris
|
|
:author: Pauli Virtanen
|
|
|
|
General
|
|
=======
|
|
|
|
NumPy has now been ported to Python 3.
|
|
|
|
Some glitches may still be present; however, we are not aware of any
|
|
significant ones, the test suite passes.
|
|
|
|
|
|
Resources
|
|
---------
|
|
|
|
Information on porting to 3K:
|
|
|
|
- https://wiki.python.org/moin/cporting
|
|
- https://wiki.python.org/moin/PortingExtensionModulesToPy3k
|
|
|
|
|
|
Prerequisites
|
|
-------------
|
|
|
|
The Nose test framework has currently (Nov 2009) no released Python 3
|
|
compatible version. Its 3K SVN branch, however, works quite well:
|
|
|
|
- http://python-nose.googlecode.com/svn/branches/py3k
|
|
|
|
|
|
Known semantic changes on Py2
|
|
=============================
|
|
|
|
As a side effect, the Py3 adaptation has caused the following semantic
|
|
changes that are visible on Py2.
|
|
|
|
* Objects (except bytes and str) that implement the PEP 3118 array interface
|
|
will behave as ndarrays in `array(...)` and `asarray(...)`; the same way
|
|
as if they had ``__array_interface__`` defined.
|
|
|
|
* Otherwise, there are no known semantic changes.
|
|
|
|
|
|
Known semantic changes on Py3
|
|
=============================
|
|
|
|
The following semantic changes have been made on Py3:
|
|
|
|
* Division: integer division is by default true_divide, also for arrays.
|
|
|
|
* Dtype field names are Unicode.
|
|
|
|
* Only unicode dtype field titles are included in fields dict.
|
|
|
|
* :pep:`3118` buffer objects will behave differently from Py2 buffer objects
|
|
when used as an argument to `array(...)`, `asarray(...)`.
|
|
|
|
In Py2, they would cast to an object array.
|
|
|
|
In Py3, they cast similarly as objects having an
|
|
``__array_interface__`` attribute, ie., they behave as if they were
|
|
an ndarray view on the data.
|
|
|
|
|
|
|
|
Python code
|
|
===========
|
|
|
|
|
|
2to3 in setup.py
|
|
----------------
|
|
|
|
Currently, setup.py calls 2to3 automatically to convert Python sources
|
|
to Python 3 ones, and stores the results under::
|
|
|
|
build/py3k
|
|
|
|
Only changed files will be re-converted when setup.py is called a second
|
|
time, making development much faster.
|
|
|
|
Currently, this seems to handle all of the necessary Python code
|
|
conversion.
|
|
|
|
Not all of the 2to3 transformations are appropriate for all files.
|
|
Especially, 2to3 seems to be quite trigger-happy in replacing e.g.
|
|
``unicode`` by ``str`` which causes problems in ``defchararray.py``.
|
|
For files that need special handling, add entries to
|
|
``tools/py3tool.py``.
|
|
|
|
|
|
|
|
numpy.compat.py3k
|
|
-----------------
|
|
|
|
There are some utility functions needed for 3K compatibility in
|
|
``numpy.compat.py3k`` -- they can be imported from ``numpy.compat``:
|
|
|
|
- bytes, unicode: bytes and unicode constructors
|
|
- asbytes: convert string to bytes (no-op on Py2)
|
|
- asbytes_nested: convert strings in lists to Bytes
|
|
- asunicode: convert string to unicode
|
|
- asunicode_nested: convert strings in lists to Unicode
|
|
- asstr: convert item to the str type
|
|
- getexception: get current exception (see below)
|
|
- isfileobj: detect Python file objects
|
|
- strchar: character for Unicode (Py3) or Strings (Py2)
|
|
- open_latin1: open file in the latin1 text mode
|
|
|
|
More can be added as needed.
|
|
|
|
|
|
numpy.f2py
|
|
----------
|
|
|
|
F2py is ported to Py3.
|
|
|
|
|
|
Bytes vs. strings
|
|
-----------------
|
|
|
|
At many points in NumPy, bytes literals are needed. These can be created via
|
|
numpy.compat.asbytes and asbytes_nested.
|
|
|
|
|
|
Exception syntax
|
|
----------------
|
|
|
|
Syntax change: "except FooException, bar:" -> "except FooException as bar:"
|
|
|
|
This is taken care by 2to3, however.
|
|
|
|
|
|
Relative imports
|
|
----------------
|
|
|
|
The new relative import syntax,
|
|
|
|
from . import foo
|
|
|
|
is not available on Py2.4, so we can't simply use it.
|
|
|
|
Using absolute imports everywhere is probably OK, if they just happen
|
|
to work.
|
|
|
|
2to3, however, converts the old syntax to new syntax, so as long as we
|
|
use the converter, it takes care of most parts.
|
|
|
|
|
|
Print
|
|
-----
|
|
|
|
The Print statement changed to a builtin function in Py3.
|
|
|
|
Also this is taken care of by 2to3.
|
|
|
|
types module
|
|
------------
|
|
|
|
The following items were removed from `types` module in Py3:
|
|
|
|
- StringType (Py3: `bytes` is equivalent, to some degree)
|
|
- InstanceType (Py3: ???)
|
|
- IntType (Py3: no equivalent)
|
|
- LongType (Py3: equivalent `long`)
|
|
- FloatType (Py3: equivalent `float`)
|
|
- BooleanType (Py3: equivalent `bool`)
|
|
- ComplexType (Py3: equivalent `complex`)
|
|
- UnicodeType (Py3: equivalent `str`)
|
|
- BufferType (Py3: more-or-less equivalent `memoryview`)
|
|
|
|
In ``numerictypes.py``, the "common" types were replaced by their
|
|
plain equivalents, and `IntType` was dropped.
|
|
|
|
|
|
numpy.core.numerictypes
|
|
-----------------------
|
|
|
|
In numerictypes, types on Python 3 were changed so that:
|
|
|
|
=========== ============
|
|
Scalar type Value
|
|
=========== ============
|
|
str_ This is the basic Unicode string type on Py3
|
|
bytes_ This is the basic Byte-string type on Py3
|
|
string_ bytes_ alias
|
|
unicode_ str_ alias
|
|
=========== ============
|
|
|
|
|
|
numpy.loadtxt et al
|
|
-------------------
|
|
|
|
These routines are difficult to duck-type to read both Unicode and
|
|
Bytes input.
|
|
|
|
I assumed they are meant for reading Bytes streams -- this is probably
|
|
the far more common use case with scientific data.
|
|
|
|
|
|
Cyclic imports
|
|
--------------
|
|
|
|
Python 3 is less forgiving about cyclic imports than Python 2. Cycles
|
|
need to be broken to have the same code work both on Python 2 and 3.
|
|
|
|
|
|
C Code
|
|
======
|
|
|
|
|
|
NPY_PY3K
|
|
--------
|
|
|
|
A #define in config.h, defined when building for Py3.
|
|
|
|
.. todo::
|
|
|
|
Currently, this is generated as a part of the config.
|
|
Is this sensible (we could also use Py_VERSION_HEX)?
|
|
|
|
|
|
private/npy_3kcompat.h
|
|
----------------------
|
|
|
|
Convenience macros for Python 3 support:
|
|
|
|
- PyInt -> PyLong on Py3
|
|
- PyString -> PyBytes on Py3
|
|
- PyUString -> PyUnicode on Py3 and PyString on Py2
|
|
- PyBytes on Py2
|
|
- PyUnicode_ConcatAndDel, PyUnicode_Concat2
|
|
- Py_SIZE et al., for older Python versions
|
|
- npy_PyFile_Dup, etc. to get FILE* from Py3 file objects
|
|
- PyObject_Cmp, convenience comparison function on Py3
|
|
- NpyCapsule_* helpers: PyCObject
|
|
|
|
Any new ones that need to be added should be added in this file.
|
|
|
|
.. todo::
|
|
|
|
Remove PyString_* eventually -- having a call to one of these in NumPy
|
|
sources is a sign of an error...
|
|
|
|
|
|
ob_type, ob_size
|
|
----------------
|
|
|
|
These use Py_SIZE, etc. macros now. The macros are also defined in
|
|
npy_3kcompat.h for the Python versions that don't have them natively.
|
|
|
|
|
|
Py_TPFLAGS_CHECKTYPES
|
|
---------------------
|
|
|
|
Python 3 no longer supports type coercion in arithmetic.
|
|
|
|
Py_TPFLAGS_CHECKTYPES is now on by default, and so the C-level
|
|
interface, ``nb_*`` methods, still unconditionally receive whatever
|
|
types as their two arguments.
|
|
|
|
However, this will affect Python-level code: previously if you
|
|
inherited from a Py_TPFLAGS_CHECKTYPES enabled class that implemented
|
|
a ``__mul__`` method, the same ``__mul__`` method would still be
|
|
called also as when a ``__rmul__`` was required, but with swapped
|
|
arguments (see Python/Objects/typeobject.c:wrap_binaryfunc_r).
|
|
However, on Python 3, arguments are swapped only if both are of same
|
|
(sub-)type, and otherwise things fail.
|
|
|
|
This means that ``ndarray``-derived subclasses must now implement all
|
|
relevant ``__r*__`` methods, since they cannot any more automatically
|
|
fall back to ndarray code.
|
|
|
|
|
|
PyNumberMethods
|
|
---------------
|
|
|
|
The structures have been converted to the new format:
|
|
|
|
- number.c
|
|
- scalartypes.c.src
|
|
- scalarmathmodule.c.src
|
|
|
|
The slots np_divide, np_long, np_oct, np_hex, and np_inplace_divide
|
|
have gone away. The slot np_int is what np_long used to be, tp_divide
|
|
is now tp_floor_divide, and np_inplace_divide is now
|
|
np_inplace_floor_divide.
|
|
|
|
These have simply been #ifdef'd out on Py3.
|
|
|
|
The Py2/Py3 compatible structure definition looks like::
|
|
|
|
static PyNumberMethods @name@_as_number = {
|
|
(binaryfunc)0, /*nb_add*/
|
|
(binaryfunc)0, /*nb_subtract*/
|
|
(binaryfunc)0, /*nb_multiply*/
|
|
#if defined(NPY_PY3K)
|
|
#else
|
|
(binaryfunc)0, /*nb_divide*/
|
|
#endif
|
|
(binaryfunc)0, /*nb_remainder*/
|
|
(binaryfunc)0, /*nb_divmod*/
|
|
(ternaryfunc)0, /*nb_power*/
|
|
(unaryfunc)0,
|
|
(unaryfunc)0, /*nb_pos*/
|
|
(unaryfunc)0, /*nb_abs*/
|
|
#if defined(NPY_PY3K)
|
|
(inquiry)0, /*nb_bool*/
|
|
#else
|
|
(inquiry)0, /*nb_nonzero*/
|
|
#endif
|
|
(unaryfunc)0, /*nb_invert*/
|
|
(binaryfunc)0, /*nb_lshift*/
|
|
(binaryfunc)0, /*nb_rshift*/
|
|
(binaryfunc)0, /*nb_and*/
|
|
(binaryfunc)0, /*nb_xor*/
|
|
(binaryfunc)0, /*nb_or*/
|
|
#if defined(NPY_PY3K)
|
|
#else
|
|
0, /*nb_coerce*/
|
|
#endif
|
|
(unaryfunc)0, /*nb_int*/
|
|
#if defined(NPY_PY3K)
|
|
(unaryfunc)0, /*nb_reserved*/
|
|
#else
|
|
(unaryfunc)0, /*nb_long*/
|
|
#endif
|
|
(unaryfunc)0, /*nb_float*/
|
|
#if defined(NPY_PY3K)
|
|
#else
|
|
(unaryfunc)0, /*nb_oct*/
|
|
(unaryfunc)0, /*nb_hex*/
|
|
#endif
|
|
0, /*inplace_add*/
|
|
0, /*inplace_subtract*/
|
|
0, /*inplace_multiply*/
|
|
#if defined(NPY_PY3K)
|
|
#else
|
|
0, /*inplace_divide*/
|
|
#endif
|
|
0, /*inplace_remainder*/
|
|
0, /*inplace_power*/
|
|
0, /*inplace_lshift*/
|
|
0, /*inplace_rshift*/
|
|
0, /*inplace_and*/
|
|
0, /*inplace_xor*/
|
|
0, /*inplace_or*/
|
|
(binaryfunc)0, /*nb_floor_divide*/
|
|
(binaryfunc)0, /*nb_true_divide*/
|
|
0, /*nb_inplace_floor_divide*/
|
|
0, /*nb_inplace_true_divide*/
|
|
(unaryfunc)NULL, /*nb_index*/
|
|
};
|
|
|
|
|
|
|
|
PyBuffer (provider)
|
|
-------------------
|
|
|
|
PyBuffer usage is widely spread in multiarray:
|
|
|
|
1) The void scalar makes use of buffers
|
|
2) Multiarray has methods for creating buffers etc. explicitly
|
|
3) Arrays can be created from buffers etc.
|
|
4) The .data attribute of an array is a buffer
|
|
|
|
Py3 introduces the PEP 3118 buffer protocol as the *only* protocol,
|
|
so we must implement it.
|
|
|
|
The exporter parts of the PEP 3118 buffer protocol are currently
|
|
implemented in ``buffer.c`` for arrays, and in ``scalartypes.c.src``
|
|
for generic array scalars. The generic array scalar exporter, however,
|
|
doesn't currently produce format strings, which needs to be fixed.
|
|
|
|
Also some code also stops working when ``bf_releasebuffer`` is
|
|
defined. Most importantly, ``PyArg_ParseTuple("s#", ...)`` refuses to
|
|
return a buffer if ``bf_releasebuffer`` is present. For this reason,
|
|
the buffer interface for arrays is implemented currently *without*
|
|
defining ``bf_releasebuffer`` at all. This forces us to go through
|
|
some additional work.
|
|
|
|
There are a couple of places that need further attention:
|
|
|
|
- VOID_getitem
|
|
|
|
In some cases, this returns a buffer object on Python 2. On Python 3,
|
|
there is no stand-alone buffer object, so we return a byte array instead.
|
|
|
|
The Py2/Py3 compatible PyBufferMethods definition looks like::
|
|
|
|
NPY_NO_EXPORT PyBufferProcs array_as_buffer = {
|
|
#if !defined(NPY_PY3K)
|
|
#if PY_VERSION_HEX >= 0x02050000
|
|
(readbufferproc)array_getreadbuf, /*bf_getreadbuffer*/
|
|
(writebufferproc)array_getwritebuf, /*bf_getwritebuffer*/
|
|
(segcountproc)array_getsegcount, /*bf_getsegcount*/
|
|
(charbufferproc)array_getcharbuf, /*bf_getcharbuffer*/
|
|
#else
|
|
(getreadbufferproc)array_getreadbuf, /*bf_getreadbuffer*/
|
|
(getwritebufferproc)array_getwritebuf, /*bf_getwritebuffer*/
|
|
(getsegcountproc)array_getsegcount, /*bf_getsegcount*/
|
|
(getcharbufferproc)array_getcharbuf, /*bf_getcharbuffer*/
|
|
#endif
|
|
#endif
|
|
#if PY_VERSION_HEX >= 0x02060000
|
|
(getbufferproc)array_getbuffer, /*bf_getbuffer*/
|
|
(releasebufferproc)array_releasebuffer, /*bf_releasebuffer*/
|
|
#endif
|
|
};
|
|
|
|
.. todo::
|
|
|
|
Produce PEP 3118 format strings for array scalar objects.
|
|
|
|
.. todo::
|
|
|
|
There's stuff to clean up in numarray/_capi.c
|
|
|
|
|
|
PyBuffer (consumer)
|
|
-------------------
|
|
|
|
There are two places in which we may want to be able to consume buffer
|
|
objects and cast them to ndarrays:
|
|
|
|
1) `multiarray.frombuffer`, ie., ``PyArray_FromAny``
|
|
|
|
The frombuffer returns only arrays of a fixed dtype. It does not
|
|
make sense to support PEP 3118 at this location, since not much
|
|
would be gained from that -- the backward compatibility functions
|
|
using the old array interface still work.
|
|
|
|
So no changes needed here.
|
|
|
|
2) `multiarray.array`, ie., ``PyArray_FromAny``
|
|
|
|
In general, we would like to handle :pep:`3118` buffers in the same way
|
|
as ``__array_interface__`` objects. Hence, we want to be able to cast
|
|
them to arrays already in ``PyArray_FromAny``.
|
|
|
|
Hence, ``PyArray_FromAny`` needs additions.
|
|
|
|
There are a few caveats in allowing :pep:`3118` buffers in
|
|
``PyArray_FromAny``:
|
|
|
|
a) `bytes` (and `str` on Py2) objects offer a buffer interface that
|
|
specifies them as 1-D array of bytes.
|
|
|
|
Previously ``PyArray_FromAny`` has cast these to 'S#' dtypes. We
|
|
don't want to change this, since will cause problems in many places.
|
|
|
|
We do, however, want to allow other objects that provide 1-D byte arrays
|
|
to be cast to 1-D ndarrays and not 'S#' arrays -- for instance, 'S#'
|
|
arrays tend to strip trailing NUL characters.
|
|
|
|
So what is done in ``PyArray_FromAny`` currently is that:
|
|
|
|
- Presence of :pep:`3118` buffer interface is checked before checking
|
|
for array interface. If it is present *and* the object is not
|
|
`bytes` object, then it is used for creating a view on the buffer.
|
|
|
|
- We also check in ``discover_depth`` and ``_array_find_type`` for the
|
|
3118 buffers, so that::
|
|
|
|
array([some_3118_object])
|
|
|
|
will treat the object similarly as it would handle an `ndarray`.
|
|
|
|
However, again, bytes (and unicode) have priority and will not be
|
|
handled as buffer objects.
|
|
|
|
This amounts to possible semantic changes:
|
|
|
|
- ``array(buffer)`` will no longer create an object array
|
|
``array([buffer], dtype='O')``, but will instead expand to a view
|
|
on the buffer.
|
|
|
|
.. todo::
|
|
|
|
Take a second look at places that used PyBuffer_FromMemory and
|
|
PyBuffer_FromReadWriteMemory -- what can be done with these?
|
|
|
|
.. todo::
|
|
|
|
There's some buffer code in numarray/_capi.c that needs to be addressed.
|
|
|
|
|
|
PyBuffer (object)
|
|
-----------------
|
|
|
|
Since there is a native buffer object in Py3, the `memoryview`, the
|
|
`newbuffer` and `getbuffer` functions are removed from `multiarray` in
|
|
Py3: their functionality is taken over by the new `memoryview` object.
|
|
|
|
|
|
PyString
|
|
--------
|
|
|
|
There is no PyString in Py3, everything is either Bytes or Unicode.
|
|
Unicode is also preferred in many places, e.g., in __dict__.
|
|
|
|
There are two issues related to the str/bytes change:
|
|
|
|
1) Return values etc. should prefer unicode
|
|
2) The 'S' dtype
|
|
|
|
This entry discusses return values etc. only, the 'S' dtype is a
|
|
separate topic.
|
|
|
|
All uses of PyString in NumPy should be changed to one of
|
|
|
|
- PyBytes: one-byte character strings in Py2 and Py3
|
|
- PyUString (defined in npy_3kconfig.h): PyString in Py2, PyUnicode in Py3
|
|
- PyUnicode: UCS in Py2 and Py3
|
|
|
|
In many cases the conversion only entails replacing PyString with
|
|
PyUString.
|
|
|
|
PyString is currently defined to PyBytes in npy_3kcompat.h, for making
|
|
things to build. This definition will be removed when Py3 support is
|
|
finished.
|
|
|
|
Where ``*_AsStringAndSize`` is used, more care needs to be taken, as
|
|
encoding Unicode to Bytes may needed. If this cannot be avoided, the
|
|
encoding should be ASCII, unless there is a very strong reason to do
|
|
otherwise. Especially, I don't believe we should silently fall back to
|
|
UTF-8 -- raising an exception may be a better choice.
|
|
|
|
Exceptions should use PyUnicode_AsUnicodeEscape -- this should result
|
|
to an ASCII-clean string that is appropriate for the exception
|
|
message.
|
|
|
|
Some specific decisions that have been made so far:
|
|
|
|
* descriptor.c: dtype field names are UString
|
|
|
|
At some places in NumPy code, there are some guards for Unicode field
|
|
names. However, the dtype constructor accepts only strings as field names,
|
|
so we should assume field names are *always* UString.
|
|
|
|
* descriptor.c: field titles can be arbitrary objects.
|
|
If they are UString (or, on Py2, Bytes or Unicode), insert to fields dict.
|
|
|
|
* descriptor.c: dtype strings are Unicode.
|
|
|
|
* descriptor.c: datetime tuple contains Bytes only.
|
|
|
|
* repr() and str() should return UString
|
|
|
|
* comparison between Unicode and Bytes is not defined in Py3
|
|
|
|
* Type codes in numerictypes.typeInfo dict are Unicode
|
|
|
|
* Func name in errobj is Bytes (should be forced to ASCII)
|
|
|
|
.. todo::
|
|
|
|
tp_doc -- it's a char* pointer, but what is the encoding?
|
|
Check esp. lib/src/_compiled_base
|
|
|
|
Currently, UTF-8 is assumed.
|
|
|
|
.. todo::
|
|
|
|
ufunc names -- again, what's the encoding?
|
|
|
|
.. todo::
|
|
|
|
Cleanup to do later on: Replace all occurrences of PyString by
|
|
PyBytes, PyUnicode, or PyUString.
|
|
|
|
.. todo::
|
|
|
|
Revise errobj decision?
|
|
|
|
.. todo::
|
|
|
|
Check that non-UString field names are not accepted anywhere.
|
|
|
|
|
|
PyUnicode
|
|
---------
|
|
|
|
PyUnicode in Py3 is pretty much as it was in Py2, except that it is
|
|
now the only "real" string type.
|
|
|
|
In Py3, Unicode and Bytes are not comparable, ie., 'a' != b'a'. NumPy
|
|
comparison routines were handled to act in the same way, leaving
|
|
comparison between Unicode and Bytes undefined.
|
|
|
|
.. todo::
|
|
|
|
Check that indeed all comparison routines were changed.
|
|
|
|
|
|
Fate of the 'S' dtype
|
|
---------------------
|
|
|
|
On Python 3, the 'S' dtype will still be Bytes.
|
|
|
|
However,::
|
|
|
|
str, str_ == unicode_
|
|
|
|
|
|
PyInt
|
|
-----
|
|
|
|
There is no limited-range integer type any more in Py3. It makes no
|
|
sense to inherit NumPy ints from Py3 ints.
|
|
|
|
Currently, the following is done:
|
|
|
|
1) NumPy's integer types no longer inherit from Python integer.
|
|
2) int is taken dtype-equivalent to NPY_LONG
|
|
3) ints are converted to NPY_LONG
|
|
|
|
PyInt methods are currently replaced by PyLong, via macros in npy_3kcompat.h.
|
|
|
|
Dtype decision rules were changed accordingly, so that NumPy understands
|
|
Py3 int translate to NPY_LONG as far as dtypes are concerned.
|
|
|
|
array([1]).dtype will be the default NPY_LONG integer.
|
|
|
|
.. todo::
|
|
|
|
Not inheriting from `int` on Python 3 makes the following not work:
|
|
``np.intp("0xff", 16)`` -- because the NumPy type does not take
|
|
the second argument. This could perhaps be fixed...
|
|
|
|
|
|
Divide
|
|
------
|
|
|
|
The Divide operation is no more.
|
|
|
|
Calls to PyNumber_Divide were replaced by FloorDivide or TrueDivide,
|
|
as appropriate.
|
|
|
|
The PyNumberMethods entry is #ifdef'd out on Py3, see above.
|
|
|
|
|
|
tp_compare, PyObject_Compare
|
|
----------------------------
|
|
|
|
The compare method has vanished, and is replaced with richcompare.
|
|
We just #ifdef the compare methods out on Py3.
|
|
|
|
New richcompare methods were implemented for:
|
|
|
|
* flagsobject.c
|
|
|
|
On the consumer side, we have a convenience wrapper in npy_3kcompat.h
|
|
providing PyObject_Cmp also on Py3.
|
|
|
|
|
|
Pickling
|
|
--------
|
|
|
|
The ndarray and dtype __setstate__ were modified to be
|
|
backward-compatible with Py3: they need to accept a Unicode endian
|
|
character, and Unicode data since that's what Py2 str is unpickled to
|
|
in Py3.
|
|
|
|
An encoding assumption is required for backward compatibility: the user
|
|
must do
|
|
|
|
loads(f, encoding='latin1')
|
|
|
|
to successfully read pickles created by Py2.
|
|
|
|
.. todo::
|
|
|
|
Forward compatibility? Is it even possible?
|
|
For sure, we are not knowingly going to store data in PyUnicode,
|
|
so probably the only way for forward compatibility is to implement
|
|
a custom Unpickler for Py2?
|
|
|
|
.. todo::
|
|
|
|
If forward compatibility is not possible, aim to store also the endian
|
|
character as Bytes...
|
|
|
|
|
|
Module initialization
|
|
---------------------
|
|
|
|
The module initialization API changed in Python 3.1.
|
|
|
|
Most NumPy modules are now converted.
|
|
|
|
|
|
PyTypeObject
|
|
------------
|
|
|
|
The PyTypeObject of py3k is binary compatible with the py2k version and the
|
|
old initializers should work. However, there are several considerations to
|
|
keep in mind.
|
|
|
|
1) Because the first three slots are now part of a struct some compilers issue
|
|
warnings if they are initialized in the old way.
|
|
|
|
2) The compare slot has been made reserved in order to preserve binary
|
|
compatibility while the tp_compare function went away. The tp_richcompare
|
|
function has replaced it and we need to use that slot instead. This will
|
|
likely require modifications in the searchsorted functions and generic sorts
|
|
that currently use the compare function.
|
|
|
|
3) The previous numpy practice of initializing the COUNT_ALLOCS slots was
|
|
bogus. They are not supposed to be explicitly initialized and were out of
|
|
place in any case because an extra base slot was added in python 2.6.
|
|
|
|
Because of these facts it is better to use #ifdefs to bring the old
|
|
initializers up to py3k snuff rather than just fill the tp_richcompare
|
|
slot. They also serve to mark the places where changes have been
|
|
made. Note that explicit initialization can stop once none of the
|
|
remaining entries are non-zero, because zero is the default value that
|
|
variables with non-local linkage receive.
|
|
|
|
The Py2/Py3 compatible TypeObject definition looks like::
|
|
|
|
NPY_NO_EXPORT PyTypeObject Foo_Type = {
|
|
#if defined(NPY_PY3K)
|
|
PyVarObject_HEAD_INIT(0,0)
|
|
#else
|
|
PyObject_HEAD_INIT(0)
|
|
0, /* ob_size */
|
|
#endif
|
|
"numpy.foo" /* tp_name */
|
|
0, /* tp_basicsize */
|
|
0, /* tp_itemsize */
|
|
/* methods */
|
|
0, /* tp_dealloc */
|
|
0, /* tp_print */
|
|
0, /* tp_getattr */
|
|
0, /* tp_setattr */
|
|
#if defined(NPY_PY3K)
|
|
(void *)0, /* tp_reserved */
|
|
#else
|
|
0, /* tp_compare */
|
|
#endif
|
|
0, /* tp_repr */
|
|
0, /* tp_as_number */
|
|
0, /* tp_as_sequence */
|
|
0, /* tp_as_mapping */
|
|
0, /* tp_hash */
|
|
0, /* tp_call */
|
|
0, /* tp_str */
|
|
0, /* tp_getattro */
|
|
0, /* tp_setattro */
|
|
0, /* tp_as_buffer */
|
|
0, /* tp_flags */
|
|
0, /* tp_doc */
|
|
0, /* tp_traverse */
|
|
0, /* tp_clear */
|
|
0, /* tp_richcompare */
|
|
0, /* tp_weaklistoffset */
|
|
0, /* tp_iter */
|
|
0, /* tp_iternext */
|
|
0, /* tp_methods */
|
|
0, /* tp_members */
|
|
0, /* tp_getset */
|
|
0, /* tp_base */
|
|
0, /* tp_dict */
|
|
0, /* tp_descr_get */
|
|
0, /* tp_descr_set */
|
|
0, /* tp_dictoffset */
|
|
0, /* tp_init */
|
|
0, /* tp_alloc */
|
|
0, /* tp_new */
|
|
0, /* tp_free */
|
|
0, /* tp_is_gc */
|
|
0, /* tp_bases */
|
|
0, /* tp_mro */
|
|
0, /* tp_cache */
|
|
0, /* tp_subclasses */
|
|
0, /* tp_weaklist */
|
|
0, /* tp_del */
|
|
0 /* tp_version_tag (2.6) */
|
|
};
|
|
|
|
|
|
|
|
PySequenceMethods
|
|
-----------------
|
|
|
|
Types with tp_as_sequence defined
|
|
|
|
* multiarray/descriptor.c
|
|
* multiarray/scalartypes.c.src
|
|
* multiarray/arrayobject.c
|
|
|
|
PySequenceMethods in py3k are binary compatible with py2k, but some of the
|
|
slots have gone away. I suspect this means some functions need redefining so
|
|
the semantics of the slots needs to be checked::
|
|
|
|
PySequenceMethods foo_sequence_methods = {
|
|
(lenfunc)0, /* sq_length */
|
|
(binaryfunc)0, /* sq_concat */
|
|
(ssizeargfunc)0, /* sq_repeat */
|
|
(ssizeargfunc)0, /* sq_item */
|
|
(void *)0, /* nee sq_slice */
|
|
(ssizeobjargproc)0, /* sq_ass_item */
|
|
(void *)0, /* nee sq_ass_slice */
|
|
(objobjproc)0, /* sq_contains */
|
|
(binaryfunc)0, /* sq_inplace_concat */
|
|
(ssizeargfunc)0 /* sq_inplace_repeat */
|
|
};
|
|
|
|
|
|
PyMappingMethods
|
|
----------------
|
|
|
|
Types with tp_as_mapping defined
|
|
|
|
* multiarray/descriptor.c
|
|
* multiarray/iterators.c
|
|
* multiarray/scalartypes.c.src
|
|
* multiarray/flagsobject.c
|
|
* multiarray/arrayobject.c
|
|
|
|
PyMappingMethods in py3k look to be the same as in py2k. The semantics
|
|
of the slots needs to be checked::
|
|
|
|
PyMappingMethods foo_mapping_methods = {
|
|
(lenfunc)0, /* mp_length */
|
|
(binaryfunc)0, /* mp_subscript */
|
|
(objobjargproc)0 /* mp_ass_subscript */
|
|
};
|
|
|
|
|
|
PyFile
|
|
------
|
|
|
|
Many of the PyFile items have disappeared:
|
|
|
|
1) PyFile_Type
|
|
2) PyFile_AsFile
|
|
3) PyFile_FromString
|
|
|
|
Most importantly, in Py3 there is no way to extract a FILE* pointer
|
|
from the Python file object. There are, however, new PyFile_* functions
|
|
for writing and reading data from the file.
|
|
|
|
Compatibility wrappers that return a dup-ed `fdopen` file pointer are
|
|
in private/npy_3kcompat.h. This causes more flushing to be necessary,
|
|
but it appears there is no alternative solution. The FILE pointer so
|
|
obtained must be closed with fclose after use.
|
|
|
|
.. todo::
|
|
|
|
Should probably be done much later on...
|
|
|
|
Adapt all NumPy I/O to use the PyFile_* methods or the low-level
|
|
IO routines. In any case, it's unlikely that C stdio can be used any more.
|
|
|
|
Perhaps using PyFile_* makes numpy.tofile e.g. to a gzip to work?
|
|
|
|
|
|
READONLY
|
|
--------
|
|
|
|
The RO alias for READONLY is no more.
|
|
|
|
These were replaced, as READONLY is present also on Py2.
|
|
|
|
|
|
PyOS
|
|
----
|
|
|
|
Deprecations:
|
|
|
|
1) PyOS_ascii_strtod -> PyOS_double_from_string;
|
|
curiously enough, PyOS_ascii_strtod is not only deprecated but also
|
|
causes segfaults
|
|
|
|
|
|
PyInstance
|
|
----------
|
|
|
|
There are some checks for PyInstance in ``common.c`` and ``ctors.c``.
|
|
|
|
Currently, ``PyInstance_Check`` is just #ifdef'd out for Py3. This is,
|
|
possibly, not the correct thing to do.
|
|
|
|
.. todo::
|
|
|
|
Do the right thing for PyInstance checks.
|
|
|
|
|
|
PyCObject / PyCapsule
|
|
---------------------
|
|
|
|
The PyCObject API is removed in Python 3.2, so we need to rewrite it
|
|
using PyCapsule.
|
|
|
|
NumPy was changed to use the Capsule API, using NpyCapsule* wrappers.
|