array_split documentation¶
Release: | 0.5.2 |
---|---|
Version: | 0.5.2 |
Date: | Sep 11, 2017 |
Introduction¶
The array_split python package is an enhancement to existing numpy.ndarray functions, such as numpy.array_split, skimage.util.view_as_blocks and skimage.util.view_as_windows, which sub-divide a multi-dimensional array into a number of multi-dimensional sub-arrays (slices). Example application areas include:
- Parallel Processing
- A large (dense) array is partitioned into smaller sub-arrays which can be processed concurrently by multiple processes (multiprocessing or mpi4py) or other memory-limited hardware (e.g. GPGPU using pyopencl, pycuda, etc). For GPGPU, it is necessary for sub-array not to exceed the GPU memory and desirable for the sub-array shape to be a multiple of the work-group (OpenCL) or thread-block (CUDA) size.
- File I/O
- A large (dense) array is partitioned into smaller sub-arrays which can be written to individual files (as, for example, a HDF5 Virtual Dataset). It is often desirable for the individual files not to exceed a specified number of (Giga) bytes and, for HDF5, it is desirable to have the individual file sub-array shape a multiple of the chunk shape. Similarly, out of core algorithms for large dense arrays often involve processing the entire data-set as a series of in-core sub-arrays. Again, it is desirable for the individual sub-array shape to be a multiple of the chunk shape.
The array_split package provides the means to partition an array (or array shape) using any of the following criteria:
Per-axis indices indicating the cut positions.
Per-axis number of sub-arrays.
Total number of sub-arrays (with optional per-axis number of sections constraints).
Specific sub-array shape.
Specification of halo (ghost) elements for sub-arrays.
Arbitrary start index for the shape to be partitioned.
Maximum number of bytes for a sub-array with constraints:
- sub-arrays are an even multiple of a specified sub-tile shape
- upper limit on the per-axis sub-array shape
Quick Start Example¶
>>> from array_split import array_split, shape_split
>>> import numpy as np
>>>
>>> ary = np.arange(0, 4*9)
>>>
>>> array_split(ary, 4) # 1D split into 4 sections (like numpy.array_split)
[array([0, 1, 2, 3, 4, 5, 6, 7, 8]),
array([ 9, 10, 11, 12, 13, 14, 15, 16, 17]),
array([18, 19, 20, 21, 22, 23, 24, 25, 26]),
array([27, 28, 29, 30, 31, 32, 33, 34, 35])]
>>>
>>> shape_split(ary.shape, 4) # 1D split into 4 parts, returns slice objects
array([(slice(0, 9, None),), (slice(9, 18, None),), (slice(18, 27, None),), (slice(27, 36, None),)],
dtype=[('0', 'O')])
>>>
>>> ary = ary.reshape(4, 9) # Make ary 2D
>>> split = shape_split(ary.shape, axis=(2, 3)) # 2D split into 2*3=6 sections
>>> split.shape
(2, 3)
>>> split
array([[(slice(0, 2, None), slice(0, 3, None)),
(slice(0, 2, None), slice(3, 6, None)),
(slice(0, 2, None), slice(6, 9, None))],
[(slice(2, 4, None), slice(0, 3, None)),
(slice(2, 4, None), slice(3, 6, None)),
(slice(2, 4, None), slice(6, 9, None))]],
dtype=[('0', 'O'), ('1', 'O')])
>>> sub_arys = [ary[tup] for tup in split.flatten()] # Create sub-array views from slice tuples.
>>> sub_arys
[array([[ 0, 1, 2], [ 9, 10, 11]]),
array([[ 3, 4, 5], [12, 13, 14]]),
array([[ 6, 7, 8], [15, 16, 17]]),
array([[18, 19, 20], [27, 28, 29]]),
array([[21, 22, 23], [30, 31, 32]]),
array([[24, 25, 26], [33, 34, 35]])]
Latest sphinx documentation (including more examples) at http://array-split.readthedocs.io/en/latest/.
Installation¶
Using pip
(root access required):
pip install array_split
or local user install (no root access required):
pip install --user array_split
or local user install from latest github source:
pip install --user git+git://github.com/array-split/array_split.git#egg=array_split
Testing¶
Run tests (unit-tests and doctest module docstring tests) using:
python -m array_split.tests
or, from the source tree, run:
python setup.py test
Travis CI at:
and AppVeyor at:
Documentation¶
Latest sphinx generated documentation is at:
and at github gh-pages:
Sphinx documentation can be built from the source:
python setup.py build_sphinx
with the HTML generated in docs/_build/html
.
Contributing¶
Check out the CONTRIBUTING doc.
License information¶
See the file LICENSE.txt for terms & conditions, for usage and a DISCLAIMER OF ALL WARRANTIES.
Terminology¶
Definitions:
- tile
- A multi-dimensional sub-array of a
numpy.ndarray
.- slice
- A
tuple
ofslice
elements defining the extents of a tile/sub-array.- cut
- A division along an axis to form tiles or slices.
- split
- The sub-division (tiling) of an array (or an array shape) resulting from cuts.
- halo
- Per-axis number of elements which specifies the expansion of a tile (in the negative and positive axis directions) to form an overlap of elements with neighbouring tiles. The overlaps are often referred to as ghost cells or ghost elements.
- sub-tile
- A sub-array of a tile.
Parameter Categories¶
There are four categories of parameters for specifying a split:
- Number of tiles
- The total number of tiles and/or the number of slices per axis. The
indices_or_sections
parameter can specify the number of tiles in the resulting split (as anint
).- Per-axis split indices
- The per-axis indices specifying where the array (shape) is to be cut. The
indices_or_sections
parameter doubles up to indicate the indices at which cuts are to occur.- Tile shape
- Explicitly specify the shape of the tile in a split. The
tile_shape
parameter (typically as a lone keyword argument) indicates the tile shape.- Tile maximum number of bytes
- Given the number of bytes per array element, a tile shape is calculated such that all tiles (including halo extension) of the resulting split do not exceed a specified (maximum) number of bytes. The
array_itemsize
parameter gives the number of bytes per array element and themax_tile_bytes
parameter constrains the maximum number of bytes per tile.
The subsequent sections provides examples from each of these categories.
Import statements for the examples¶
In the examples of the following sections, we assume that the following statement
has been issued to import
the relevant functions:
>>> import numpy
>>> from array_split import array_split, shape_split, ShapeSplitter
array_split
, shape_split
and ShapeSplitter
¶
The array_split.array_split()
function is analogous to
the numpy.array_split()
function. It takes a numpy.ndarray
object as an argument and returns a list
of tile (numpy.ndarray
sub-array
objects) elements:
>>> numpy.array_split(numpy.arange(0, 10), 3)
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
>>> array_split(numpy.arange(0, 10), 3) # array_split.array_split
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
The array_split.shape_split()
function takes an array shape as an
argument instead of an actual numpy.ndarray
object, and returns
a numpy
structured array
of tuple
elements. The tuple elements can then be used to generate
the tiles from a numpy.ndarray
of an equivalent shape:
>>> ary = numpy.arange(0, 10)
>>> split = shape_split(ary.shape, 3) # returns array of tuples
>>> split
array([(slice(0, 4, None),), (slice(4, 7, None),), (slice(7, 10, None),)],
dtype=[('0', 'O')])
>>> [ary[slyce] for slyce in split.flatten()] # generates tile views of ary
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
Each tuple
element, of the returned split, has length
equal to the dimension of the multi-dimensional shape,
i.e. N = len(array_shape)
. Each tuple
indicates the indexing extent of a tile.
The array_split.ShapeSplitter
class contains the bulk of the split implementation
for the array_split.shape_split()
. The array_split.ShapeSplitter.__init__()
constructor takes the same arguments as the array_split.shape_split()
function and
the array_split.ShapeSplitter.calculate_split()
method computes the split. After
the split computation, some state information is preserved in the
array_split.ShapeSplitter
data attributes:
>>> ary = numpy.arange(0, 10)
>>> splitter = ShapeSplitter(ary.shape, 3)
>>> split = splitter.calculate_split()
>>> split.shape
(3,)
>>> split
array([(slice(0, 4, None),), (slice(4, 7, None),), (slice(7, 10, None),)],
dtype=[('0', 'O')])
>>> [ary[slyce] for slyce in split.flatten()]
[array([0, 1, 2, 3]), array([4, 5, 6]), array([7, 8, 9])]
>>>
>>> splitter.split_shape # equivalent to split.shape above
array([3])
>>> splitter.split_begs # start indices for tile extents
[array([0, 4, 7])]
>>> splitter.split_ends # stop indices for tile extents
[array([ 4, 7, 10])]
Methods of the array_split.ShapeSplitter
class can be over-ridden
in sub-classes in order to customise the splitting behaviour.
The examples of the following section explicitly illustrate the behaviour for
the array_split.shape_split()
function, but with minor modifications,
the examples are also relevant for the array_split.array_split()
function
and for instances of the array_split.ShapeSplitter
class.
Splitting by number of tiles¶
Splitting an array is performed by specifying: total number of tiles in the final split and per-axis number of slices.
Single axis number of tiles¶
When the indices_or_sections
parameter is specified as an
integer (scalar), it specifies the number of tiles in the returned split:
>>> split = shape_split([20,], 4) # 1D, array_shape=[20,], number of tiles=4, default axis=0
>>> split.shape
(4,)
>>> split
array([(slice(0, 5, None),), (slice(5, 10, None),), (slice(10, 15, None),),
(slice(15, 20, None),)],
dtype=[('0', 'O')])
By default, cuts are made along the axis = 0
axis. In the multi-dimensional
case, one can over-ride the axis using the axis
parameter, e.g. for a 2D shape:
>>> split = shape_split([20,10], 4, axis=1) # Split along axis=1
>>> split.shape
(1, 4)
>>> split
array([[(slice(0, 20, None), slice(0, 3, None)),
(slice(0, 20, None), slice(3, 6, None)),
(slice(0, 20, None), slice(6, 8, None)),
(slice(0, 20, None), slice(8, 10, None))]],
dtype=[('0', 'O'), ('1', 'O')])
Multiple axes number of tiles¶
The axis
parameter can also be used to specify the number of slices (sections)
per-axis:
>>> split = shape_split([20, 10], axis=[3, 2]) # Cut into 3*2=6 tiles
>>> split.shape
(3, 2)
>>> split
array([[(slice(0, 7, None), slice(0, 5, None)),
(slice(0, 7, None), slice(5, 10, None))],
[(slice(7, 14, None), slice(0, 5, None)),
(slice(7, 14, None), slice(5, 10, None))],
[(slice(14, 20, None), slice(0, 5, None)),
(slice(14, 20, None), slice(5, 10, None))]],
dtype=[('0', 'O'), ('1', 'O')])
The array axis 0 has been cut into three sections and axis 1 has been cut into two
sections for a total of 3*2 = 6
tiles. In general, if axis
is an
integer (scalar) it indicates the single axis which is to be cut to form slices.
When axis
is a sequence, then axis[i]
indicates the number of
sections into which axis i
is to be cut.
In addition, one can also specify a total number of tiles and use the axis
parameter to limit which axes are to be cut by specifying non-positive values for
elements of the axis
sequence. For example, in 3D, cut into 8 tiles, but
only cut the axis=1
and axis=2
axes:
>>> split = shape_split([20, 10, 15], 8, axis=[1, 0, 0]) # Cut into 1*?*?=8 tiles
>>> split.shape
(1, 4, 2)
>>> split
array([[[(slice(0, 20, None), slice(0, 3, None), slice(0, 8, None)),
(slice(0, 20, None), slice(0, 3, None), slice(8, 15, None))],
[(slice(0, 20, None), slice(3, 6, None), slice(0, 8, None)),
(slice(0, 20, None), slice(3, 6, None), slice(8, 15, None))],
[(slice(0, 20, None), slice(6, 8, None), slice(0, 8, None)),
(slice(0, 20, None), slice(6, 8, None), slice(8, 15, None))],
[(slice(0, 20, None), slice(8, 10, None), slice(0, 8, None)),
(slice(0, 20, None), slice(8, 10, None), slice(8, 15, None))]]],
dtype=[('0', 'O'), ('1', 'O'), ('2', 'O')])
In the above, non-positive elements of axis
are replaced
with positive values such that numpy.product(axis)
equals
the number of requested tiles (= 8
above).
Raises ValueError
if the impossible is attempted:
>>> try:
... split = shape_split([20, 10, 15], 8, axis=[1, 3, 0]) # Impossible to cut into 1*3*?=8 tiles
... except (ValueError,) as e:
... e
...
ValueError('Unable to construct grid of num_slices=8 elements from num_slices_per_axis=[1, 3, 0] (with max_slices_per_axis=[20 10 15])',)
Splitting by per-axis cut indices¶
Array splitting is performed by explicitly specifying the indices at which cuts are performed.
Single axis cut indices¶
The indices_or_sections
parameter can also be used to
specify the location (index values) of cuts:
>>> split = shape_split([20,], [5, 7, 9]) # 1D, split into 4 tiles, default cut axis=0
>>> split.shape
(4,)
>>> split
array([(slice(0, 5, None),), (slice(5, 7, None),), (slice(7, 9, None),),
(slice(9, 20, None),)],
dtype=[('0', 'O')])
Here, three cuts have been made to form 4
slices, cuts at index 5
, index 7
and index 9
.
Similarly, in 2D, the indices_or_sections
cut indices can made
along axis = 1
only:
>>> split = shape_split([20, 13], [5, 7, 9], axis=1) # 2D, cut into 4 tiles, cut axis=1
>>> split.shape
(1, 4)
>>> split
array([[(slice(0, 20, None), slice(0, 5, None)),
(slice(0, 20, None), slice(5, 7, None)),
(slice(0, 20, None), slice(7, 9, None)),
(slice(0, 20, None), slice(9, 13, None))]],
dtype=[('0', 'O'), ('1', 'O')])
Multiple axes cut indices¶
The indices_or_sections
parameter can also be used to cut
along multiple axes. In this case, the indices_or_sections
parameter is specified as a sequence of sequence,
so that indices_or_sections[i]
specifies the cut
indices along axis i
.
For example, in 3D, cut along axis=1
and axis=2
only:
>>> split = shape_split([20, 13, 64], [[], [7], [15, 30, 45]]) # 3D, split into 8 tiles, no cuts on axis=0
>>> split.shape
(1, 2, 4)
>>> split
array([[[(slice(0, 20, None), slice(0, 7, None), slice(0, 15, None)),
(slice(0, 20, None), slice(0, 7, None), slice(15, 30, None)),
(slice(0, 20, None), slice(0, 7, None), slice(30, 45, None)),
(slice(0, 20, None), slice(0, 7, None), slice(45, 64, None))],
[(slice(0, 20, None), slice(7, 13, None), slice(0, 15, None)),
(slice(0, 20, None), slice(7, 13, None), slice(15, 30, None)),
(slice(0, 20, None), slice(7, 13, None), slice(30, 45, None)),
(slice(0, 20, None), slice(7, 13, None), slice(45, 64, None))]]],
dtype=[('0', 'O'), ('1', 'O'), ('2', 'O')])
The indices_or_sections=[[], [7], [15, 30, 45]]
parameter indicates
that the cut indices for axis=0
are []
(i.e. no cuts), the
cut indices for axis=1
are [7]
(a single cut at index 7
)
and the cut indices for axis=2
are [15, 30, 45]
(three cuts).
Splitting by tile shape¶
The tile shape can be explicitly set with the tile_shape
parameter,
e.g. in 1D:
>>> split = shape_split([20,], tile_shape=[6,]) # Cut into (6,) shaped tiles
>>> split.shape
(4,)
>>> split
array([(slice(0, 6, None),), (slice(6, 12, None),), (slice(12, 18, None),),
(slice(18, 20, None),)],
dtype=[('0', 'O')])
and 2D:
>>> split = shape_split([20, 32], tile_shape=[6, 16]) # Cut into (6, 16) shaped tiles
>>> split.shape
(4, 2)
>>> split
array([[(slice(0, 6, None), slice(0, 16, None)),
(slice(0, 6, None), slice(16, 32, None))],
[(slice(6, 12, None), slice(0, 16, None)),
(slice(6, 12, None), slice(16, 32, None))],
[(slice(12, 18, None), slice(0, 16, None)),
(slice(12, 18, None), slice(16, 32, None))],
[(slice(18, 20, None), slice(0, 16, None)),
(slice(18, 20, None), slice(16, 32, None))]],
dtype=[('0', 'O'), ('1', 'O')])
Splitting by maximum bytes per tile¶
Tile shape can constrained by specifying a maximum number of bytes
per tile by specifying the array_itemsize
and
the max_tile_bytes
parameters. In 1D:
>>> split = shape_split(
... array_shape=[512,],
... array_itemsize=1, # Default value
... max_tile_bytes=512 # Equals number of array bytes
... )
...
>>> split.shape
(1,)
>>> split
array([(slice(0, 512, None),)],
dtype=[('0', 'O')])
Double the array per-element number of bytes:
>>> split = shape_split(
... array_shape=[512,],
... array_itemsize=2,
... max_tile_bytes=512 # Equals half the number of array bytes
... )
...
>>> split.shape
(2,)
>>> split
array([(slice(0, 256, None),), (slice(256, 512, None),)],
dtype=[('0', 'O')])
Decrement max_tile_bytes
to 511
to split into 3 tiles:
>>> split = shape_split(
... array_shape=[512,],
... array_itemsize=2,
... max_tile_bytes=511 # Less than half the number of array bytes
... )
...
>>> split.shape
(3,)
>>> split
array([(slice(0, 171, None),), (slice(171, 342, None),),
(slice(342, 512, None),)],
dtype=[('0', 'O')])
Note that the split is calculated so that tiles are approximately equal in size.
In 2D:
>>> split = shape_split(
... array_shape=[512, 1024],
... array_itemsize=1,
... max_tile_bytes=512*512
... )
...
>>> split.shape
(2, 1)
>>> split
array([[(slice(0, 256, None), slice(0, 1024, None))],
[(slice(256, 512, None), slice(0, 1024, None))]],
dtype=[('0', 'O'), ('1', 'O')])
and increasing array_itemsize
to 4
:
>>> split = shape_split(
... array_shape=[512, 1024],
... array_itemsize=4,
... max_tile_bytes=512*512
... )
...
>>> split.shape
(8, 1)
>>> split
array([[(slice(0, 64, None), slice(0, 1024, None))],
[(slice(64, 128, None), slice(0, 1024, None))],
[(slice(128, 192, None), slice(0, 1024, None))],
[(slice(192, 256, None), slice(0, 1024, None))],
[(slice(256, 320, None), slice(0, 1024, None))],
[(slice(320, 384, None), slice(0, 1024, None))],
[(slice(384, 448, None), slice(0, 1024, None))],
[(slice(448, 512, None), slice(0, 1024, None))]],
dtype=[('0', 'O'), ('1', 'O')])
The preference is to cut into ('C'
order) contiguous memory tiles.
Tile shape upper bound constraint¶
The split can be influenced by specifying the max_tile_shape
parameter. For the previous 2D example, cuts can for forced
along axis=1
by constraining the tile shape:
>>> split = shape_split(
... array_shape=[512, 1024],
... array_itemsize=4,
... max_tile_bytes=512*512,
... max_tile_shape=[numpy.inf, 256]
... )
...
>>> split.shape
(2, 4)
>>> split
array([[(slice(0, 256, None), slice(0, 256, None)),
(slice(0, 256, None), slice(256, 512, None)),
(slice(0, 256, None), slice(512, 768, None)),
(slice(0, 256, None), slice(768, 1024, None))],
[(slice(256, 512, None), slice(0, 256, None)),
(slice(256, 512, None), slice(256, 512, None)),
(slice(256, 512, None), slice(512, 768, None)),
(slice(256, 512, None), slice(768, 1024, None))]],
dtype=[('0', 'O'), ('1', 'O')])
Sub-tile shape constraint¶
The split can also be influenced by specifying the sub_tile_shape
parameter which forces the tile shape to be an even multiple of
the sub_tile_shape
:
>>> split = shape_split(
... array_shape=[512, 1024],
... array_itemsize=4,
... max_tile_bytes=512*512,
... max_tile_shape=[numpy.inf, 256],
... sub_tile_shape=(15, 10)
... )
...
>>> split.shape
(3, 5)
>>> split
array([[(slice(0, 180, None), slice(0, 210, None)),
(slice(0, 180, None), slice(210, 420, None)),
(slice(0, 180, None), slice(420, 630, None)),
(slice(0, 180, None), slice(630, 840, None)),
(slice(0, 180, None), slice(840, 1024, None))],
[(slice(180, 360, None), slice(0, 210, None)),
(slice(180, 360, None), slice(210, 420, None)),
(slice(180, 360, None), slice(420, 630, None)),
(slice(180, 360, None), slice(630, 840, None)),
(slice(180, 360, None), slice(840, 1024, None))],
[(slice(360, 512, None), slice(0, 210, None)),
(slice(360, 512, None), slice(210, 420, None)),
(slice(360, 512, None), slice(420, 630, None)),
(slice(360, 512, None), slice(630, 840, None)),
(slice(360, 512, None), slice(840, 1024, None))]],
dtype=[('0', 'O'), ('1', 'O')])
The array_start
parameter¶
The array_start
argument to the array_split.shape_split()
function
and the array_split.ShapeSplitter.__init__()
constructor specifies
an index offset for the slices in the returned tuple
of slice
objects:
>>> split = shape_split((15,), 3)
>>> split
array([(slice(0, 5, None),), (slice(5, 10, None),), (slice(10, 15, None),)],
dtype=[('0', 'O')])
>>> split = shape_split((15,), 3, array_start=(20,))
>>> split
array([(slice(20, 25, None),), (slice(25, 30, None),),
(slice(30, 35, None),)],
dtype=[('0', 'O')])
The halo
parameter¶
The halo
parameter can be used to generate tiles
which overlap with neighbouring tiles by a specified number
of array elements (in each axis direction):
>>> from array_split import ARRAY_BOUNDS, NO_BOUNDS
>>> split = shape_split([16,], 4) # No halo
>>> split.shape
(4,)
>>> split
array([(slice(0, 4, None),), (slice(4, 8, None),), (slice(8, 12, None),),
(slice(12, 16, None),)],
dtype=[('0', 'O')])
>>> split = shape_split([16,], 4, halo=2, tile_bounds_policy=ARRAY_BOUNDS) # halo width = 2
>>> split.shape
(4,)
>>> split
array([(slice(0, 6, None),), (slice(2, 10, None),), (slice(6, 14, None),),
(slice(10, 16, None),)],
dtype=[('0', 'O')])
>>> split = shape_split(
... [16,],
... 4,
... halo=2,
... tile_bounds_policy=NO_BOUNDS # halo width = 2 and tile halos extend outside array_shape bounds
... )
>>> split.shape
(4,)
>>> split
array([(slice(-2, 6, None),), (slice(2, 10, None),), (slice(6, 14, None),),
(slice(10, 18, None),)],
dtype=[('0', 'O')])
The tile_bounds_policy
parameter specifies whether the halo
extended tiles can extend beyond the bounding box defined by the start
index array_start
and the stop index array_start + array_shape
.
Asymmetric halo extensions can also be specified:
>>> split = shape_split(
... [16,],
... 4,
... halo=((1,2),),
... tile_bounds_policy=NO_BOUNDS
... )
>>> split.shape
(4,)
>>> split
array([(slice(-1, 6, None),), (slice(3, 10, None),), (slice(7, 14, None),),
(slice(11, 18, None),)],
dtype=[('0', 'O')])
For an N
dimensional split (i.e. N = len(array_shape)
), the halo
parameter can be either a
- scalar
- Tiles are extended by
halo
elements in the negative and positive directions for all axes.- 1D sequence
- Tiles are extended by
halo[a]
elements in the negative and positive directions for axisa
.- 2D sequence
- Tiles are extended by
halo[a][0]
elements in the negative direction andhalo[a][1]
in the positive direction for axisa
.
For example, in 3D:
>>> split = shape_split(
... [16, 8, 8],
... 2,
... halo=1, # halo=1 in +ve and -ve directions for all axes
... tile_bounds_policy=NO_BOUNDS
... )
>>> split.shape
(2, 1, 1)
>>> split
array([[[(slice(-1, 9, None), slice(-1, 9, None), slice(-1, 9, None))]],
[[(slice(7, 17, None), slice(-1, 9, None), slice(-1, 9, None))]]],
dtype=[('0', 'O'), ('1', 'O'), ('2', 'O')])
>>> split = shape_split(
... [16, 8, 8],
... 2,
... halo=(1, 2, 3), # halo=1 for axis 0, halo=2 for axis 1, halo=3 for axis=2
... tile_bounds_policy=NO_BOUNDS
... )
>>> split.shape
(2, 1, 1)
>>> split
array([[[(slice(-1, 9, None), slice(-2, 10, None), slice(-3, 11, None))]],
[[(slice(7, 17, None), slice(-2, 10, None), slice(-3, 11, None))]]],
dtype=[('0', 'O'), ('1', 'O'), ('2', 'O')])
>>> split = shape_split(
... [16, 8, 8],
... 2,
... halo=((1, 2), (3, 4), (5, 6)), # halo=1 for -ve axis 0, halo=2 for +ve axis 0
... # halo=3 for -ve axis 1, halo=4 for +ve axis 1
... # halo=5 for -ve axis 2, halo=6 for +ve axis 2
... tile_bounds_policy=NO_BOUNDS
... )
>>> split.shape
(2, 1, 1)
>>> split
array([[[(slice(-1, 10, None), slice(-3, 12, None), slice(-5, 14, None))]],
[[(slice(7, 18, None), slice(-3, 12, None), slice(-5, 14, None))]]],
dtype=[('0', 'O'), ('1', 'O'), ('2', 'O')])
The array_split
Package¶
Python package for splitting a numpy.ndarray
(or just an array shape)
into a number of sub-arrays.
The two main functions are:
array_split.array_split()
- Similar to
numpy.array_split()
, returns a list of sub-array views of the inputnumpy.ndarray
. Can split along multiple axes and has more splitting criteria (parameters) thannumpy.array_split()
.array_split.shape_split()
- Instead taking an
numpy.ndarray
as an argument, it takes the array shape and returns tuples ofslice
objects which indicate the extents of the sub-arrays.
These two functions use an instance of the array_split.ShapeSplitter
class
which contains the bulk of the split implementation.
Instances of array_split.ShapeSplitter
also
maintain state related to the computed split.
Splitting of multi-dimensional arrays can be performed according to several criteria:
Per-axis indices indicating the cut positions.
Per-axis number of sub-arrays.
Total number of sub-arrays (with optional per-axis number of sections constraints).
Specific sub-array shape.
Specification of halo (ghost) elements for sub-arrays.
Arbitrary start index for the shape to be partitioned.
Maximum number of bytes for a sub-array with constraints:
- sub-arrays are an even multiple of a specified sub-tile shape
- upper limit on the per-axis sub-array shape
The usage documentation is given in the Examples section.
Classes and Functions¶
shape_split (array_shape, *args, **kwargs) |
Splits specified array_shape in tiles, returns array of slice tuples. |
array_split (ary[, indices_or_sections, ...]) |
Splits the specified array ary into sub-arrays, returns list of numpy.ndarray . |
ShapeSplitter (array_shape[, ...]) |
Implements array shape splitting. |
The array_split.split
Module¶
Defines array splitting functions and classes.
Classes and Functions¶
shape_factors (n[, dim]) |
Returns a numpy.ndarray of factors f such that (len(f) == dim) and (numpy.product(f) == n) . |
calculate_num_slices_per_axis (...[, ...]) |
Returns a numpy.ndarray (return_array say) where non-positive elements of |
calculate_tile_shape_for_max_bytes (...[, ...]) |
Returns a tile shape tile_shape such that numpy.product(tile_shape)*numpy.sum(array_itemsize) <= max_tile_bytes . |
convert_halo_to_array_form (halo, ndim) |
Converts the halo argument to a (ndim, 2) shaped array. |
ShapeSplitter (array_shape[, ...]) |
Implements array shape splitting. |
shape_split (array_shape, *args, **kwargs) |
Splits specified array_shape in tiles, returns array of slice tuples. |
array_split (ary[, indices_or_sections, ...]) |
Splits the specified array ary into sub-arrays, returns list of numpy.ndarray . |
Attributes¶
-
array_split.split.
ARRAY_BOUNDS
= <property object>¶ Indicates that tiles are always within the array bounds, resulting in tiles which have truncated halos. See The halo parameter examples.
-
array_split.split.
NO_BOUNDS
= <property object>¶ Indicates that tiles may have halos which extend beyond the array bounds. See The halo parameter examples.
Utilities¶
is_scalar (obj) |
Returns True if argument obj is a numeric type. |
is_sequence (obj) |
Returns True if argument obj is a sequence (e.g. |
is_indices (indices_or_sections) |
Test for the indices_or_sections argument of ShapeSplitter.__init__() to determine whether it is specifying total number of tiles or sequence of cut indices. |
pad_with_object (sequence, new_length[, obj]) |
Returns sequence list end-padded with obj elements so that the length of the returned list equals new_length . |
pad_with_none (sequence, new_length) |
Returns sequence list end-padded with None elements so that the length of the returned list equals new_length . |
The array_split.split_test
Module¶
Module defining array_split.split
unit-tests.
Execute as:
python -m array_split.split_tests
Classes¶
SplitTest ([methodName]) |
Tests for array_split.split module. |
The array_split.tests
Module¶
Module for running all array_split
unit-tests, including unittest
test-cases
and doctest
tests for module doc-strings and sphinx (RST) documentation.
Execute as:
python -m array_split.tests
Classes and Functions¶
MultiPlatformAnd23Checker |
Overrides the doctest.OutputChecker.check_output() method |
DocTestTestSuite () |
Adds array_split doctests as unittest.TestCase objects. |
load_tests (loader, tests, pattern) |
Loads array_split.split_test tests and DocTestTestSuite tests. |
The array_split.logging
Module¶
Default initialisation of python logging.
Some simple wrappers of python built-in logging
module
for array_split
logging.
Classes and Functions¶
SplitStreamHandler ([outstr, errstr, splitlevel]) |
A python logging.handlers Handler class for splitting logging messages to different streams depending on the logging-level. |
initialise_loggers (names[, log_level, ...]) |
Initialises specified loggers to generate output at the specified logging level. |
get_formatter ([prefix_string]) |
Returns logging.Formatter object which produces messages with time and prefix_string prefix. |
The array_split.unittest
Module¶
Some simple wrappers of python built-in unittest
module
for array_split
unit-tests.
Classes and Functions¶
main (module_name[, log_level, init_logger_names]) |
Small wrapper for unittest.main() which initialises logging.Logger objects. |
TestCase ([methodName]) |
Extends unittest.TestCase with the assertArraySplitEqual() . |
The array_split.license
Module¶
License and copyright info.
License¶
Copyright (C) 2017 The Australian National University.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Copyright¶
Copyright (C) 2017 The Australian National University.
Functions¶
license () |
Returns the array_split license string. |
copyright () |
Returns the array_split copyright string. |
version () |
Returns array_split version string. |
Welcome! We appreciate your interest in contributing to array_split
.
If you haven’t done so already, check out the
README
How to contribute¶
Workflow¶
The preferred workflow for contributing to array_split
is to fork the
array_split repository on
GitHub, clone, and develop on a branch. Steps:
Fork the array_split repository by clicking on the ‘Fork’ button near the top right of the page. This creates a copy of the code under your GitHub user account. For more details on how to fork a repository see this guide.
Clone your fork of the
array_split
repo from your GitHub account to your local disk:$ git clone git@github.com:YourLogin/array_split.git $ cd array_split
Create a
feature
branch to hold your development changes:$ git checkout -b my-feature
Always use a
feature
branch. It’s good practice to never work on themaster
branch!Develop the feature on your feature branch. Add changed files using
git add
and thengit commit
files:$ git add modified_files $ git commit
to record your changes in Git, then push the changes to your GitHub account with:
$ git push -u origin my-feature
Follow these instructions to create a pull request from your fork. This will send an email to the committers.
(If any of the above seems like magic to you, please look up the Git documentation online.
Coding Guidelines¶
Unit test new code using python unittest framework.
Ensure unittest coverage is good (
>90%
) by using the coverage tool:$ coverage run --source=array_split --omit='*logging*,*unittest*,*rtd*' -m array_split.tests $ coverage report -m
Ensure style by using autopep8 and flake8 compliance:
$ autopep8 -r -i -a --max-line-length=100 array_split $ flake8 array_split
Use docstrings for API documentation and ensure that it builds with sphinx (without warnings) and renders correctly:
$ python setup.py build_sphinx
produces top level html file
docs/_build/html/index.html
.
Code of Conduct¶
array_split
adheres to the
Python Code Quality Authority’s Code of Conduct.