Tutorial

Tutorial#

MArray is a package for extending your favorite Python Array API Standard compatible library with mask capabilities. Motivation for masked arrays can be found at “What is a masked array?”.

MArray is easy to install with pip, and it has no required dependencies. (However, array_api_compat is needed when using backends that are not yet array API compatible, such as PyTorch and CuPy.)

# !pip install marray

The rest of the tutorial will assume that we want to add masks to NumPy arrays. Note that this is different from using NumPy’s built-in masked arrays from the numpy.ma namespace because numpy.ma is not compatible with the array API standard. Even the base NumPy namespace is not Array API compatible in versions of NumPy prior to 2.0, so we will install a recent version of NumPy to work with.

# !pip install --upgrade numpy

To create a version of the NumPy namespace with mask support, use Python’s from...import...as syntax.

from marray import numpy as mxp

For cases in which this syntax would fail, use marray’s only public function: marray.masked_namespace.

# from array_api_compat import numpy as np
# import marray
# mxp = marray.masked_namespace(np)

mxp exposes all the features of NumPy that are specified in the Array API standard, but adds masks support to them. For example:

x = mxp.arange(3)
x

MArray(array([0, 1, 2]), array([False, False, False]))

Just as xp.arange(3) would have created a regular NumPy array with elements [0, 1, 2], mxp.arange(3) creates an MArray object with these elements. These are accessible via the data attribute.

x.data

array([0, 1, 2])

The difference is that the MArray also has a mask, available via the mask attribute.

x.mask

array([False, False, False])

Because all of the elements of the mask are False, this MArray will behave just like a regular NumPy array. That’s boring. Let’s create an array with a nontrivial mask. To do that, we’ll use mxp.asarray.

x = mxp.asarray([1, 2, 3, 4], mask=[False, True, False, True])
x

MArray(array([1, _, 3, _]), array([False,  True, False,  True]))

marray is intended to be a very light wrapper of the underlying array library. Just as it has only one public function (get_namespace), it makes only one modification to the signature of a wrapped library function, which we’ve used above: it adds a mask keyword-only argument to the asarray function.

Let’s see how the mask changes the behavior of common functions.

Statistical Functions#

For reducing functions, masked elements are ignored; the result is the same as if the masked elements were not in the array.

mxp.max(x)  # 4 was masked

MArray(array(3), array(False))

mxp.sum(x)  # 2 and 4 were masked

MArray(array(4), array(False))

For the only non-reducing statistical function, cumulative_sum, masked elements do not contribute to the cumulative sum.

mxp.cumulative_sum(x)

MArray(array([1, _, 4, _]), array([False,  True, False,  True]))

Note that the elements at indices where the original array were masked remain masked. Because of the limitations of the underlying array library, there will always be values corresponding with masked elements in data, but these values should be considered meaningless.

Utility functions#

all and any work like the reducing statistics functions.

x = mxp.asarray([False, False, False, True], mask=[False, True, False, True])
mxp.all(x)

MArray(array(False), array(False))

mxp.any(x)

MArray(array(False), array(False))

Is that last result surprising? Although there is one True in x.data, it is ignored when computing any because it is masked.

You may have noticed that the mask of the result has always been False in these examples of reducing functions. This is always the case unless all elements of the array are masked. In this case, it is required by the reducing nature of the function to return a 0D array for a 1D input, but there is not an universally accepted result for these functions when all elements are masked. (What is the maximum of an empty set?)

x = mxp.asarray(x.data, mask=True)
mxp.any(x).mask

array(True)

Sorting functions#

The sorting functions treat masked values as undefined and, by convention, append them to the end of the returned array.

data = [8, 3, 4, 1, 9, 9, 5, 5]
mask = [0, 0, 1, 0, 1, 1, 0, 0]
x = mxp.asarray(data, mask=mask)
mxp.sort(x)

MArray(
    array([1, 3, 5, 5, 8, _, _, _]),
    array([False, False, False, False, False,  True,  True,  True])
)

Where did those huge numbers come from? We emphasize again: the data corresponding with masked elements should be considered meaningless; they are just placeholders that allow us respect the mask while doing array operations efficiently.

i = mxp.argsort(x)
i

MArray(
    array([3, 1, 6, 7, 0, 2, 4, 5]),
    array([False, False, False, False, False, False, False, False])
)

Is it surprising that the mask of the array returned by argsort is all False? These are the indices that allow us to transform the original array into the sorted result. We can confirm that without a mask, these indices sort the array and keep the right elements masked.

y = x[i.data]
y

MArray(
    array([1, 3, 5, 5, 8, _, _, _]),
    array([False, False, False, False, False,  True,  True,  True])
)

Gotcha: Sorting is not supported when the the non-masked data includes the maximum (minimum when descending=True) value of the data’s dtype.

z = mxp.asarray(x, mask=mask, dtype=mxp.uint8)
z[0] = 2**8 - 1
# mxp.sort(z)
# NotImplementedError: The maximum value of the data's dtype is included in the non-masked data; this complicates sorting when masked values are present.
# Consider promoting to another dtype to use `sort`.

It is often possible to sidestep this limitation by using a different dtype for the sorting, then converting back to the original type.

z = mxp.astype(z, mxp.uint16)
z_sorted = mxp.sort(z)
z_sorted = mxp.astype(z_sorted, mxp.uint8)
z_sorted

MArray(
    array([  1,   3,   5,   5, 255,   _,   _,   _], dtype=uint8),
    array([False, False, False, False, False,  True,  True,  True])
)

Set functions#

Masked elements are treated as distinct from all non-masked elements but equivalent to all other masked elements.

res = mxp.unique_counts(x)

res.values

MArray(array([1, 3, 5, 8, _]), array([False, False, False, False,  True]))

res.counts

MArray(array([1, 1, 2, 1, 3]), array([False, False, False, False, False]))

Gotcha: set functions have the same limitation as the sorting functions: the non-masked data may not include the maximum value of the data’s dtype.

Manipulation functions#

Manipulation functions perform the same operation on the data and the mask.

mxp.flip(y)

MArray(
    array([ _, _, _, 8, 5, 5, 3, 1]),
    array([ True,  True,  True, False, False, False, False, False])
)

mxp.stack([y, y])

MArray(
    array([[1, 3, 5, 5, 8, _, _, _],
           [1, 3, 5, 5, 8, _, _, _]]),
    array([[False, False, False, False, False,  True,  True,  True],
           [False, False, False, False, False,  True,  True,  True]])
)

Creation functions#

Most creation functions create arrays with an all-False mask.

mxp.eye(3)

MArray(
    array([[1., 0., 0.],
           [0., 1., 0.],
           [0., 0., 1.]]),
    array([[False, False, False],
           [False, False, False],
           [False, False, False]])
)

Exceptions include the _like functions, which preserve the mask of the array argument.

mxp.zeros_like(y)

MArray(
    array([0, 0, 0, 0, 0, _, _, _]),
    array([False, False, False, False, False,  True,  True,  True])
)

tril and triu also preserve the mask of the indicated triangular portion of the argument.

import numpy as xp

data = xp.ones((3, 3))
mask = xp.zeros_like(data)
mask[0, -1] = 1
mask[-1, 0] = 1
A = mxp.asarray(data, mask=mask)
A

MArray(
    array([[1., 1.,  _],
           [1., 1., 1.],
           [ _, 1., 1.]]),
    array([[False, False,  True],
           [False, False, False],
           [ True, False, False]])
)

mxp.tril(A)

MArray(
    array([[1., 0., 0.],
           [1., 1., 0.],
           [ _, 1., 1.]]),
    array([[False, False, False],
           [False, False, False],
           [ True, False, False]])
)

Searching functions#

Similarly to the statistics functions, masked elements are treated as if they did not exist.

x[[1, -1]] = 0  # add some zeros
x  # let's remember what `x` looks like

MArray(
    array([8, 0, _, 1, _, _, 5, 0]),
    array([False, False,  True, False,  True,  True, False, False])
)

mxp.argmax(x)  # 9 is masked, so 8 (at index 0) is the largest element

MArray(array(0), array(False))

i = mxp.nonzero(x)  # Only elements at these indices are nonzero *and* not masked
i

(MArray(array([0, 3, 6]), array([False, False, False])),)

The correct behavior of indexing with a masked array is ambiguous, so use only regular, unmasked arrays for indexing.

indices = i[0].data
x[indices]  # nonzero, not masked

MArray(array([8, 1, 5]), array([False, False, False]))

Elementwise functions#

Elementwise functions (and operators) simply perform the requested operation on the data.

For unary functions, the mask of the result is the mask of the argument.

x = xp.linspace(0, 2*xp.pi, 5)
x = mxp.asarray(x, mask=(x > xp.pi))
x

MArray(
    array([0.        , 1.57079633, 3.14159265,          _,          _]),
    array([False, False, False,  True,  True])
)

-x

MArray(
    array([-0.        , -1.57079633, -3.14159265,           _,           _]),
    array([False, False, False,  True,  True])
)

mxp.round(mxp.sin(x))

MArray(
    array([0., 1., 0.,  _,  _]),
    array([False, False, False,  True,  True])
)

For binary functions and operators, the mask of the result is the result of the logical or operation on the masks of the arguments.

x = mxp.asarray([1, 2, 3, 4], mask=[1, 0, 1, 0])
y = mxp.asarray([5, 6, 7, 8], mask=[1, 1, 0, 0])
x + y

MArray(array([ _,  _,  _, 12]), array([ True,  True,  True, False]))

mxp.pow(y, x)

MArray(array([   _,    _,    _, 4096]), array([ True,  True,  True, False]))

Note that np.ma automatically masks non-finite elements produced during calculations.

import numpy

x = numpy.ma.masked_array(0, mask=False)
with numpy.errstate(divide='ignore', invalid='ignore'):
    y = [1, 0] / x
y

masked_array(data=[--, --],
             mask=[ True,  True],
       fill_value=1e+20,
            dtype=float64)

MArray does not follow this convention.

x = mxp.asarray(0, mask=False)
with numpy.errstate(divide='ignore', invalid='ignore'):
    y = [1, 0] / x
y

MArray(array([inf, nan]), array([False, False]))

This is because masked elements are often used to represent missing data, and the results of these operations are not missing. If this does not suit your needs, mask out data according to your requirements after performing the operation.

x = mxp.asarray(0, mask=False)
with numpy.errstate(divide='ignore', invalid='ignore'):
    y = [1, 0] / x
mxp.asarray(y.data, mask=xp.isnan(y.data))

MArray(array([inf,   _]), array([False,  True]))

Linear Algebra Functions#

As usual, linear algebra functions and operators treat masked elements as though they don’t exist.

x = mxp.asarray([1, 2, 3, 4], mask=[1, 0, 1, 0])
y = mxp.asarray([5, 6, 7, 8], mask=[1, 1, 0, 0])
x @ y  # the last elements of the arrays, 4 and 8, are the only non-masked elements

MArray(array(32), array(False))

The exception is matrix_transpose, which transposes the data and the mask.

x = mxp.asarray([[1, 2], [3, 4]], mask=[[1, 1], [0, 0]])
x

MArray(
    array([[ _, _],
           [3, 4]]),
    array([[ True,  True],
           [False, False]])
)

mxp.matrix_transpose(x)

MArray(
    array([[ _, 3],
           [ _, 4]]),
    array([[ True, False],
           [ True, False]])
)

Conclusion#

While this tutorial is not exhaustive, we hope it is sufficient to allow you to predict the results of operations with MArrays and use them to suit your needs. If you’d like to see this tutorial extended in a particular way, please open an issue!