Tensors

Note

All operations in augpy are asynchronous with respect to the CPU, i.e., function call initiate work on the GPU and return immediately. For example CudaTensor.numpy() will initiate copying data from the device to the host memory and return the array immediately, even though data has not yet been fully copied over.

Use CudaStream.synchronize(), or CudaEvent.record() and CudaEvent.synchronize() to synchronize CPU code with the respective stream or event on the GPU.

However, all work done on the GPU is sequential within a CudaStream. You can use augpy functions to “queue up” operations on tensors, so synchronization is only required when using interacting with the CPU or another GPU framework.

Data Types

class augpy.DLDataType(code: int, bits: int, lanes: int = 1)[source]

Bases: augpy._augpy.pybind11_object

DLPack data type for CudaTensors.

Parameters
__init__(self: augpy._augpy.'DLDataType', code: int, bits: int, lanes: int = 1)None[source]
Return type

None

property bits

Number of bits.

property code

See 'DLDataType'Code.

property itemsize

Number of bytes per element with this data type.

property lanes

Mumber of elements for vector types. Must be 1 to use with CudaTensor.

class augpy.DLDataTypeCode(arg0: int)[source]

Bases: object

DLPack type code enum.

Members:

kDLInt :

Signed integer.

kDLUInt :

Unsigned integer.

kDLFloat :

Floating point number.

__init__(self: augpy._augpy.'DLDataTypeCode', arg0: int)None[source]
Return type

None

property kDLFloat

DLPack type code enum.

Members:

kDLInt :

Signed integer.

kDLUInt :

Unsigned integer.

kDLFloat :

Floating point number.

property kDLInt

DLPack type code enum.

Members:

kDLInt :

Signed integer.

kDLUInt :

Unsigned integer.

kDLFloat :

Floating point number.

property kDLUInt

DLPack type code enum.

Members:

kDLInt :

Signed integer.

kDLUInt :

Unsigned integer.

kDLFloat :

Floating point number.

augpy.to_augpy_dtype(numpy_dtype: Union[type, numpy.dtype]) → augpy._augpy.DLDataType[source]

Translate numpy to augpy data types.

Example:

to_augpy_dtype(numpy.uint8) == augpy.uint8
Parameters

numpy_dtype (Union[type, numpy.dtype]) – numpy type to translate

Return type

DLDataType

augpy.to_numpy_dtype(augpy_dtype: augpy._augpy.DLDataType)type[source]

Translate augpy to numpy data types.

Example:

to_numpy_dtype(augpy.uint8) == numpy.uint8
Parameters

augpy_dtype (DLDataType) – augpy type to translate

Return type

type

augpy.swap_dtype(dtype: Union[augpy._augpy.DLDataType, type, numpy.dtype]) → Union[augpy._augpy.DLDataType, type][source]

Translate to and from numpy and augpy data types.

Examples:

swap_dtype(augpy.uint8) == numpy.uint8
swap_dtype(numpy.uint8) == augpy.uint8
Parameters

dtype (Union[DLDataType, type, numpy.dtype]) – type to translate to augpy or numpy

Return type

Union[DLDataType, type]

augpy.to_temp_dtype(dtype: Union[augpy._augpy.DLDataType, type, numpy.dtype]) → Union[augpy._augpy.DLDataType, type][source]

augpy defines a temp type for each tensor data type. This temp type is used internally for processing and sometimes returns. This dict maps from augpy and numpy dtypes and their temp types.

This function returns the temp type for given data type.

Parameters

dtype (Union[DLDataType, type, numpy.dtype]) – augpy or numpy dtype

Returns

temp type

Return type

Union[DLDataType, type]

augpy.int8

<DLDataType int8>

augpy.int16

<DLDataType int16>

augpy.int32

<DLDataType int32>

augpy.int64

<DLDataType int64>

augpy.uint8

<DLDataType uint8>

augpy.uint16

<DLDataType uint16>

augpy.uint32

<DLDataType uint32>

augpy.uint64

<DLDataType uint64>

augpy.float16

<DLDataType float16>

Warning

Not yet supported.

augpy.float32

<DLDataType float32>

augpy.float64

<DLDataType float64>

CudaTensor

augpy’s tensor class. It is a backwards compatible extension to the DLPack specification.

It supports all the usual operations you would expect from a full-featured tensor class, like complex indexing and slicing:

>>> t = CudaTensor((2, 2, 4), uint8)
>>> t
<CudaTensor shape=(2, 2, 4), device=0, dtype=uint8>
>>> t[1, 1, 3]
<CudaTensor shape=(), device=0, dtype=uint8>
>>> t[-1]
<CudaTensor shape=(2, 4), device=0, dtype=uint8>
>>> t[:,0]
<CudaTensor shape=(2, 4), device=0, dtype=uint8>
>>> t[:, 1:, 1:-1]
<CudaTensor shape=(2, 1, 2), device=0, dtype=uint8>
>>> t[:, :, ::2]
<CudaTensor shape=(2, 2, 2), device=0, dtype=uint8>

Math and comparison operations you are used to from numpy or Pytorch also work just fine:

>>> t.numpy()
array([[[0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[0, 0, 0, 0],
        [0, 0, 0, 0]]], dtype=uint8)
>>> t += 3
>>> t.numpy()
array([[[3, 3, 3, 3],
        [3, 3, 3, 3]],

       [[3, 3, 3, 3],
        [3, 3, 3, 3]]], dtype=uint8)
>>> (5 - t).numpy()
array([[[2, 2, 2, 2],
        [2, 2, 2, 2]],

       [[2, 2, 2, 2],
        [2, 2, 2, 2]]], dtype=uint8)
>>> (t > (t - 1)).numpy()
array([[[1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=uint8)

Note

All operations in augpy are asynchronous, so calling CudaTensor.numpy() will initiate copying data from the device to the host memory. You need to use CudaStream.synchronize(), or CudaEvent.record() and CudaEvent.synchronize() to ensure that data is fully copied before the array is accessed.

Math is saturating in augpy. Integer tensors will never over or underflow:

>>> t[:] = 40
>>> t.numpy()
array([[[40, 40, 40, 40],
        [40, 40, 40, 40]],

       [[40, 40, 40, 40],
        [40, 40, 40, 40]]], dtype=uint8)
>>> (0 - t).numpy()
array([[[0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[0, 0, 0, 0],
        [0, 0, 0, 0]]], dtype=uint8)

Broadcasting is also supported:

>>> t1 = CudaTensor((3, 1), uint8)
>>> t2 = CudaTensor((1, 3), uint8)
>>> ((t1 + 3) * (t2 + 4)).numpy()
array([[12, 12, 12],
       [12, 12, 12],
       [12, 12, 12]], dtype=uint8)

Note

Tensors may appear to be initialized with zeros. They may, however, reuse memory from previously deleted tensors, so they should be treated as uninitialized and need to be zeroed or otherwise initialized.

class augpy.CudaTensor(shape: List[int], dtype: augpy._augpy.DLDataType = <augpy._augpy.DLDataType object>, device_id: int = 0)[source]

Bases: augpy._augpy.pybind11_object

Create a new, empty tensor on a GPU device.

Parameters
  • shape (List[int]) – shape of the tensor

  • dtype (DLDataType) – data type

  • device_id (int) – Cuda device id

__init__(self: augpy._augpy.'CudaTensor', shape: List[int], dtype: augpy._augpy.DLDataType = DLDataType(code=kDLUInt, bits=8), device_id: int = 0)None[source]
Return type

None

property byte_offset

Starting offset in bytes for the data pointer.

property dtype

Tensor data type.

fill(*args, **kwargs)[source]
fill(self: ‘CudaTensor’, scalar: float) → ’CudaTensor’[source]

Fill the tensor with the given scalar value.

Returns

this tensor

Return type

“””CudaTensor”””

fill(self: ‘CudaTensor’, other: ‘CudaTensor’) → ’CudaTensor’[source]

Copy the given tensor into this tensor.

Returns

this tensor

Return type

“””CudaTensor”””

property is_contiguous

True if the tensor is contiguous, i.e., elements are located next to each other in memory.

property itemsize

Size of the one element in bytes.

property ndim

Number of dimensions.

numpy(*args, **kwargs)[source]
numpy(self: ‘CudaTensor’) → array[source]

Create a new numpy array and start copying data from the device to host memory.

Return type

array

numpy(self: ‘CudaTensor’, array: buffer = None) → array[source]

Create a new numpy array from the given buffer and start copying data from the device to host memory.

Parameters

array (buffer) – buffer to create new array from

Return type

array

property ptr

Data pointer.

reshape(self: augpy._augpy.'CudaTensor', shape: List[int]) → augpy._augpy.’CudaTensor’[source]

Return a new tensor that uses the same backing memory with a different shape. Shape must have same number of elements. Only contiguous tensors can be reshaped.

Parameters

shape (List[int]) – new shape

Return type

‘CudaTensor’

property shape

Tensor shape.

property size

Number of elements in the tensor.

property strides

Tensor strides, i.e., the number of elements to add to a flat tensor to reach the next element for each dimension.

sum(*args, **kwargs)[source]
sum(self: ‘CudaTensor’, upcast: bool = False) → ’CudaTensor’[source]

Sum all values in the tensor.

Parameters

upcast (bool) – if True, the output scalar tensor will be promoted to a more expressive data type to avoid saturation

Returns

sum as scalar tensor

Return type

“””CudaTensor”””

sum(self: ‘CudaTensor’, axis: int, keepdim: bool = False, upcast: bool = False, out: ‘CudaTensor’ = None, blocks_per_sm: int = 8, threads: int = 0) → ’CudaTensor’[source]

Sum all values in the tensor along an axis.

Parameters
  • axis (int) – which axis to sum along

  • keepdim (bool) – keep the summed dimension with size 1

  • upcast (bool) – if True, the output scalar tensor will be promoted to a more expressive data type to avoid saturation

  • out ("""CudaTensor""") – use this tensor as output, must have correct shape, and same data type if upcast is False, otherwise promoted type is required

Returns

tensor summed along axis

Return type

“””CudaTensor”””

Creation & Conversion

augpy.cast(*args, **kwargs)[source]
augpy.cast(tensor: CudaTensor, out: CudaTensor, blocks_per_sm: int = 8, threads: int = 0) → CudaTensor[source]

Read values from tensor, cast them to the data type of out and store them there. tensor and out must have the same shape.

Parameters
Return type

CudaTensor

augpy.cast(tensor: CudaTensor, dtype: DLDataType, blocks_per_sm: int = 8, threads: int = 0) → CudaTensor[source]

Create a new tensor with values from tensor cast to the given data type dtype.

Parameters
Returns

new tensor with given data type

Return type

CudaTensor

augpy.copy(src: augpy._augpy.CudaTensor, dst: augpy._augpy.CudaTensor, blocks_per_sm: int = 8, threads: int = 0) → augpy._augpy.CudaTensor[source]

Copy src into dst. Supports broadcasting.

Return type

CudaTensor

augpy.empty_like(tensor: augpy._augpy.CudaTensor) → augpy._augpy.CudaTensor[source]

Create a new tensor with the same shape, dtype and on the same device as tensor.

Return type

CudaTensor

augpy.array_to_tensor(*args, **kwargs)[source]
augpy.array_to_tensor(array: buffer, device_id: int = 0) → CudaTensor[source]

Copy a Python buffer into a new tensor on the specified GPU device. This initiates an asynchronous copy from host to device memory.

Return type

CudaTensor

augpy.array_to_tensor(array: buffer, tensor: CudaTensor) → CudaTensor[source]

Copy a Python buffer to a tensor created from the given buffer tensor. This initiates an asynchronous copy from host to device memory.

Return type

CudaTensor

augpy.tensor_to_array(*args, **kwargs)[source]
augpy.tensor_to_array(tensor: CudaTensor) → array[source]

Copy a given tensor to a new numpy array. This initiates an asynchronous copy from device to host memory.

Return type

array

augpy.tensor_to_array(tensor: CudaTensor, array: buffer) → array[source]

Copy a given tensor to a numpy array created from the given buffer array. This initiates an asynchronous copy from device to host memory.

Return type

array

augpy.import_dltensor(tensor_capsule: capsule, name: str) → augpy._augpy.CudaTensor[source]

Import a GPU tensor from another library into augpy.

Parameters
  • tensor_capsule (capsule) – a Python capsule object that contains a DLManagedTensor

  • name (str) – name under which the tensor is stored in the capsule, e.g., "dltensor" for Pytorch

Returns

other tensor wrapped in a CudaTensor

Return type

CudaTensor

augpy.export_dltensor(tensor: object, name: str = 'dltensor', destruct: bool = True) → capsule[source]

Export a GPU tensor to be used by another library.

Parameters
  • pytensor – Python-wrapped CudaTensor

  • name (str) – name under which the tensor is stored in the returned capsule, e.g., “dltensor” for Pytorch

  • destruct (bool) – if True, add a destructor to the capsule which will delete the tensor when the capsule is deleted; only set to False if you know what you’re doing

Returns

capsule with exported CudaTensor

Return type

capsule

Functions

augpy.add(*args, **kwargs)[source]
augpy.add(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Add a scalar value to a tensor.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.add(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Add tensor2 to tensor1.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.sub(*args, **kwargs)[source]
augpy.sub(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Subtract a scalar value from a tensor.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.sub(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Subtract tensor2 from tensor1.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.rsub(tensor: augpy._augpy.CudaTensor, scalar: float, out: augpy._augpy.CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → augpy._augpy.CudaTensor[source]

Subtract a tensor from a scalar value.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.mul(*args, **kwargs)[source]
augpy.mul(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Multiply a tensor by a scalar value.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.mul(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Multiply tensor1 by tensor2.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.div(*args, **kwargs)[source]
augpy.div(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Divide a tensor by a scalar value.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.div(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Divide tensor1 by tensor2.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.rdiv(tensor: augpy._augpy.CudaTensor, scalar: float, out: augpy._augpy.CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → augpy._augpy.CudaTensor[source]

Divide a scalar value by a tensor.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.lt(*args, **kwargs)[source]
augpy.lt(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Compute tensor < scalar as uint8 tensor, where 1 means the condition is met and 0 otherwise.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.lt(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Compute tensor1 >= tensor2 as uint8 tensor, where 1 means the condition is met and 0 otherwise.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.le(*args, **kwargs)[source]
augpy.le(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Compute tensor <= scalar as uint8 tensor, where 1 means the condition is met and 0 otherwise.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.le(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Compute tensor1 >= tensor2 as uint8 tensor, where 1 means the condition is met and 0 otherwise.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.gt(tensor: augpy._augpy.CudaTensor, scalar: float, out: augpy._augpy.CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → augpy._augpy.CudaTensor[source]

Compute tensor > scalar as uint8 tensor, where 1 means the condition is met and 0 otherwise.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.ge(tensor: augpy._augpy.CudaTensor, scalar: float, out: augpy._augpy.CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → augpy._augpy.CudaTensor[source]

Compute tensor >= scalar as uint8 tensor, where 1 means the condition is met and 0 otherwise.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.eq(*args, **kwargs)[source]
augpy.eq(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Compute tensor == scalar as uint8 tensor, where 1 means the condition is met and 0 otherwise.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.eq(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source]

Compute tensor1 == tensor2 as uint8 tensor, where 1 means the condition is met and 0 otherwise.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.fma(scalar: float, tensor1: augpy._augpy.CudaTensor, tensor2: augpy._augpy.CudaTensor, out: augpy._augpy.CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → augpy._augpy.CudaTensor[source]

Compute a fused multiply-add on a scalar and two tensors, i.e.,

\[r = s \cdot t_1 \cdot t_2\]

If tensor1 has an unsigned integer data type, then tensor2 must have the signed version of the same type, e.g., a uint8 tensor must be paired with a int8 tensor.

Parameters
Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.gemm(A: augpy._augpy.CudaTensor, B: augpy._augpy.CudaTensor, C: augpy._augpy.CudaTensor = None, alpha: float = 1.0, beta: float = 0.0) → augpy._augpy.CudaTensor[source]

Calculate the matrix multiplication of two 2D tensors. More specifically calculates

\[C = A imes (lpha \cdot B) + eta \cdot C\]

Only float and double are supported.

All tensors must have the same data type.

All tensors must be contiguous.

Returns

new output tensor if C is None, otherwise C

Return type

CudaTensor

augpy.fill(scalar: float, dst: augpy._augpy.CudaTensor, blocks_per_sm: int = 8, threads: int = 0) → augpy._augpy.CudaTensor[source]

Fill src with the given scalar value.

Return type

CudaTensor

augpy.sum(*args, **kwargs)[source]
augpy.sum(tensor: CudaTensor, upcast: bool = False) → CudaTensor[source]

Sum all elements in a tensor with saturation.

Parameters
  • tensor (CudaTensor) – tensor to sum, must be contiguous

  • upcast (bool) – if True, returns tensor with float or double type

Returns

sum value as scalar tensor

Return type

CudaTensor

augpy.sum(tensor: CudaTensor, axis: int, keepdim: bool = False, upcast: bool = False, out: CudaTensor = None, blocks_per_sm: int = 8, num_threads: int = 0) → CudaTensor[source]

Sum of all elements along an axis in a tensor with saturation.

Parameters
  • tensor (CudaTensor) – tensor to sum, may be strided

  • axis (int) – axis index to sum along

  • keepdim (bool) – if True, keep sum axis dimension with length 1

  • upcast (bool) – if True, returns tensor with float or double type

  • out (CudaTensor) – output tensor (may be None)

Returns

tensor with values summed along axis, or None if out is tensor

Return type

CudaTensor