Tensors¶

Data Types
CudaTensor
Creation & Conversion
Functions

Note

All operations in augpy are asynchronous with respect to the CPU, i.e., function call initiate work on the GPU and return immediately. For example CudaTensor.numpy() will initiate copying data from the device to the host memory and return the array immediately, even though data has not yet been fully copied over.

Use CudaStream.synchronize(), or CudaEvent.record() and CudaEvent.synchronize() to synchronize CPU code with the respective stream or event on the GPU.

However, all work done on the GPU is sequential within a CudaStream. You can use augpy functions to “queue up” operations on tensors, so synchronization is only required when using interacting with the CPU or another GPU framework.

Data Types ¶

class augpy.DLDataType(code: int, bits: int, lanes: int = 1)[source]¶

Bases: augpy._augpy.pybind11_object

DLPack data type for CudaTensors.

Parameters

code (int) – See DLDataTypeCode
bits (int) – Number of bits
lanes (int) – Number of elements for vector types; must be 1 to use with CudaTensor

__init__(self: augpy._augpy.'DLDataType', code: int, bits: int, lanes: int = 1) → None [source]¶

Return type: None

property bits¶: Number of bits.

property code¶: See 'DLDataType'Code.

property itemsize¶: Number of bytes per element with this data type.

property lanes¶: Mumber of elements for vector types. Must be 1 to use with CudaTensor.

class augpy.DLDataTypeCode(arg0: int)[source]¶

Bases: object

DLPack type code enum.

Members:

kDLInt :
Signed integer.

kDLUInt :
Unsigned integer.

kDLFloat :
Floating point number.

__init__(self: augpy._augpy.'DLDataTypeCode', arg0: int) → None [source]¶

Return type: None

property kDLFloat¶

DLPack type code enum.

Members:

kDLInt :
Signed integer.

kDLUInt :
Unsigned integer.

kDLFloat :
Floating point number.

property kDLInt¶

DLPack type code enum.

Members:

kDLInt :
Signed integer.

kDLUInt :
Unsigned integer.

kDLFloat :
Floating point number.

property kDLUInt¶

DLPack type code enum.

Members:

kDLInt :
Signed integer.

kDLUInt :
Unsigned integer.

kDLFloat :
Floating point number.

augpy.to_augpy_dtype(numpy_dtype: Union[type, numpy.dtype]) → augpy._augpy.DLDataType[source]¶

Translate numpy to augpy data types.

Example:

to_augpy_dtype(numpy.uint8) == augpy.uint8

Parameters: numpy_dtype (Union[type, numpy.dtype]) – numpy type to translate
Return type: DLDataType

augpy.to_numpy_dtype(augpy_dtype: augpy._augpy.DLDataType) → type [source]¶

Translate augpy to numpy data types.

Example:

to_numpy_dtype(augpy.uint8) == numpy.uint8

Parameters: augpy_dtype (DLDataType) – augpy type to translate
Return type: type

augpy.swap_dtype(dtype: Union[augpy._augpy.DLDataType, type, numpy.dtype]) → Union[augpy._augpy.DLDataType, type][source]¶

Translate to and from numpy and augpy data types.

Examples:

swap_dtype(augpy.uint8) == numpy.uint8
swap_dtype(numpy.uint8) == augpy.uint8

Parameters: dtype (Union[DLDataType, type, numpy.dtype]) – type to translate to augpy or numpy
Return type: Union[DLDataType, type]

augpy.to_temp_dtype(dtype: Union[augpy._augpy.DLDataType, type, numpy.dtype]) → Union[augpy._augpy.DLDataType, type][source]¶

augpy defines a temp type for each tensor data type. This temp type is used internally for processing and sometimes returns. This dict maps from augpy and numpy dtypes and their temp types.

This function returns the temp type for given data type.

Parameters: dtype (Union[DLDataType, type, numpy.dtype]) – augpy or numpy dtype
Returns: temp type
Return type: Union[DLDataType, type]

augpy.int8¶: <DLDataType int8>

augpy.int16¶: <DLDataType int16>

augpy.int32¶: <DLDataType int32>

augpy.int64¶: <DLDataType int64>

augpy.uint8¶: <DLDataType uint8>

augpy.uint16¶: <DLDataType uint16>

augpy.uint32¶: <DLDataType uint32>

augpy.uint64¶: <DLDataType uint64>

augpy.float16¶: <DLDataType float16>

Warning

Not yet supported.

augpy.float32¶: <DLDataType float32>

augpy.float64¶: <DLDataType float64>

CudaTensor ¶

augpy’s tensor class. It is a backwards compatible extension to the DLPack specification.

It supports all the usual operations you would expect from a full-featured tensor class, like complex indexing and slicing:

>>> t = CudaTensor((2, 2, 4), uint8)
>>> t
<CudaTensor shape=(2, 2, 4), device=0, dtype=uint8>
>>> t[1, 1, 3]
<CudaTensor shape=(), device=0, dtype=uint8>
>>> t[-1]
<CudaTensor shape=(2, 4), device=0, dtype=uint8>
>>> t[:,0]
<CudaTensor shape=(2, 4), device=0, dtype=uint8>
>>> t[:, 1:, 1:-1]
<CudaTensor shape=(2, 1, 2), device=0, dtype=uint8>
>>> t[:, :, ::2]
<CudaTensor shape=(2, 2, 2), device=0, dtype=uint8>

Math and comparison operations you are used to from numpy or Pytorch also work just fine:

>>> t.numpy()
array([[[0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[0, 0, 0, 0],
        [0, 0, 0, 0]]], dtype=uint8)
>>> t += 3
>>> t.numpy()
array([[[3, 3, 3, 3],
        [3, 3, 3, 3]],

       [[3, 3, 3, 3],
        [3, 3, 3, 3]]], dtype=uint8)
>>> (5 - t).numpy()
array([[[2, 2, 2, 2],
        [2, 2, 2, 2]],

       [[2, 2, 2, 2],
        [2, 2, 2, 2]]], dtype=uint8)
>>> (t > (t - 1)).numpy()
array([[[1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 1],
        [1, 1, 1, 1]]], dtype=uint8)

Note

All operations in augpy are asynchronous, so calling CudaTensor.numpy() will initiate copying data from the device to the host memory. You need to use CudaStream.synchronize(), or CudaEvent.record() and CudaEvent.synchronize() to ensure that data is fully copied before the array is accessed.

Math is saturating in augpy. Integer tensors will never over or underflow:

>>> t[:] = 40
>>> t.numpy()
array([[[40, 40, 40, 40],
        [40, 40, 40, 40]],

       [[40, 40, 40, 40],
        [40, 40, 40, 40]]], dtype=uint8)
>>> (0 - t).numpy()
array([[[0, 0, 0, 0],
        [0, 0, 0, 0]],

       [[0, 0, 0, 0],
        [0, 0, 0, 0]]], dtype=uint8)

Broadcasting is also supported:

>>> t1 = CudaTensor((3, 1), uint8)
>>> t2 = CudaTensor((1, 3), uint8)
>>> ((t1 + 3) * (t2 + 4)).numpy()
array([[12, 12, 12],
       [12, 12, 12],
       [12, 12, 12]], dtype=uint8)

Note

Tensors may appear to be initialized with zeros. They may, however, reuse memory from previously deleted tensors, so they should be treated as uninitialized and need to be zeroed or otherwise initialized.

class augpy.CudaTensor(shape: List[int], dtype: augpy._augpy.DLDataType = <augpy._augpy.DLDataType object>, device_id: int = 0)[source]¶

Bases: augpy._augpy.pybind11_object

Create a new, empty tensor on a GPU device.

Parameters

shape (List[int]) – shape of the tensor
dtype (DLDataType) – data type
device_id (int) – Cuda device id

__init__(self: augpy._augpy.'CudaTensor', shape: List[int], dtype: augpy._augpy.DLDataType = DLDataType(code=kDLUInt, bits=8), device_id: int = 0) → None [source]¶

Return type: None

property byte_offset¶: Starting offset in bytes for the data pointer.

property dtype¶: Tensor data type.

fill(*args, **kwargs)[source]¶

fill(self: ‘CudaTensor’, scalar: float) → ’CudaTensor’[source]

Fill the tensor with the given scalar value.

Returns: this tensor
Return type: “””CudaTensor”””

fill(self: ‘CudaTensor’, other: ‘CudaTensor’) → ’CudaTensor’[source]

Copy the given tensor into this tensor.

Returns: this tensor
Return type: “””CudaTensor”””

property is_contiguous¶: True if the tensor is contiguous, i.e., elements are located next to each other in memory.

property itemsize¶: Size of the one element in bytes.

property ndim¶: Number of dimensions.

numpy(*args, **kwargs)[source]¶

numpy(self: ‘CudaTensor’) → array[source]

Create a new numpy array and start copying data from the device to host memory.

Return type: array

numpy(self: ‘CudaTensor’, array: buffer = None) → array[source]

Create a new numpy array from the given buffer and start copying data from the device to host memory.

Parameters: array (buffer) – buffer to create new array from
Return type: array

property ptr¶: Data pointer.

reshape(self: augpy._augpy.'CudaTensor', shape: List[int]) → augpy._augpy.’CudaTensor’[source]¶

Return a new tensor that uses the same backing memory with a different shape. Shape must have same number of elements. Only contiguous tensors can be reshaped.

Parameters: shape (List[int]) – new shape
Return type: ‘CudaTensor’

property shape¶: Tensor shape.

property size¶: Number of elements in the tensor.

property strides¶: Tensor strides, i.e., the number of elements to add to a flat tensor to reach the next element for each dimension.

sum(*args, **kwargs)[source]¶

sum(self: ‘CudaTensor’, upcast: bool = False) → ’CudaTensor’[source]

Sum all values in the tensor.

Parameters: upcast (bool) – if True, the output scalar tensor will be promoted to a more expressive data type to avoid saturation
Returns: sum as scalar tensor
Return type: “””CudaTensor”””

sum(self: ‘CudaTensor’, axis: int, keepdim: bool = False, upcast: bool = False, out: ‘CudaTensor’ = None, blocks_per_sm: int = 8, threads: int = 0) → ’CudaTensor’[source]

Sum all values in the tensor along an axis.

Parameters

axis (int) – which axis to sum along
keepdim (bool) – keep the summed dimension with size 1
upcast (bool) – if True, the output scalar tensor will be promoted to a more expressive data type to avoid saturation
out ("""CudaTensor""") – use this tensor as output, must have correct shape, and same data type if upcast is False, otherwise promoted type is required

Returns

tensor summed along axis

Return type

“””CudaTensor”””

Creation & Conversion ¶

augpy.cast(*args, **kwargs)[source]¶

augpy.cast(tensor: CudaTensor, out: CudaTensor, blocks_per_sm: int = 8, threads: int = 0) → CudaTensor[source]

Read values from tensor, cast them to the data type of out and store them there. tensor and out must have the same shape.

Parameters

tensor (CudaTensor) – source tensor
out (CudaTensor) – output tensor

Return type

Tensors¶

Data Types¶

CudaTensor¶

Creation & Conversion¶

Functions¶

Data Types ¶

CudaTensor ¶

Creation & Conversion ¶

Functions ¶