Tensors¶
Note
All operations in augpy are asynchronous with respect to the
CPU, i.e., function call initiate work on the GPU and return
immediately.
For example CudaTensor.numpy()
will initiate copying
data from the device to the host memory and return the array
immediately, even though data has not yet been fully copied
over.
Use CudaStream.synchronize()
, or
CudaEvent.record()
and CudaEvent.synchronize()
to synchronize CPU code with the respective stream or event
on the GPU.
However, all work done on the GPU is sequential within a
CudaStream
.
You can use augpy functions to “queue up” operations on tensors,
so synchronization is only required when using interacting
with the CPU or another GPU framework.
Data Types¶
-
class
augpy.
DLDataType
(code: int, bits: int, lanes: int = 1)[source]¶ Bases:
augpy._augpy.pybind11_object
DLPack data type for
CudaTensors
.- Parameters
code (int) – See
DLDataTypeCode
bits (int) – Number of bits
lanes (int) – Number of elements for vector types; must be 1 to use with
CudaTensor
-
__init__
(self: augpy._augpy.'DLDataType', code: int, bits: int, lanes: int = 1) → None[source]¶ - Return type
-
property
bits
¶ Number of bits.
-
property
code
¶ See
'DLDataType'Code
.
-
property
itemsize
¶ Number of bytes per element with this data type.
-
property
lanes
¶ Mumber of elements for vector types. Must be 1 to use with
CudaTensor
.
-
class
augpy.
DLDataTypeCode
(arg0: int)[source]¶ Bases:
object
DLPack type code enum.
Members:
- kDLInt :
Signed integer.
- kDLUInt :
Unsigned integer.
- kDLFloat :
Floating point number.
-
property
kDLFloat
¶ DLPack type code enum.
Members:
- kDLInt :
Signed integer.
- kDLUInt :
Unsigned integer.
- kDLFloat :
Floating point number.
-
augpy.
to_augpy_dtype
(numpy_dtype: Union[type, numpy.dtype]) → augpy._augpy.DLDataType[source]¶ Translate numpy to augpy data types.
Example:
to_augpy_dtype(numpy.uint8) == augpy.uint8
- Parameters
numpy_dtype (Union[type, numpy.dtype]) – numpy type to translate
- Return type
-
augpy.
to_numpy_dtype
(augpy_dtype: augpy._augpy.DLDataType) → type[source]¶ Translate augpy to numpy data types.
Example:
to_numpy_dtype(augpy.uint8) == numpy.uint8
- Parameters
augpy_dtype (DLDataType) – augpy type to translate
- Return type
-
augpy.
swap_dtype
(dtype: Union[augpy._augpy.DLDataType, type, numpy.dtype]) → Union[augpy._augpy.DLDataType, type][source]¶ Translate to and from numpy and augpy data types.
Examples:
swap_dtype(augpy.uint8) == numpy.uint8 swap_dtype(numpy.uint8) == augpy.uint8
- Parameters
dtype (Union[DLDataType, type, numpy.dtype]) – type to translate to augpy or numpy
- Return type
Union[DLDataType, type]
-
augpy.
to_temp_dtype
(dtype: Union[augpy._augpy.DLDataType, type, numpy.dtype]) → Union[augpy._augpy.DLDataType, type][source]¶ augpy defines a temp type for each tensor data type. This temp type is used internally for processing and sometimes returns. This dict maps from augpy and numpy dtypes and their temp types.
This function returns the temp type for given data type.
- Parameters
dtype (Union[DLDataType, type, numpy.dtype]) – augpy or numpy dtype
- Returns
temp type
- Return type
Union[DLDataType, type]
-
augpy.
int8
¶ <DLDataType int8>
-
augpy.
int16
¶ <DLDataType int16>
-
augpy.
int32
¶ <DLDataType int32>
-
augpy.
int64
¶ <DLDataType int64>
-
augpy.
uint8
¶ <DLDataType uint8>
-
augpy.
uint16
¶ <DLDataType uint16>
-
augpy.
uint32
¶ <DLDataType uint32>
-
augpy.
uint64
¶ <DLDataType uint64>
-
augpy.
float16
¶ <DLDataType float16>
Warning
Not yet supported.
-
augpy.
float32
¶ <DLDataType float32>
-
augpy.
float64
¶ <DLDataType float64>
CudaTensor¶
augpy’s tensor class. It is a backwards compatible extension to the DLPack specification.
It supports all the usual operations you would expect from a full-featured tensor class, like complex indexing and slicing:
>>> t = CudaTensor((2, 2, 4), uint8)
>>> t
<CudaTensor shape=(2, 2, 4), device=0, dtype=uint8>
>>> t[1, 1, 3]
<CudaTensor shape=(), device=0, dtype=uint8>
>>> t[-1]
<CudaTensor shape=(2, 4), device=0, dtype=uint8>
>>> t[:,0]
<CudaTensor shape=(2, 4), device=0, dtype=uint8>
>>> t[:, 1:, 1:-1]
<CudaTensor shape=(2, 1, 2), device=0, dtype=uint8>
>>> t[:, :, ::2]
<CudaTensor shape=(2, 2, 2), device=0, dtype=uint8>
Math and comparison operations you are used to from numpy or Pytorch also work just fine:
>>> t.numpy()
array([[[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 0, 0, 0],
[0, 0, 0, 0]]], dtype=uint8)
>>> t += 3
>>> t.numpy()
array([[[3, 3, 3, 3],
[3, 3, 3, 3]],
[[3, 3, 3, 3],
[3, 3, 3, 3]]], dtype=uint8)
>>> (5 - t).numpy()
array([[[2, 2, 2, 2],
[2, 2, 2, 2]],
[[2, 2, 2, 2],
[2, 2, 2, 2]]], dtype=uint8)
>>> (t > (t - 1)).numpy()
array([[[1, 1, 1, 1],
[1, 1, 1, 1]],
[[1, 1, 1, 1],
[1, 1, 1, 1]]], dtype=uint8)
Note
All operations in augpy are asynchronous, so calling
CudaTensor.numpy()
will initiate copying data
from the device to the host memory.
You need to use CudaStream.synchronize()
, or
CudaEvent.record()
and CudaEvent.synchronize()
to ensure that data is fully copied before the array
is accessed.
Math is saturating in augpy. Integer tensors will never over or underflow:
>>> t[:] = 40
>>> t.numpy()
array([[[40, 40, 40, 40],
[40, 40, 40, 40]],
[[40, 40, 40, 40],
[40, 40, 40, 40]]], dtype=uint8)
>>> (0 - t).numpy()
array([[[0, 0, 0, 0],
[0, 0, 0, 0]],
[[0, 0, 0, 0],
[0, 0, 0, 0]]], dtype=uint8)
Broadcasting is also supported:
>>> t1 = CudaTensor((3, 1), uint8)
>>> t2 = CudaTensor((1, 3), uint8)
>>> ((t1 + 3) * (t2 + 4)).numpy()
array([[12, 12, 12],
[12, 12, 12],
[12, 12, 12]], dtype=uint8)
Note
Tensors may appear to be initialized with zeros. They may, however, reuse memory from previously deleted tensors, so they should be treated as uninitialized and need to be zeroed or otherwise initialized.
-
class
augpy.
CudaTensor
(shape: List[int], dtype: augpy._augpy.DLDataType = <augpy._augpy.DLDataType object>, device_id: int = 0)[source]¶ Bases:
augpy._augpy.pybind11_object
Create a new, empty tensor on a GPU device.
- Parameters
shape (List[int]) – shape of the tensor
dtype (DLDataType) – data type
device_id (int) – Cuda device id
-
__init__
(self: augpy._augpy.'CudaTensor', shape: List[int], dtype: augpy._augpy.DLDataType = DLDataType(code=kDLUInt, bits=8), device_id: int = 0) → None[source]¶ - Return type
-
property
byte_offset
¶ Starting offset in bytes for the data pointer.
-
property
dtype
¶ Tensor data type.
-
fill
(*args, **kwargs)[source]¶ -
fill
(self: ‘CudaTensor’, scalar: float) → ’CudaTensor’[source] Fill the tensor with the given scalar value.
- Returns
this tensor
- Return type
“””CudaTensor”””
-
fill
(self: ‘CudaTensor’, other: ‘CudaTensor’) → ’CudaTensor’[source] Copy the given tensor into this tensor.
- Returns
this tensor
- Return type
“””CudaTensor”””
-
-
property
is_contiguous
¶ True
if the tensor is contiguous, i.e., elements are located next to each other in memory.
-
property
itemsize
¶ Size of the one element in bytes.
-
property
ndim
¶ Number of dimensions.
-
numpy
(*args, **kwargs)[source]¶ -
numpy
(self: ‘CudaTensor’) → array[source] Create a new numpy array and start copying data from the device to host memory.
- Return type
array
-
numpy
(self: ‘CudaTensor’, array: buffer = None) → array[source] Create a new numpy array from the given buffer and start copying data from the device to host memory.
- Parameters
array (buffer) – buffer to create new array from
- Return type
array
-
-
property
ptr
¶ Data pointer.
-
reshape
(self: augpy._augpy.'CudaTensor', shape: List[int]) → augpy._augpy.’CudaTensor’[source]¶ Return a new tensor that uses the same backing memory with a different shape. Shape must have same number of elements. Only contiguous tensors can be reshaped.
- Parameters
shape (List[int]) – new shape
- Return type
‘CudaTensor’
-
property
shape
¶ Tensor shape.
-
property
size
¶ Number of elements in the tensor.
-
property
strides
¶ Tensor strides, i.e., the number of elements to add to a flat tensor to reach the next element for each dimension.
-
sum
(*args, **kwargs)[source]¶ -
sum
(self: ‘CudaTensor’, upcast: bool = False) → ’CudaTensor’[source] Sum all values in the tensor.
- Parameters
upcast (bool) – if
True
, the output scalar tensor will be promoted to a more expressive data type to avoid saturation- Returns
sum as scalar tensor
- Return type
“””CudaTensor”””
-
sum
(self: ‘CudaTensor’, axis: int, keepdim: bool = False, upcast: bool = False, out: ‘CudaTensor’ = None, blocks_per_sm: int = 8, threads: int = 0) → ’CudaTensor’[source] Sum all values in the tensor along an axis.
- Parameters
axis (int) – which axis to sum along
keepdim (bool) – keep the summed dimension with size 1
upcast (bool) – if
True
, the output scalar tensor will be promoted to a more expressive data type to avoid saturationout ("""CudaTensor""") – use this tensor as output, must have correct shape, and same data type if
upcast
isFalse
, otherwise promoted type is required
- Returns
tensor summed along axis
- Return type
“””CudaTensor”””
-
Creation & Conversion¶
-
augpy.
cast
(*args, **kwargs)[source]¶ -
augpy.
cast
(tensor: CudaTensor, out: CudaTensor, blocks_per_sm: int = 8, threads: int = 0) → CudaTensor[source] Read values from
tensor
, cast them to the data type ofout
and store them there.tensor
andout
must have the same shape.- Parameters
tensor (CudaTensor) – source tensor
out (CudaTensor) – output tensor
- Return type
-
augpy.
cast
(tensor: CudaTensor, dtype: DLDataType, blocks_per_sm: int = 8, threads: int = 0) → CudaTensor[source] Create a new tensor with values from
tensor
cast to the given data typedtype
.- Parameters
tensor (CudaTensor) – source tensor
dtype (DLDataType) – target data type
- Returns
new tensor with given data type
- Return type
-
-
augpy.
copy
(src: augpy._augpy.CudaTensor, dst: augpy._augpy.CudaTensor, blocks_per_sm: int = 8, threads: int = 0) → augpy._augpy.CudaTensor[source]¶ Copy
src
intodst
. Supports broadcasting.- Return type
-
augpy.
empty_like
(tensor: augpy._augpy.CudaTensor) → augpy._augpy.CudaTensor[source]¶ Create a new tensor with the same shape, dtype and on the same device as
tensor
.- Return type
-
augpy.
array_to_tensor
(*args, **kwargs)[source]¶ -
augpy.
array_to_tensor
(array: buffer, device_id: int = 0) → CudaTensor[source] Copy a Python buffer into a new tensor on the specified GPU device. This initiates an asynchronous copy from host to device memory.
- Return type
-
augpy.
array_to_tensor
(array: buffer, tensor: CudaTensor) → CudaTensor[source] Copy a Python buffer to a tensor created from the given buffer
tensor
. This initiates an asynchronous copy from host to device memory.- Return type
-
-
augpy.
tensor_to_array
(*args, **kwargs)[source]¶ -
augpy.
tensor_to_array
(tensor: CudaTensor) → array[source] Copy a given tensor to a new numpy array. This initiates an asynchronous copy from device to host memory.
- Return type
array
-
augpy.
tensor_to_array
(tensor: CudaTensor, array: buffer) → array[source] Copy a given tensor to a numpy array created from the given buffer
array
. This initiates an asynchronous copy from device to host memory.- Return type
array
-
-
augpy.
import_dltensor
(tensor_capsule: capsule, name: str) → augpy._augpy.CudaTensor[source]¶ Import a GPU tensor from another library into augpy.
- Parameters
tensor_capsule (capsule) – a Python capsule object that contains a
DLManagedTensor
name (str) – name under which the tensor is stored in the capsule, e.g.,
"dltensor"
for Pytorch
- Returns
other tensor wrapped in a
CudaTensor
- Return type
-
augpy.
export_dltensor
(tensor: object, name: str = 'dltensor', destruct: bool = True) → capsule[source]¶ Export a GPU tensor to be used by another library.
- Parameters
pytensor – Python-wrapped CudaTensor
name (str) – name under which the tensor is stored in the returned capsule, e.g., “dltensor” for Pytorch
destruct (bool) – if
True
, add a destructor to the capsule which will delete the tensor when the capsule is deleted; only set toFalse
if you know what you’re doing
- Returns
capsule with exported
CudaTensor
- Return type
capsule
Functions¶
-
augpy.
add
(*args, **kwargs)[source]¶ -
augpy.
add
(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Add a
scalar
value to atensor
.- Parameters
tensor (CudaTensor) – tensor
scalar (float) – scalar value
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
add
(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Add
tensor2
totensor1
.- Parameters
tensor1 (CudaTensor) – first tensor
tensor2 (CudaTensor) – second tensor
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
-
augpy.
sub
(*args, **kwargs)[source]¶ -
augpy.
sub
(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Subtract a
scalar
value from atensor
.- Parameters
tensor (CudaTensor) – tensor
scalar (float) – scalar value
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
sub
(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Subtract
tensor2
fromtensor1
.- Parameters
tensor1 (CudaTensor) – first tensor
tensor2 (CudaTensor) – second tensor
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
-
augpy.
rsub
(tensor: augpy._augpy.CudaTensor, scalar: float, out: augpy._augpy.CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → augpy._augpy.CudaTensor[source]¶ Subtract a
tensor
from ascalar
value.- Parameters
tensor (CudaTensor) – tensor
scalar (float) – scalar value
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
mul
(*args, **kwargs)[source]¶ -
augpy.
mul
(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Multiply a
tensor
by ascalar
value.- Parameters
tensor (CudaTensor) – tensor
scalar (float) – scalar value
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
mul
(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Multiply
tensor1
bytensor2
.- Parameters
tensor1 (CudaTensor) – first tensor
tensor2 (CudaTensor) – second tensor
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
-
augpy.
div
(*args, **kwargs)[source]¶ -
augpy.
div
(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Divide a
tensor
by ascalar
value.- Parameters
tensor (CudaTensor) – tensor
scalar (float) – scalar value
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
div
(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Divide tensor1 by tensor2.
- Parameters
tensor1 (CudaTensor) – first tensor
tensor2 (CudaTensor) – second tensor
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
-
augpy.
rdiv
(tensor: augpy._augpy.CudaTensor, scalar: float, out: augpy._augpy.CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → augpy._augpy.CudaTensor[source]¶ Divide a
scalar
value by atensor
.- Parameters
tensor (CudaTensor) – tensor
scalar (float) – scalar value
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
lt
(*args, **kwargs)[source]¶ -
augpy.
lt
(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Compute
tensor < scalar
asuint8
tensor, where1
means the condition is met and0
otherwise.- Parameters
tensor (CudaTensor) – tensor
scalar (float) – scalar value
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
lt
(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Compute
tensor1 >= tensor2
asuint8
tensor, where1
means the condition is met and0
otherwise.- Parameters
tensor1 (CudaTensor) – first tensor
tensor2 (CudaTensor) – second tensor
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
-
augpy.
le
(*args, **kwargs)[source]¶ -
augpy.
le
(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Compute
tensor <= scalar
asuint8
tensor, where1
means the condition is met and0
otherwise.- Parameters
tensor (CudaTensor) – tensor
scalar (float) – scalar value
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
le
(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Compute
tensor1 >= tensor2
asuint8
tensor, where1
means the condition is met and0
otherwise.- Parameters
tensor1 (CudaTensor) – first tensor
tensor2 (CudaTensor) – second tensor
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
-
augpy.
gt
(tensor: augpy._augpy.CudaTensor, scalar: float, out: augpy._augpy.CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → augpy._augpy.CudaTensor[source]¶ Compute
tensor > scalar
asuint8
tensor, where1
means the condition is met and0
otherwise.- Parameters
tensor (CudaTensor) – tensor
scalar (float) – scalar value
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
ge
(tensor: augpy._augpy.CudaTensor, scalar: float, out: augpy._augpy.CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → augpy._augpy.CudaTensor[source]¶ Compute
tensor >= scalar
asuint8
tensor, where1
means the condition is met and0
otherwise.- Parameters
tensor (CudaTensor) – tensor
scalar (float) – scalar value
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
eq
(*args, **kwargs)[source]¶ -
augpy.
eq
(tensor: CudaTensor, scalar: float, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Compute
tensor == scalar
asuint8
tensor, where1
means the condition is met and0
otherwise.- Parameters
tensor (CudaTensor) – tensor
scalar (float) – scalar value
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
eq
(tensor1: CudaTensor, tensor2: CudaTensor, out: CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → CudaTensor[source] Compute
tensor1 == tensor2
asuint8
tensor, where1
means the condition is met and0
otherwise.- Parameters
tensor1 (CudaTensor) – first tensor
tensor2 (CudaTensor) – second tensor
out (CudaTensor) – optional output tensor
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
-
augpy.
fma
(scalar: float, tensor1: augpy._augpy.CudaTensor, tensor2: augpy._augpy.CudaTensor, out: augpy._augpy.CudaTensor = None, blocks_per_sm: int = 8, threads: int = 512) → augpy._augpy.CudaTensor[source]¶ Compute a fused multiply-add on a scalar and two tensors, i.e.,
\[r = s \cdot t_1 \cdot t_2\]If
tensor1
has an unsigned integer data type, thentensor2
must have the signed version of the same type, e.g., auint8
tensor must be paired with aint8
tensor.- Parameters
scalar (float) – scalar factor
tensor1 (CudaTensor) – tensor \(t_1\)
tensor2 (CudaTensor) – tensor \(t_2\)
out (CudaTensor) – optional output tensor \(r\)
- Returns
new tensor if
out
isNone
, elseout
- Return type
-
augpy.
gemm
(A: augpy._augpy.CudaTensor, B: augpy._augpy.CudaTensor, C: augpy._augpy.CudaTensor = None, alpha: float = 1.0, beta: float = 0.0) → augpy._augpy.CudaTensor[source]¶ Calculate the matrix multiplication of two 2D tensors. More specifically calculates
\[C = A imes (lpha \cdot B) + eta \cdot C\]Only
float
anddouble
are supported.All tensors must have the same data type.
All tensors must be contiguous.
- Returns
new output tensor if
C
isNone
, otherwiseC
- Return type
-
augpy.
fill
(scalar: float, dst: augpy._augpy.CudaTensor, blocks_per_sm: int = 8, threads: int = 0) → augpy._augpy.CudaTensor[source]¶ Fill src with the given scalar value.
- Return type
-
augpy.
sum
(*args, **kwargs)[source]¶ -
augpy.
sum
(tensor: CudaTensor, upcast: bool = False) → CudaTensor[source] Sum all elements in a tensor with saturation.
- Parameters
tensor (CudaTensor) – tensor to sum, must be contiguous
upcast (bool) – if
True
, returns tensor withfloat
ordouble
type
- Returns
sum value as scalar tensor
- Return type
-
augpy.
sum
(tensor: CudaTensor, axis: int, keepdim: bool = False, upcast: bool = False, out: CudaTensor = None, blocks_per_sm: int = 8, num_threads: int = 0) → CudaTensor[source] Sum of all elements along an axis in a tensor with saturation.
- Parameters
tensor (CudaTensor) – tensor to sum, may be strided
axis (int) – axis index to sum along
keepdim (bool) – if
True
, keep sum axis dimension with length 1upcast (bool) – if
True
, returns tensor withfloat
ordouble
typeout (CudaTensor) – output tensor (may be
None
)
- Returns
tensor with values summed along axis, or
None
ifout
is tensor- Return type
-