Elementwise Iteration¶
Warning
doxygenfunction: Cannot find function “elementwise_contiguous_kernel” in doxygen xml output for project “augpy” from directory: /home/docs/checkouts/readthedocs.org/user_builds/augpy/checkouts/latest/doc/source/../xml
-
template<int
n_tensors
, unsigned intvalues_per_thread
, typenameparam_t
, typenameF
>
voidaugpy
::
elementwise_kernel
(array<tensor_param, n_tensors> tensors, const param_t constants, const ndim_array contiguous_strides, const int ndim, const size_t count, size_t shape0)¶ Cuda kernel for elementwise functions on arbitrary tensors.
values_per_thread
defines how many elements in the tensors each thread in each block will calculate. This is done by striding in the first dimension of the tensors. Effectively,values_per_thread
is never larger thanshape0
.- Parameters
tensors
: an array ofn_tensors
tensor_params ; first tensor is output, remainder are inputs; strides must be given in bytesconstants
: constant value given to functioncontiguous_strides
: strides in bytes a contiguous tensor of the same shape would havendim
: number of dimensions in tensorscount
: number of values in tensorsvalues_per_thread
: number of values in tensor each thread in each block computesshape0
: number of elements in first dimension of tensors
- Template Parameters
n_tensors
: number of tensors given to functionparam_t
: type of constant value given to kernel function alongside tensorsF
: function of typevoid F(const array<tensor_param, n_tensors>&, const param_t&)
applied to tensors
-
template<int
n_tensors
, typenameparam_t
, typenameF
>
CudaTensor *augpy
::
elementwise_function
(array<CudaTensor*, n_tensors> tensors, const param_t constant, unsigned int blocks_per_sm, unsigned int num_threads, bool enforce_same_dtype = true)¶ Apply a function elementwise to some tensors.
- Parameters
tensors
: an array ofn_tensors
tensors; first tensor is output, remainder are inputsconstant
: constant value given to functionblocks_per_sm
: number of blocks to create per SM on the GPU; at least 1num_threads
: number of threads in each block; 0 means auto-selectenforce_same_dtype
: if true, throwsstd::invalid_argument
if tensors have different dtypes
- Template Parameters
n_tensors
: number of tensors given to functionparam_t
: type of constant value given to kernel function alongside tensorsF
: function of typevoid (*F)(array<tensor_param, n_tensors>, param_t)
applied to tensors