Elementwise Iteration¶

Warning

doxygenfunction: Cannot find function “elementwise_contiguous_kernel” in doxygen xml output for project “augpy” from directory: /home/docs/checkouts/readthedocs.org/user_builds/augpy/checkouts/latest/doc/source/../xml

template<int n_tensors, unsigned int values_per_thread, typename param_t, typename F> void augpy::elementwise_kernel(array<tensor_param, n_tensors> tensors, const param_t constants, const ndim_array contiguous_strides, const int ndim, const size_t count, size_t shape0)¶

Cuda kernel for elementwise functions on arbitrary tensors. values_per_thread defines how many elements in the tensors each thread in each block will calculate. This is done by striding in the first dimension of the tensors. Effectively, values_per_thread is never larger than shape0.

Parameters

tensors: an array of n_tensors tensor_params ; first tensor is output, remainder are inputs; strides must be given in bytes
constants: constant value given to function
contiguous_strides: strides in bytes a contiguous tensor of the same shape would have
ndim: number of dimensions in tensors
count: number of values in tensors
values_per_thread: number of values in tensor each thread in each block computes
shape0: number of elements in first dimension of tensors

Template Parameters

n_tensors: number of tensors given to function
param_t: type of constant value given to kernel function alongside tensors
F: function of type void F(const array<tensor_param, n_tensors>&, const param_t&) applied to tensors

template<int n_tensors, typename param_t, typename F> CudaTensor *augpy::elementwise_function(array<CudaTensor*, n_tensors> tensors, const param_t constant, unsigned int blocks_per_sm, unsigned int num_threads, bool enforce_same_dtype = true)¶

Apply a function elementwise to some tensors.

Parameters

tensors: an array of n_tensors tensors; first tensor is output, remainder are inputs
constant: constant value given to function
blocks_per_sm: number of blocks to create per SM on the GPU; at least 1
num_threads: number of threads in each block; 0 means auto-select
enforce_same_dtype: if true, throws std::invalid_argument if tensors have different dtypes

Template Parameters

n_tensors: number of tensors given to function
param_t: type of constant value given to kernel function alongside tensors
F: function of type void (*F)(array<tensor_param, n_tensors>, param_t) applied to tensors