Elementwise Iteration

Warning

doxygenfunction: Cannot find function “elementwise_contiguous_kernel” in doxygen xml output for project “augpy” from directory: /home/docs/checkouts/readthedocs.org/user_builds/augpy/checkouts/latest/doc/source/../xml

template<int n_tensors, unsigned int values_per_thread, typename param_t, typename F>
void augpy::elementwise_kernel(array<tensor_param, n_tensors> tensors, const param_t constants, const ndim_array contiguous_strides, const int ndim, const size_t count, size_t shape0)

Cuda kernel for elementwise functions on arbitrary tensors. values_per_thread defines how many elements in the tensors each thread in each block will calculate. This is done by striding in the first dimension of the tensors. Effectively, values_per_thread is never larger than shape0.

Parameters
  • tensors: an array of n_tensors tensor_params ; first tensor is output, remainder are inputs; strides must be given in bytes

  • constants: constant value given to function

  • contiguous_strides: strides in bytes a contiguous tensor of the same shape would have

  • ndim: number of dimensions in tensors

  • count: number of values in tensors

  • values_per_thread: number of values in tensor each thread in each block computes

  • shape0: number of elements in first dimension of tensors

Template Parameters
  • n_tensors: number of tensors given to function

  • param_t: type of constant value given to kernel function alongside tensors

  • F: function of type void F(const array<tensor_param, n_tensors>&, const param_t&) applied to tensors

template<int n_tensors, typename param_t, typename F>
CudaTensor *augpy::elementwise_function(array<CudaTensor*, n_tensors> tensors, const param_t constant, unsigned int blocks_per_sm, unsigned int num_threads, bool enforce_same_dtype = true)

Apply a function elementwise to some tensors.

Parameters
  • tensors: an array of n_tensors tensors; first tensor is output, remainder are inputs

  • constant: constant value given to function

  • blocks_per_sm: number of blocks to create per SM on the GPU; at least 1

  • num_threads: number of threads in each block; 0 means auto-select

  • enforce_same_dtype: if true, throws std::invalid_argument if tensors have different dtypes

Template Parameters
  • n_tensors: number of tensors given to function

  • param_t: type of constant value given to kernel function alongside tensors

  • F: function of type void (*F)(array<tensor_param, n_tensors>, param_t) applied to tensors