Index Translation

template<typename scalar_t>
bool augpy::translate_idx_strided(scalar_t *&t, const ndim_array strides, const ndim_array contiguous_strides, const int ndim, const size_t count, const unsigned int values_per_thread, size_t &shape0)

Translate from contiguous index to a single strided tensor.

Parameters
  • t: pointer to tensor data

  • strides: strides in number of elements of the strided tensor

  • contiguous_strides: strides of the contiguous tensor in bytes

  • ndim: number of dimensions

  • count: number of elements in the tensor without the first dimension, i.e., \(\frac{numel(t)}{s_0}\)

  • values_per_thread: number of values calculated by each thread

  • shape0: size of the first dimension; looped over at most values_per_thread times

template<typename scalar1_t, typename scalar2_t>
bool augpy::translate_idx_contiguous_strided(scalar1_t *&t1, const ndim_array t1_strides, scalar2_t *&t2, const ndim_array t2_strides, const int ndim, const size_t count, const unsigned int values_per_thread, size_t &shape0)

Translate from one contiguous to one strided tensor. t1 must be contiguous, t2 may be strided.

Parameters
  • t1: pointer to contiguous tensor data

  • t1_strides: strides in number of elements of t1

  • t2: pointer to strided tensor data

  • t2_strides: strides in number of elements of t2

  • ndim: number of dimensions

  • count: number of elements in the tensor without the first dimension, i.e., \(\frac{numel(t)}{s_0}\)

  • values_per_thread: number of values calculated by each thread

  • shape0: size of the first dimension; looped over at most values_per_thread times

template<typename scalar1_t, typename scalar2_t>
bool augpy::translate_idx_strided_strided(scalar1_t *&t1, const ndim_array t1_strides, scalar2_t *&t2, const ndim_array t2_strides, const ndim_array contiguous_strides, const int ndim, const size_t count, const unsigned int values_per_thread, size_t &shape0)

Translate contiguous index to two strided tensors. t1 and t2 may be strided.

Parameters
  • t1: pointer to first tensor data

  • t1_strides: strides in number of elements of t1

  • t2: pointer to second tensor data

  • t2_strides: strides in number of elements of t2

  • contiguous_strides: strides of the contiguous tensor in bytes

  • ndim: number of dimensions

  • count: number of elements in the tensor without the first dimension, i.e., \(\frac{numel(t)}{s_0}\)

  • values_per_thread: number of values calculated by each thread

  • shape0: size of the first dimension; looped over at most values_per_thread times

template<typename scalar1_t, typename scalar2_t, typename scalar3_t>
bool augpy::translate_idx_strided_strided_strided(scalar1_t *&t1, const ndim_array t1_strides, scalar2_t *&t2, const ndim_array t2_strides, scalar3_t *&t3, const ndim_array t3_strides, const ndim_array contiguous_strides, const int ndim, const size_t count, const unsigned int values_per_thread, size_t &shape0)

Translate contiguous index to three strided tensors. t1, t2, and t3 may be strided.

Parameters
  • t1: pointer to first tensor data

  • t1_strides: strides in number of elements of t1

  • t2: pointer to second tensor data

  • t2_strides: strides in number of elements of t2

  • t3: pointer to third tensor data

  • t3_strides: strides in number of elements of t3

  • contiguous_strides: strides of the contiguous tensor in bytes

  • ndim: number of dimensions

  • count: number of elements in the tensor without the first dimension, i.e., \(\frac{numel(t)}{s_0}\)

  • values_per_thread: number of values calculated by each thread

  • shape0: size of the first dimension; looped over at most values_per_thread times

THREAD_LOOP_1(FUN, COUNTER, P1, STRIDE1)

Loop one strided tensor over first dimension.

Parameters
  • FUN: code to execute

  • COUNTER: counter variable, initially set to number of iterations

  • P1: pointer variable

  • STRIDE1: stride in number of elements in the first dimension

THREAD_LOOP_2(FUN, COUNTER, P1, STRIDE1, P2, STRIDE2)

Loop two strided tensors over first dimension.

Parameters
  • FUN: code to execute

  • COUNTER: counter variable, initially set to number of iterations

  • P1: first pointer variable

  • STRIDE1: first tensor stride in number of elements in the first dimension

  • P2: second pointer variable

  • STRIDE2: second tensor stride in number of elements in the first dimension

THREAD_LOOP_3(FUN, COUNTER, P1, STRIDE1, P2, STRIDE2, P3, STRIDE3)

Loop three strided tensors over first dimension.

Parameters
  • FUN: code to execute

  • COUNTER: counter variable, initially set to number of iterations

  • P1: first pointer variable

  • STRIDE1: first tensor stride in number of elements in the first dimension

  • P2: second pointer variable

  • STRIDE2: second tensor stride in number of elements in the first dimension

  • P3: third pointer variable

  • STRIDE3: third tensor stride in number of elements in the first dimension