Type Casting

The saturate_cast family of function templates provide safe, saturating type casting for both GPU and CPU. Any combination of augpy dtypes can be cast.

Always instantiate the template with both types to ensure that the correct implementation is used.

Cuda type cast intrinsics are used where possible, otherwise min and max math functions are used. Some type combinations use if-the-else branches on CPU.

template<typename input_t, typename output_t>
__device__ __host__ __forceinline__ void saturate_cast(input_t vin, output_t *vout)

Cast a input_t value to output_t and write to vout pointer. The cast is done with saturation, meaning that no under or overflow will occur.

It is ensured that, for any casted value pairs \((v_{in}, v_{out})\) and \((v_{in}', v_{out}')\), if \(v_{in} \le (v_{in}'\), then \(v_{out} \le (v_{out}'\). Similarly, if \(v_{in} \ge (v_{in}'\), then \(v_{out} \ge (v_{out}'\).

When casting from integral to float types, depending on available precision, generally \(v_{in} \neq v_{out}\).

The input is rounded to nearest even when casting from float to integral types.

Template Parameters
  • input_t – input dtype

  • output_t – output dtype

Parameters
  • vin – input value to cast

  • vout – pointer to output value