Image Functions

JPEG Decoding

Hybrid JPEG decoding on CPU and GPU using Nvjpeg.

class augpy.Decoder(device_padding: int = 16777216, host_padding: int = 8388608, gpu_huffman: bool = False)[source]

Bases: augpy._augpy.pybind11_object

Wrapper for Nvjpeg-based JPEG decoding, created on the current_device.

See:

Nvjpeg docs

Parameters
  • device_padding (int) – memory padding on the device

  • host_padding (int) – memory padding on the host

  • gpu_huffman (bool) – enable Huffman decoding on the GPU; not recommended unless you really need to offload from CPU

__init__(self: augpy._augpy.'Decoder', device_padding: int = 16777216, host_padding: int = 8388608, gpu_huffman: bool = False)None[source]
Return type

None

decode(self: augpy._augpy.'Decoder', data: str, buffer: augpy._augpy.CudaTensor = None) → augpy._augpy.CudaTensor[source]

Decode a JPEG image using Nvjpeg. Output is in \((H,W,C)\) format and resides on the GPU device.

Parameters
  • data (str) – compressed JPEG image as a JFIF string, i.e., the full file contents

  • buffer (CudaTensor) – optional buffer to use; may be None; if not None must be big enough to contain the decoded image

Returns

new tensor with decoded image on GPU in \((H,W,C)\) format

Return type

CudaTensor

Affine Warp

Functions to apply affine transformations on 2D images.

augpy.make_transform(source_size: Tuple[int, int], target_size: Tuple[int, int], angle: float = 0, scale: float = 1, aspect: float = 1, shift: Optional[Tuple[float, float]] = None, shear: Optional[Tuple[float, float]] = None, hmirror: bool = False, vmirror: bool = False, scale_mode: Union[str, augpy._augpy.WarpScaleMode] = <augpy._augpy.WarpScaleMode object>, max_supersampling: int = 3, out: Optional[numpy.ndarray] = None, __template__=array([[0., 0., 0.], [0., 0., 0.]], dtype=float32), **__) → Tuple[numpy.ndarray, int][source]

Convenience wrapper for make_affine_matrix().

See:

make_affine_matrix() slightly faster, less convenient version of this function.

Parameters
  • source_size (Tuple[int, int]) – source height (\(h_s\)) and width (\(w_s\))

  • target_size (Tuple[int, int]) – target height (\(h_t\)) and width (\(w_t\))

  • angle (float) – clockwise angle in degrees with image center as rotation axis

  • scale (float) – scale factor relative to output size; 1 means fill target height or width wise depending on scale_mode and whichever is longest/shortest; larger values will crop, smaller values leave empty space in the output canvas

  • aspect (float) – controls the aspect ratio; 1 means same as input, values greater 1 increase the width and reduce the height

  • shift (Optional[Tuple[float, float]]) – (shifty, shiftx) or None for (0, 0); shift the image in y (vertical) and x (horizontal) direction; 0 centers the image on the output canvas; -1 means shift up/left as much as possible; 1 means shfit down/right as much as possible; the maximum distance to shift is \(max(scale \cdot h_s - h_t, h_t - scale \cdot h_s)\)

  • shear (Optional[Tuple[float, float]]) – (sheary, shearx) or None for (0, 0); controls up/down and left/right shear; for every pixel in the x direction move sheary pixels in y direction, same for y direction

  • hmirror (bool) – if True flip image horizontally

  • vmirror (bool) – if True flip image vertically

  • scale_mode (Union[str, WarpScaleMode]) – if WarpScaleMode.WARP_SCALE_SHORTEST scale is relative to shortest side; this fills the output canvas, cropping the image if necessary; if WarpScaleMode.WARP_SCALE_LONGEST scale is relative to longest side; this ensures the image is contained inside the output canvas, but leaves empty space

  • max_supersampling (int) – upper limit for recommended supersampling

  • out (Optional[numpy.ndarray]) – optional \(2 \times 3\) float output array

Returns

transformation matrix and suggested supersampling factor

Return type

Tuple[numpy.ndarray, int]

augpy.make_affine_matrix(out: buffer, source_height: int, source_width: int, target_height: int, target_width: int, angle: float = 0.0, scale: float = 1.0, aspect: float = 1.0, shifty: float = 0.0, shiftx: float = 0.0, sheary: float = 0.0, shearx: float = 0.0, hmirror: bool = False, vmirror: bool = False, scale_mode: augpy._augpy.WarpScaleMode = WarpScaleMode.WARP_SCALE_SHORTEST, max_supersampling: int = 3)int[source]

Create a \(2 imes 3\) matrix for a set of affine transformations. This matrix is compatible with the warpAffine function of OpenCV with the WARP_INVERSE_MAP flag set.

Transforms are applied in the following order:

  1. shear

  2. scale & aspect ratio

  3. horizontal & vertical mirror

  4. rotation

  5. horizontal & vertical shift

See:

make_transform() for a more convenient version of this function.

Parameters
  • out (buffer) – output buffer that matrix is written to; must be a writeable \(2 imes 3\) float buffer

  • source_height (int) – \(h_s\) height of the image in pixels

  • source_width (int) – \(w_s\) width of the image in pixels

  • target_height (int) – \(h_t\) height of the output canvas in pixels

  • target_width (int) – \(w_t\) width of the output canvas in pixels

  • angle (float) – clockwise angle in degrees with image center as rotation axis

  • scale (float) – scale factor relative to output size; 1 means fill target height or width wise depending on scale_mode and whichever is longest/shortest; larger values will crop, smaller values leave empty space in the output canvas

  • aspect (float) – controls the aspect ratio; 1 means same as input, values greater 1 increase the width and reduce the height

  • shifty (float) – shift the image in y direction (vertical); 0 centers the image on the output canvas; -1 means shift up as much as possible; 1 means shfit down as much as possible; the maximum distance to shift is \(max(scale \cdot h_s - h_t, h_t - scale \cdot h_s)\)

  • shiftx (float) – same as shifty, but in x direction (horizontal)

  • sheary (float) – controls up/down shear; for every pixel in the x direction move sheary pixels in y direction

  • shearx (float) – same as sheary but controls left/right shear

  • hmirror (bool) – if True flip image horizontally

  • vmirror (bool) – if True flip image vertically

  • scale_mode (WarpScaleMode) – if WarpScaleMode.WARP_SCALE_SHORTEST scale is relative to shortest side; this fills the output canvas, cropping the image if necessary; if WarpScaleMode.WARP_SCALE_LONGEST scale is relative to longest side; this ensures the image is contained inside the output canvas, but leaves empty space

  • max_supersampling (int) – upper limit for recommended supersampling

Returns

recommended supersampling factor for the warp

Return type

int

augpy.warp_affine(src: augpy._augpy.CudaTensor, dst: augpy._augpy.CudaTensor, matrix: buffer, background: augpy._augpy.CudaTensor, supersampling: int)None[source]

Takes an image in channels-last format \((H, W, C)\) and affine warps it into a given output tensor in channels-first format \((C, H, W)\). Any blank canvas is filled with a background color. The warp is performed with bi-linear and supersampling.

Parameters
  • src (CudaTensor) – image tensor

  • dst (CudaTensor) – target tensor

  • matrix (buffer) – \(2 imes 3\) float transformation matrix, see make_affine_matrix() for details

  • background (CudaTensor) – background color to fill empty canvas

  • supersampling (int) – supersampling factor, e.g., 3 means 9 samples are taken in a \(3 imes 3\) grid

Return type

None

class augpy.WarpScaleMode(arg0: int)[source]

Bases: object

Enum whether to scale relative to the shortest or longest side of the image.

Members:

WARP_SCALE_SHORTEST :

Scaling is relative to the shortest side of the image.

WARP_SCALE_LONGEST :

Scaling is relative to the longest side of the image.

property WARP_SCALE_LONGEST
Enum whether to scale relative to the

shortest or longest side of the image.

Members:

WARP_SCALE_SHORTEST :

Scaling is relative to the shortest side of the image.

WARP_SCALE_LONGEST :

Scaling is relative to the longest side of the image.

property WARP_SCALE_SHORTEST
Enum whether to scale relative to the

shortest or longest side of the image.

Members:

WARP_SCALE_SHORTEST :

Scaling is relative to the shortest side of the image.

WARP_SCALE_LONGEST :

Scaling is relative to the longest side of the image.

__init__(self: augpy._augpy.'WarpScaleMode', arg0: int)None[source]
Return type

None

Lighting

The following functions change the lighting of 2D images. Both input and output must be contiguous and in channel-first format \((C,H,W)\) (channel, height, width). All dtypes and an arbitrary number of channels is supported.

Output tensor out may be None, in which case a new tensor of the same shape and dtype as the input is returned. Output tensor must be same shape and dtype as the input. If output is given None is returned.

augpy.lighting(imtensor: augpy._augpy.CudaTensor, gammagrays: augpy._augpy.CudaTensor, gammacolors: augpy._augpy.CudaTensor, contrasts: augpy._augpy.CudaTensor, vmin: float, vmax: float, out: augpy._augpy.CudaTensor = None) → augpy._augpy.CudaTensor[source]

Apply lighting augmentation to a batch of images. This is a four-step process:

  1. Normalize values :math:`v_{norm} =

rac{v - v_{min}}{v_{max}-v_{min}}`

with \(v_{max}\) the minimum and \(v_{max}\) the maximum lightness value

  1. Apply contrast change

  2. Apply gamma correction

  3. Denormalize values \(v' = v_{norm} * (v_{max}-v_{min}) + v_{min}\)

To change contrast two reference functions are used. With contrast \(\mathcal{c} \ge 0\), i.e., increased contrast, the following function is used:

\[f_{pos}(v) =\]

rac{1.0037575963899724}{1 + exp(6.279 + v cdot 12.558)} - 0.0018787981949862

With contrast \(\mathcal{c} < 0\), i.e., decreased contrast, the following function is used:

\[f_{neg}(v) = 0.1755606108304832 \cdot atanh(v \cdot 1.986608 - 0.993304) + 0.5\]

The final value is \(v' = (1-\mathcal{c}) \cdot v + \mathcal{c} \cdot f(v)\).

Brightness and color changes are done via gamma correction.

\[v' = v^{\gamma_{gray} \cdot \gamma_c}\]

with \(\gamma_{gray}\) the gamma for overall lightness and \(\gamma_{c}\) the per-channel gamma.

Parameters:

tensor: image tensor in \((N,C,H,W)\) format gammagrays: tensor of \(N\) gamma gray values gammacolors: tensor of \(C\cdot N\) gamma values in the format

\(\gamma_{1,1}, \gamma_{1,2}, ..., \gamma_{1,C}, \gamma_{2,1}, \gamma_{2,2}, ... \gamma_{N,C-1}, \gamma_{N,C}\)

contrasts: tensor of \(N\) contrast values in \([-1, 1]\) vmin: minimum lightness value in images vmax: maximum lightness value in images out: output tensor (may be None)

Returns:

new tensor if out is None, else out

rtype

CudaTensor

Blur

The following functions apply different types of blur on 2D images. Both input and output must be contiguous and in channel-first format (channel, height, width). All dtypes are supported. Edge values are repeated for locations that fall outside the input image.

Output tensor out may be None, in which case a new tensor of the same shape and dtype as the input is returned. Output tensor must be same shape and dtype as the input. If output is given None is returned.

augpy.box_blur_single(input: augpy._augpy.CudaTensor, ksize: int, out: augpy._augpy.CudaTensor = None) → augpy._augpy.CudaTensor[source]

Apply box blur to a single image.

Kernel size describes both width and height in pixels of the area in the input that is averaged for each output pixel. Odd values are recommended for best results. For even values, the center of the kernel is below and to the right of the true center. This means the output is shifted up and left by half a pixel.

Parameters
  • input (CudaTensor) – image tensor in channel-first format

  • ksize (int) – kernel size in pixels

  • out (CudaTensor) – output tensor (may be None)

Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.gaussian_blur_single(input: augpy._augpy.CudaTensor, sigma: float, out: augpy._augpy.CudaTensor = None) → augpy._augpy.CudaTensor[source]

Apply Gaussian blur to a single image.

Kernel size is calculated like this:

ksize = max(3, int(sigma * 6.6 - 2.3) | 1)

I.e., ksize is at least 3 and always odd.

Parameters
  • input (CudaTensor) – image tensor in channel-first format

  • sigma (float) – standard deviation of the kernel

  • out (CudaTensor) – output tensor (may be None)

Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.gaussian_blur(input: augpy._augpy.CudaTensor, sigmas: augpy._augpy.CudaTensor, max_ksize: int, out: augpy._augpy.CudaTensor = None) → augpy._augpy.CudaTensor[source]

Apply Gaussian blur to a batch of images.

Maximum kernel size can be calculated like this:

ksize = max(3, int(max(sigmas) * 6.6 - 2.3) | 1)

I.e., ksize is at least 3 and always odd.

The given kernel size defines the upper limit. The actual kernel size is calculated with the formula above and clipped at the given maximum.

Smaller values can be given to trade speed vs quality. Bigger values typically do not visibly improve quality.

Odd values are strongly recommended for best results. For even values, the center of the kernel is below and to the right of the true center. This means the output is shifted up and left by half a pixel. This can lead to inconsistencies between images in the batch. Images with large sigmas may be shifted, while smaller sigmas mean no shift occurs.

Parameters
  • input (CudaTensor) – batch tensor with images in first dimension

  • sigmas (CudaTensor) – float tensor with one sigma value per image in the batch

  • max_ksize (int) – maximum kernel size in pixels

  • out (CudaTensor) – output tensor (may be None)

Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.image

class augpy.image.DecodeWarp(batch_size: int, shape: Tuple[int, int, int], background: augpy._augpy.CudaTensor = None, dtype: augpy._augpy.DLDataType = <augpy._augpy.DLDataType object>, cpu_threads: int = 1, num_buffers: int = 2, decode_buffer_size: Optional[int] = None)[source]

Bases: object

Use Decoder.decode() to decode JPEG images in memory and apply warp_affine() into batch tensor buffers.

DecodeWarp instances allocate buffers and decoders on the current_device.

Parameters
  • batch_size (int) – number of samples in a batch

  • shape (Tuple[int, int, int]) – shape of one image in the batch \((C,H,W)\)

  • background (CudaTensor) – tensor with \(C\) background color values

  • dtype (DLDataType) – type used for buffer tensors

  • cpu_threads (int) – number of parallel decoders

  • num_buffers (int) – number of buffers tensors

  • decode_buffer_size (Optional[int]) – size of pre-allocated buffer for decoding; must be larger than the number of subpixels; if None a new buffer is allocated every time

__call__(batch: dict)dict[source]

Decode a list of JPEG images under the 'image' key and warp them into a batch tensor with the parameters defined by a list of augmentation dicts under key 'augmentation'.

Each set of augmentation parameters is a dict that contains values for the parameters of the warp_affine() function. Additional parameters are ignored.

Parameters

batch (dict) – dict {'image': [JPEG, JPEG, ...], 'augmentation': [params, params, ...]}

Returns

batch where 'image' is replaced by a batch tensor of transformed images.

Return type

dict

finalize_batch(buffer)[source]
class augpy.image.Lighting(batch_size: int, channels: int = 3, min_value: Union[int, float] = 0, max_value: Union[int, float] = 255)[source]

Bases: object

Apply the lighting() function to a batch of images. The batch tensor must have format \((N,C,H,W)\), with batch size \(N\), number of channels \(C\), height \(H\), and width \(W\).

Lighting instances allocate buffers on the current_device.

Parameters
  • batch_size (int) – number of samples in a batch

  • channels (int) – number of channels \(C\) per image

  • min_value (Union[int, float]) – minimum brightness value, typically 0 for 8 bit images

  • max_value (Union[int, float]) – maximum brightness value, typically 255 for 8 bit images

__call__(batch)[source]

Apply lighting augmentation to a batch of images in a tensor under the 'image' key, with parameters parameters defined by a list of augmentation dicts under key 'augmentation'. Modifies the image tensor in-place.

Each set of augmentation parameters is a dict that contains values for the parameters of the lighting() function. Additional parameters are ignored.

Parameters

batch – dict {'image': [JPEG, JPEG, ...], 'augmentation': [params, params, ...]}

Returns

batch where 'image' has been modified in-place according to given augmentation parameters.