Image Functions¶

JPEG Decoding¶

Hybrid JPEG decoding on CPU and GPU using Nvjpeg.

class augpy.Decoder(device_padding: int = 16777216, host_padding: int = 8388608, gpu_huffman: bool = False)[source]¶

Bases: augpy._augpy.pybind11_object

Wrapper for Nvjpeg-based JPEG decoding, created on the current_device.

See:: Nvjpeg docs

Parameters

device_padding (int) – memory padding on the device
host_padding (int) – memory padding on the host
gpu_huffman (bool) – enable Huffman decoding on the GPU; not recommended unless you really need to offload from CPU

__init__(self: augpy._augpy.'Decoder', device_padding: int = 16777216, host_padding: int = 8388608, gpu_huffman: bool = False) → None [source]¶

Return type: None

decode(self: augpy._augpy.'Decoder', data: str, buffer: augpy._augpy.CudaTensor = None) → augpy._augpy.CudaTensor[source]¶

Decode a JPEG image using Nvjpeg. Output is in \((H,W,C)\) format and resides on the GPU device.

Parameters

data (str) – compressed JPEG image as a JFIF string, i.e., the full file contents
buffer (CudaTensor) – optional buffer to use; may be None; if not None must be big enough to contain the decoded image

Returns

new tensor with decoded image on GPU in \((H,W,C)\) format

Return type

CudaTensor

Affine Warp¶

Functions to apply affine transformations on 2D images.

augpy.make_transform(source_size: Tuple[int, int], target_size: Tuple[int, int], angle: float = 0, scale: float = 1, aspect: float = 1, shift: Optional[Tuple[float, float]] = None, shear: Optional[Tuple[float, float]] = None, hmirror: bool = False, vmirror: bool = False, scale_mode: Union[str, augpy._augpy.WarpScaleMode] = <augpy._augpy.WarpScaleMode object>, max_supersampling: int = 3, out: Optional[numpy.ndarray] = None, __template__=array([[0., 0., 0.], [0., 0., 0.]], dtype=float32), **__) → Tuple[numpy.ndarray, int][source]¶

Convenience wrapper for make_affine_matrix().

See:: make_affine_matrix() slightly faster, less convenient version of this function.

Parameters

source_size (Tuple[int, int]) – source height (\(h_s\)) and width (\(w_s\))
target_size (Tuple[int, int]) – target height (\(h_t\)) and width (\(w_t\))
angle (float) – clockwise angle in degrees with image center as rotation axis
scale (float) – scale factor relative to output size; 1 means fill target height or width wise depending on scale_mode and whichever is longest/shortest; larger values will crop, smaller values leave empty space in the output canvas
aspect (float) – controls the aspect ratio; 1 means same as input, values greater 1 increase the width and reduce the height
shift (Optional[Tuple[float, float]]) – (shifty, shiftx) or None for (0, 0); shift the image in y (vertical) and x (horizontal) direction; 0 centers the image on the output canvas; -1 means shift up/left as much as possible; 1 means shfit down/right as much as possible; the maximum distance to shift is \(max(scale \cdot h_s - h_t, h_t - scale \cdot h_s)\)
shear (Optional[Tuple[float, float]]) – (sheary, shearx) or None for (0, 0); controls up/down and left/right shear; for every pixel in the x direction move sheary pixels in y direction, same for y direction
hmirror (bool) – if True flip image horizontally
vmirror (bool) – if True flip image vertically
scale_mode (Union[str, WarpScaleMode]) – if WarpScaleMode.WARP_SCALE_SHORTEST scale is relative to shortest side; this fills the output canvas, cropping the image if necessary; if WarpScaleMode.WARP_SCALE_LONGEST scale is relative to longest side; this ensures the image is contained inside the output canvas, but leaves empty space
max_supersampling (int) – upper limit for recommended supersampling
out (Optional[numpy.ndarray]) – optional \(2 \times 3\) float output array

Returns

transformation matrix and suggested supersampling factor

Return type

Tuple[numpy.ndarray, int]

augpy.make_affine_matrix(out: buffer, source_height: int, source_width: int, target_height: int, target_width: int, angle: float = 0.0, scale: float = 1.0, aspect: float = 1.0, shifty: float = 0.0, shiftx: float = 0.0, sheary: float = 0.0, shearx: float = 0.0, hmirror: bool = False, vmirror: bool = False, scale_mode: augpy._augpy.WarpScaleMode = WarpScaleMode.WARP_SCALE_SHORTEST, max_supersampling: int = 3) → int [source]¶

Create a \(2 imes 3\) matrix for a set of affine transformations. This matrix is compatible with the warpAffine function of OpenCV with the WARP_INVERSE_MAP flag set.

Transforms are applied in the following order:

shear
scale & aspect ratio
horizontal & vertical mirror
rotation
horizontal & vertical shift

See:: make_transform() for a more convenient version of this function.

Parameters

out (buffer) – output buffer that matrix is written to; must be a writeable \(2 imes 3\) float buffer
source_height (int) – \(h_s\) height of the image in pixels
source_width (int) – \(w_s\) width of the image in pixels
target_height (int) – \(h_t\) height of the output canvas in pixels
target_width (int) – \(w_t\) width of the output canvas in pixels
angle (float) – clockwise angle in degrees with image center as rotation axis
scale (float) – scale factor relative to output size; 1 means fill target height or width wise depending on scale_mode and whichever is longest/shortest; larger values will crop, smaller values leave empty space in the output canvas
aspect (float) – controls the aspect ratio; 1 means same as input, values greater 1 increase the width and reduce the height
shifty (float) – shift the image in y direction (vertical); 0 centers the image on the output canvas; -1 means shift up as much as possible; 1 means shfit down as much as possible; the maximum distance to shift is \(max(scale \cdot h_s - h_t, h_t - scale \cdot h_s)\)
shiftx (float) – same as shifty, but in x direction (horizontal)
sheary (float) – controls up/down shear; for every pixel in the x direction move sheary pixels in y direction
shearx (float) – same as sheary but controls left/right shear
hmirror (bool) – if True flip image horizontally
vmirror (bool) – if True flip image vertically
scale_mode (WarpScaleMode) – if WarpScaleMode.WARP_SCALE_SHORTEST scale is relative to shortest side; this fills the output canvas, cropping the image if necessary; if WarpScaleMode.WARP_SCALE_LONGEST scale is relative to longest side; this ensures the image is contained inside the output canvas, but leaves empty space
max_supersampling (int) – upper limit for recommended supersampling

Returns

recommended supersampling factor for the warp

Return type

int

augpy.warp_affine(src: augpy._augpy.CudaTensor, dst: augpy._augpy.CudaTensor, matrix: buffer, background: augpy._augpy.CudaTensor, supersampling: int) → None [source]¶

Takes an image in channels-last format \((H, W, C)\) and affine warps it into a given output tensor in channels-first format \((C, H, W)\). Any blank canvas is filled with a background color. The warp is performed with bi-linear and supersampling.

Parameters

src (CudaTensor) – image tensor
dst (CudaTensor) – target tensor
matrix (buffer) – \(2 imes 3\) float transformation matrix, see make_affine_matrix() for details
background (CudaTensor) – background color to fill empty canvas
supersampling (int) – supersampling factor, e.g., 3 means 9 samples are taken in a \(3 imes 3\) grid

Return type

None

class augpy.WarpScaleMode(arg0: int)[source]¶

Bases: object

Enum whether to scale relative to the shortest or longest side of the image.

Members:

WARP_SCALE_SHORTEST :
Scaling is relative to the shortest side of the image.

WARP_SCALE_LONGEST :
Scaling is relative to the longest side of the image.

property WARP_SCALE_LONGEST¶

Enum whether to scale relative to the: shortest or longest side of the image.

Members:

WARP_SCALE_SHORTEST :
Scaling is relative to the shortest side of the image.

WARP_SCALE_LONGEST :
Scaling is relative to the longest side of the image.

property WARP_SCALE_SHORTEST¶

Enum whether to scale relative to the: shortest or longest side of the image.

Members:

WARP_SCALE_SHORTEST :
Scaling is relative to the shortest side of the image.

WARP_SCALE_LONGEST :
Scaling is relative to the longest side of the image.

__init__(self: augpy._augpy.'WarpScaleMode', arg0: int) → None [source]¶

Return type: None

Lighting¶

The following functions change the lighting of 2D images. Both input and output must be contiguous and in channel-first format \((C,H,W)\) (channel, height, width). All dtypes and an arbitrary number of channels is supported.

Output tensor out may be None, in which case a new tensor of the same shape and dtype as the input is returned. Output tensor must be same shape and dtype as the input. If output is given None is returned.

augpy.lighting(imtensor: augpy._augpy.CudaTensor, gammagrays: augpy._augpy.CudaTensor, gammacolors: augpy._augpy.CudaTensor, contrasts: augpy._augpy.CudaTensor, vmin: float, vmax: float, out: augpy._augpy.CudaTensor = None) → augpy._augpy.CudaTensor[source]¶

Apply lighting augmentation to a batch of images. This is a four-step process:

Normalize values :math:`v_{norm} =

rac{v - v_{min}}{v_{max}-v_{min}}`

with \(v_{max}\) the minimum and \(v_{max}\) the maximum lightness value

Apply contrast change
Apply gamma correction
Denormalize values \(v' = v_{norm} * (v_{max}-v_{min}) + v_{min}\)

To change contrast two reference functions are used. With contrast \(\mathcal{c} \ge 0\), i.e., increased contrast, the following function is used:

\[f_{pos}(v) =\]

rac{1.0037575963899724}{1 + exp(6.279 + v cdot 12.558)} - 0.0018787981949862

With contrast \(\mathcal{c} < 0\), i.e., decreased contrast, the following function is used:

\[f_{neg}(v) = 0.1755606108304832 \cdot atanh(v \cdot 1.986608 - 0.993304) + 0.5\]

The final value is \(v' = (1-\mathcal{c}) \cdot v + \mathcal{c} \cdot f(v)\).

Brightness and color changes are done via gamma correction.

\[v' = v^{\gamma_{gray} \cdot \gamma_c}\]

with \(\gamma_{gray}\) the gamma for overall lightness and \(\gamma_{c}\) the per-channel gamma.

Parameters:
tensor: image tensor in \((N,C,H,W)\) format gammagrays: tensor of \(N\) gamma gray values gammacolors: tensor of \(C\cdot N\) gamma values in the format

\(\gamma_{1,1}, \gamma_{1,2}, ..., \gamma_{1,C}, \gamma_{2,1}, \gamma_{2,2}, ... \gamma_{N,C-1}, \gamma_{N,C}\)

contrasts: tensor of \(N\) contrast values in \([-1, 1]\) vmin: minimum lightness value in images vmax: maximum lightness value in images out: output tensor (may be None)

Returns:
new tensor if out is None, else out

rtype

CudaTensor

Blur¶

The following functions apply different types of blur on 2D images. Both input and output must be contiguous and in channel-first format (channel, height, width). All dtypes are supported. Edge values are repeated for locations that fall outside the input image.

Output tensor out may be None, in which case a new tensor of the same shape and dtype as the input is returned. Output tensor must be same shape and dtype as the input. If output is given None is returned.

augpy.box_blur_single(input: augpy._augpy.CudaTensor, ksize: int, out: augpy._augpy.CudaTensor = None) → augpy._augpy.CudaTensor[source]¶

Apply box blur to a single image.

Kernel size describes both width and height in pixels of the area in the input that is averaged for each output pixel. Odd values are recommended for best results. For even values, the center of the kernel is below and to the right of the true center. This means the output is shifted up and left by half a pixel.

Parameters

input (CudaTensor) – image tensor in channel-first format
ksize (int) – kernel size in pixels
out (CudaTensor) – output tensor (may be None)

Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.gaussian_blur_single(input: augpy._augpy.CudaTensor, sigma: float, out: augpy._augpy.CudaTensor = None) → augpy._augpy.CudaTensor[source]¶

Apply Gaussian blur to a single image.

Kernel size is calculated like this:

ksize = max(3, int(sigma * 6.6 - 2.3) | 1)

I.e., ksize is at least 3 and always odd.

Parameters

input (CudaTensor) – image tensor in channel-first format
sigma (float) – standard deviation of the kernel
out (CudaTensor) – output tensor (may be None)

Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.gaussian_blur(input: augpy._augpy.CudaTensor, sigmas: augpy._augpy.CudaTensor, max_ksize: int, out: augpy._augpy.CudaTensor = None) → augpy._augpy.CudaTensor[source]¶

Apply Gaussian blur to a batch of images.

Maximum kernel size can be calculated like this:

ksize = max(3, int(max(sigmas) * 6.6 - 2.3) | 1)

I.e., ksize is at least 3 and always odd.

The given kernel size defines the upper limit. The actual kernel size is calculated with the formula above and clipped at the given maximum.

Smaller values can be given to trade speed vs quality. Bigger values typically do not visibly improve quality.

Odd values are strongly recommended for best results. For even values, the center of the kernel is below and to the right of the true center. This means the output is shifted up and left by half a pixel. This can lead to inconsistencies between images in the batch. Images with large sigmas may be shifted, while smaller sigmas mean no shift occurs.

Parameters

input (CudaTensor) – batch tensor with images in first dimension
sigmas (CudaTensor) – float tensor with one sigma value per image in the batch
max_ksize (int) – maximum kernel size in pixels
out (CudaTensor) – output tensor (may be None)

Returns

new tensor if out is None, else out

Return type

CudaTensor

augpy.image¶

class augpy.image.DecodeWarp(batch_size: int, shape: Tuple[int, int, int], background: augpy._augpy.CudaTensor = None, dtype: augpy._augpy.DLDataType = <augpy._augpy.DLDataType object>, cpu_threads: int = 1, num_buffers: int = 2, decode_buffer_size: Optional[int] = None)[source]¶

Bases: object

Use Decoder.decode() to decode JPEG images in memory and apply warp_affine() into batch tensor buffers.

DecodeWarp instances allocate buffers and decoders on the current_device.

Parameters

batch_size (int) – number of samples in a batch
shape (Tuple[int, int, int]) – shape of one image in the batch \((C,H,W)\)
background (CudaTensor) – tensor with \(C\) background color values
dtype (DLDataType) – type used for buffer tensors
cpu_threads (int) – number of parallel decoders
num_buffers (int) – number of buffers tensors
decode_buffer_size (Optional[int]) – size of pre-allocated buffer for decoding; must be larger than the number of subpixels; if None a new buffer is allocated every time

__call__(batch: dict) → dict [source]¶

Decode a list of JPEG images under the 'image' key and warp them into a batch tensor with the parameters defined by a list of augmentation dicts under key 'augmentation'.

Each set of augmentation parameters is a dict that contains values for the parameters of the warp_affine() function. Additional parameters are ignored.

Parameters: batch (dict) – dict {'image': [JPEG, JPEG, ...], 'augmentation': [params, params, ...]}
Returns: batch where 'image' is replaced by a batch tensor of transformed images.
Return type: dict

finalize_batch(buffer)[source]¶

class augpy.image.Lighting(batch_size: int, channels: int = 3, min_value: Union[int, float] = 0, max_value: Union[int, float] = 255)[source]¶

Bases: object

Apply the lighting() function to a batch of images. The batch tensor must have format \((N,C,H,W)\), with batch size \(N\), number of channels \(C\), height \(H\), and width \(W\).

Lighting instances allocate buffers on the current_device.

Parameters

batch_size (int) – number of samples in a batch
channels (int) – number of channels \(C\) per image
min_value (Union[int, float]) – minimum brightness value, typically 0 for 8 bit images
max_value (Union[int, float]) – maximum brightness value, typically 255 for 8 bit images

__call__(batch)[source]¶

Apply lighting augmentation to a batch of images in a tensor under the 'image' key, with parameters parameters defined by a list of augmentation dicts under key 'augmentation'. Modifies the image tensor in-place.

Each set of augmentation parameters is a dict that contains values for the parameters of the lighting() function. Additional parameters are ignored.

Parameters: batch – dict {'image': [JPEG, JPEG, ...], 'augmentation': [params, params, ...]}
Returns: batch where 'image' has been modified in-place according to given augmentation parameters.