Transforming and augmenting images¶

Transforms are common epitome transformations available in the torchvision.transforms module. They can exist chained together using Compose . Near transform classes have a function equivalent: functional transforms give fine-grained command over the transformations. This is useful if you have to build a more complex transformation pipeline (east.g. in the case of segmentation tasks).

Most transformations have both PIL images and tensor images, although some transformations are PIL-only and some are tensor-only. The Conversion Transforms may be used to catechumen to and from PIL images.

The transformations that accept tensor images also accept batches of tensor images. A Tensor Image is a tensor with (C, H, W) shape, where C is a number of channels, H and W are image height and width. A batch of Tensor Images is a tensor of (B, C, H, W) shape, where B is a number of images in the batch.

The expected range of the values of a tensor prototype is implicitly defined by the tensor dtype. Tensor images with a float dtype are expected to have values in [0, 1) . Tensor images with an integer dtype are expected to accept values in [0, MAX_DTYPE] where MAX_DTYPE is the largest value that can be represented in that dtype.

Randomized transformations will use the aforementioned transformation to all the images of a given batch, but they volition produce unlike transformations across calls. For reproducible transformations across calls, you may use functional transforms.

The post-obit examples illustrate the utilise of the bachelor transforms:

  • Analogy of transforms

    _images/sphx_glr_plot_transforms_001.png

  • Tensor transforms and JIT

    _images/sphx_glr_plot_scripted_tensor_transforms_001.png

Alert

Since v0.8.0 all random transformations are using torch default random generator to sample random parameters. Information technology is a astern compatibility breaking change and user should set the random state every bit following:

                                    # Previous versions                  # import random                  # random.seed(12)                  # Now                  import                  torch                  torch                  .                  manual_seed                  (                  17                  )                

Please, keep in heed that the same seed for torch random generator and Python random generator volition not produce the aforementioned results.

Scriptable transforms¶

In order to script the transformations, please utilize torch.nn.Sequential instead of Etch .

                                    transforms                  =                  torch                  .                  nn                  .                  Sequential                  (                  transforms                  .                  CenterCrop                  (                  x                  ),                  transforms                  .                  Normalize                  ((                  0.485                  ,                  0.456                  ,                  0.406                  ),                  (                  0.229                  ,                  0.224                  ,                  0.225                  )),                  )                  scripted_transforms                  =                  torch                  .                  jit                  .                  script                  (                  transforms                  )                

Make sure to use only scriptable transformations, i.e. that piece of work with torch.Tensor and does not require lambda functions or PIL.Image .

For any custom transformations to be used with torch.jit.script , they should be derived from torch.nn.Module .

Compositions of transforms¶

Compose (transforms)

Composes several transforms together.

Transforms on PIL Image and torch.*Tensor¶

CenterCrop (size)

Crops the given image at the centre.

ColorJitter ([brightness, contrast, …])

Randomly change the brightness, contrast, saturation and hue of an image.

FiveCrop (size)

Crop the given image into four corners and the primal crop.

Grayscale ([num_output_channels])

Convert paradigm to grayscale.

Pad (padding[, fill, padding_mode])

Pad the given paradigm on all sides with the given "pad" value.

RandomAffine (degrees[, interpret, calibration, …])

Random affine transformation of the image keeping eye invariant.

RandomApply (transforms[, p])

Utilise randomly a list of transformations with a given probability.

RandomCrop (size[, padding, pad_if_needed, …])

Crop the given prototype at a random location.

RandomGrayscale ([p])

Randomly catechumen image to grayscale with a probability of p (default 0.i).

RandomHorizontalFlip ([p])

Horizontally flip the given image randomly with a given probability.

RandomPerspective ([distortion_scale, p, …])

Performs a random perspective transformation of the given epitome with a given probability.

RandomResizedCrop (size[, calibration, ratio, …])

Crop a random portion of image and resize it to a given size.

RandomRotation (degrees[, interpolation, …])

Rotate the prototype by angle.

RandomVerticalFlip ([p])

Vertically flip the given image randomly with a given probability.

Resize (size[, interpolation, max_size, …])

Resize the input image to the given size.

TenCrop (size[, vertical_flip])

Crop the given prototype into four corners and the central crop plus the flipped version of these (horizontal flipping is used past default).

GaussianBlur (kernel_size[, sigma])

Blurs image with randomly called Gaussian blur.

RandomInvert ([p])

Inverts the colors of the given prototype randomly with a given probability.

RandomPosterize ($.25[, p])

Posterize the image randomly with a given probability by reducing the number of bits for each color channel.

RandomSolarize (threshold[, p])

Solarize the paradigm randomly with a given probability by inverting all pixel values in a higher place a threshold.

RandomAdjustSharpness (sharpness_factor[, p])

Suit the sharpness of the image randomly with a given probability.

RandomAutocontrast ([p])

Autocontrast the pixels of the given epitome randomly with a given probability.

RandomEqualize ([p])

Equalize the histogram of the given image randomly with a given probability.

Transforms on PIL Image but¶

RandomChoice (transforms[, p])

Apply single transformation randomly picked from a list.

RandomOrder (transforms)

Use a list of transformations in a random order.

Transforms on torch.*Tensor only¶

LinearTransformation (transformation_matrix, …)

Transform a tensor image with a square transformation matrix and a mean_vector computed offline.

Normalize (mean, std[, inplace])

Normalize a tensor epitome with mean and standard deviation.

RandomErasing ([p, scale, ratio, value, inplace])

Randomly selects a rectangle region in an torch Tensor epitome and erases its pixels.

ConvertImageDtype (dtype)

Convert a tensor image to the given dtype and scale the values accordingly This part does not support PIL Image.

Conversion Transforms¶

ToPILImage ([mode])

Convert a tensor or an ndarray to PIL Epitome.

ToTensor ()

Catechumen a PIL Prototype or numpy.ndarray to tensor.

PILToTensor ()

Convert a PIL Prototype to a tensor of the aforementioned type.

Generic Transforms¶

Lambda (lambd)

Apply a user-defined lambda as a transform.

Automatic Augmentation Transforms¶

AutoAugment is a common Data Augmentation technique that can improve the accuracy of Prototype Classification models. Though the data augmentation policies are directly linked to their trained dataset, empirical studies show that ImageNet policies provide significant improvements when practical to other datasets. In TorchVision nosotros implemented 3 policies learned on the following datasets: ImageNet, CIFAR10 and SVHN. The new transform can be used standalone or mixed-and-matched with existing transforms:

Functional Transforms¶

Functional transforms give yous fine-grained command of the transformation pipeline. Equally opposed to the transformations above, functional transforms don't contain a random number generator for their parameters. That means you have to specify/generate all parameters, but the functional transform will give you reproducible results across calls.

Example: y'all can apply a functional transform with the same parameters to multiple images like this:

                                    import                  torchvision.transforms.functional                  as                  TF                  import                  random                  def                  my_segmentation_transforms                  (                  epitome                  ,                  segmentation                  ):                  if                  random                  .                  random                  ()                  >                  0.five                  :                  angle                  =                  random                  .                  randint                  (                  -                  30                  ,                  xxx                  )                  prototype                  =                  TF                  .                  rotate                  (                  image                  ,                  angle                  )                  sectionalisation                  =                  TF                  .                  rotate                  (                  sectionalization                  ,                  angle                  )                  # more transforms ...                  return                  image                  ,                  segmentation                

Example: yous can use a functional transform to build transform classes with custom behavior:

                                    import                  torchvision.transforms.functional                  every bit                  TF                  import                  random                  form                  MyRotationTransform                  :                  """Rotate by i of the given angles."""                  def                  __init__                  (                  cocky                  ,                  angles                  ):                  self                  .                  angles                  =                  angles                  def                  __call__                  (                  self                  ,                  x                  ):                  bending                  =                  random                  .                  selection                  (                  cocky                  .                  angles                  )                  return                  TF                  .                  rotate                  (                  x                  ,                  angle                  )                  rotation_transform                  =                  MyRotationTransform                  (                  angles                  =                  [                  -                  30                  ,                  -                  15                  ,                  0                  ,                  15                  ,                  30                  ])                

adjust_brightness (img, brightness_factor)

Adjust brightness of an image.

adjust_contrast (img, contrast_factor)

Adjust contrast of an image.

adjust_gamma (img, gamma[, gain])

Perform gamma correction on an prototype.

adjust_hue (img, hue_factor)

Arrange hue of an image.

adjust_saturation (img, saturation_factor)

Suit color saturation of an image.

adjust_sharpness (img, sharpness_factor)

Accommodate the sharpness of an prototype.

affine (img, angle, interpret, scale, shear)

Apply affine transformation on the image keeping image heart invariant.

autocontrast (img)

Maximize contrast of an image by remapping its pixels per channel then that the lowest becomes blackness and the lightest becomes white.

center_crop (img, output_size)

Crops the given image at the center.

convert_image_dtype (image[, dtype])

Convert a tensor image to the given dtype and calibration the values appropriately This function does not support PIL Image.

crop (img, top, left, meridian, width)

Crop the given paradigm at specified location and output size.

equalize (img)

Equalize the histogram of an image by applying a non-linear mapping to the input in order to create a uniform distribution of grayscale values in the output.

erase (img, i, j, h, west, five[, inplace])

Erase the input Tensor Epitome with given value.

five_crop (img, size)

Crop the given image into iv corners and the cardinal ingather.

gaussian_blur (img, kernel_size[, sigma])

Performs Gaussian blurring on the image by given kernel.

get_image_num_channels (img)

Returns the number of channels of an image.

get_image_size (img)

Returns the size of an image equally [width, height].

hflip (img)

Horizontally flip the given image.

invert (img)

Capsize the colors of an RGB/grayscale image.

normalize (tensor, mean, std[, inplace])

Normalize a float tensor image with mean and standard deviation.

pad (img, padding[, fill, padding_mode])

Pad the given epitome on all sides with the given "pad" value.

perspective (img, startpoints, endpoints[, …])

Perform perspective transform of the given prototype.

pil_to_tensor (picture show)

Convert a PIL Image to a tensor of the same type.

posterize (img, bits)

Posterize an image by reducing the number of bits for each color channel.

resize (img, size[, interpolation, max_size, …])

Resize the input image to the given size.

resized_crop (img, top, left, top, width, size)

Crop the given image and resize information technology to desired size.

rgb_to_grayscale (img[, num_output_channels])

Convert RGB image to grayscale version of prototype.

rotate (img, angle[, interpolation, expand, …])

Rotate the image past angle.

solarize (img, threshold)

Solarize an RGB/grayscale image by inverting all pixel values above a threshold.

ten_crop (img, size[, vertical_flip])

Generate ten cropped images from the given image.

to_grayscale (img[, num_output_channels])

Convert PIL image of whatever mode (RGB, HSV, LAB, etc) to grayscale version of epitome.

to_pil_image (pic[, mode])

Convert a tensor or an ndarray to PIL Epitome.

to_tensor (pic)

Convert a PIL Image or numpy.ndarray to tensor.

vflip (img)

Vertically flip the given image.