Basic: Basic usage#

This guide will cover basic usage of bioimageloader. Examples down below will load a dataset from bioimageloader.collections, and transform images with a image augmentation library called albumentations. Additionally, how to load them with multi-processing, when you need a computationally heavy set of augmentations.

Load a dataset from collections#

Let’s load bioimageloader.collections.DSB2018 (2018 Data Science Bowl) dataset for instance. Output from the iteration is a dictionary of strings as keys and numpy array as values. You can control the output type through output parameter.

 1from bioimageloader.collections import DSB2018
 3dsb2018 = DSB2018('./data/DSB2018')
 5# iterate
 6data: dict[str, numpy.ndarray]
 7for data in dsb2018:
 8    image = data['image']
 9    mask = data['mask']
10    do_something(image, mask)

Data augmentation with albumentations#

Use transforms keyword argument. Below example defines a set of random crop, horizontal flip, and random contrast image augmentations. Check out A list of transforms and their supported targets from albumentations library.

Applying transformations often implies random shuffling and you may want to sample more from datasets. Once you pass num_samples, it will automatically perform shuffle and set the sampling number to num_samples.

 1import albumentations as A
 2from bioimageloader.collections import DSB2018
 4transforms = A.Compose([
 5    A.RandomCrop(width=256, height=256),
 6    A.HorizontalFlip(p=0.5),
 7    A.RandomBrightnessContrast(p=0.2),
 9num_samples = 2000  # DSB2018 training set has 670 images
11dsb2018 = DSB2018('./data/DSB2018', transforms=transforms, num_samples=num_samples)
13# iterate transformed images
14data: dict[str, numpy.ndarray]
15for data in dsb2018:
16    image = data['image']
17    mask = data['mask']
18    # these assertions will not throw AssertionError
19    assert image.shape[0] == 256 and image.shape[1] == 256
20    assert mask.shape[0] == 256 and mask.shape[1] == 256
21    do_something(image, mask)

Batch loading with multi-processing#

Batch loading is essential especially when you have a set of augmentations that requires heavy computation, or when you would like to run deep neural network models which can benefit from GPU.

Wrap a dataset with bioimageloader.BatchDataloader and specify a batch size as well as number of processes.

 1import albumentations as A
 2from bioimageloader.collections import DSB2018
 3from bioimageloader import BatchDataloader
 5heavy_transforms = A.Compose([
 6    A.RandomCrop(width=256, height=256),
 7    A.HorizontalFlip(p=0.5),
 8    A.RandomBrightnessContrast(p=0.2),
10# construct dset with transforms
11dsb2018 = DSB2018('./data/DSB2018', transforms=heavy_transforms)
12batch_loader = BatchDataloader(dsb2018,
13                               batch_size=16,
14                               drop_last=True,
15                               num_workers=8)
16# iterate transformed images
17data: dict[str, numpy.ndarray]
18for data in dsb2018:
19    image = data['image']
20    mask = data['mask']
21    # these assertions will not throw AssertionError
22    assert image.shape[0] == 16 and mask.shape[0] == 16
23    assert image.shape[1] == 256 and image.shape[2] == 256
24    assert mask.shape[1] == 256 and mask.shape[2] == 256
25    do_something(image, mask)