Basic: Load multiple datasets#

Load multiple datasets either one by one or use bioimageloader.Config to read them all at once from .yml file. If you are not familiar with yaml format please, check out https://quickref.me/yaml or go to the official yaml webpage for more detail.

Concatenate multiple datasets#

Grouping multiple datasets allows to load them as if they are a single dataset. You can define transforms for each dataset and group them together. Concatenating can be done by wrapping them in list with bioimageloader.ConcatDataset.

Note that every datasets concatenated within the same group should have the same output keys. For example, you cannot mix a dataset that only returns ‘image’ with another one that returns both ‘image’ and ‘mask’. Be aware that some datasets do not offer annotation, meaning that those do not have output parameter.

 1import albumentations as A
 2from bioimageloader.collections import DSB2018, TNBC, BBBC016
 3from bioimageloader import ConcatDataset, BatchDataloader
 4
 5# transforms for each dset
 6transforms_dsb2018 = A.Compose([
 7    A.RandomCrop(width=256, height=256),
 8    A.HorizontalFlip(p=0.5),
 9    A.RandomBrightnessContrast(p=0.2),
10])
11transforms_tnbc = A.Compose([
12    A.RandomCrop(width=256, height=256),
13    A.VerticalFlip(p=0.5),
14    A.RandomBrightnessContrast(p=0.4),
15])
16transforms_bbbc016 = A.Compose([
17    A.RandomSizedCrop(min_max_height=[300, 500], width=256, height=256),
18    A.Blur(p=0.5),
19    A.RandomBrightnessContrast(p=0.6),
20])
21# construct dsets
22dsb2018 = DSB2018('./data/DSB2018', output='image', transforms=transforms_dsb2018)
23tnbc = TNBC('./data/TNBC_NucleiSegmentation', output='image', transforms=transforms_tnbc)
24bbbc016 = BBBC016('./data/BBBC016_v1', transforms=transforms_bbbc016)  # do not have annotation
25# concatenate
26cat = ConcatDataset([dsb2018, tnbc, bbbc016])
27# load them in batch
28call_cat = BatchDataloader(cat,
29                           batch_size=16,
30                           drop_last=True,
31                           num_workers=8)
32# iterate transformed images
33for meow in call_cat:
34    image = meow['image']
35    # these assertions will not throw AssertionError
36    assert image.shape[0] == 16
37    assert image.shape[1] == 256 and image.shape[2] == 256
38    do_something(image)

Load datasets using config.yml#

You can make a yaml file to manage parameters for multiple datasets in one place.

./config/config.yml

1DSB2018:
2  root_dir: ./data/DSB2018/
3  output: image
4TNBC:
5  root_dir: ./data/TNBC_NucleiSegmentation/
6  output: image
7BBBC016:
8  root_dir: ./data/BBBC016_v1/

./main.py

 1import albumentations as A
 2from bioimageloader import Config
 3
 4# parse config
 5config = Config('./config/config.yml')
 6
 7# 1. load datsets without transforms
 8datasets: list[Dataset] = config.load_datasets()
 9print([dset.acronym for dset in datsets])
10# ['DSB2018', 'TNBC', 'BBBC016']
11
12# 2. load datasets with the same transforms for all datasets
13transforms = A.Compose([
14    A.RandomCrop(width=256, height=256),
15    A.HorizontalFlip(p=0.5),
16    A.RandomBrightnessContrast(p=0.2),
17])
18datasets_transformed = config.load_datasets(transforms)
19
20# 3. load datsets with a set of transforms for each dataset
21transforms_dsb2018 = A.Compose([
22    A.RandomCrop(width=256, height=256),
23    A.HorizontalFlip(p=0.5),
24    A.RandomBrightnessContrast(p=0.2),
25])
26transforms_tnbc = A.Compose([
27    A.RandomCrop(width=256, height=256),
28    A.VerticalFlip(p=0.5),
29    A.RandomBrightnessContrast(p=0.4),
30])
31transforms_bbbc016 = A.Compose([
32    A.RandomSizedCrop(min_max_height=[300, 500], width=256, height=256),
33    A.Blur(p=0.5),
34    A.RandomBrightnessContrast(p=0.6),
35])
36# organize all in a dictionary
37transforms_indv: dict = {
38    'DSB2018': transforms_dsb2018,
39    'TNBC': transforms_tnbc,
40    'BBBC016': transforms_bbbc016,
41}
42datasets_transformed_indv = config.load_datasets(transforms_indv)