Basic: Load multiple datasets#
Load multiple datasets either one by one or use bioimageloader.Config
to
read them all at once from .yml
file. If you are not familiar with yaml format
please, check out https://quickref.me/yaml or go to the official yaml webpage for more detail.
Concatenate multiple datasets#
Grouping multiple datasets allows to load them as if they are a single dataset. You can
define transforms
for each dataset and group them together. Concatenating can be
done by wrapping them in list with bioimageloader.ConcatDataset
.
Note that every datasets concatenated within the same group should have the same output
keys. For example, you cannot mix a dataset that only returns ‘image’ with another one
that returns both ‘image’ and ‘mask’. Be aware that some datasets do not offer
annotation, meaning that those do not have output
parameter.
1import albumentations as A
2from bioimageloader.collections import DSB2018, TNBC, BBBC016
3from bioimageloader import ConcatDataset, BatchDataloader
4
5# transforms for each dset
6transforms_dsb2018 = A.Compose([
7 A.RandomCrop(width=256, height=256),
8 A.HorizontalFlip(p=0.5),
9 A.RandomBrightnessContrast(p=0.2),
10])
11transforms_tnbc = A.Compose([
12 A.RandomCrop(width=256, height=256),
13 A.VerticalFlip(p=0.5),
14 A.RandomBrightnessContrast(p=0.4),
15])
16transforms_bbbc016 = A.Compose([
17 A.RandomSizedCrop(min_max_height=[300, 500], width=256, height=256),
18 A.Blur(p=0.5),
19 A.RandomBrightnessContrast(p=0.6),
20])
21# construct dsets
22dsb2018 = DSB2018('./data/DSB2018', output='image', transforms=transforms_dsb2018)
23tnbc = TNBC('./data/TNBC_NucleiSegmentation', output='image', transforms=transforms_tnbc)
24bbbc016 = BBBC016('./data/BBBC016_v1', transforms=transforms_bbbc016) # do not have annotation
25# concatenate
26cat = ConcatDataset([dsb2018, tnbc, bbbc016])
27# load them in batch
28call_cat = BatchDataloader(cat,
29 batch_size=16,
30 drop_last=True,
31 num_workers=8)
32# iterate transformed images
33for meow in call_cat:
34 image = meow['image']
35 # these assertions will not throw AssertionError
36 assert image.shape[0] == 16
37 assert image.shape[1] == 256 and image.shape[2] == 256
38 do_something(image)
Load datasets using config.yml
#
You can make a yaml file to manage parameters for multiple datasets in one place.
./config/config.yml
1DSB2018:
2 root_dir: ./data/DSB2018/
3 output: image
4TNBC:
5 root_dir: ./data/TNBC_NucleiSegmentation/
6 output: image
7BBBC016:
8 root_dir: ./data/BBBC016_v1/
./main.py
1import albumentations as A
2from bioimageloader import Config
3
4# parse config
5config = Config('./config/config.yml')
6
7# 1. load datsets without transforms
8datasets: list[Dataset] = config.load_datasets()
9print([dset.acronym for dset in datsets])
10# ['DSB2018', 'TNBC', 'BBBC016']
11
12# 2. load datasets with the same transforms for all datasets
13transforms = A.Compose([
14 A.RandomCrop(width=256, height=256),
15 A.HorizontalFlip(p=0.5),
16 A.RandomBrightnessContrast(p=0.2),
17])
18datasets_transformed = config.load_datasets(transforms)
19
20# 3. load datsets with a set of transforms for each dataset
21transforms_dsb2018 = A.Compose([
22 A.RandomCrop(width=256, height=256),
23 A.HorizontalFlip(p=0.5),
24 A.RandomBrightnessContrast(p=0.2),
25])
26transforms_tnbc = A.Compose([
27 A.RandomCrop(width=256, height=256),
28 A.VerticalFlip(p=0.5),
29 A.RandomBrightnessContrast(p=0.4),
30])
31transforms_bbbc016 = A.Compose([
32 A.RandomSizedCrop(min_max_height=[300, 500], width=256, height=256),
33 A.Blur(p=0.5),
34 A.RandomBrightnessContrast(p=0.6),
35])
36# organize all in a dictionary
37transforms_indv: dict = {
38 'DSB2018': transforms_dsb2018,
39 'TNBC': transforms_tnbc,
40 'BBBC016': transforms_bbbc016,
41}
42datasets_transformed_indv = config.load_datasets(transforms_indv)