More: Modifying existing collections#

All collections are python class, meaning you can make a subclass out of it in order to modify default behaviors, or implement new methods and attributes.

Let’s say, you would like to use bioimageloader.collections.BBBC007 dataset to train a U-Net. While BBBC007 does not provide mask annotation, which U-Net typically requires, it comes with outline annotation that we can easily convert into mask annotation.

All you have to do is to copy class definition of BBBC007 source and to override get_mask() method. Go to the source link and compare it with codes below. In summary,

  1. Copy class source

  2. Change class name to something else

  3. Override get_mask() method

  4. (optional) Give new argument(s) and write docs

Below codes use scipy.ndimage.binary_fill_holes() from scipy package.

  1from pathlib import Path
  2from typing import List, Optional, Sequence, Union
  3
  4import albumentations
  5import numpy as np
  6import tifffile
  7
  8from bioimageloader.collections import BBBC007
  9from bioimageloader.utils import stack_channels
 10
 11# ndi.binary_fill_holes() will fill outline annotation
 12import scipy.ndimage as ndi
 13
 14
 15class BBBC007variant(BBBC007):
 16    """Drosophila Kc167 cells (VARIANT)
 17
 18    NU-Net needs outline to be filled and only need DNA annotation.
 19
 20    Outline annotation
 21
 22    Images were acquired using a motorized Zeiss Axioplan 2 and a Axiocam MRm
 23    camera, and are provided courtesy of the laboratory of David Sabatini at the
 24    Whitehead Institute for Biomedical Research. Each image is roughly 512 x 512
 25    pixels, with cells roughly 25 pixels in dimeter, and 80 cells per image on
 26    average. The two channels (DNA and actin) of each image are stored in
 27    separate gray-scale 8-bit TIFF files.
 28
 29    Notes
 30    -----
 31    - [4, 5, 11, 14, 15] have 3 channels but they are just all gray scale
 32        images. Extra work is required in get_image().
 33
 34    .. [1] Jones et al., in the Proceedings of the ICCV Workshop on Computer
 35       Vision for Biomedical Image Applications (CVBIA), 2005.
 36    .. [2] [BBBC007](https://bbbc.broadinstitute.org/BBBC007)
 37    """
 38    # Dataset's acronym
 39    acronym = 'BBBC007'
 40
 41    def __init__(
 42        self,
 43        root_dir: str,
 44        *,
 45        output: str = 'both',
 46        transforms: Optional[albumentations.Compose] = None,
 47        num_samples: Optional[int] = None,
 48        # specific to this dataset
 49        image_ch: Sequence[str] = ('DNA', 'actin',),
 50        anno_ch: Sequence[str] = ('DNA',),
 51        # arguments for VARIANT
 52        fill_holes: bool = True,
 53        **kwargs
 54    ):
 55        """
 56        Parameters
 57        ----------
 58        root_dir : str
 59            Path to root directory
 60        output : {'image', 'mask', 'both'} (default: 'both')
 61            Change outputs. 'both' returns {'image': image, 'mask': mask}.
 62        transforms : albumentations.Compose, optional
 63            An instance of Compose (albumentations pkg) that defines
 64            augmentation in sequence.
 65        num_samples : int, optional
 66            Useful when ``transforms`` is set. Define the total length of the
 67            dataset. If it is set, it overwrites ``__len__``.
 68        image_ch : {'DNA', 'actin'} (default: ('DNA', 'actin'))
 69            Which channel(s) to load as image. Make sure to give it as a
 70            Sequence when choose a single channel.
 71        anno_ch : {'DNA', 'actin'} (default: ('DNA',))
 72            Which channel(s) to load as annotation. Make sure to give it as a
 73            Sequence when choose a single channel.
 74        fill_holes : bool (default: True)
 75            Fill outline annotation using `scipy.ndimage.binary_fill_holes()`
 76
 77        See Also
 78        --------
 79        BBBC007 : Super class
 80        MaskDataset : Super class
 81        DatasetInterface : Interface
 82        """
 83        # Pass existing arguments to its super class
 84        super().__init__(
 85            root_dir=root_dir,
 86            output=output,
 87            transforms=transforms,
 88            num_samples=num_samples,
 89            image_ch=image_ch,
 90            anno_ch=anno_ch,
 91            **kwargs
 92        )
 93        # arguments for VARIANT
 94        self.fill_holes = fill_holes
 95
 96    # override
 97    def get_mask(self, p: Union[Path, List[Path]]) -> np.ndarray:
 98        if isinstance(p, Path):
 99            mask = tifffile.imread(p)
100        else:
101            mask = stack_channels(tifffile.imread, p)
102        # VARIANT behavior
103        if self.fill_holes:
104            mask = ndi.binary_fill_holes(mask)
105        # output.dtype=bool and bool is not well handled by albumentations
106        return mask.astype(np.float32)