bioimageloader.collections#

Collection of public bioimage datasets

class bioimageloader.collections.BBBC002(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#

Drosophila Kc167 cells

There are 10 fields of view of each sample, for a total of 50 fields of view. The images were acquired on a Zeiss Axiovert 200M microscope. The images provided here are a single channel, DNA. The image size is 512 x 512 pixels. The images are provided as 8-bit TIFF files.

Parameters

root_dirstr: Path to root directory
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.

See also

Dataset: Base class
DatasetInterface: Interface

Notes

Cell count available
ImageJ RoI available for 3 tiles
- CPvalid1_48_40x_Tiles_p0151DAPI_ROIs.zip
- CPvalid1_340_40x_Tiles_p1175DAPI_ROIs.zip
- CPvalid1_nodsRNA_40x_Tiles_p0219DAPI_ROIs.zip

References

1: https://bbbc.broadinstitute.org/BBBC002

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

property file_list: List[pathlib.Path]#: A list of pathes to image files

class bioimageloader.collections.BBBC004(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#

Synthetic cells

Biological application

One of the principal challenges in counting or segmenting nuclei is dealing with clustered nuclei. To help assess algorithms’ performance in this regard, this synthetic image set consists of five subsets with increasing degree of clustering.

Images

Five subsets of 20 images each are provided. Each image contains 300 objects, but the objects overlap and cluster with different probabilities in the five subsets. The images were generated with the SIMCEP simulating platform for fluorescent cell population images (Lehmussola et al., IEEE T. Med. Imaging, 2007 and Lehmussola et al., P. IEEE, 2008).

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
References
———-
.. [1] https://bbbc.broadinstitute.org/BBBC004

See also

MaskDataset: Super class
DatasetInterface: Interface

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

class bioimageloader.collections.BBBC006(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('hoechst', 'phalloidin'), z_ind: int = 16, **kwargs)[source]#

Human U2OS cells (out of focus)

Images were acquired from one 384-well microplate containing U2OS cells stained with Hoechst 33342 markers (to label nuclei) were imaged with an exposure of 15 and 1000 ms for Hoechst and phalloidin respectively, at 20x magnification, 2x binning, and 2 sites per well. For each site, the optimal focus was found using laser auto-focusing to find the well bottom. The automated microscope was then programmed to collect a z-stack of 32 image sets (z = 16 at the optimal focal plane, 15 images above the focal plane, 16 below) with 2 μm between slices. Each image is 696 x 520 pixels in 16-bit TIF format, LZW compression. Each image filename includes either ‘w1’ to denote Hoechst images or ‘w2’ to denote phalloidin images.

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
image_ch{‘hoechst’, ‘phalloidin’}, default: (‘hoechst’, ‘phalloidin’): Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
z_indint, default: 16: Select one z stack. Default is 16, because 16 is the most in-focus.
TIF format, LZW compression. Each image filename includes either ‘w1’ to
denote Hoechst images or ‘w2’ to denote phalloidin images.

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

z-stack, z=16 is in-focus ones, sites (s1, s2)
Instance segmented
384 wells, 2 sites per well; 384 * 2 = 768 images
2 channels, w1=Hoechst, w2=phalloidin
Two channels usually overlap and when overlapped, it’s hard to distinguish two channels anymore.
Saved in UINT16, but UINT12 practically. Max value caps at 4095.

References

1: https://bbbc.broadinstitute.org/BBBC006

get_image(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

class bioimageloader.collections.BBBC007(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('DNA', 'actin'), anno_ch: Sequence[str] = ('DNA',), **kwargs)[source]#

Drosophila Kc167 cells

Outline annotation

Images were acquired using a motorized Zeiss Axioplan 2 and a Axiocam MRm camera, and are provided courtesy of the laboratory of David Sabatini at the Whitehead Institute for Biomedical Research. Each image is roughly 512 x 512 pixels, with cells roughly 25 pixels in dimeter, and 80 cells per image on average. The two channels (DNA and actin) of each image are stored in separate gray-scale 8-bit TIFF files.

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
image_ch{‘DNA’, ‘actin’}, default: (‘DNA’, ‘actin’): Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel. Name matches to anno_ch.
anno_ch{‘DNA’, ‘actin’}, default: (‘DNA’,): Which channel(s) to load as annotation. Make sure to give it as a Sequence when choose a single channel.

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

[4, 5, 11, 14, 15] have 3 channels but they are just all gray scale images. Extra work is required in get_image().

References

1: Jones et al., in the Proceedings of the ICCV Workshop on Computer Vision for Biomedical Image Applications (CVBIA), 2005.
2: https://bbbc.broadinstitute.org/BBBC007

get_image(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get an image

get_mask(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get a mask

property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#: A list of pathes to image files

property anno_dict: Union[Dict[int, pathlib.Path], Dict[int, List[pathlib.Path]]]#: Dictionary of pathes to annotation files

class bioimageloader.collections.BBBC008(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('DNA', 'actin'), anno_ch: Sequence[str] = ('DNA',), **kwargs)[source]#

Human HT29 colon-cancer cells [1]

F/B semantic segmentation

The image set consists of 12 images. The samples were stained with Hoechst (channel 1), pH3 (channel 2), and phalloidin (channel 3). Hoechst labels DNA, which is present in the nucleus. Phalloidin labels actin, which is present in the cytoplasm. The last stain, pH3, indicates cells in mitosis; whereas this was important for Moffat et al.’s screen, it is irrelevant for segmentation and counting, so this channel is left out.

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overrides __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘equal’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
image_ch{‘DNA’, ‘actin’}, default: (‘DNA’, ‘actin’): Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
anno_ch{‘DNA’, ‘actin’}, default: (‘DNA’,): Which channel(s) to load as annotation. Make sure to give it as a Sequence when choose a single channel.

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

Annotation F/B: BG=1, FG=0; very annoying…

References

1: https://bbbc.broadinstitute.org/BBBC008
2: Carpenter et al., Genome Biology, 2006

get_image(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get an image

get_mask(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get a mask

property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#: A list of pathes to image files

property anno_dict: Union[Dict[int, pathlib.Path], Dict[int, List[pathlib.Path]]]#: Dictionary of pathes to annotation files

class bioimageloader.collections.BBBC009(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#

Human red blood cells

This image set consists of five differential interference contrast (DIC) images of red bood cells.

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.

See also

MaskDataset: Super class
DatasetInterface: Interface

References

1: https://bbbc.broadinstitute.org/BBBC009

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

class bioimageloader.collections.BBBC013(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('GFP', 'DNA'), **kwargs)[source]#

Human U2OS cells cytoplasm–nucleus translocation

The images were acquired at BioImage on the IN Cell Analyzer 3000 using the Trafficking Data Analysis Module, with one image per channel (Channel 1 = FKHR-GFP; Channel 2 = DNA). Image size is 640 x 640 pixels. Images are available in native FRM format or 8-bit BMP.

Parameters

root_dirstr: Path to root directory
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
image_ch{‘GFP’, ‘DNA’}, default: (‘GFP’, ‘DNA’): Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.

See also

Dataset: Base class
DatasetInterface: Interface

Notes

Two formats are available; FRM and BMP

References

1: https://bbbc.broadinstitute.org/BBBC013

get_image(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get an image

property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#: A list of pathes to image files

class bioimageloader.collections.BBBC014(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('DAPI', 'FITC'), **kwargs)[source]#

Human U2OS cells cytoplasm–nucleus translocation

This 96-well plate has images of cytoplasm to nucleus translocation of the transcription factor NFκB in MCF7 (human breast adenocarcinoma cell line) and A549 (human alveolar basal epithelial) cells in response to TNFα concentration.

Images are at 10x objective magnification. The plate was acquired at Vitra Bioscience on the CellCard reader. For each well there is one field with two images: a nuclear counterstain (DAPI) image and a signal stain (FITC) image. Image size is 1360 x 1024 pixels. Images are in 8-bit BMP format.

Parameters

root_dirstr: Path to root directory
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
image_ch{‘DAPI’, ‘FITC’}, default: (‘DAPI’, ‘FITC’): Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.

See also

Dataset: Base class
DatasetInterface: Interface

Notes

Second channel is usually very clear with a few artifacts
Biological annotation
CellProfiler’s LoadText module format annotation also available (not implemented)
Zoom in?

References

1: https://bbbc.broadinstitute.org/BBBC014

get_image(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get an image

property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#: A list of pathes to image files

class bioimageloader.collections.BBBC015(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('b2AR', 'arrestin'), **kwargs)[source]#

Human U2OS cells transfluor

The images are of a human osteosarcoma cell line (U2OS) co-expressing beta2 (b2AR) adrenergic receptor and arrestin-GFP protein molecules. The receptor was modified-type that generates “vesicle-type” spots upon ligand stimulation.

The plate was acquired on iCyte imaging cytometer with iCyte software version 2.5.1. Image file format is JPEG with one image for green channel and one image for crimson channel. Image size is 1000 x 768 pixels.

This image set has a portion of a 96-well plate containing 3 replica rows and 12 concentration points of isoproterenol. In each well four fields were acquired. File name structure: <well-number>_<field>_<channel>.JPG

Parameters

root_dirstr: Path to root directory
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘equal’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
image_ch{‘b2AR’, ‘arrestin’}, default: (‘b2AR’, ‘arrestin’): Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.

See also

Dataset: Base class
DatasetInterface: Interface

Notes

2 channels (Green, Crimson?), texture in green channel
Crimson channel…?
RGB channel is all the same in each image file

References

1: https://bbbc.broadinstitute.org/BBBC015

get_image(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get an image

property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#: A list of pathes to image files

class bioimageloader.collections.BBBC016(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('GFP', 'DNA'), **kwargs)[source]#

Human U2OS cells transfluor

This image set is of a Transfluor assay where an orphan GPCR is stably integrated into the b-arrestin GFP expressing U2OS cell line. After one hour incubation with a compound the cells were fixed with (formaldehyde).

The plate was read on Cellomics ArrayScan HCS Reader using the GPCR Bioapplication. File format is 8-bit TIFF with one image for green channel (GFP) and one image for blue channel (DNA). Image size is 512 x 512 pixels.

Parameters

root_dirstr: Path to root directory
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
image_ch{‘GFP’, ‘DNA’}, default: (‘GFP’, ‘DNA’): Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.

See also

Dataset: Base class
DatasetInterface: Interface

Notes

2 channels (G,B), nuclei are Blue

References

1: https://bbbc.broadinstitute.org/BBBC016

get_image(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get an image

property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#: A list of pathes to image files

class bioimageloader.collections.BBBC018(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('DNA', 'actin', 'pH3'), anno_ch: Sequence[str] = ('DNA',), drop_missing_pairs: bool = True, **kwargs)[source]#

Human HT29 colon-cancer cells (diverse phenotypes)

The image set consists of 56 fields of view (4 from each of 14 samples). Because there are three channels, there are 168 image files. (The samples were stained with Hoechst 33342, pH3, and phalloidin. Hoechst 33342 is a DNA stain that labels the nucleus. Phospho-histone H3 indicates mitosis. Phalloidin labels actin, which is present in the cytoplasm.) The samples are the top-scoring sample from each of Jones et al.’s classifiers, as listed in the file SamplesScores.zip in their supplement. The files are in DIB format, as produced by the Cellomics ArrayScan instrument at the Whitehead–MIT Bioimaging Center. We recommend using Bio-Formats to read the DIB files. Each image is 512 x 512 pixels.

The filenames are of the form wellidx-channel.DIB, where wellidx is the five-digit well index (from Jones et al.’s supplement) and channel is either DNA, actin, or pH3, depending on the channel.

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
image_ch{‘DNA’, ‘actin’, ‘pH3’}, default: (‘DNA’, ‘actin’, ‘pH3’): Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
anno_ch{‘DNA’, ‘actin’}, default: (‘DNA’,): Which channel(s) to load as annotation. Make sure to give it as a Sequence when choose a single channel.
drop_missing_pairsbool, default: True: Valid only if output=’both’. It will drop images that do not have mask pairs.

Other Parameters

image_ch{‘DNA’, ‘actin’}, default: (‘DNA’, ‘actin’): Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.

Warning

BBBC018_v1_images/10779 annotation is missing. len(anno_dict) = len(file_list) - 1; ind={26}

PosixPath(‘BBBC018_v1_images/10779-DNA.DIB’)

PosixPath(‘BBBC018_v1_images/10779-actin.DIB’)

PosixPath(‘BBBC018_v1_images/10779-pH3.DIB’)

This one is not properly saved after annotation. It has annotation overlaid on top to image. Need to filter ``mask==255`.

‘BBBC018_v1_outlines/17675-nuclei.png’

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

Every DIB has 3 channels (Order = (DNA,actin,pH3)). The second one is the object.
DNA -> Nuceli
Actin -> Cell
Annotation is outline one, but every anno is closed so binary_fill_holes works fine
For some reason annotation is y inverted

References

1: https://bbbc.broadinstitute.org/BBBC018

get_image(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get an image

get_mask(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get a mask

property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#: A list of pathes to image files

property anno_dict: Union[Dict[int, pathlib.Path], Dict[int, List[pathlib.Path]]]#: Dictionary of pathes to annotation files

class bioimageloader.collections.BBBC020(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('nuclei', 'cells'), anno_ch: Sequence[str] = ('nuclei',), drop_missing_pairs: bool = True, **kwargs)[source]#

Murine bone-marrow derived macrophages

The image set consists of 25 images, each consisting of three channels. The samples were stained with DAPI and CD11b/APC. In addition to this, a merged image is provided. DAPI labels the nuclei and CD11b/APC the cell surface.

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
image_ch{‘cell’, ‘nuclei’}, default: (‘cell’, ‘nuclei’): Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
anno_ch{‘nuclei’, ‘cells’}, default: (‘nuclei’,): Which channel(s) to load as annotation. Make sure to give it as a Sequence when choose a single channel.
drop_missing_pairsbool, default: True: Valid only if output=’both’. It will drop images that do not have mask pairs.

Warning

5 annotations are missing: ind={17,18,19,20,21} [jw-30min 1, jw-30min 2, jw-30min 3, jw-30min 4, jw-30min 5]

./BBBC020_v1_images/jw-30min 1/jw-30min 1_(c1+c5).TIF

./BBBC020_v1_images/jw-30min 2/jw-30min 2_(c1+c5).TIF

./BBBC020_v1_images/jw-30min 3/jw-30min 3_(c1+c5).TIF

./BBBC020_v1_images/jw-30min 4/jw-30min 4_(c1+c5).TIF

./BBBC020_v1_images/jw-30min 5/jw-30min 5_(c1+c5).TIF

BBC020_v1_outlines_nuclei/jw-15min 5_c5_43.TIF exists but corrupted

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

Anotations are instance segmented where each of them is saved as a single image file. It loads and aggregates them as a single array. Label loaded after will override the one loaded before. If you do not want this behavior, make a subclass out of this class and override get_mask() method, accordingly.
2 channels; R channel is the same as G, R==G!=B
Assign 0 to red channel
BBBC has received a complaint that “BBB020_v1_outlines_nuclei” appears incomplete and we have been unable to obtain the missing images from the original contributor.
Nuclei anno looks good
Should separte nuclei and cells annotation; if anno_ch=None, anno_dict becomes a mess.

References

1: https://bbbc.broadinstitute.org/BBBC020

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(lst_p: Union[List[pathlib.Path], List[List[pathlib.Path]]]) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, List[pathlib.Path]]#: Dictionary of pathes to annotation files

class bioimageloader.collections.BBBC021(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('DNA', 'actin', 'tublin'), **kwargs)[source]#

Human MCF7 cells – compound-profiling experiment [1]

The images are of MCF-7 breast cancer cells treated for 24 h with a collection of 113 small molecules at eight concentrations. The cells were fixed, labeled for DNA, F-actin, and Β-tubulin, and imaged by fluorescent microscopy as described [Caie et al. Molecular Cancer Therapeutics, 2010].

There are 39,600 image files (13,200 fields of view imaged in three channels) in TIFF format. We provide the images in 55 ZIP archives, one for each microtiter plate. The archives are ~750 MB each.

Parameters

root_dirstr: Path to root directory
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
image_ch{‘DNA’, ‘actin’, ‘tublin’}, default: (‘DNA’, ‘actin’, ‘tublin’): Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.

See also

Dataset: Base class
DatasetInterface: Interface

Notes

HUGE dataset
3 channels
- w1 (DNA) -> Blue
- w2 (actin?) -> Green
- w4 (tublin??)-> Red
UINT16

References

1: https://bbbc.broadinstitute.org/BBBC021

get_image(p: Union[pathlib.Path, List[pathlib.Path]]) → numpy.ndarray[source]#: Get an image

property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#: A list of pathes to image files

class bioimageloader.collections.BBBC026(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#

Human Hepatocyte and Murine Fibroblast cells – Co-culture experiment

This 384-well plate has images of co-cultured hepatocytes and fibroblasts. Every other well is populated (A01, A03, …, C01, C03, …) such that 96 wells comprise the data. Each well has 9 sites and thus 9 images associated, totaling 864 images.

For each well there is one field and a single image nuclear image (Hoecsht). Images are in 8-bit PNG format.

Parameters

root_dirstr: Path to root directory
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.

See also

Dataset: Base class
DatasetInterface: Interface

Notes

Only centers are annotated for 5 imgages (not implemented)

References

1: https://bbbc.broadinstitute.org/BBBC026

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

property file_list: List[pathlib.Path]#: A list of pathes to image files

class bioimageloader.collections.BBBC030(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#

Chinese Hamster Ovary Cells

The image set consists of 60 Differential Interference Contrast (DIC) images of Chinese Hamster Ovary (CHO) cells. The images are taken on an Olympus Cell-R microscope with a 20x lens at the time when the cell initiated their attachment to the bottom of the dish.

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
See Also
——–
MaskDatasetSuper class
DatasetInterfaceInterface

References

1: https://bbbc.broadinstitute.org/BBBC030

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

class bioimageloader.collections.BBBC039(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, training: bool = True, **kwargs)[source]#

Nuclei of U2OS cells in a chemical screen [1]

This data set has a total of 200 fields of view of nuclei captured with fluorescence microscopy using the Hoechst stain. These images are a sample of the larger BBBC022 chemical screen. The images are stored as TIFF files with 520x696 pixels at 16 bits.

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
trainingbool, default: True: Load training set if True, else load testing one

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

Split (training/valiadation/test)
- training=True combines ‘training’ with ‘validation’
Annotate objs not touching each other with 1 and use 2, 3, … for the touching ones. It is great and clever, but it does not follow the form of other instance segmented masks. get_mask() will make a instance labeled mask (each obj has unique labels). After labeling max label is 231 for training, and 202 for test. So having masks of dtype UINT8 is fine.
Max label is 3 (in original annotation)
Sample of larger BBBC022 and did manual segmentation
Possible overlap some with DSB2018
Mask is png but (instance) value is only stored in RED channel
Maximum value is 2**12

References

1: https://bbbc.broadinstitute.org/BBBC039

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

class bioimageloader.collections.BBBC041(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', training: bool = True, **kwargs)[source]#

vivax (malaria) infected human blood smears [1]

Images are in .png or .jpg format. There are 3 sets of images consisting of 1364 images (~80,000 cells) with different researchers having prepared each one: from Brazil (Stefanie Lopes), from Southeast Asia (Benoit Malleret), and time course (Gabriel Rangel). Blood smears were stained with Giemsa reagent.

These images were contributed by Jane Hung of MIT and the Broad Institute in Cambridge, MA. [1]

There is also a Github reposity that lists malaria parasite imaging datasets (blood smears) [2].

Parameters

root_dirstr: Path to root directory
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when `transforms` is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
trainingbool, default: True: Load training set if True, else load testing one

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

Label categories: all 7 cats, [‘difficult’, ‘gametocyte’, ‘leukocyte’, ‘red blood cell’, ‘ring’, ‘schizont’, ‘trophozoite’]
1208/120 training/test split. So not 1368 images as written in the description.
png and jpg extension; training images are in RGB space in PNG format, while test images are in YUV space in JPEG format.
YUV will be automatically detected when read and cast to RGB
Two resolutions; depending on training/test: (1200, 1600) for training, (1383, 1944) for test

References

1(1,2): https://bbbc.broadinstitute.org/BBBC041
2: https://github.com/tobsecret/Awesome_Malaria_Parasite_Imaging_Datasets

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

property file_list: List[pathlib.Path]#: A list of pathes to image files

class bioimageloader.collections.Cellpose(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', training: bool = True, gray_is_not_green: bool = True, specialized_data: bool = False, **kwargs)[source]#

Dataset for Cellpose [1], [2]

Cellpose: a generalist algorithm for cellular segmentation

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
trainingbool, default: True: Load training set if True, else load testing one.
gray_is_not_greenbool, default: True: Proper grayscale. Green channel value will be broadcast to all channels.
specialized_databool, default: False: Load “specialized data” mentioned in the paper [1].

See also

MaskDataset: Super class
DatasetInterface: Interface

Notes

Download link is hard to find [3]
It is a complete dataset by itself, meaning that it is not intended to be mixed or concatenated with others. It consists of various sources of images, not only bioimages but also images of fruits, rocks and etc.
All images have 3 channels, but technically they are not RGB. Every images have values on the second channel and if there is more signal, then it goes to the first one. There is no image that has values on the last channel. As a result, when visualized in RGB, they look all green and red. In particular, for this reason, grayscale images have signal in the second channel and look green. gray_is_not_green argument address that.
Built-in grayscale conversion methods are not correct for this dataset. The conversion should be channel-agnostic.
Currently, gray_is_not_green=False and grayscale=True will reduce values of single channel images 1/3 times.

References

1(1,2): C. Stringer, M. Michaelos, and M. Pachitariu, “Cellpose: a generalist algorithm for cellular segmentation,” bioRxiv, p. 2020.02.02.931238, Feb. 2020, doi: 10.1101/2020.02.02.931238.
2: https://github.com/mouseLand/cellpose
3: https://www.cellpose.org/dataset

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

class bioimageloader.collections.ComputationalPathology(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', mask_tif: bool = False, **kwargs)[source]#

A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology [1]

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when `transforms` is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
mask_tifbool, default: False: Instead of parsing every xml file to reconstruct mask image arrays, use pre-drawn mask tif files which should reside in the same folder as annotation xml files.

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

Resolution of all images is (1000,1000)
gt is converted from annotation recorded in xml format
gt has dtype of torch.float64, converted from numpy.uint16, and it has value ‘num_objects’ * 255 because it is base-transformed
The origianl dataset provides annotation in xml format, which takes long time to parse and to reconstruct mask images dynamically during training. Drawing masks beforehand makes training much faster. Use mask_tif in that case.
When augmenters is provided, set the num_samples argument 30x1000x1000 -> 16x30=480 patches. Thus, the default num_samples=720 (x1.5)
dtype of ‘gt’ is int16. However, to make batching easier, it will be casted to float32
Be careful about types of augmenters; avoid interpolation

References

1: N. Kumar, R. Verma, S. Sharma, S. Bhargava, A. Vahadane, and A. Sethi, “A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology,” IEEE Transactions on Medical Imaging, vol. 36, no. 7, pp. 1550–1560, Jul. 2017, doi: 10.1109/TMI.2017.2677499.

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

save_xml_to_tif()[source]#

Parse .xml to mask and write it as tiff file

Having masks in images is much faster than parsing .xml for each call. This func iterates through anno_dict, parse and save each in .tif format in the same annotation directory. Re-initiate an instance with mask_tif argument to load them.

property file_list: list#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

class bioimageloader.collections.DSB2018(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', training: bool = True, **kwargs)[source]#

Data Science Bowl 2018 [1] also known as BBBC038 [2]

Find the nuclei in divergent images to advance medical discovery

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
trainingbool, default: True: Load training set if True, else load testing one

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

References

1: https://www.kaggle.com/c/data-science-bowl-2018/
2: https://bbbc.broadinstitute.org/BBBC038/

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(anno: Union[List[pathlib.Path], Dict[str, Any]]) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Union[Dict[int, List[pathlib.Path]], Dict[int, dict]]#: Dictionary of pathes to annotation files

class bioimageloader.collections.DigitalPathology(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', **kwargs)[source]#

Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases [1]

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

Annotation is partial
Boolean mask to UINT8 mask (0, 255)

References

1: A. Janowczyk and A. Madabhushi, “Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases,” J Pathol Inform, vol. 7, Jul. 2016, doi: 10.4103/2153-3539.186902.

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

class bioimageloader.collections.FRUNet(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, normalize: bool = True, **kwargs)[source]#

FRU-Net: Robust Segmentation of Small Extracellular Vesicles [1]

TEM images

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
normalizebool, default: True: Normalize each image by its maximum value and cast it to UINT8.

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

Originally, dtype is UINT16
Max value is 20444, but contrast varies a lot. For example, some images have value less than 0.05 of 2^16, which makes images not visible. Normalization may be needed. Init param normalize is set to True by default for this reason. For each image, it calculates maximum value and divide by it.

References

1: E. Gómez-de-Mariscal, M. Maška, A. Kotrbová, V. Pospíchalová, P. Matula, and A. Muñoz-Barrutia, “Deep-Learning-Based Segmentation of Small Extracellular Vesicles in Transmission Electron Microscopy Images,” Scientific Reports, vol. 9, no. 1, Art. no. 1, Sep. 2019, doi: 10.1038/s41598-019-49431-3.

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

class bioimageloader.collections.LIVECell(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, training: bool = True, mask_tif: bool = False, **kwargs)[source]#

LIVECEll: A large-scale dataset for label-free live cell segmentation [1]

“LIVECell - A large-scale dataset for label-free live cell segmentation” by Edlund et. al. 2021 [2]

Light microscopy is a cheap, accessible, non-invasive modality that when combined with well-established protocols of two-dimensional cell culture facilitates high-throughput quantitative imaging to study biological phenomena. Accurate segmentation of individual cells enables exploration of complex biological questions, but this requires sophisticated imaging processing pipelines due to the low contrast and high object density. Deep learning-based methods are considered state-of-the-art for most computer vision problems but require vast amounts of annotated data, for which there is no suitable resource available in the field of label-free cellular imaging. To address this gap we present LIVECell, a high-quality, manually annotated and expert-validated dataset that is the largest of its kind to date, consisting of over 1.6 million cells from a diverse set of cell morphologies and culture densities. To further demonstrate its utility, we provide convolutional neural network-based models trained and evaluated on LIVECell.

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
trainingbool, default: True: Load training set if True, else load testing one
mask_tifbool, default: False: Use saved COCO annotations as tif mask images in a new ./root_dir/masks directory. It will greatly improve loading speed. Available after calling save_coco_to_tif().

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

Annotation in MS COCO format [3]. Parsing it takes time`.
Currently not supporting dynamically parsing COCO annotation due to slow speed. Pre-parse masks in .tif format by calling save_coco_to_tif().
Validation set is originally separted from training set. Currently they are combined training=True.
Single cells subsets are not covered

References

1: https://sartorius-research.github.io/LIVECell/
2: https://www.nature.com/articles/s41592-021-01249-6
3: https://cocodataset.org/

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

save_coco_to_tif()[source]#

Save the masks as tif files

Read training/val or test annotations from json. Make tif files under ‘masks/livecell_train_val_masks’ and ‘masks/livecell_test_masks’.

Initialize a new instance with setting mask_tif=True to load saved masks.

class bioimageloader.collections.MurphyLab(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, drop_missing_pairs: bool = True, drop_broken_files: bool = True, filled_mask: bool = False, **kwargs)[source]#

Nuclei Segmentation In Microscope Cell Images: A Hand-Segmented Dataset And Comparison Of Algorithms [1]

Parameters

root_dirstr or pathlib.Path: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
drop_missing_pairsbool, default: True: Valid only if output=’both’. It will drop images that do not have mask pairs.
drop_broken_filesbool, default: True: Drop broken files that cannot be read
filled_maskbool, default: False: Use saved filled masks through fill_save_mask() method instead of default boundary masks. If one would want to use manually modified masks, the annotation files should have the same name as ‘*.xcf’ with modified suffix by ‘.png’.

Warning

This dataset has many issues whose details can be found below. The simpleset way is to drop those that cause isseus. It is recommended to not opt out drop_missing_pairs() and drop_broken_files(). Otherwise, it will meet exceptions.

If one wants filled hole, fill_save_mask() function will fill holes with some tricks to handle edge cases and save them as .png format. Then set filled_mask argument to True to load them.

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

4 channel PNG format annotation mask even though mask is binary. But it is not grayscale binary. They put value 255 only in red channel.
Two annotation formats; Photoshop and GIMP. It seems that two annotators worked separately. segmented-ashariff will be ignored. In total, 97 segmented images (out of 100)
3 missing segmentations: ind={31, 43, 75}
./data/images/dna-images/gnf/dna-31.png

./data/images/dna-images/gnf/dna-43.png

./data/images/dna-images/ic100/dna-25.png
Manually filled annotation to make masks using GIMP
2009_ISBI_2DNuclei_code_data/data/images/segmented-lpc/ic100/dna-15.xcf does not have ‘borders’ layer like the others. This one alone has ‘border’ layer.

References

1: L. P. Coelho, A. Shariff, and R. F. Murphy, “Nuclear segmentation in microscope cell images: A hand-segmented dataset and comparison of algorithms,” in 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Jun. 2009, pp. 518–521, doi: 10.1109/ISBI.2009.5193098.

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

fill_save_mask()[source]#

Fill holes from boundary mask with some tricks

Requires scipy and scikit-image. Install depencency with pip option pip install bioimageloader[process].

Note that this does not result perfect filled masks. Those not entirely closed by this algorithm (36, 40, 63).

Other issues: ind=63: ‘border’ not ‘borders’, ind=93 GimpDocument cannot read it…

class bioimageloader.collections.S_BSST265(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#

An annotated fluorescence image dataset for training nuclear segmentation methods [1]

Immuno Fluorescence (IF) images, designed for ML

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

All images have grayscale BUT some have 3 channels
rawimages: Raw nuclear images in TIFF format
groundtruth: Annotated masks in TIFF format
groundtruth_svgs: SVG-Files for each annotated masks and corresponding raw image in JPEG format
singlecell_groundtruth: Groundtruth for randomly selected nuclei of the testset (25 nuclei per testset class, a subset of all nuclei of the testset classes; human experts can compete with this low number of nuclei per subset by calculating Dice coefficients between their annotations and the groundtruth annotations)
visualized_groundtruth: Visualization of groundtruth masks in PNG format
visualized_singlecell_groundtruth: Visualization of groundtruth for randomly selected nuclei in PNG format
Find more info in README.txt inside the root directory

References

1: F. Kromp et al., “An annotated fluorescence image dataset for training nuclear segmentation methods,” Scientific Data, vol. 7, no. 1, Art. no. 1, Aug. 2020, doi: 10.1038/s41597-020-00608-w.

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: list#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

class bioimageloader.collections.StarDist(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, training: bool = True, **kwargs)[source]#

Dataset for StarDist [1], [2]

Cell Detection with Star-convex Polygons

StarDist data is a subset of Data Science Bowl 2018 [3]

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
trainingbool, default: True: Load training set if True, else load testing one

See also

DSB2018
MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

StarDist data is a subset of Data Science Bowl 2018 [3]. Choose only one, do not mix them.
root_dir is not ‘dsb2018’ even though the archive name is ‘dsb2018’, because it conflicts with the original DSB2018. Make a new directory.
All images have grayscale

References

1: U. Schmidt, M. Weigert, C. Broaddus, and G. Myers, “Cell Detection with Star-convex Polygons,” arXiv:1806.03535 [cs], vol. 11071, pp. 265–273, 2018, doi: 10.1007/978-3-030-00934-2_30.
2: https://github.com/stardist/stardist/releases/download/0.1.0/dsb2018.zip
3(1,2): https://www.kaggle.com/c/data-science-bowl-2018/

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files

class bioimageloader.collections.TNBC(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', **kwargs)[source]#

TNBC Nuclei Segmentation Dataset [1]

Parameters

root_dirstr: Path to root directory
output{‘both’, ‘image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

References

1: Segmentation of Nuclei in Histopathology Images by Deep Regression of the Distance Map, https://ieeexplore.ieee.org/document/8438559

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: anno_dict[ind] = <file>

class bioimageloader.collections.UCSB(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', category: Sequence[str] = ('malignant',), **kwargs)[source]#

A biosegmentation benchmark for evaluation of bioimage analysis methods

Parameters

root_dirstr: Path to root directory
output{‘both’,’ image’, ‘mask’}, default: ‘both’: Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
transformsalbumentations.Compose, optional: An instance of Compose (albumentations pkg) that defines augmentation in sequence.
num_samplesint, optional: Useful when transforms is set. Define the total length of the dataset. If it is set, it overwrites __len__.
grayscalebool, default: False: Convert images to grayscale
grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’: How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
category{‘benign’, ‘malignant’}, default: (‘malignant’,): Select which category of output you want

See also

MaskDataset: Super class
Dataset: Base class
DatasetInterface: Interface

Notes

32 ‘benign’, 26 ‘malignant’ images (58 images in total)
58x768x896 -> ~600 patches. Thus, the defulat num_samples=900 (x1.5).
Images are not fully annotated

References

1: E. Drelie Gelasca, B. Obara, D. Fedorov, K. Kvilekval, and B. Manjunath, “A biosegmentation benchmark for evaluation of bioimage analysis methods,” BMC Bioinformatics, vol. 10, p. 368, Nov. 2009, doi: 10.1186/1471-2105-10-368.

get_image(p: pathlib.Path) → numpy.ndarray[source]#: Get an image

get_mask(p: pathlib.Path) → numpy.ndarray[source]#: Get a mask

property file_list: List[pathlib.Path]#: A list of pathes to image files

property anno_dict: Dict[int, pathlib.Path]#: Dictionary of pathes to annotation files