bioimageloader.collections#
Collection of public bioimage datasets
- class bioimageloader.collections.BBBC002(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#
Drosophila Kc167 cells
There are 10 fields of view of each sample, for a total of 50 fields of view. The images were acquired on a Zeiss Axiovert 200M microscope. The images provided here are a single channel, DNA. The image size is 512 x 512 pixels. The images are provided as 8-bit TIFF files.
- Parameters
- root_dirstr
Path to root directory
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.
See also
Dataset
Base class
DatasetInterface
Interface
Notes
Cell count available
- ImageJ RoI available for 3 tiles
CPvalid1_48_40x_Tiles_p0151DAPI_ROIs.zip
CPvalid1_340_40x_Tiles_p1175DAPI_ROIs.zip
CPvalid1_nodsRNA_40x_Tiles_p0219DAPI_ROIs.zip
References
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- class bioimageloader.collections.BBBC004(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#
Synthetic cells
Biological application
One of the principal challenges in counting or segmenting nuclei is dealing with clustered nuclei. To help assess algorithms’ performance in this regard, this synthetic image set consists of five subsets with increasing degree of clustering.
Images
Five subsets of 20 images each are provided. Each image contains 300 objects, but the objects overlap and cluster with different probabilities in the five subsets. The images were generated with the SIMCEP simulating platform for fluorescent cell population images (Lehmussola et al., IEEE T. Med. Imaging, 2007 and Lehmussola et al., P. IEEE, 2008).
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- References
- ———-
- .. [1] https://bbbc.broadinstitute.org/BBBC004
See also
MaskDataset
Super class
DatasetInterface
Interface
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.BBBC006(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('hoechst', 'phalloidin'), z_ind: int = 16, **kwargs)[source]#
Human U2OS cells (out of focus)
Images were acquired from one 384-well microplate containing U2OS cells stained with Hoechst 33342 markers (to label nuclei) were imaged with an exposure of 15 and 1000 ms for Hoechst and phalloidin respectively, at 20x magnification, 2x binning, and 2 sites per well. For each site, the optimal focus was found using laser auto-focusing to find the well bottom. The automated microscope was then programmed to collect a z-stack of 32 image sets (z = 16 at the optimal focal plane, 15 images above the focal plane, 16 below) with 2 μm between slices. Each image is 696 x 520 pixels in 16-bit TIF format, LZW compression. Each image filename includes either ‘w1’ to denote Hoechst images or ‘w2’ to denote phalloidin images.
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- image_ch{‘hoechst’, ‘phalloidin’}, default: (‘hoechst’, ‘phalloidin’)
Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
- z_indint, default: 16
Select one z stack. Default is 16, because 16 is the most in-focus.
- TIF format, LZW compression. Each image filename includes either ‘w1’ to
- denote Hoechst images or ‘w2’ to denote phalloidin images.
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
z-stack, z=16 is in-focus ones, sites (s1, s2)
Instance segmented
384 wells, 2 sites per well; 384 * 2 = 768 images
2 channels, w1=Hoechst, w2=phalloidin
Two channels usually overlap and when overlapped, it’s hard to distinguish two channels anymore.
Saved in UINT16, but UINT12 practically. Max value caps at 4095.
References
- property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.BBBC007(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('DNA', 'actin'), anno_ch: Sequence[str] = ('DNA',), **kwargs)[source]#
Drosophila Kc167 cells
Outline annotation
Images were acquired using a motorized Zeiss Axioplan 2 and a Axiocam MRm camera, and are provided courtesy of the laboratory of David Sabatini at the Whitehead Institute for Biomedical Research. Each image is roughly 512 x 512 pixels, with cells roughly 25 pixels in dimeter, and 80 cells per image on average. The two channels (DNA and actin) of each image are stored in separate gray-scale 8-bit TIFF files.
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- image_ch{‘DNA’, ‘actin’}, default: (‘DNA’, ‘actin’)
Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel. Name matches to anno_ch.
- anno_ch{‘DNA’, ‘actin’}, default: (‘DNA’,)
Which channel(s) to load as annotation. Make sure to give it as a Sequence when choose a single channel.
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
[4, 5, 11, 14, 15] have 3 channels but they are just all gray scale images. Extra work is required in get_image().
References
- 1
Jones et al., in the Proceedings of the ICCV Workshop on Computer Vision for Biomedical Image Applications (CVBIA), 2005.
- 2
- property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#
A list of pathes to image files
- property anno_dict: Union[Dict[int, pathlib.Path], Dict[int, List[pathlib.Path]]]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.BBBC008(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('DNA', 'actin'), anno_ch: Sequence[str] = ('DNA',), **kwargs)[source]#
Human HT29 colon-cancer cells [1]
F/B semantic segmentation
The image set consists of 12 images. The samples were stained with Hoechst (channel 1), pH3 (channel 2), and phalloidin (channel 3). Hoechst labels DNA, which is present in the nucleus. Phalloidin labels actin, which is present in the cytoplasm. The last stain, pH3, indicates cells in mitosis; whereas this was important for Moffat et al.’s screen, it is irrelevant for segmentation and counting, so this channel is left out.
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when transforms is set. Define the total length of the dataset. If it is set, it overrides
__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘equal’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- image_ch{‘DNA’, ‘actin’}, default: (‘DNA’, ‘actin’)
Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
- anno_ch{‘DNA’, ‘actin’}, default: (‘DNA’,)
Which channel(s) to load as annotation. Make sure to give it as a Sequence when choose a single channel.
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
Annotation F/B: BG=1, FG=0; very annoying…
References
- 1
- 2
Carpenter et al., Genome Biology, 2006
- property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#
A list of pathes to image files
- property anno_dict: Union[Dict[int, pathlib.Path], Dict[int, List[pathlib.Path]]]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.BBBC009(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#
Human red blood cells
This image set consists of five differential interference contrast (DIC) images of red bood cells.
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.
See also
MaskDataset
Super class
DatasetInterface
Interface
References
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.BBBC013(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('GFP', 'DNA'), **kwargs)[source]#
Human U2OS cells cytoplasm–nucleus translocation
The images were acquired at BioImage on the IN Cell Analyzer 3000 using the Trafficking Data Analysis Module, with one image per channel (Channel 1 = FKHR-GFP; Channel 2 = DNA). Image size is 640 x 640 pixels. Images are available in native FRM format or 8-bit BMP.
- Parameters
- root_dirstr
Path to root directory
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- image_ch{‘GFP’, ‘DNA’}, default: (‘GFP’, ‘DNA’)
Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
See also
Dataset
Base class
DatasetInterface
Interface
Notes
Two formats are available; FRM and BMP
References
- property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#
A list of pathes to image files
- class bioimageloader.collections.BBBC014(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('DAPI', 'FITC'), **kwargs)[source]#
Human U2OS cells cytoplasm–nucleus translocation
This 96-well plate has images of cytoplasm to nucleus translocation of the transcription factor NFκB in MCF7 (human breast adenocarcinoma cell line) and A549 (human alveolar basal epithelial) cells in response to TNFα concentration.
Images are at 10x objective magnification. The plate was acquired at Vitra Bioscience on the CellCard reader. For each well there is one field with two images: a nuclear counterstain (DAPI) image and a signal stain (FITC) image. Image size is 1360 x 1024 pixels. Images are in 8-bit BMP format.
- Parameters
- root_dirstr
Path to root directory
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- image_ch{‘DAPI’, ‘FITC’}, default: (‘DAPI’, ‘FITC’)
Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
See also
Dataset
Base class
DatasetInterface
Interface
Notes
Second channel is usually very clear with a few artifacts
Biological annotation
CellProfiler’s LoadText module format annotation also available (not implemented)
Zoom in?
References
- property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#
A list of pathes to image files
- class bioimageloader.collections.BBBC015(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('b2AR', 'arrestin'), **kwargs)[source]#
Human U2OS cells transfluor
The images are of a human osteosarcoma cell line (U2OS) co-expressing beta2 (b2AR) adrenergic receptor and arrestin-GFP protein molecules. The receptor was modified-type that generates “vesicle-type” spots upon ligand stimulation.
The plate was acquired on iCyte imaging cytometer with iCyte software version 2.5.1. Image file format is JPEG with one image for green channel and one image for crimson channel. Image size is 1000 x 768 pixels.
This image set has a portion of a 96-well plate containing 3 replica rows and 12 concentration points of isoproterenol. In each well four fields were acquired. File name structure: <well-number>_<field>_<channel>.JPG
- Parameters
- root_dirstr
Path to root directory
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘equal’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- image_ch{‘b2AR’, ‘arrestin’}, default: (‘b2AR’, ‘arrestin’)
Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
See also
Dataset
Base class
DatasetInterface
Interface
Notes
2 channels (Green, Crimson?), texture in green channel
Crimson channel…?
RGB channel is all the same in each image file
References
- property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#
A list of pathes to image files
- class bioimageloader.collections.BBBC016(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('GFP', 'DNA'), **kwargs)[source]#
Human U2OS cells transfluor
This image set is of a Transfluor assay where an orphan GPCR is stably integrated into the b-arrestin GFP expressing U2OS cell line. After one hour incubation with a compound the cells were fixed with (formaldehyde).
The plate was read on Cellomics ArrayScan HCS Reader using the GPCR Bioapplication. File format is 8-bit TIFF with one image for green channel (GFP) and one image for blue channel (DNA). Image size is 512 x 512 pixels.
- Parameters
- root_dirstr
Path to root directory
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- image_ch{‘GFP’, ‘DNA’}, default: (‘GFP’, ‘DNA’)
Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
See also
Dataset
Base class
DatasetInterface
Interface
Notes
2 channels (G,B), nuclei are Blue
References
- property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#
A list of pathes to image files
- class bioimageloader.collections.BBBC018(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('DNA', 'actin', 'pH3'), anno_ch: Sequence[str] = ('DNA',), drop_missing_pairs: bool = True, **kwargs)[source]#
Human HT29 colon-cancer cells (diverse phenotypes)
The image set consists of 56 fields of view (4 from each of 14 samples). Because there are three channels, there are 168 image files. (The samples were stained with Hoechst 33342, pH3, and phalloidin. Hoechst 33342 is a DNA stain that labels the nucleus. Phospho-histone H3 indicates mitosis. Phalloidin labels actin, which is present in the cytoplasm.) The samples are the top-scoring sample from each of Jones et al.’s classifiers, as listed in the file SamplesScores.zip in their supplement. The files are in DIB format, as produced by the Cellomics ArrayScan instrument at the Whitehead–MIT Bioimaging Center. We recommend using Bio-Formats to read the DIB files. Each image is 512 x 512 pixels.
The filenames are of the form wellidx-channel.DIB, where wellidx is the five-digit well index (from Jones et al.’s supplement) and channel is either DNA, actin, or pH3, depending on the channel.
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- image_ch{‘DNA’, ‘actin’, ‘pH3’}, default: (‘DNA’, ‘actin’, ‘pH3’)
Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
- anno_ch{‘DNA’, ‘actin’}, default: (‘DNA’,)
Which channel(s) to load as annotation. Make sure to give it as a Sequence when choose a single channel.
- drop_missing_pairsbool, default: True
Valid only if output=’both’. It will drop images that do not have mask pairs.
- Other Parameters
- image_ch{‘DNA’, ‘actin’}, default: (‘DNA’, ‘actin’)
Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
Warning
BBBC018_v1_images/10779 annotation is missing. len(anno_dict) = len(file_list) - 1; ind={26}
PosixPath(‘BBBC018_v1_images/10779-DNA.DIB’)
PosixPath(‘BBBC018_v1_images/10779-actin.DIB’)
PosixPath(‘BBBC018_v1_images/10779-pH3.DIB’)
This one is not properly saved after annotation. It has annotation overlaid on top to image. Need to filter ``mask==255`.
‘BBBC018_v1_outlines/17675-nuclei.png’
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
Every DIB has 3 channels (Order = (DNA,actin,pH3)). The second one is the object.
DNA -> Nuceli
Actin -> Cell
Annotation is outline one, but every anno is closed so binary_fill_holes works fine
For some reason annotation is y inverted
References
- property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#
A list of pathes to image files
- property anno_dict: Union[Dict[int, pathlib.Path], Dict[int, List[pathlib.Path]]]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.BBBC020(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('nuclei', 'cells'), anno_ch: Sequence[str] = ('nuclei',), drop_missing_pairs: bool = True, **kwargs)[source]#
Murine bone-marrow derived macrophages
The image set consists of 25 images, each consisting of three channels. The samples were stained with DAPI and CD11b/APC. In addition to this, a merged image is provided. DAPI labels the nuclei and CD11b/APC the cell surface.
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- image_ch{‘cell’, ‘nuclei’}, default: (‘cell’, ‘nuclei’)
Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
- anno_ch{‘nuclei’, ‘cells’}, default: (‘nuclei’,)
Which channel(s) to load as annotation. Make sure to give it as a Sequence when choose a single channel.
- drop_missing_pairsbool, default: True
Valid only if output=’both’. It will drop images that do not have mask pairs.
Warning
5 annotations are missing: ind={17,18,19,20,21} [jw-30min 1, jw-30min 2, jw-30min 3, jw-30min 4, jw-30min 5]
./BBBC020_v1_images/jw-30min 1/jw-30min 1_(c1+c5).TIF
./BBBC020_v1_images/jw-30min 2/jw-30min 2_(c1+c5).TIF
./BBBC020_v1_images/jw-30min 3/jw-30min 3_(c1+c5).TIF
./BBBC020_v1_images/jw-30min 4/jw-30min 4_(c1+c5).TIF
./BBBC020_v1_images/jw-30min 5/jw-30min 5_(c1+c5).TIF
BBC020_v1_outlines_nuclei/jw-15min 5_c5_43.TIF exists but corrupted
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
Anotations are instance segmented where each of them is saved as a single image file. It loads and aggregates them as a single array. Label loaded after will override the one loaded before. If you do not want this behavior, make a subclass out of this class and override
get_mask()
method, accordingly.- 2 channels; R channel is the same as G, R==G!=B
Assign 0 to red channel
BBBC has received a complaint that “BBB020_v1_outlines_nuclei” appears incomplete and we have been unable to obtain the missing images from the original contributor.
Nuclei anno looks good
Should separte nuclei and cells annotation; if
anno_ch=None
,anno_dict
becomes a mess.
References
- get_mask(lst_p: Union[List[pathlib.Path], List[List[pathlib.Path]]]) numpy.ndarray [source]#
Get a mask
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, List[pathlib.Path]]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.BBBC021(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', image_ch: Sequence[str] = ('DNA', 'actin', 'tublin'), **kwargs)[source]#
Human MCF7 cells – compound-profiling experiment [1]
The images are of MCF-7 breast cancer cells treated for 24 h with a collection of 113 small molecules at eight concentrations. The cells were fixed, labeled for DNA, F-actin, and Β-tubulin, and imaged by fluorescent microscopy as described [Caie et al. Molecular Cancer Therapeutics, 2010].
There are 39,600 image files (13,200 fields of view imaged in three channels) in TIFF format. We provide the images in 55 ZIP archives, one for each microtiter plate. The archives are ~750 MB each.
- Parameters
- root_dirstr
Path to root directory
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘equal’, ‘cv2’, Sequence[float]}, default: ‘equal’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- image_ch{‘DNA’, ‘actin’, ‘tublin’}, default: (‘DNA’, ‘actin’, ‘tublin’)
Which channel(s) to load as image. Make sure to give it as a Sequence when choose a single channel.
See also
Dataset
Base class
DatasetInterface
Interface
Notes
HUGE dataset
- 3 channels
w1 (DNA) -> Blue
w2 (actin?) -> Green
w4 (tublin??)-> Red
UINT16
References
- property file_list: Union[List[pathlib.Path], List[List[pathlib.Path]]]#
A list of pathes to image files
- class bioimageloader.collections.BBBC026(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#
Human Hepatocyte and Murine Fibroblast cells – Co-culture experiment
This 384-well plate has images of co-cultured hepatocytes and fibroblasts. Every other well is populated (A01, A03, …, C01, C03, …) such that 96 wells comprise the data. Each well has 9 sites and thus 9 images associated, totaling 864 images.
For each well there is one field and a single image nuclear image (Hoecsht). Images are in 8-bit PNG format.
- Parameters
- root_dirstr
Path to root directory
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.
See also
Dataset
Base class
DatasetInterface
Interface
Notes
Only centers are annotated for 5 imgages (not implemented)
References
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- class bioimageloader.collections.BBBC030(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#
Chinese Hamster Ovary Cells
The image set consists of 60 Differential Interference Contrast (DIC) images of Chinese Hamster Ovary (CHO) cells. The images are taken on an Olympus Cell-R microscope with a 20x lens at the time when the cell initiated their attachment to the bottom of the dish.
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- See Also
- ——–
- MaskDatasetSuper class
- DatasetInterfaceInterface
References
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.BBBC039(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, training: bool = True, **kwargs)[source]#
Nuclei of U2OS cells in a chemical screen [1]
This data set has a total of 200 fields of view of nuclei captured with fluorescence microscopy using the Hoechst stain. These images are a sample of the larger BBBC022 chemical screen. The images are stored as TIFF files with 520x696 pixels at 16 bits.
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- trainingbool, default: True
Load training set if True, else load testing one
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
- Split (training/valiadation/test)
training=True combines ‘training’ with ‘validation’
Annotate objs not touching each other with 1 and use 2, 3, … for the touching ones. It is great and clever, but it does not follow the form of other instance segmented masks.
get_mask()
will make a instance labeled mask (each obj has unique labels). After labeling max label is 231 for training, and 202 for test. So having masks of dtype UINT8 is fine.Max label is 3 (in original annotation)
Sample of larger BBBC022 and did manual segmentation
Possible overlap some with DSB2018
Mask is png but (instance) value is only stored in RED channel
Maximum value is 2**12
References
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.BBBC041(root_dir: str, *, transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', training: bool = True, **kwargs)[source]#
vivax (malaria) infected human blood smears [1]
Images are in .png or .jpg format. There are 3 sets of images consisting of 1364 images (~80,000 cells) with different researchers having prepared each one: from Brazil (Stefanie Lopes), from Southeast Asia (Benoit Malleret), and time course (Gabriel Rangel). Blood smears were stained with Giemsa reagent.
These images were contributed by Jane Hung of MIT and the Broad Institute in Cambridge, MA. [1]
There is also a Github reposity that lists malaria parasite imaging datasets (blood smears) [2].
- Parameters
- root_dirstr
Path to root directory
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
`transforms`
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- trainingbool, default: True
Load training set if True, else load testing one
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
Label categories: all 7 cats, [‘difficult’, ‘gametocyte’, ‘leukocyte’, ‘red blood cell’, ‘ring’, ‘schizont’, ‘trophozoite’]
1208/120 training/test split. So not 1368 images as written in the description.
png and jpg extension; training images are in RGB space in PNG format, while test images are in YUV space in JPEG format.
YUV will be automatically detected when read and cast to RGB
Two resolutions; depending on training/test: (1200, 1600) for training, (1383, 1944) for test
References
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- class bioimageloader.collections.Cellpose(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'equal', training: bool = True, gray_is_not_green: bool = True, specialized_data: bool = False, **kwargs)[source]#
-
Cellpose: a generalist algorithm for cellular segmentation
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- trainingbool, default: True
Load training set if True, else load testing one.
- gray_is_not_greenbool, default: True
Proper grayscale. Green channel value will be broadcast to all channels.
- specialized_databool, default: False
Load “specialized data” mentioned in the paper [1].
See also
MaskDataset
Super class
DatasetInterface
Interface
Notes
Download link is hard to find [3]
It is a complete dataset by itself, meaning that it is not intended to be mixed or concatenated with others. It consists of various sources of images, not only bioimages but also images of fruits, rocks and etc.
All images have 3 channels, but technically they are not RGB. Every images have values on the second channel and if there is more signal, then it goes to the first one. There is no image that has values on the last channel. As a result, when visualized in RGB, they look all green and red. In particular, for this reason, grayscale images have signal in the second channel and look green.
gray_is_not_green
argument address that.Built-in grayscale conversion methods are not correct for this dataset. The conversion should be channel-agnostic.
Currently,
gray_is_not_green=False
andgrayscale=True
will reduce values of single channel images 1/3 times.
References
- 1(1,2)
C. Stringer, M. Michaelos, and M. Pachitariu, “Cellpose: a generalist algorithm for cellular segmentation,” bioRxiv, p. 2020.02.02.931238, Feb. 2020, doi: 10.1101/2020.02.02.931238.
- 2
- 3
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.ComputationalPathology(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', mask_tif: bool = False, **kwargs)[source]#
A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology [1]
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
`transforms`
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- mask_tifbool, default: False
Instead of parsing every xml file to reconstruct mask image arrays, use pre-drawn mask tif files which should reside in the same folder as annotation xml files.
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
Resolution of all images is (1000,1000)
gt is converted from annotation recorded in xml format
gt has dtype of torch.float64, converted from numpy.uint16, and it has value ‘num_objects’ * 255 because it is base-transformed
The origianl dataset provides annotation in xml format, which takes long time to parse and to reconstruct mask images dynamically during training. Drawing masks beforehand makes training much faster. Use
mask_tif
in that case.When
augmenters
is provided, set thenum_samples
argument 30x1000x1000 -> 16x30=480 patches. Thus, the defaultnum_samples=720
(x1.5)dtype of ‘gt’ is int16. However, to make batching easier, it will be casted to float32
Be careful about types of augmenters; avoid interpolation
References
- 1
N. Kumar, R. Verma, S. Sharma, S. Bhargava, A. Vahadane, and A. Sethi, “A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology,” IEEE Transactions on Medical Imaging, vol. 36, no. 7, pp. 1550–1560, Jul. 2017, doi: 10.1109/TMI.2017.2677499.
- save_xml_to_tif()[source]#
Parse .xml to mask and write it as tiff file
Having masks in images is much faster than parsing .xml for each call. This func iterates through
anno_dict
, parse and save each in .tif format in the same annotation directory. Re-initiate an instance withmask_tif
argument to load them.
- property file_list: list#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.DSB2018(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', training: bool = True, **kwargs)[source]#
Data Science Bowl 2018 [1] also known as BBBC038 [2]
Find the nuclei in divergent images to advance medical discovery
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- trainingbool, default: True
Load training set if True, else load testing one
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
References
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Union[Dict[int, List[pathlib.Path]], Dict[int, dict]]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.DigitalPathology(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', **kwargs)[source]#
Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases [1]
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
Annotation is partial
Boolean mask to UINT8 mask (0, 255)
References
- 1
A. Janowczyk and A. Madabhushi, “Deep learning for digital pathology image analysis: A comprehensive tutorial with selected use cases,” J Pathol Inform, vol. 7, Jul. 2016, doi: 10.4103/2153-3539.186902.
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.FRUNet(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, normalize: bool = True, **kwargs)[source]#
FRU-Net: Robust Segmentation of Small Extracellular Vesicles [1]
TEM images
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- normalizebool, default: True
Normalize each image by its maximum value and cast it to UINT8.
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
Originally, dtype is UINT16
Max value is 20444, but contrast varies a lot. For example, some images have value less than 0.05 of 2^16, which makes images not visible. Normalization may be needed. Init param
normalize
is set to True by default for this reason. For each image, it calculates maximum value and divide by it.
References
- 1
E. Gómez-de-Mariscal, M. Maška, A. Kotrbová, V. Pospíchalová, P. Matula, and A. Muñoz-Barrutia, “Deep-Learning-Based Segmentation of Small Extracellular Vesicles in Transmission Electron Microscopy Images,” Scientific Reports, vol. 9, no. 1, Art. no. 1, Sep. 2019, doi: 10.1038/s41598-019-49431-3.
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.LIVECell(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, training: bool = True, mask_tif: bool = False, **kwargs)[source]#
LIVECEll: A large-scale dataset for label-free live cell segmentation [1]
“LIVECell - A large-scale dataset for label-free live cell segmentation” by Edlund et. al. 2021 [2]
Light microscopy is a cheap, accessible, non-invasive modality that when combined with well-established protocols of two-dimensional cell culture facilitates high-throughput quantitative imaging to study biological phenomena. Accurate segmentation of individual cells enables exploration of complex biological questions, but this requires sophisticated imaging processing pipelines due to the low contrast and high object density. Deep learning-based methods are considered state-of-the-art for most computer vision problems but require vast amounts of annotated data, for which there is no suitable resource available in the field of label-free cellular imaging. To address this gap we present LIVECell, a high-quality, manually annotated and expert-validated dataset that is the largest of its kind to date, consisting of over 1.6 million cells from a diverse set of cell morphologies and culture densities. To further demonstrate its utility, we provide convolutional neural network-based models trained and evaluated on LIVECell.
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- trainingbool, default: True
Load training set if True, else load testing one
- mask_tifbool, default: False
Use saved COCO annotations as tif mask images in a new ./root_dir/masks directory. It will greatly improve loading speed. Available after calling
save_coco_to_tif()
.
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
Annotation in MS COCO format [3]. Parsing it takes time`.
Currently not supporting dynamically parsing COCO annotation due to slow speed. Pre-parse masks in .tif format by calling
save_coco_to_tif()
.Validation set is originally separted from training set. Currently they are combined
training=True
.Single cells subsets are not covered
References
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.MurphyLab(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, drop_missing_pairs: bool = True, drop_broken_files: bool = True, filled_mask: bool = False, **kwargs)[source]#
Nuclei Segmentation In Microscope Cell Images: A Hand-Segmented Dataset And Comparison Of Algorithms [1]
- Parameters
- root_dirstr or pathlib.Path
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- drop_missing_pairsbool, default: True
Valid only if output=’both’. It will drop images that do not have mask pairs.
- drop_broken_filesbool, default: True
Drop broken files that cannot be read
- filled_maskbool, default: False
Use saved filled masks through fill_save_mask() method instead of default boundary masks. If one would want to use manually modified masks, the annotation files should have the same name as ‘*.xcf’ with modified suffix by ‘.png’.
Warning
This dataset has many issues whose details can be found below. The simpleset way is to drop those that cause isseus. It is recommended to not opt out
drop_missing_pairs()
anddrop_broken_files()
. Otherwise, it will meet exceptions.If one wants filled hole,
fill_save_mask()
function will fill holes with some tricks to handle edge cases and save them as .png format. Then setfilled_mask
argument to True to load them.Read more in Notes section
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
4 channel PNG format annotation mask even though mask is binary. But it is not grayscale binary. They put value 255 only in red channel.
Two annotation formats; Photoshop and GIMP. It seems that two annotators worked separately. segmented-ashariff will be ignored. In total, 97 segmented images (out of 100)
- 3 missing segmentations: ind={31, 43, 75}
./data/images/dna-images/gnf/dna-31.png
./data/images/dna-images/gnf/dna-43.png
./data/images/dna-images/ic100/dna-25.png
Manually filled annotation to make masks using GIMP
2009_ISBI_2DNuclei_code_data/data/images/segmented-lpc/ic100/dna-15.xcf does not have ‘borders’ layer like the others. This one alone has ‘border’ layer.
References
- 1
L. P. Coelho, A. Shariff, and R. F. Murphy, “Nuclear segmentation in microscope cell images: A hand-segmented dataset and comparison of algorithms,” in 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Jun. 2009, pp. 518–521, doi: 10.1109/ISBI.2009.5193098.
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- fill_save_mask()[source]#
Fill holes from boundary mask with some tricks
Requires scipy and scikit-image. Install depencency with pip option
pip install bioimageloader[process]
.Note that this does not result perfect filled masks. Those not entirely closed by this algorithm (36, 40, 63).
Other issues:
ind=63
: ‘border’ not ‘borders’,ind=93
GimpDocument
cannot read it…
- class bioimageloader.collections.S_BSST265(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, **kwargs)[source]#
An annotated fluorescence image dataset for training nuclear segmentation methods [1]
Immuno Fluorescence (IF) images, designed for ML
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
All images have grayscale BUT some have 3 channels
rawimages: Raw nuclear images in TIFF format
groundtruth: Annotated masks in TIFF format
groundtruth_svgs: SVG-Files for each annotated masks and corresponding raw image in JPEG format
singlecell_groundtruth: Groundtruth for randomly selected nuclei of the testset (25 nuclei per testset class, a subset of all nuclei of the testset classes; human experts can compete with this low number of nuclei per subset by calculating Dice coefficients between their annotations and the groundtruth annotations)
visualized_groundtruth: Visualization of groundtruth masks in PNG format
visualized_singlecell_groundtruth: Visualization of groundtruth for randomly selected nuclei in PNG format
Find more info in README.txt inside the root directory
References
- 1
F. Kromp et al., “An annotated fluorescence image dataset for training nuclear segmentation methods,” Scientific Data, vol. 7, no. 1, Art. no. 1, Aug. 2020, doi: 10.1038/s41597-020-00608-w.
- property file_list: list#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.StarDist(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, training: bool = True, **kwargs)[source]#
-
Cell Detection with Star-convex Polygons
StarDist data is a subset of Data Science Bowl 2018 [3]
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- trainingbool, default: True
Load training set if True, else load testing one
See also
DSB2018
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
StarDist data is a subset of Data Science Bowl 2018 [3]. Choose only one, do not mix them.
root_dir
is not ‘dsb2018’ even though the archive name is ‘dsb2018’, because it conflicts with the original DSB2018. Make a new directory.All images have grayscale
References
- 1
U. Schmidt, M. Weigert, C. Broaddus, and G. Myers, “Cell Detection with Star-convex Polygons,” arXiv:1806.03535 [cs], vol. 11071, pp. 265–273, 2018, doi: 10.1007/978-3-030-00934-2_30.
- 2
https://github.com/stardist/stardist/releases/download/0.1.0/dsb2018.zip
- 3(1,2)
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files
- class bioimageloader.collections.TNBC(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', **kwargs)[source]#
TNBC Nuclei Segmentation Dataset [1]
- Parameters
- root_dirstr
Path to root directory
- output{‘both’, ‘image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
References
- 1
Segmentation of Nuclei in Histopathology Images by Deep Regression of the Distance Map, https://ieeexplore.ieee.org/document/8438559
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
anno_dict[ind] = <file>
- class bioimageloader.collections.UCSB(root_dir: str, *, output: str = 'both', transforms: Optional[albumentations.core.composition.Compose] = None, num_samples: Optional[int] = None, grayscale: bool = False, grayscale_mode: Union[str, Sequence[float]] = 'cv2', category: Sequence[str] = ('malignant',), **kwargs)[source]#
A biosegmentation benchmark for evaluation of bioimage analysis methods
- Parameters
- root_dirstr
Path to root directory
- output{‘both’,’ image’, ‘mask’}, default: ‘both’
Change outputs. ‘both’ returns {‘image’: image, ‘mask’: mask}.
- transformsalbumentations.Compose, optional
An instance of Compose (albumentations pkg) that defines augmentation in sequence.
- num_samplesint, optional
Useful when
transforms
is set. Define the total length of the dataset. If it is set, it overwrites__len__
.- grayscalebool, default: False
Convert images to grayscale
- grayscale_mode{‘cv2’, ‘equal’, Sequence[float]}, default: ‘cv2’
How to convert to grayscale. If set to ‘cv2’, it follows opencv implementation. Else if set to ‘equal’, it sums up values along channel axis, then divides it by the number of expected channels.
- category{‘benign’, ‘malignant’}, default: (‘malignant’,)
Select which category of output you want
See also
MaskDataset
Super class
Dataset
Base class
DatasetInterface
Interface
Notes
32 ‘benign’, 26 ‘malignant’ images (58 images in total)
58x768x896 -> ~600 patches. Thus, the defulat num_samples=900 (x1.5).
Images are not fully annotated
References
- 1
E. Drelie Gelasca, B. Obara, D. Fedorov, K. Kvilekval, and B. Manjunath, “A biosegmentation benchmark for evaluation of bioimage analysis methods,” BMC Bioinformatics, vol. 10, p. 368, Nov. 2009, doi: 10.1186/1471-2105-10-368.
- property file_list: List[pathlib.Path]#
A list of pathes to image files
- property anno_dict: Dict[int, pathlib.Path]#
Dictionary of pathes to annotation files