3D GAN image synthesis and dataset quality assessment for bacterial biofilm

Abstract

Motivation

Data-driven deep learning techniques usually require a large quantity of labeled training data to achieve reliable solutions in bioimage analysis. However, noisy image conditions and high cell density in bacterial biofilm images make 3D cell annotations difficult to obtain. Alternatively, data augmentation via synthetic data generation is attempted, but current methods fail to produce realistic images.

Results

This article presents a bioimage synthesis and assessment workflow with application to augment bacterial biofilm images. 3D cyclic generative adversarial networks (GAN) with unbalanced cycle consistency loss functions are exploited in order to synthesize 3D biofilm images from binary cell labels. Then, a stochastic synthetic dataset quality assessment (SSQA) measure that compares statistical appearance similarity between random patches from random images in two datasets is proposed. Both SSQA scores and other existing image quality measures indicate that the proposed 3D Cyclic GAN, along with the unbalanced loss function, provides a reliably realistic (as measured by mean opinion score) 3D synthetic biofilm image. In 3D cell segmentation experiments, a GAN-augmented training model also presents more realistic signal-to-background intensity ratio and improved cell counting accuracy.

Availability and implementation

https://github.com/jwang-c/DeepBiofilm.

Supplementary information

Supplementary data are available at Bioinformatics online.

1 Introduction

Analyzing the single-cell behaviors in the densely-packed bacterial biofilms is key to providing insight for effective treatment of bacterial infectious diseases, such as cystic fibrosis pneumonia (Zhang et al., 2020). However, single-cell analysis in 3D biofilm images remains an open challenge due to densely-packed communities, inhomogeneous intensity and noise (Wang et al., 2017). The advent of deep learning represents an opportunity to solve challenging problems in single-cell analysis. However, such a data-driven approach requires a large quantity of labeled training data to achieve accurate and reliable solutions (Liu et al., 2021b). The current gold standard for labeling biological images is manual annotation, which can be extremely time-consuming and inaccurate, especially for 3D data. The annotation process is inconsistent among different annotators, as fluorescence intensity is often not uniform in cells and cell boundaries are not distinct. It is even more difficult, or nearly impossible, to annotate individual non-spherical cells when densely packed in 3D (Fig. 3, first column), like bacterial cells, for single-cell analysis purpose. Therefore, an automatic ground truth generation method is important for deep learning tasks to obtain a sufficient training dataset.

Data augmentation for microscopy images: For single-cell analysis, major strategies to augment the training dataset before applying deep learning include: (i) performing classic transformations (e.g. scaling, translation, rotation) on limited manually annotated data (Abdollahi et al., 2020; Bloice et al., 2019), which is error-prone to annotated data quality; (ii) simulating the volumes using optical and biological model-based knowledge (Lindén et al., 2016; Zhang et al., 2020) that cannot take into account uncalibrated image aberrations and illumination/emission heterogeneity; and (3) generating synthetic datasets using generative adversarial networks (GAN) (Chen et al., 2021; Dunn et al., 2019). Among all, CycleGAN-based approaches generate images via mapping the distribution between the input and output image domains in a bidirectional manner, which can reproduce realistic scenarios in biomedical datasets (Dimitrakopoulos et al., 2020; Liu et al., 2021a; Sandfort et al., 2019). Most importantly, such a model does not need the input and output images to be structurally aligned to each other, denoted as an unpaired image-to-image translation (Zhu et al., 2017). A successful extension of CycleGAN was tested for a cell label-to-image translation task in Fu et al. (2018), which added spatial consistency loss to reduce the cell location drift in generated images from input binary labels. While works like Fu et al. (2018) and Sandfort et al. (2019) perform training on 2D slices in the 3D images to generate synthetic 3D stacks, they cannot provide sufficient z-axial signal continuity in our bacterial biofilm images. Some works extend the 2D GAN to a 3D GAN (Abramian and Eklund, 2019; Zhang et al., 2018), but they cannot map unmatched labels to images in the training dataset.

Synthetic image quality assessment: The image quality assessment of GAN outputs is another critical step needed to complete the image synthesis work in this project, especially when the output dataset is not structurally aligned with the reference dataset. Image quality assessment assists in selecting a high qualitative synthetic dataset for further deep learning tasks. The regular fully-referenced image quality assessment metrics [e.g. SSIM (Wang et al., 2004)] or comparison-based blind quality assessment methods [e.g. C-IQA (Liang and Weller, 2016)] cannot accommodate our application, because they rely on the overall matching structure and distortion between the ground truth and generated images. The inception score (IS) and the Fréchet inception distance (FID) are more commonly used in evaluating GAN outputs (Heusel et al., 2017; Zhu et al., 2017). However, IS and FID both utilize a pre-trained inception v3 model of 2D images, which cannot directly accommodate 3D dataset evaluation. To automatically assess and learn the conditions in 3D GAN outputs, this article presents a comparison-based stochastic synthetic dataset quality assessment (SSQA) measure that evaluates the relative intensity-wise quality of 2D/3D synthetic dataset compared to the real dataset.

What to expect in this article? This article explores the ideas of unpaired label-to-image translation with CycleGANs (Fu et al., 2018; Zhu et al., 2017), and extends the networks to 3D in order to generate bacterial biofilm image data. Additionally, an unbalanced cycle-consistency loss is presented (Section 2.2) to achieve optimal synthesized biofilm data in 3D when evaluating multiple image quality assessment metrics. Since ground truth annotation is not available in most unpaired image-to-image translation tasks, a stochastic synthetic data quality assessment (SSQA) scheme is also proposed (Section 2.3). Taking the advantage of SSQA and realistic 3D GAN outputs, the biofilm image analysis pipeline can be further improved in terms of higher single-cell counting accuracy and better signal-to-background intensity measure (Section 3.4).

2 Materials and methods

2.1 Training dataset

The 3D biofilm dataset is obtained by way of lattice light-sheet microscopy (LLSM) (Zhang et al., 2020). LLSM is able to look into the dense aggregations of cells in vivo because of its low photo-toxicity and high spatial resolution (Gahlmann and Moerner, 2014). Escherichia coli bacterial cells are used in this article with cytosolic expression of green fluorescent protein. There are 300 3D images with size $128 \times 128 \times 8$ voxels used for training. These training images contain three time points in biofilm development (Fig. 3, first column). Each image has voxel size of 100 nm $\times 100$ nm $\times 100$ nm. All the real images are pre-processed with normalization and contrast enhancement that saturates the bottom 1% and the top 1% of all voxel values. The label set contains 300 binary images generated by CellModeller (Rudge et al., 2012) with local densities that approximately match the real image conditions (Fig. 3, second column). For testing, another 300 unseen label images were used to generate synthetic images.

2.2 Biofilm image synthesis with 3D cyclic GAN

2.2.1 3D GAN architecture

The basic framework of the learning module follows the benchmark image synthesis model with CycleGAN (Zhu et al., 2017), which consists of four parts: two generators and two discriminators (Fig. 1). The generators (G_LR, G_RL) attempt to generate synthetic LLSM images from a binary label set, while discriminators (D_L, D_R) try to distinguish generated synthetic data from the real data. We adopt the architectures of the downsampling/upsampling style generator with six residual blocks (He et al., 2016) at the bottleneck and patchGANs discriminator with three convolutional layers as used in Fu et al. (2018), which presented promising results in 2D cases. In both networks, we change all the 2D layers to 3D ones. In addition, the original 70 × 70 2D patchGANs discriminator is changed to patch size $8 \times 65 \times 65$ (depth × height × width) for input of a biofilm image. Replication padding, instance normalization, and the ReLu activation function are used in the generator with employment of the hyperbolic tangent function in the last layer. The discriminator uses LeakyRelu(0.2) and batch normalization. Details about the model parameters of each layer implemented with PyTorch can be found in Supplementary Figure S1.

Fig. 1.

Pipeline for image synthesis and assessment for LLSM 3D bacterial biofilm images. All the images shown are 3D rendered in ImageJ. Step 1: learning the label-to-image translation using 3D cyclic GAN, which contains two generators (G_LR, G_RL), and two discriminators (D_L, D_R). Step 2: the learned translation from labels to real images is used to generate synthetic data. Step 3: evaluating the quality of synthetic data with SSQA

Open in new tab Download slide

2.2.2 Loss function

To learn the mapping functions between labels (⁠

l_{i} \in L

⁠) and microscopy real images (⁠

r_{i} \in R

⁠), where

i \in [1, T]

and T is the number of training samples, the original CycleGAN loss function and several variations are tested for application to biofilm data. Generally, the total loss function (⁠

L

⁠) consists of three parts (Zhu et al., 2017):

\begin{matrix} L = L_{GAN} (G_{L R}, D_{R}, L, R) + L_{GAN} (G_{R L}, D_{L}, R, L) \\ + λ L_{cyc} (G_{L R}, G_{R L}) \end{matrix}

(1)

where λ is a control parameter, which is set to 10 in the experiments such that all losses are on the same scale.

The first two terms in the above GAN loss assess the mappings

L \to R

and

R \to L

using a least-squares formulation (Mao et al., 2017) to avoid vanishing gradients in the biofilm study. An example of direction

L \to R

is formulated in Equation (2). On one hand, the generator G_LR is optimized to generate similar realistic images

G_{L R} (l_{i})

to the target output images r_i so that the discriminator D_R can assign class 1 (real) to the generated images. On the other hand, D_R is trained to maximally discriminate the two classes correctly, i.e., by identifying the original real images as class 1 and the generated images as class 0. Both optimization processes are realized by minimizing the mean least-squares error of T image comparisons.

\begin{matrix} \min_{G_{L R}} \min_{D_{R}} L_{GAN} (G_{L R}, D_{R}, R, L) \\ = \min_{G_{L R}} \frac{1}{T} \sum_{i = 1}^{T} | | D_{R} (G_{L R} (l_{i})) - 1 | |_{2} \\ + \min_{D_{R}} \frac{1}{T} \sum_{i = 1}^{T} (| | D_{R} (r_{i}) - 1 | |_{2} + | | D_{R} (G_{L R} (l_{i})) - 0 | |_{2}) \end{matrix}

(2)

The last cycle consistency loss

L_{cyc}

tries to ensure that the cyclic translation process can bring back input images from the generated images (Zhu et al., 2017), by minimizing the mean error evaluated by

p -

norm between an input image and the cycled back image, i.e.

G_{R L} (G_{L R} (l_{i})) \approx l_{i}

and

G_{L R} (G_{R L} (r_{i})) \approx r_{i}

⁠. We generalize and modify the original loss formula as:

\begin{matrix} L_{cyc} (G_{L R}, G_{R L}) = \frac{1}{T} \sum_{i = 1}^{T} (α | | G_{R L} (G_{L R} (l_{i})) - l_{i} | |_{p_{1}} \\ + β | | G_{L R} (G_{R L} (r_{i})) - r_{i} | |_{p_{2}}) \end{matrix}

(3)

where α and β are weighting factors.

ℓ_{1}

-norm, as

p_{1} = p_{2} = 1

for both mapping directions, is used in Zhu et al. (2017) and many other cross-modality image synthesis scenarios (Yang et al., 2020). However, the work on generating synthetic cell volumes from binary cell labels in Fu et al. (2018) suggested that displacements in cell location were observed. Accordingly, Fu et al. (2018) added a spatial consistency loss, which is an additional

ℓ_{2}

-norm that computes the pixel-wise mean squared error between l_i and

H (G_{L R} (l_{i}))

⁠. Here, H is the same structure as G_LR.

As the lack of paired data is particularly challenging in our case, and the noise levels and cell sizes differ significantly from those of the original 2D data used in Fu et al. (2018), we investigate different loss functions in the backbone network (CycleGAN) to determine which form provides both the best distribution and spatial consistency for biofilm data. The decision between choosing $ℓ_{1}$ -norm and $ℓ_{2}$ -norm in loss functions is always of critical interest. Although $ℓ_{1}$ -norm is robust for data with outliers, $ℓ_{2}$ -norm provides unique solutions. Thus, different combinations of p₁ and p₂ values, including the options listed above, are compared below. We propose an unbalanced $ℓ_{21}$ cycle-consistency loss, that changing $p_{1} = 2$ and increasing the weighting of cycle mapping direction $L \to R \to L$ by a factor of α = 2 to preserve spatial consistency. In this case, the need for training another network H in 3D will be reduced. Details about the comparison of different loss functions are involved in the next section.

2.3 Synthetic bioimage quality assessment with SSQA

To evaluate the image appearance quality of a generated synthetic dataset compared to the real LLSM biofilm dataset, a Stochastic patch-based Synthetic dataset Quality Assessment (SSQA) method is proposed.

The proposed SSQA, as depicted in Figure 2, is a stochastic approach that takes random patches from random images for quality evaluation and statistical observation analysis. SSQA contains four stages. In the first stage, three images are randomly picked from two sets, J₀ and J from the real and V from the synthetic dataset. This stage is repeated K times, and an SSQA score is computed at each time. At the second stage, the comparison-based similarity measure

S_{Data}^{(k, n)}

is calculated between two randomly cropped patches with size W³ from two pairs of images:

J_{0} \leftrightarrow J

and

J \leftrightarrow V

⁠. Here, the similarity score obtained from

J_{0} \leftrightarrow J

pair is treated as a reference intra-class comparison.

S_{Data}^{(k, n)}

denotes the similarity score between the compared images

Data = {J, V}

and the images J in the real dataset at the n^th patch selection and the k^th image comparison. This score is formulated based on the luminance and contrast indices from Wang et al. (2004):

S_{Data}^{(k, n)} = \frac{2 μ_{data} \cdot μ_{j} + C_{1}}{μ_{data}^{2} + μ_{j}^{2} + C_{1}} \cdot \frac{2 σ_{data} \cdot σ_{j} + C_{2}}{σ_{data}^{2} + σ_{j}^{2} + C_{2}}

(4)

where μ_data and μ_j, and σ_data and σ_j are the means and standard deviations extracted from patches data and j, respectively. Patches data and j are randomly picked from Data and J. C₁ and C₂ are just constant values to avoid a zero denominator. Each patch similarity score is between zero and one. The value of W can change the values of SSQA, as it reflects the local features in intensity. For the choice in this article, we are trying to compare the random regions that can include the interior intensity of a cell, so W = 4 pixels is chosen as the average radius (400 nm) of bacterial cells. In fact, the patterns of SSQA frequencies for different window sizes W were found to be similar to each other.

Fig. 2.

Flowchart to compute and analyze SSQA

Open in new tab Download slide

The third stage of SSQA is to repeat the second stage for N times, and then find the distribution of similarity S_Data at k^th image comparison. The ‘real vs. synthetic’ SSQA^k score from randomly picked images is the Bhattacharyya distance between two comparison-based similarity distributions,

S_{J}^{k}

and

S_{V}^{k}

⁠. The SSQA^k is defined as:

SSQ A^{k} = - l n (\sum_{x = 1}^{X} \sqrt{S_{J}^{k} (x) \cdot S_{V}^{k} (x)})

(5)

where

x \in [0, X]

is the domain of discrete probability distribution. Bhattacharyya distance measures the overlapping amount of two distributions. It does not require an order of input distributions and can avoid a zero denominator when compared to KL-divergence. Its value ranges from zero to infinity, where a zero value means that the two distributions are identical. The distribution of patch similarity is discretized with 100 quantization levels, which results in a small SSQA (⁠

\sim 10^{- 2}

⁠) as shown in Figure 4.

Finally, the results of K cross-dataset (real vs. synthetic) SSQA scores are compared graphically in Figure 4. To achieve statistically meaningful SSQA comparisons, an adequate number of stochastic samples are needed in each evaluation based on image size and image numbers in each dataset. In this article, we provide results with $K \times N = 600 \times 10000$ samples. Statistical observations (e.g. mean, standard deviation) and intersection-over-union (IoU) of two SSQA histograms are analyzed as listed in Table 1. In terms of a single SSQA, a value closer to zero means better image relative quality. When we evaluate an unpaired dataset containing multiple image conditions, the comparison between intra-dataset (real versus real) and cross-dataset (real versus target) SSQA measures is needed. In this case, a smaller difference between the statistical observations compared to the reference (inter real dataset SSQA₀) denotes more similar dataset conditions. The mean values of SSQA, along with standard deviations, are shown in Table 1. The IoU values of SSQAs exhibit quantitative comparison of different synthetic datasets, where a value closer to 1 means the distributions of image intensity-wise qualities in the two datasets are more similar.

Table 1.

Open in new tab

Comparison of dataset quality assessment measures (columns) evaluated on different datasets (rows)

Dataset	MOS	FID	BRISQUE	SBR	SSQA	IoU(SSQA, SSQA₀)
Reference	3.981 ± 1.184	1.170e−10	36.252 ± 5.997	1.767	0.026 ± 0.029	1
Simulation	–	122.147	41.096 ± 3.269	1.567	0.019 ± 0.023	0.4670
SpCycleGAN	2.967 ± 1.460	45.188	34.124 ± 6.796	0.926	0.030 ± 0.027	0.6173
SpCycleGAN 3D	3.083 ± 1.198	103.074	38.478 ± 2.187	1.234	0.018 ± 0.018	0.3652
CycleGAN 3D	3.008 ± 1.345	106.938	38.092 ± 5.439	1.364	0.021 ± 0.025	0.4371
Ours: 3D Cyclic GAN (⁠ $ℓ_{21}$ ⁠)	3.381 ± 1.241	80.798	34.238 ± 5.155	1.418	0.028 ± 0.029	0.6575

Dataset	MOS	FID	BRISQUE	SBR	SSQA	IoU(SSQA, SSQA₀)
Reference	3.981 ± 1.184	1.170e−10	36.252 ± 5.997	1.767	0.026 ± 0.029	1
Simulation	–	122.147	41.096 ± 3.269	1.567	0.019 ± 0.023	0.4670
SpCycleGAN	2.967 ± 1.460	45.188	34.124 ± 6.796	0.926	0.030 ± 0.027	0.6173
SpCycleGAN 3D	3.083 ± 1.198	103.074	38.478 ± 2.187	1.234	0.018 ± 0.018	0.3652
CycleGAN 3D	3.008 ± 1.345	106.938	38.092 ± 5.439	1.364	0.021 ± 0.025	0.4371
Ours: 3D Cyclic GAN (⁠ $ℓ_{21}$ ⁠)	3.381 ± 1.241	80.798	34.238 ± 5.155	1.418	0.028 ± 0.029	0.6575

Note: FID (Heusel et al., 2017) and BRISQUE are performed on 2D slices in 3D stacks, while the others directly evaluate 3D images. MOS, BRISQUE and SSQA are mean ± standard deviation values. The SBR scores are also averaged over all images in the dataset. Reference denotes the intra-real dataset quality statistics. The closer to the Reference scores, the better relative dataset image quality. The values highlighted in bold are the best results, and the underlined ones in italics are second best.

Table 1.

Open in new tab

Comparison of dataset quality assessment measures (columns) evaluated on different datasets (rows)

Dataset	MOS	FID	BRISQUE	SBR	SSQA	IoU(SSQA, SSQA₀)
Reference	3.981 ± 1.184	1.170e−10	36.252 ± 5.997	1.767	0.026 ± 0.029	1
Simulation	–	122.147	41.096 ± 3.269	1.567	0.019 ± 0.023	0.4670
SpCycleGAN	2.967 ± 1.460	45.188	34.124 ± 6.796	0.926	0.030 ± 0.027	0.6173
SpCycleGAN 3D	3.083 ± 1.198	103.074	38.478 ± 2.187	1.234	0.018 ± 0.018	0.3652
CycleGAN 3D	3.008 ± 1.345	106.938	38.092 ± 5.439	1.364	0.021 ± 0.025	0.4371
Ours: 3D Cyclic GAN (⁠ $ℓ_{21}$ ⁠)	3.381 ± 1.241	80.798	34.238 ± 5.155	1.418	0.028 ± 0.029	0.6575

Dataset	MOS	FID	BRISQUE	SBR	SSQA	IoU(SSQA, SSQA₀)
Reference	3.981 ± 1.184	1.170e−10	36.252 ± 5.997	1.767	0.026 ± 0.029	1
Simulation	–	122.147	41.096 ± 3.269	1.567	0.019 ± 0.023	0.4670
SpCycleGAN	2.967 ± 1.460	45.188	34.124 ± 6.796	0.926	0.030 ± 0.027	0.6173
SpCycleGAN 3D	3.083 ± 1.198	103.074	38.478 ± 2.187	1.234	0.018 ± 0.018	0.3652
CycleGAN 3D	3.008 ± 1.345	106.938	38.092 ± 5.439	1.364	0.021 ± 0.025	0.4371
Ours: 3D Cyclic GAN (⁠ $ℓ_{21}$ ⁠)	3.381 ± 1.241	80.798	34.238 ± 5.155	1.418	0.028 ± 0.029	0.6575

3 Implementations and discussions

3.1 GAN training and testing

The training of 3D cyclic GAN in this article follows the basic setup in Zhu et al. (2017), which uses Adam optimizer with batch size of one and learning rate of $2 e^{- 4}$ ⁠. The learning rate decays linearly after half of the total epochs are completed. Four loss functions are compared: the original SpCycleGAN (Fu et al., 2018), the SpCycleGAN loss function applied in 3D GAN, the original CycleGAN extended to 3D, and the modified unbalanced cycle consistency loss. Other combinations of p₁ and p₂ values are also tested, but they did not provide reasonable outputs. SpCycleGAN was trained with all the 2D z-sliced images in the 3D stacks, and the other networks are directly trained using 3D cyclic GAN with 300 training data. Each model is trained separately with one NVIDIA Titan v GPU, which takes about a day for training and few minutes for testing. The training time elapse for all the networks follows: SpCycleGAN < 3D Cyclic GAN ( $ℓ_{21}$ ) < CycleGAN 3D < SpCycleGAN 3D. The testing, or image generation, is performed on another 3D dataset containing 300 images that have never appeared in the training set. Additionally, the model-based simulated dataset and reference real images are shown in Figure 3.

Fig. 3.

Qualitative comparison of different synthesized results. The images in each column are one slice-view along z-axis in the 3D volume. Real: real enhanced LLSM experimental 3D images, which contain three average biofilm local densities (D), 54.8%, 59.0% and 64.6%, and three signal-to-background ratios (SBR), 2.2, 1.8 and 1.3 (Zhang et al., 2020). Label: label images obtained by CellModeller with comparable local density conditions to the first column. The synthetic outputs for comparison are from model-based simulation (Zhang et al., 2020), original SpCycleGAN (Fu et al., 2018), SpCycleGAN loss in 3D, CycleGAN loss (Zhu et al., 2017) in 3D and our modified 3D cyclic GAN with $(p_{1}, p_{2}) = (2, 1)$ and $(α, β) = (2, 1)$ in Equation (3). More visual comparisons can be found in Supplementary Figure S2

Open in new tab Download slide

3.2 Other evaluation metrics

Four other metrics in the literature are evaluated for data quality comparison: human subjective mean opinion score (MOS) for GAN outputs, deep feature-based dataset similarity analysis with FID (Heusel et al., 2017), distortion-based image quality evaluation with BRISQUE (Mittal et al., 2012), and location correspondence evaluation with signal-to-background ratio (SBR) analysis.

Mean opinion score (MOS): The MOS is a reference score that considers human subjective opinion to assess how realistic human observers perceive the GAN output images. Five categories from 1 to 5 are used for the purpose of this article, where each scale denotes one quality level in ‘1—fake’, ‘2—likely to be fake’, ‘3—not sure’, ‘4—likely to be real’ and ‘5—real’. Eighteen participants were each given 100 randomly chosen images from five different datasets. The numbers of images from each set are equal. Each image includes a 2D slice view of the 3D data for evaluation of spatial appearance and a side view examining axial signal continuity. The final score is averaged over the number of participants and the number of data from each dataset. A more realistic dataset will yield a MOS score closer to 5.

Fréchet inception distance (FID): FID score evaluates the quality of GAN outputs based on the statistics from the original real training dataset as well as the statistics in the target outputs (Heusel et al., 2017). It uses a pre-trained inception v3 model to extract deep features that discriminate between real and generated images in terms of mean and covariance. Then, FID calculates the difference of two feature distributions using Fréchet distance. The lower the FID, the better the dataset quality. FID was found to be well-correlated with human judgment, but the inception v3 model was trained on ImageNet 2D images with no 3D interface as of yet. Therefore, only 2D slices in the 3D stacks were evaluated as shown in Table 1.

Blind/referenceless image spatial quality evaluator (BRISQUE): BRISQUE evaluates the distortions in images without the need for corresponding reference images. It measures the deviation of the distributions in normalized distorted images from natural scene images which follow the Gaussian distribution (Mittal et al., 2012). Features in intensity and pixel neighbors are extracted and further analyzed to get a quality score using a support vector regressor. This article uses the original version of BRISQUE, which predicts the distortions in 2D images. For the comparison of dataset quality, a smaller difference of BRISQUE value between the reference and target synthetic dataset is better.

Location correspondence with SBR: The evaluation of location correspondence is performed by overlaying the ground truth labels on generated synthetic data. The values of mean intensity and standard deviation in ‘cell regions’ and ‘background regions’ are extracted. To compare the difference among different datasets, SBR is calculated as described in Zhang et al. (2020) that takes the ratio of foreground to background mean intensities. When the location correspondence is lower, the mean intensity in the foreground gets lower due to the lack of foreground signals, and the mean intensity in the background is higher because of more wrongly-positioned signals. Thus, a higher SBR that is closer to the reference value in Table 1 is better.

3.3 Image quality comparisons

With regard to Table 1, 3D Cyclic GAN ( $ℓ_{21}$ ) presented the overall optimal dataset quality with the best distortion-based measure (BRISQUE) and best intensity-wise comparisons [SSQA and IoU(SSQA, SSQA₀)], corroborated by human subjective MOS. For the other two scores, FID and SBR, 3D Cyclic GAN ( $ℓ_{21}$ ) still achieves the second-best performance. In visual inspection of different datasets as shown in Figure 3, 3D Cyclic GAN ( $ℓ_{21}$ ) demonstrates reliable image details that mimic the image conditions in the real dataset. Especially in the background of the images, fewer regions of artifacts or over-smoothness are seen compared to some other synthesized datasets. There are cell drifting problems for all the GAN-generated datasets, but the results of 3D Cyclic GAN ( $ℓ_{21}$ ) are relatively better.

SpCycleGAN achieved the best FID score and comparable BRISQUE and SSQA scores to 3D Cyclic GAN ( $ℓ_{21}$ ), which indicates that the outputs from 2D GAN with the original spatial consistency loss provide decent 2D image intensity-wise and distortion-wise quality conditions. The visual results in Figure 3 also validate the intensity and distortion similarity compared to the real images. SpCycleGAN produces the lowest SBR and MOS, because it cannot preserve the axial signal correspondence along z-axis (Fig. 7) and has more regions of cell signals missing as shown in Figure 3. When this spatial consistency loss is applied in 3D Cyclic GAN, better visual location correspondence, as quantified by SBR, is found in part of the SpCycleGAN 3D outputs, such as the first and last rows in Figure 3. The outputs, however, cannot consistently yield realistic images, where the resultant images are over-smoothed and distorted (see Fig. 3, last row). SpCycleGAN 3D has the lowest mean of SSQA score over all the K image comparisons, but its SSQA frequency does not reflect the similar spread of different image conditions in the real LLSM microscopy image dataset (Fig. 4). Model-based simulation outputs exhibit the best location correspondence in terms of SBR as shown in Figure 7, because the cell signals in the datasets are produced by incorporating the exact locations of cells in labels with theoretical fluorescent emission models, point spread functions, and noise conditions. These simulated datasets are suboptimal, because they cannot mimic the actual intensity and distortion statistics in real datasets with regard to FID, BRISQUE and SSQA.

Fig. 4.

SSQA frequency comparison of different datasets. Histograms of SSQA for all the different outputs are normalized by the total number of K image comparisons to get the frequency of SSQA score in each dataset. The Reference plot indicates the stochastic comparison within the data pool of real images. The more similar the distribution to the Reference, the better relative image quality of the synthetic dataset

Open in new tab Download slide

In summary, 3D Cyclic GAN ( $ℓ_{21}$ ) generates the overall most realistic dataset with qualities better than other GAN outputs in terms of intensity, distortion and location correspondence. Additionally, the proposed 3D Cyclic GAN (⁠ $ℓ_{21}$ ⁠) generated a diverse dataset with a similar mixture of different image conditions as compared to the reference dataset. This diversity in image conditions is observed by the mean and standard deviation values in BRISQUE and SSQA.

3.4 3D Cell segmentation with u-net on real images

After image synthesis, cell segmentation is a key analysis step that can accommodate meaningful cell-level bioinformatics in bacterial biofilm image analysis. In this section, 3D u-net (Çiçek et al., 2016) based experiments are conducted to evaluate the cell segmentation performance involving GAN outputs during the training step.

Training data: The training dataset contains 300 image-and-label sets that include both 3D Cyclic GAN (⁠ $ℓ_{21}$ ⁠) generated outputs and model-based simulations. The size of each image is $128 \times 128 \times 8$ ⁠, and labels are binary (1 for cell foreground). We argue that combining the two different datasets can reinforce the segmentation model to learn more realistic cell foreground in GAN outputs without losing image-to-label location correspondence provided by simulation. SSQA and SBR metrics are utilized to pick out good quality GAN outputs as substitutions for the corresponding simulated images. In the following segmentation experiments, the top 50 SSQA scores with SBR >1.5 in GAN outputs are selected. Other portions of GAN outputs can be tested based on the generated dataset quality.

Training parameters: A general 3D u-net structure is adopted with three stages of downsampling and upsampling pairs. The kernel size is 3³ for both convolution and deconvolution, padded with 1 voxel in all three spatial directions. The stride for convolutions is 2 voxels and 1 voxel for deconvolution operations. Max-pooling with size 2³ is used for downsamplers. A residual function of the input at each stage is also learned along the downsampling path for faster convergence (Milletari et al., 2016). Batch normalization and ReLU function are used after convolutions with sigmoid function in the final layer. The number of features for each stage is 64, 256, 256 and 512. A binary cross entropy loss function is chosen in our segmentation task. Batch size, learning rate and the number of epoch are 1, $1 e^{- 4}$ and 200 respectively.

Segmentation performance: For testing, 10 unseen real biofilm images are manually annotated to validate the semantic segmentation accuracy with 3D u-net. Then, we also applied m-LCuts (Wang et al., 2021) to post-process under-segmented u-net outputs to achieve single-cell segmentation. Two single-cell annotated data from Wang et al. (2021) are used for quantitative comparison. By visual inspection, the segmentation outputs from the GAN-involved training model depict more contour details than the simulation-only trained model (Fig. 5).

Fig. 5.

Qualitative comparison of semantic segmentation results using 3D u-net. The images correspond to one slice in the 3D volume. The last column is the segmentation outputs using a combined training dataset involving 3D cyclic GAN (⁠ $ℓ_{21}$ ⁠) outputs

Open in new tab Download slide

Quantitatively (see Table 2), although the Dice scores that compare binary segmentation results with manual annotations do not indicate an improvement with the modified training dataset, the margin between the two is not large. One important factor of Dice evaluation comes from the difficulty in accurate manual annotation; cell boundaries can be extremely hard to distinguish in images like row 3 in Figure 5. From a different aspect, with no need for manual annotations, training with GAN outputs can produce SBR scores closer to the reference value (⁠ $\sim 1.767$ ⁠). In addition, the preliminary single-cell segmentation results also show that the GAN-involved training dataset provides better single-cell counting accuracy after m-LCuts (Wang et al., 2021). Cell counting accuracy calculates how many cells can match the ground truth cells if the intersection-over-union (IoU) of the two cell volumes is larger than a thresholding value. For the two images shown in Figure 6, the GAN-involved single-cell segmentation performance is better than the other one for all IoU, ranging from 0.1 to 1. The cell counting accuracy at IoU $= 0.3$ is shown in Table 2.

Fig. 6.

Comparison of segmentation performance with single-cell manual annotation from Wang et al. (2021). The images shown are one slice in the 3D volume

Open in new tab Download slide

Table 2.

Open in new tab

Quantitative comparison of 3D u-net outputs on 10 manually annotated images

Training data	Dice	SBR	Cell counting accuracy
Simulation only	0.759	1.918	56.73%/71.26%
Simulation + GAN outputs	0.750	1.779	61.73%/76.19%

Note: Two images that have instance-based manual annotations are evaluated with cell counting accuracy. Bolded values indicate better scores.

Table 2.

Open in new tab

Quantitative comparison of 3D u-net outputs on 10 manually annotated images

Training data	Dice	SBR	Cell counting accuracy
Simulation only	0.759	1.918	56.73%/71.26%
Simulation + GAN outputs	0.750	1.779	61.73%/76.19%

Note: Two images that have instance-based manual annotations are evaluated with cell counting accuracy. Bolded values indicate better scores.

Overall, the above 3D cell segmentation performance indicates that the synthetic biofilm dataset generated by 3D Cyclic GAN (⁠ $ℓ_{21}$ ⁠) can offer an advanced single-cell analysis pipeline for bacterial biofilm images.

3.5 Discussion

The following discussions are to analyze the limitations in the current GAN-based networks for biofilm application, additional model tuning experiments and future directions to improve the GAN outputs for biofilm image studies. Although 3D Cyclic GAN successfully reproduces the real LLSM distributions to generate 3D synthetic datasets, the spatial drifting of each bacterial cell region within the generated images is a limitation of the current GAN-based unpaired biofilm augmentation workflows, as shown in Figure 7. It is worth mentioning that generating a synthetic dataset with unpaired images is challenging in nature, as only about 50% in pixel-wise accuracy was observed in the original unpaired image translation results (Zhu et al., 2017). Additional model tuning experiments aiming to improve the dataset quality in intensity, distortion and spatial consistency were carried out with the presented 3D Cyclic GAN. These efforts include changing the number of layers and parameters in both the generator and discriminator, expanding the training dataset by flipping and cropping, and tuning basic training parameters. For example, we tested learning rate from a scale of $10^{- 2}$ to $10^{- 6}$ ⁠, added and removed up to two layers, and tuned the parameters in the convolutional layers. We also modified the activation layers with a sigmoid function, and varied batch sizes from 1 to 5. None of these trials provided a better synthetic microscopy biofilm output than the current set-up (see Section 2).

Fig. 7.

3D comparison of GAN outputs rendered in ImageJ, which shows the preservation of axial (along z-axis) signal continuity with 3D GANs, as well as problems in location drift and missing cells associated with SpCycleGAN (Fu et al., 2018), SpCycleGAN in 3D, CycleGAN (Zhu et al., 2017) in 3D and the proposed 3D cyclic GAN (⁠ $ℓ_{21}$ ⁠). The ground truth label volume and model-based simulation result are shown for reference

Open in new tab Download slide

Taking advantage of the proposed SSQA, improvements in 3D cell segmentation performance are observed when extending the training dataset of u-net with a combination of model-based simulation and filtered 3D Cyclic GAN (⁠ $ℓ_{21}$ ⁠) outputs. The combined training set strategy aims to maintain high label-to-image structural correspondence as well as fidelity in realistic bioimage appearance. By ranking SSQA scores with the assistance of SBR, the low-quality GAN outputs can be filtered out. Thus, the spatial drifting problem observed in the GAN pipeline is relieved in the application of cell segmentation in bacterial biofilm images. Other future experiments can be explored by adding a small amount of manual annotation of real images to improve the overall GAN location correspondence for the biofilm learning task. Besides, the current biofilm data do not contain enough thick clusters of cells in z-direction that follow the density in CellModeller outputs, so we only selected eight slices of them with a comparable number of cells to label-set images. Collection of more LLSM biofilms data with thicker cluster sizes may be helpful as well.

4 Conclusion

This article explores bioimage synthesis options using GAN to learn and generate densely packed LLSM microscopy 3D biofilm images. A 3D Cyclic GAN with unbalanced cycle consistency loss $ℓ_{21}$ is presented that can provide the preferable synthetic dataset that mimics the realistic image conditions in the real LLSM dataset, concerning image quality assessment on appearance and distortion conditions. 3D GAN models also ensure axial continuity in signals of cell regions along z-direction when compared to 2D GAN outputs. The proposed stochastic SSQA scheme fills an existing gap in evaluating 3D GAN outputs when the corresponding ground truth images are unavailable. With adequate comparisons in terms of the number of patches and images (e.g. 10 000 × 600 in this article), SSQA reveals meaningful trends of intensity-wise cross-dataset quality. Taking advantage of the presented learning and evaluating pipeline, a GAN-augmented training set can further assist bacterial biofilm image analysis tasks, e.g. 3D cell segmentation. When mixing SSQA-filtered GAN outputs and model-based simulation in training, better single-cell informatics are offered than the simulation-only training process, in terms of more realistic signal-to-background mean intensity ratio and higher cell counting accuracy.

Acknowledgements

We thank Prof. Edward J. Delp and his laboratory at Purdue University for providing the code for DeepSynth (SpCycleGAN).

Funding

This work was supported (in part) by the US National Institute of General Medical Sciences [1R01GM139002 to A.G. and S.T.A.].

Conflict of Interest: none declared.

Data Availability

Software and sample data underlying this article are available in https://github.com/jwang-c/DeepBiofilm.

References

Abdollahi

et al. (

2020

) Data augmentation in training deep learning models for medical image analysis. In: Deep Learners and Deep Learner Descriptors for Medical Applications, Springer, Cham, pp.

167

–

180

Abramian

Eklund

(

2019

) Generating fMRI volumes from T1-weighted volumes using 3D CycleGAN. arXiv preprint arXiv:1907.08533.

Bloice

M.D.

et al. (

2019

)

Biomedical image augmentation using augmentor

Bioinformatics

4522

–

4524

Chen

et al. (

2021

) Three dimensional synthetic non-ellipsoidal nuclei volume generation using bezier curves. In: IEEE International Symposium on Biomedical Imaging, Nice, France, pp.

961

–

965

Çiçek

Ö.

et al. (

2016

) 3D u-net: learning dense volumetric segmentation from sparse annotation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Athens, Greece, pp.

424

–

432

Dimitrakopoulos

et al. (

2020

) ISING-GAN: annotated data augmentation with a spatially constrained generative adversarial network. In: IEEE International Symposium on Biomedical Imaging, Iowa City, USA, pp.

1600

–

1603

Dunn

K.W.

et al. (

2019

)

DeepSynth: three-dimensional nuclear segmentation of biological images using neural networks trained with synthetic data

Sci. Rep

–

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

et al. (

2018

) Three dimensional fluorescence microscopy image synthesis and segmentation. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, USA, pp.

2221

–

2229

Gahlmann

Moerner

(

2014

)

Exploring bacterial cell biology with single-molecule tracking and super-resolution imaging

Nat. Rev. Microbiol

–

et al. (

2016

) Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, USA, pp.

770

–

778

Heusel

et al. (

2017

) GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of 31st Conference on Neural Information Processing Systems, Long Beach, USA, pp.

6629

–

6640

Liang

Weller

D.S.

(

2016

)

Comparison-based image quality assessment for selecting image restoration parameters

IEEE Trans. Image Process

5118

–

5130

Lindén

et al. (

2016

)

Simulated single molecule microscopy with SMeagol

Bioinformatics

2394

–

2395

Liu

et al. (

2021a

)

ASIST: annotation-free synthetic instance segmentation and tracking by adversarial simulations

Comput. Biol. Med

134

104501

Liu

et al. (

2021b

)

A survey on applications of deep learning in microscopy image analysis

Comput. Biol. Med

134

104523

Mao

et al. (

2017

) Least squares generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp.

2794

–

2802

Milletari

et al. (

2016

) V-net: Fully convolutional neural networks for volumetric medical image segmentation. In: International conference on 3D vision (3DV). IEEE, Stanford, USA, pp.

565

–

571

Mittal

et al. (

2012

)

No-reference image quality assessment in the spatial domain

IEEE Trans. Image Process

4695

–

4708

Rudge

T.J.

et al. (

2012

)

Computational modeling of synthetic microbial biofilms

ACS Synth. Biol

345

–

352

Sandfort

et al. (

2019

)

Data augmentation using generative adversarial networks (CycleGAN) to improve generalizability in CT segmentation tasks

Sci. Rep

–

Wang

et al. (

2017

) BACT-3D: a level set segmentation approach for dense multi-layered 3D bacterial biofilms. In: IEEE International Conference on Image Processing, Beijing, China, pp.

330

–

334

Wang

et al. (

2021

)

Graph-theoretic post-processing of segmentation with application to dense biofilms

IEEE Trans. Image Process

8580

–

8594

Wang

et al. (

2004

)

Image quality assessment: from error visibility to structural similarity

IEEE Trans. Image Process

600

–

612

Yang

et al. (

2020

)

Unsupervised MR-to-CT synthesis using structure-constrained cycleGAN

IEEE Trans. Med. Imaging

4249

–

4261

Zhang

et al. (

2020

)

Non-invasive single-cell morphometry in living bacterial biofilms

Nat. Commun

–

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Zhang

et al. (

2018

) Translating and segmenting multimodal medical volumes with cycle-and shape-consistency generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, USA, pp.

9242

–

9251

Zhu

J.-Y.

et al. (

2017

) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, pp.

2223

–

2232

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://dbpia.nl.go.kr/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Associate Editor:

Download all slides

Month:	Total Views:
August 2022	41
September 2022	31
October 2022	62
November 2022	28
December 2022	14
January 2023	18
February 2023	44
March 2023	67
April 2023	73
May 2023	63
June 2023	27
July 2023	70
August 2023	67
September 2023	33
October 2023	79
November 2023	97
December 2023	76
January 2024	88
February 2024	80
March 2024	61
April 2024	83
May 2024	66
June 2024	39
July 2024	53
August 2024	44
September 2024	51
October 2024	47
November 2024	68
December 2024	36
January 2025	74
February 2025	53
March 2025	70
April 2025	76
May 2025	36

Article Contents

3D GAN image synthesis and dataset quality assessment for bacterial biofilm

Abstract

1 Introduction

2 Materials and methods

2.1 Training dataset

2.2 Biofilm image synthesis with 3D cyclic GAN

2.2.1 3D GAN architecture

2.2.2 Loss function

2.3 Synthetic bioimage quality assessment with SSQA

3 Implementations and discussions

3.1 GAN training and testing

3.2 Other evaluation metrics

3.3 Image quality comparisons

3.4 3D Cell segmentation with u-net on real images

3.5 Discussion

4 Conclusion

Acknowledgements

Funding

Data Availability

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

Article Contents

3D GAN image synthesis and dataset quality assessment for bacterial biofilm Free

Abstract

1 Introduction

2 Materials and methods

2.1 Training dataset

2.2 Biofilm image synthesis with 3D cyclic GAN

2.2.1 3D GAN architecture

2.2.2 Loss function

2.3 Synthetic bioimage quality assessment with SSQA

3 Implementations and discussions

3.1 GAN training and testing

3.2 Other evaluation metrics

3.3 Image quality comparisons

3.4 3D Cell segmentation with u-net on real images

3.5 Discussion

4 Conclusion

Acknowledgements

Funding

Data Availability

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Looking for your next opportunity?

This Feature Is Available To Subscribers Only

3D GAN image synthesis and dataset quality assessment for bacterial biofilm