A state-of-the-art technique to perform cloud-based semantic segmentation using deep learning 3D U-Net architecture – BMC Bioinformatics

ByZeeshan Shaukat, Qurat ul Ain Farooq, Shanshan Tu, Chuangbai Xiao and Saqib Ali

Jun 24, 2022

In image segmentation, a digital MRI image is partitioned into multiple divided segments with each segment having a distinct property. Traditionally, image segmentation helps in locating objects and boundaries in an image. In brain tumor segmentation, not only the location of the tumors is identified but extensions of the tumor regions including active tumorous tissue, necrotic or dead tissue, and edema (swelling near the tumor) are also detected [22]. Brain tumor segmentation identifies abnormal areas in the brain by comparing them with the normal tissues. Glioblastomas, the most malignant form of tumor infiltrate the neighboring tissues that causes unclear boundaries. Hence, they are hard to differentiate from normal tissues, as a result, multiple image modalities are used to identify glioblastomas. There are three sub categories of brain tumor segmentation based on the involvement of human which includes manual segmentation, semiautomatic segmentation, and fully automatic segmentation [35].

Manual segmentation involves the use of specialized tools by a human expert in drawing and painting the tumor regions and boundaries. Its accuracy depends upon the skills and knowledge of the operator performing it. Despite of the fact that manual segmentation is a laborious and time-consuming process, it is still considered as a gold standard for semi-automatic and fully automatic segmentation. Figure 3 shows workflow of brain tumor segmentation. In semi-automatic segmentation, human expertise and computer programs are combined and an operator is required to initialize the segmentation process and for evaluation of the results. Fully automatic segmentation does not require any human interaction. It involves the use of artificial intelligence in combination with prior knowledge and datasets to solve the segmentation problems [6].

Fully automatic brain tumor segmentation methods are classified into discriminative and generative methods. Discriminative methods usually depend upon supervised learning. Techniques that rely on supervised learning usually involve learning where the relationships between an image and a set of manually annotated data are learned from a huge dataset. In fully automatic image segmentation, machine learning algorithms has gained popularity due to their unmatched performance. Over the past few years, classical machine learning algorithms have been used extensively. However, due to the complexity of the data, classical machine learning techniques are not suitable for most applications [36]. Deep learning methods are becoming more popular due to their ability to learn and improve on complex computer vision tasks. Compared to discriminative methods, generative methods use prior knowledge such as location and size of the healthy tissues and generate probabilistic models [22].

Expertimental setup

Available datasets

Automatic brain tumor segmentation has gained immense popularity in the past few years and there has been an increased interest in performing automatic brain tumor segmentation using publicly available datasets. The benchmark dataset Multi-modal Brain Tumor Image Segmentation (BRATS) [37], developed in 2012 is currently the most common publicly accessible dataset and has emerged to standardize performance evaluation in brain tumor segmentation process. Previously, The Internet Brain Segmentation Repository (IBSR) [38] and the BrainWeb datasets [39] have been used by several researchers in their image processing algorithms. The Reference Image Database to Evaluate Therapy Response (RIDER) [40] is another targeted data collection repository. RIDER neuro MRI contains imaging data of 19 patients with recurrent high-grade glioma and the dataset has been used by researchers in their automatic brain tumor segmentation experiments.

BRATS challenge contains datasets of four modalities T1, T1c, T2 and Flair belonging to both high-grade and low-grade gliomas. Initially, BRATS dataset contained only 30 MRI scans of glioma patients but the number grew substantially over the next few years. Medical Segmentation Decathlon (MSD) [41] is another challenge that provides a relatively larger dataset for brain tumor segmentation and can offer a wide range of modalities. It is actually a subset of the data of BRATS 2016 and 2017 challenge. It offers 750 multiparametric magnetic resonance images (mp-MRI) of both high and low-grade gliomas. The Decathlon challenge contains ten publicly available datasets that belongs to different regions of human body including brain, heart, hippocampus, liver, lung, pancreas, prostate, colon, hepatic vessel and spleen.

Dataset parameters for this study

We used the BraTS brain tumor dataset for training and validation. The size of the dataset was approximately 7 GB which contains 750(484 training volumes with voxel labels and 266 test volumes without labels) MRI scans of brain tumors as defined in Table 2, namely gliomas, having 4-D volumes, which represents 3-D images stack. Each 3-D volumetric image has dimensions 240(height) × 240(width) × 155(depth) × 4(different scan modalities). 484 training volumes were further divided into three independent sets that was used for training, testing and validation. Figure 4 showing volumetric image from dataset showing ground truth left as well labeled pixel on the right while Fig. 5 shows four different labeled training volumes.

Experiment environment

We used Microsoft Azure Cloud Virtual Machine to run our experiment as it provides a low latency, high-throughput network interface optimized for tightly coupled parallel computing workloads. A CUDA Capable GPU is required for performing semantic segmentation of the image volumes. So, we choose N-series virtual machines as its ideal for compute and graphics-intensive workloads, like high-end remote visualization, deep learning, and predictive analytics, detailed experimental specifications are defined in Table 3. NC-series virtual machines feature the NVIDIA Tesla accelerated platform K80 GPU which dramatically lowers data center costs by delivering exceptional performance with fewer, more powerful cloud servers. It’s engineered to boost throughput in real-world applications by 5-10x, while also saving customers up to 50% for an accelerated data center compared to a CPU-only system.

Training and validation

Preprocessing

To efficiently train 3-D U-Net network, we need to preprocess the MRI dataset to crop it to a region primarily containing the brain and tumor. Cropping reduces the size of data as it only retains the critical part of each MRI volume and its corresponding labels. Each volume modality independently normalized by subtracting the mean and dividing by the standard deviation of the cropped region. Then the training volumes was further split into 55 test sets, 400 training sets, and 29 validation sets.

Random patch extraction

Extracting Random Patches to prevent running out of memory is a common technique while training with large volumes as shown in Fig. 6. Use a random patch extraction datastore (specifications in Table 4) to feed the training data to the network and to validate the training progress. This datastore extracts random patches from ground truth images and corresponding pixel label data.

To make the training more robust, we used a function to augment 3D Patch which randomly reflects and rotates the training data. As time progresses, to evaluate whether the network is continuously learning, underfitting, or overfitting we used validation data.

3-D U-Net layers set up

This study practices discrepancy of the 3-D U-Net network as in U-Net, the preliminary sequences of convolutional layers (CL) are intermingled with max pooling layers, consecutively reducing the resolution of the input image. These layers are trailed by a sequence of convolutional layers intermingled with upsampling operators, consecutively increasing the resolution of the input image. The zero padding convolutions returns the same output size as of input.

Deep Learning 3-D U-Net using following layers:

• 3-D image input layer

• 3-D convolution layer for convolutional neural networks

• Batch normalization layer

• Leaky rectified linear unit layer

• 3-D max pooling layer

• Transposed 3-D convolution layer

• Softmax output layer

• Concatenation layer

The first, image Input 3d Layer, operates on image patches of size 64 × 64 × 64 voxels. The image input layer in 3-D U-Net is trailed by the contracting path, which consists of three encoder modules. Each encoder contains two convolution layers with 3 × 3 × 3 filters that double the number of feature maps, followed by a nonlinear activation using reLu layer. The first convolution is also followed by a batch normalization layer. Each encoder ends with a max pooling layer that halves the image resolution in each dimension.Unique names assigned to all the layers in the network.

For example, “en1” denotes the first encoder module and “de4” denotes the fourth decoder module. Where “en” denotes to encoder and “de” denotes to decoder while 1 and 4 are corresponding index to that module.

The expanding path were created of the 3-D U-Net which consists of four decoder modules as shown in Fig. 7, while Fig. 8 shows 3D U-Net Deep Network diagram we used to train the system. The result analysis is given in Table 5 below. All decoders comprise of two convolution layers with same filters as of encoder that halve the number of feature maps, trailed by a nonlinear activation using a reLu layer. The first three decoders conclude with a transposed convolution layer that upsamples the image by a factor of 2. The final decoder includes a convolution layer that maps the feature vector of each voxel to the classes.

The concluding decoder consist of a convolution layer that maps the feature vector of each voxel to each of the two classes (background and tumor region). The custom Dice pixel classification layer weights the loss function to increase the effect of the small tumor regions on the Dice score.

Dice loss is calculated by using Sørensen-Dice similarity coefficient which measures the overlap between two segmented volumes. The general Dice loss L between one image Y and the corresponding T (ground truth) is given by

$$L = 1 – frac{{2sumnolimits_{k = 1}^{K} {w_{k} sumnolimits_{m = 1}^{M} {Y_{km} T_{km} } } }}{{sumnolimits_{k = 1}^{K} {w_{k} sumnolimits_{m = 1}^{M} {Y_{km}^{2} + T_{km}^{2} } } }}$$

(1)

where M is the number of elements along the first two dimensions of Y(image), K is the number of classes and Wk is a class specific weighting factor that controls the influence each class makes to the loss. Wk is characteristically the opposite area of the expected region:

$$W_{k} = frac{1}{{left( {sumnolimits_{{m = 1}}^{M} {T_{{km}} } } right)^{2} }}$$

(2)

This weighting used to reduce the influence of larger regions on the Dice score making it easier for the network to learn how to segment smaller regions. Concatenatin were done using input layer and encoder modules with the fourth decoder module. While other decoder modules were added as separate branches to layer graph. Concatenation Layers were used to connect the second reLu layer of each encoder module with a transposed convolution layer of equal size from a decoder module. The output of each concatenation layer were connected to the first convolution layer of the decoder module.

To effectively train the system “Adam” optimization solver was used with following hyperparameters shown in Table 6.

Methemtical experenseion of algorithm to effectively train the system can be defined as.

$$m_{t} = beta_{1} m_{t – 1} + (1 – beta_{1} )left[ {frac{delta L}{{delta w_{t} }}} right]v_{t} = beta_{2} v_{t – 1} + (1 – beta_{2} )left[ {frac{delta L}{{delta w_{t} }}} right]^{2}$$

(3)

In Eq. 3 mt describes aggregate of gradients at time t, Vt denotes sum of square of past gradients. While Wt is weights at time t, ∂L is derivative of Loss Function and ∂Wt is derivative of weights at time t, β denotes Moving average parameter, ϵ is equal to a small positive constant.”