# Clinical and phantom validation of a deep learning based denoising algorithm for F-18-FDG PET images from lower detection counting in comparison with the standard acquisition – EJNMMI Physics

May 11, 2022

### Phantom

Three phantom experiments were realized using a NEMA IEC phantom with 3 different acquisition conditions and acquired on our 3 different General Healthcare PET/CT systems (Discovery MI 4 rings, Discovery 710 and Discovery IQ 4 rings).

For the first and second experiments (E1 & E2), the phantom was equipped with a set of 6 fillable spheres (inner diameters/volumes of 10 mm/0.52 mL, 13 mm/1.15 mL, 17 mm/2.57 mL, 22 mm/5.58 mL, 28 mm/11.5 mL, 37 mm/26.5 mL). For E1, 2 syringes of approximately 20 MBq and 10 MBq of F-18-DG (calibrated at the acquisition time) were prepared. The first syringe was injected into the phantom tank filled with water, and the second, diluted on 1 L of water and then used to fill the spheres. This yields a contrast ratio between spheres and background of 5:1 and a background activity of 2 MBq/kg. At the end of E1, approximately 20 MBq of F-18-FDG were reinjected in the phantom background to obtain a contrast ratio of 2:1 (E2).

For the third experiment (E3), the phantom was equipped with a set of 4 micro fillable spheres (inner diameters/volumes of 5.94 mm/31 µL, 6.95 mm/63 µL, 8.23 mm/125 µL, 9.86 mm/250 µL) and the 2 smallest spheres of the standard set mentioned above. The same procedure was applied to reach a final contrast of 5:1.

For each experiment, the phantom was centered in the field-of-view and a list-mode acquisition over one bed-position was performed, allowing the reconstruction of different acquisition durations: the regular clinical duration (PET100), one-half (PET50) and one-third (PET33) and of the regular clinical duration (cf. Table 1). This method allows to simulate a PET tracer dose reduction retrospectively, with resulting simulated low-dose images having equivalent characteristics with PET images actually measured at lower doses [25].

The raw data were reconstructed according to the routinely used OSEM or BPL protocols (cf. Table 1) and for the 1/2 and 1/3 acquisition duration, post-processed with SubtlePET (named PET50 + SP and PET33 + SP).

We used PET/CT images from the first 110 patients who agreed to participate to this study. Those patients benefited from PET examination addressed for various pathologies (oncology or internal medicine representative of the clinical activity) during October 2020. All patients were informed that their data were fully anonymized for research purposes and gave their approval (IRB approval was obtained for this study). PET/CT scans were acquired 60 min after the injection of 3 MBq/kg of F-18-FDG, with an acquisition time varying from 1.5 min/bed position for the DMI4 to 2 min/bed position for D710 and DIQ4 (cf. Table 1). All the PET raw data were natively acquired in the list-mode format, allowing the retrospective reconstruction of lower time/dose-equivalent sinograms. Given the results observed in the phantom experiments regarding the loss of information with PET33 + SP reconstructions, only a subpopulation of 30 patients were reconstructed with a 66% time lowering (PET33) and post-processed with SubtlePET (PET33 + SP) to evaluate the qualitative improvement achieved by SubtlePET. For the whole population, a 50% time lowering was studied (PET50) and enhanced with SubtlePET (PET50 + SP). For the 20 patients with a body mass index (BMI) > 30 kg/m2, the SubtlePET algorithm was also applied on the full time acquisition (PET100 + SP) to evaluate the interest of SubtlePET on noisier images. As for the phantom experiments, the reconstruction set-up depends on the PET system used (cf. Table 1).

Except for disease-free patients, one hypermetabolic lesion was delineated on each patient by an experienced nuclear physician. In order to be representative, the choice was made to select different types of lesions: primary or metastatic, small sub-centimetric or larger, homogeneous or heterogeneous, low or high uptake etc.…) from different organs among 60 patients.

### SubtlePET algorithm

SubtlePET uses a 2.5D encoder–decoder U-Net deep convolutional neural network to perform denoising. The software takes a low count PET image (from shorter scan or lower dose) as input and generates a high quality PET image (close to full dose image) as output. It employs a convolutional neural network (CNN)-based method in a pixel’s neighborhood to reduce noise and increase image quality. Using a residual learning approach and optimized for quantitative (L1 norm) as well as structural similarity (SSIM), the software learns to separate and suppress the noise components while preserving and enhancing the structural components.

The networks were trained with paired low- and high-count PET series coming from a wide range of clinical indications and patient BMI and from a large variety of PET/CT and PET/MR devices (10 General Electric, 5 Siemens and 2 Philips models). The training data included millions of paired image patches derived from hundreds of patient scans with multi-slice PET data and data augmentation. All the training PET data was acquired in the USA or Canada with the average injected FDG dose ~ 6 MBq/kg and acquisition time per bed of 2–3 min/bed. For the training regime, low count data was either retrospectively reconstructed or prospectively acquired at 1/4th the acquisition time or dose (i.e., 1.5 MBq/kg at 2–3 min/bed or 6 MBq/kg at 30–45 s/bed).

### Image analysis

#### Quantification

On the phantom experiments, spherical volumes of interest (VOIs) were manually drawn to enclose each visible sphere and on the background (10 cm3 spherical VOI located in the central part of the phantom) to measure quantitative parameters. On the patient analysis, the lesions quantitation was measured, using automatic segmentation tools proposed on the AWServer workstation (GE Healthcare, Milwaukee, USA). The background region was proximally defined for each lesion. In addition, a VOI of approximately 6 cm3 was also defined on an hepatic healthy region when applicable. For the phantom or the patients, each VOI was perfectly cloned on every sequence (all reconstructions and all acquisition statistic) to get the measurements on the exact same location and prevent any intra-operator variability.

For each VOI, the SUVmax, SUVmean, SUVpeak and standard deviation (SD) were recorded to derive: the sphere contrast recovery coefficient (CRC) for the phantom data only, the contrast to noise ratio (CNR) and the background variability (BV) by using:

$${text{CRC}} = frac{{frac{{{text{SUV}}_{{{text{max}}}} {text{ in sphere}}}}{{{text{SUV}}_{{{text{mean}}}} {text{ in backgrcound}}}}}}{{frac{{text{Activity concentration in sphere}}}{{text{Actity concentration in background}}}}};;{text{CNR}} = frac{{frac{{{text{SUV}}_{{{text{max}}}} {text{ in sphere or lesion}}}}{{{text{SUV}}_{{{text{mean}}}} {text{ in backgrcound}}}}}}{{{text{SUV}}_{{{text{SD}}}} {text{ in Background}}}};{text{and}};{text{BV}} = frac{{{text{SUV}}_{{{text{SD}}}} {text{ in Background}}}}{{{text{SUV}}_{{{text{mean}}}} {text{ in backgrcound}}}} times 100$$

We also calculated the percentage variation of SUVmax, SUVpeak, CRC, CNR and BV (ΔSUVmax, ΔSUVpeak, ΔCRC, ΔCRC and ΔBV respectively) regularly used in clinical practice, between the SubtlePET-enhanced images (PET50 + SP, PET33 + SP, PET100 + SP) and the standard PET100 images.

We studied the correlation between BV and SUV variations (ΔBV and ΔSUV) as a function of the patient BMI to evaluate the efficiency of SubtlePET on patient with noisier images.

Additionally, quantitative image quality metrics like peak signal to noise ratio (PSNR) and structural similarity index (SSIM) were also calculated between the regular duration PET scan (PET100) and the faster PET processed and unprocessed series to assess for the presence or the absence of absolute errors (data loss, corruption, alteration, or exaggeration).

In complement to the quantitative analysis, 2 senior nuclear medicine physicians independently realized a qualitative evaluation of the overall quality of the image, considering a 3 point-scale: (1) insufficient quality for image interpretation; (2) insufficient quality, with noise or heterogeneity but acceptable for interpretation; and (3) image of good quality for optimal interpretation. At the end of their evaluation, in case of disagreement on image quality rating, a joint analysis was performed. Finally, for the PET50 + SP versus PET100 images, the evaluation of quality was summarized by the question: “Would my report have changed considering the PET50 + SP instead of PET100 images?”. To that end, PET100 and PET50 + SP series were presented side by side to each physician independently. All the images were evaluated in one session and there was no waiting period between different images.

### Statistical analysis

We compared the phantom and patient’s quantitative data (SUVmax, SUVmean, ΔSUVmax, ΔSUVmean, ΔCRC and ΔCNR) using a Student paired t test, with p values lower than 0.05 considered as statistically significant.

The comparison of lesion detectability and quality between PET100 and SubtlePET-enhanced images was evaluated by calculating the kappa coefficients for each observer.

All statistical tests were realized with MedCalc 13.1.2.0 and graphs and plots with Excel 2016.