Purpose: To demonstrate the application of generative deep learning in visually representing what neural networks learn. The specific example explored is the prediction of B-type natriuretic peptide (BNP) levels as a marker of heart failure from chest radiographs.
Methods and Materials: 106,290 chest radiographs from 48,532 unique patients were extracted from a PACS database for this ethics approved study. 7,390 radiographs had a corresponding BNP result within 36 hours.The remainder 98,900 did not have a corresponding BNP result and formed the unlabelled dataset. The unlabelled dataset was used to create a generative model to convert radiographs to and from a low dimensional latent space. Simulated reconstructions for chest radiographs with high BNPs were then performed to visualise the same radiograph if it had a normal BNP. These were used to obtain a visual rationale which represent features the model has identified as important in this specific predictive task.
Results: A linear regression shows that the latent representations are logarithmically correlated with the accompanying BNP result. At a cut-off BNP value of 100 ng/L, the linear classifier obtained an AUC of 0.82. Qualitative assessment of the generated visual rationales confirms that the algorithm learns radiographic findings compatible with traditional human teaching. It also allows identification of unexpected features.
Conclusion: Generative learning can be applied to create visual representations of features learned by an algorithm. These visual representations could improve radiologists’ understanding of deep learning algorithms and help enable the safe and appropriate application of deep learning in medicine.
Purpose: Chest x-rays are widely used to identify pulmonary consolidation because they are highly accessible, cheap and sensitive. Automating the diagnosis in chest x-rays can reduce diagnostic delay, especially in resource limited settings.
Methods and Materials: Anonymised dataset of 423,218 chest x-rays with corresponding reports (collected from 166 centres across India spanning 22 x-ray machine variants from 9 manufacturers) is used for training and validation. x-rays with consolidation are identified from their reports using natural language processing techniques. Images are preprocessed to a standard size and normalised to remove source dependency. These images are trained using deep residual neural networks. Multiple models are trained on various selective subsets of the dataset along with one model trained on entire data set. Scores yielded by each of these models is passed through a 2-layer neural network to generate final probabilities for presence of consolidation in an x-ray.
Results: The model is validated and tested on a test dataset that is uniformly sampled from the parent dataset without any exclusion criteria. Sensitivity and specificity for the tag has been observed as 0.81 and 0.80, respectively. Area under the Receiver Operating Curve (AUC-ROC) was observed as 0.88.
Conclusion: Deep learning can be used to diagnose pulmonary consolidation in chest x-rays with models trained on a generalised dataset with samples from multiple demographics. This model performs better than a model trained on controlled dataset and is suited for a real world setting where x-ray quality may not be consistent.
Purpose: Chest X-rays(CXR), being highly sensitive, serve as a Screening tool in TB diagnosis. Though there are no classical features diagnostic of TB on CXR, there are a few patterns that can be used as supportive evidence. In Resource limited settings, developing Deep Learning algorithms for CXR based TB screening, could reduce diagnostic delay. Our algorithm screens for 8 abnormal patterns(TB tags)- Pleural effusion, blunted CP, Atelectasis, Fibrosis, Opacity, Nodules, Calcification and Cavity. It reports 'No Abnormality Detected' if none of these patterns are present on CXR.
Methods and Materials: An anonymized dataset of 423,218 CXRs with matched radiologist reports across (22 models, 9 manufacturers, 166 centres in India) was used to generate training data for the deep learning models. Natural Language Processing techniques were used to extract TB tags from these reports. Deep learning systems were trained to predict the probability of the presence/absence of each TB tag along with heat-maps that highlight abnormal regions in the CXR for each positive result.
Results: We validated the screening algorithm on 3 datasets external to our training set- two public datasets maintained by NIH(from Montgomery and Shenzen) and a third from NIRT, India. The Area under the Receiver Operating Curve (AUC-ROC) for TB prediction was 0.91, 0.87 and 0.83 respectively.
Conclusion: Training on a diversified dataset enabled good performance on samples from completely different demographics. After further validation of it's robustness against variation, the system can be deployed at scale to improve the current systems for TB screening significantly
Purpose: In spine pathologies, the proper detection and identification of vertebrae is a need to perform diagnosis. This is a manual task that delays radiologists’ workflow. The main goal of this work is to assist radiologists by detecting and identifying automatically vertebrae in spine CT scans.
Methods and Materials: To locate the spine centerline, a detection of the spinal canal was done by using a thresholding and a dilation using a cylindrical structuring element. Once the spine was located, the position of each vertebral body was detected and identified. This procedure was divided into 4 steps. First each axial 2D image was classified into 4 different regions: upper-thoracic, lower-thoracic, lumbar or sacrum. Thereafter, for each region, axial 2D images were classified into vertebra or non-vertebra. All these classifiers were designed using a pre-trained Convolutional Neural Network (CNN) to extract characteristics of each image followed by a Support-Vector-Machine (SVM) classifier and distribute them into the different groups. Finally, the centroid of each vertebra was calculated and identified.
Results: The mean localization error (distance between the estimated and real vertebra position) obtained was 7,37±8,69mm on the thoracic region and 9,21±8,70mm in the lumbar region; and the identification rate obtained was 89,20% in the thoracic region and 92,90% in the lumbar zone.
Conclusion: By combining deep learning with image processing techniques it is possible to automatically locate and identify vertebrae in spine CT scans. CNN are a good candidate to extract complex characteristics from the images in order to classify radiological images.
Purpose: The standard diagnostic protocol for Lung Cancer involves an FDG-PET/CT to stage the patient. Both over- and undertreating the stage lead to drastic consequences for the patient’s health and comfort. The full examination of these images takes up to 90 minutes per patient and requires the expertise of both a radiologist and a nuclear medicine physician. We aim to support physicians by automatically and quantitatively evaluating this reading and interpretation process.
Methods and Materials: Using a retrospective study with 179 patients, we collect the first PET-CT scan, manually annotate the suspicious lesions, and record the radiological TNM stage. We train a deep neural network using augmentation including transformed versions of the original images to avoid overfitting.
Results: The accuracy of the network was evaluated by using a validation group. The accuracy of the network was for very high: over 90% of lesions area was overlapping and a misclassification rate of 20%.Once the network was trained, detecting the suspicious lesions and computation of stage for a new patient can be done in a few seconds.
Conclusion: Staging and reading PET-CT images is a challenging task with large degrees of heterogeneity in both the patients and images. Given this heterogeneity, this initial study shows great promise in terms of speed and accuracy of the proposed set-up and could make this a valuable asset in the clinical environment and clinical trials.
Purpose: To develop a deep learning method for assisting radiologists in the discrimination between distal ureteral stones and pelvic phleboliths in thin slice CT images, and to evaluate whether this differentiation is possible using only local features.
Methods and Materials: A limited field-of-view image data bank was retrospectively created, consisting of 5x5x5cm selections from 1-mm-thick unenhanced CT images centred around 218 pelvis phleboliths and 267 distal ureteral stones in 336 patients. 50 stones and 50 phleboliths formed a validation cohort and the remainder a training cohort. Ground truth was established by a radiologist using the complete CT examination during inclusion. The limited field-of-view CT stacks were independently reviewed and classified as containing a distal ureteral stone or a phlebolith by seven radiologists. Each cropped stack consisted of 50 slices (5x5cm field-of-view) and was displayed in a standard PACS reading environment. A convolutional neural network using three perpendicular images (2.5D-CNN) from the limited field-of-view CT stacks was trained for classification.
Results: The 2.5D-CNN obtained 89% accuracy (95% confidence interval 81%-94%) for the classification in the unseen validation cohort while the accuracy of radiologists reviewing the same cohort was 86% (range 76%-91%). There was no statistically significant difference between 2.5D-CNN and radiologists.
Conclusion: The 2.5D-CNN achieved radiologist level classification accuracy between distal ureteral stones and pelvic phleboliths when only using the local features. The mean accuracy of 86% for radiologists using limited field-of-view indicates that distant anatomical information that helps identifying the ureter’s course is needed.
Purpose: Radioembolisation in the liver requires the computation of organ and tumor volumes for dosimetry planning. The purpose of this study was to investigate the accuracy of a new, fully-automated deep learning approach for liver segmentation in MRI data.
Methods and Materials: Images of 70 patients who underwent MRI to evaluate liver tumors were retrospectively analyzed. Data was acquired on two different scanners (Signa HDxt or Discovery 750, GE) applying a volume sequence and Gd-EOB-DTPA as contrast agent. Segmentation was performed on late phase images with a slice thickness of 2-3 mm.
Ground truth liver contours were accurately defined by experienced technicians using contouring and interpolation software and checked by a radiologist. Additionally, one radiologist and two residents defined routine segmentations for 28 patients scheduled for radioembolisation. A deep learning network was trained on the remaining 42 data sets and automatic organ boundaries were computed for the radioembolisation cases.
Results: Compared to the ground truth, relative volume error was 5.37%±4.38%, 3.63%±4.57% and 4.33%±4.62% for the three clinical users, and 6.95%±5.73% for the deep learning approach. DICE coefficient was 0.94±0.02, 0.95±0.02, 0.95±0.02, and 0.92±0.04 for the automatic method. Interactive ground truth segmentation took 24±8 minutes compared to approximately 10±4 minutes by the clinicians, and 20±7 seconds for the deep learning method.
Conclusion: Automated deep learning based liver segmentation in MRI data of radioembolisation patients was much faster with sufficient accuracy compared to interactive contouring and interpolation. The new method will support automated tumor load computation based on MRI in the future.
Purpose: MRI is commonly used to non-invasively evaluate location, size, spread, oedema and the biological status of glioblastoma (GBM). Automatic, reliable and reproducible tumour segmentation can enable volumetric response assessment and effective integration of volumetric multimodal tumour characterization. The objective of this study was to apply a state-of-the-art deep learning algorithm for fully automatic GBM-compartment segmentation on clinical routine data from multiple centres, and to compare the results to ground truth manual expert segmentation.
Methods and Materials: 64 patients from 15 institutions with newly diagnosed, primary GBM were included. T1, T2, FLAIR and contrast-enhanced (CE) T1 sequences (Philips and Siemens scanners) underwent preprocessing and were fed to a deep-learning model based on DeepMedic. This model was trained on 220 cases of the BRATS database. Acquired segmentations were compared to manual segmentations of whole tumour (WT), necrosis (NC), and contrast-enhanced tumour (CET) performed by experienced radiologists as part of this study.
Results: The CET and WT were automatically detected in all patients. The automatic/manual segmentation dice similarity coefficients (DSC) were 0.86±0.09 (WT), 0.78±0.15 (CET), 0.62±0.30 (NC). For NC we found a correlation (R=-0.622, p<0.01) between surface-to-volume ratio and DSC. No correlation was found between resolution and DSC.
Conclusion: The proposed approach is robust on routine clinical data and has high accuracy comparable to interrater variability. This makes it a suitable building block for automatic tumour segmentation reviewed by the radiologist in pre-operative characterisation of GBM. Training the network on data with small/complex shape tumours might further increase the NC segmentation accuracy.
Purpose: To develop and validate a deep neural network-based algorithm for automated, rapid and accurate detection from head CT for the following haemorrhages: intracerebral (ICH), subdural (SDH), extradural (EDH) and subarachnoid (SAH).
Methods and Materials: An anonymised database of head CTs was searched for non-contrast scans which were reported with any of ICH, SDH, EDH, SAH and those which were reported with neither of these. Each slice of these scans is manually tagged with the haemorrhages that are visible in that slice. In all, 3040 scans (116227 slices) were annotated, of which number of scans(slices) with ICH, SDH, EDH, SAH and neither of these are 781(6957), 493(6593), 742(6880), 561(5609) and 944(92999), respectively. Our deep learning model is a modified ResNet18 with 4 parallel final fully connected layers for each of the haemorrhages. This model is trained on the slices from the annotated dataset to make slice-level decisions. Random forests are trained with ResNet's softmax outputs for all the slices in a scan as features to make scan-level decisions.
Results: A different set of 2993 scans, uniformly sampled from the database without any exclusion criterion, is used for testing the scan-level decisions. Number of scans with ICH, SDH, EDH and SAH in this set are 123, 58, 41 and 62, respectively. Area under the receiver operating curve (AUC) for scan-level decisions for ICH, SDH, EDH and SAH are 0.91, 0.90, 0.90 and 0.90, respectively. Algorithm takes <1s to produce the decision for a scan.
Conclusion: Deep learning can accurately detect intra- and extra-axial haemorrhages from head CTs.
Purpose: Features of generalised cerebral atrophy on brain CT images are the marker of neurodegenerative diseases of the brain. Our study aims at automated diagnosis of generalised cerebral atrophy on brain CT images using deep neural networks thereby offering an objective early diagnosis.
Methods and Materials: An anonymised dataset containing 78 head CT scans (1608 slices) was used to train and validate a skull-stripping algorithm. The intracranial region was marked out slice by slice in each scan. Then a U-Net-based deep neural network was trained on these annotations to strip the skull from each slice. A second anonymised dataset containing 2189 CT scans (231 scans with atrophy) was used to train and validate an atrophy detection algorithm. First, an image registration technique was applied on the predicted intracranial region to align all scans to a standard head CT scan. The parenchymal and CSF volume was calculated by thresholding Hounsfield units from the intracranial region. The ratio of CSF volume to parenchymal volume from each slice of the aligned CT scan and the age of the patient were used as features to train a random forest algorithm that decides if the scan shows generalised cerebral atrophy.
Results: An independent set of 3000 head CT scans (347 scans with atrophy) was used to test the algorithm. Area under the receiver operating curve (AUC) for scan-level decisions is 0.86. Predictions on each patient takes time < 45s.
Conclusion: Deep convolutional networks can accurately detect generalised cerebral atrophy given a CT scan.