Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Le Lu, Ph.D.
Joint work with Holger R. Roth, Hoo-chang Shin, Ari Seff, Xiaosong Wang,
Mingchen Gao, Isabella Nogues, Ronald M. Summers
Radiology and Imaging Sciences, National Institutes of Health Clinical Center
le.lu@nih.gov
Application Focus: Cancer Imaging
American Cancer Society: Cancer Facts and Figures 2016. Atlanta, Ga: American Cancer
Society, 2016. Last accessed February 1, 2016.
http://www.cancer.gov/types/common-cancers
Overview: Three Key Problems (I)
Lymph node, colon polyp, bone lesion detection using Deep CNN + Random View
Aggregation (http://arxiv.org/abs/1505.03046, TMI 2016a; MICCAI 2014a)
Empirical analysis on Lymph node detection and interstitial lung disease (ILD)
classification using CNN (http://arxiv.org/abs/1602.03409, TMI 2016b)
Deep segmentation on pancreas and lymph node clusters with HED (Holistically-
nested neural networks, Xie & Tu, 2015) as building blocks to learn unary
(segmentation mask) and pairwise (labeling segmentation boundary) CRF terms +
spatial aggregation or + structured optimization (The focus of MICCAI 2016
submissions since this is a much needed task Small datasets; (de-)compositional
representation is still the key.)
CRF: conditional random fields
(+ parts of Abd.)
Previous work mostly use direct 3D image feature information from CT volume.
The state-of-the-art approaches [4,5] employ a large set of boosted 3D Haar
features to build a holistic detector, in a scanning window manner.
Curse of dimensionality leads to relatively poor performance [Lu, Barbu, et al.,
2008].
Axial
Coronal
Sagittal
Note that a unified, compact HOG model is trained, regardless of axial, coronal, or
sagittal views, or unifying view orientations.
Lymph Node Detection FROC Performance
Lymph Node Detection FROC Performance
Enriching HOG descriptor with other image feature channels, e.g., mid-level semantic
contours/gradients, can further lift the sensitivity for 8~10%!
About 1/3 FPs are found to be smaller lymph nodes (short axis < 10 mm).
Make Shallow to Go Deeper via Mid-level Cues?
[Seff et al. MICCAI 2015]
Six-fold cross-valdiation FROC curves are shown for the two target regions
Classification
A linear SVM is trained using the new feature set; A HOG cell size of 9x9
pixels gives optimal performance.
Separate models are trained for specific LN size ranges to form a mixture-of-
templates-approach (see later slide)
Table reproduced from Table 3, Feulner et al., Lymph node detection and segmentation in chest CT data
using discriminative learning and a spatial prior, Medical image analysis, 17(2): 254-270 (2013). Note that
Barbu et al. (2010) is not directly comparable to other papers since Axillary lymph nodes are easier to detect.
Generalizable? Colon CADe Results using a deeper CNN on
1186 patients (or 2372 CTC volumes) [Roth et al., TMI 2016]
Particularly, we present
Evaluation of different CNN architectures ranging from 5 thousand to
160 million parameters with various of depths of CNN layers;
Impacts on performance given datasets of different scales and spatial
image contexts;
When transfer learning from pre-trained ImageNet CNN models via
fine-tuning can be helpful and why?
Problem 1: Lymph node detection in CT using three-
orthogonal views + random sampling + multi-scale
Problem 2.b: Slice based ILD Classification in CT, thick
sliceness, no Lung segmentation
Problem 2.b: Patch 32x32 based ILD Classification in CT,
all previous work using this protocol, manual ROI reqed
Observations & Directions
2. The tradeoff of better learning models versus more training datasets [29]
should be thought carefully for finding an optimal solution of any CADe
problem (e.g., mediastinal and abdominal LN detection).
3. The Datasets can be the bottleneck to further advance the field of CADe.
Building progressively growing (in scales) well annotated datasets is at
least with the same importance of developing new algorithms.
As an analogy in computer vision, Scene Recognition problem has made
tremendous progress, thanks to the steady and continuous development of
Scene-15, MIT Indoor-67, SUN-397 and Place datasets [36], .
4. Transfer learning from the large scale annotated natural image datasets
(ImageNet) to CADe problems is validated to be consistently beneficial in
our experiments. This sheds some light on cross-datasets CNN learning in
medical image domain, e.g., the union of ILD [20] and LTRC datasets [38]
as suggested in this paper.
[Farag et al., arXiv-1407.8497, 2014; Roth et al., arXiv-1504.03967; Roth et al., MICCAI 2015]
(II) Candidate Region Generation (Hand-crafted
Image Features + RF) [Farag et al., arXiv-1407.8497]
Zoom-out
P-ConvNet: Deep Patch Classification
holger.roth@nih.gov
Ground truth Random Forest 2.5D Patch ConvNet prob.
R2-ConvNet: Regional ConvNet
~68%
Dice
score
~27% ~57%
Dice Dice
score score
Training & Testing Performance (4-fold Cross-
Validation)
3/24/2015
holger.roth@nih.gov
Probability maps thresholded at p0=0.2, p1=0.5, and p2=0.6, calibrated in training and applied
on testing. 43
Dice coefficients: 84.2% (+/- 3.6%) in Training and 75.8% (+/-5.4%) in Testing (more stable by
std values)
4-fold CV Performance
DSC=82.7%.
mm mm mm mm
mm mm mm mm
mm mm mm mm
mm mm mm mm
mm mm mm mm
Mean 0.936 mm
Std 0.586 mm
Min 0.297 mm
Max 2.204 mm
(III) Interleaved Text/Image Deep Mining on a Large-Scale Radiology
Database (780K/60K patients) for Automated Image Interpretation
Hoo-Chang Shin, Le Lu, Lauren Kim, Ari Seff, Jianhua Yao, Ronald M. Summers, IEEE
Conf. CVPR 2015, to appear; JMLR on large scale health informatics issue (in submission)
Interleaved Text/Image Deep Mining on a Large-Scale Radiology Database
Example words embedded in the vector space using Open Source RNN based Google Word-
to-Vector modeling (visualized on 2D), trained from 1B words in 780K radiology reports and
0.2B from OpenI:an open access biomedical image search engine; http://openi.nlm.nih.gov .
Interleaved Text/Image Deep Mining on a Large-Scale Radiology Database
Interleaved Text/Image Deep Mining on a Large-Scale Radiology Database
Randomly
Shuffled Images
Image Clusters NLP on text Yes for Each Iteration
with semantic text reports for each
labels Cluster Train 70% Val 10%
Test 20%
CNN Models and Feature Encoding
4 7
1 5 1 2 5
6
4 0 5 6 5
22 25 60 64 141 174 40 129 195 26 72 200 205 230 253 23 75 233 41 104 166 246 81 84 179 224 259
With Radiologist-in-the-loop
Protocol to build an annotated
Large-scale Radiology Image
Database Flickr 30K, MS
COCO ?
Take Home Messages
1. High performance CAD systems can be build using Stratified, Heterogeneous
Cascade or Stacking; progressively pruning from large dimensional model state
spaces approaches to handle the unbalanced negative learning challenge (negatives
need to be approximately sampled).
2. Full 3D approaches may capture more holistic patterns but can be very challenging to
be effectively/compactly trained, even by modern learning systems not always
optimal by default The issue of Complexity & Composability curse-of-
dimensionality of trainability and generality proper balance of representation
granularity/scale & size.
3. Proper image representations (e.g., random 2D/2.5D view sampling and aggregation,
mid-level cues, 20-questions hypothesis testing, ) can be critical alternatives.
4. Multi-staged algorithmic flow is not end-to-end trainable; but offer great flexibility
of leveraging heterogeneous components: shallow or deep, as long as the performance
goal of each step/stage is clearly defined and can compensate each other.
le.lu@nih.gov; rms@nih.gov
Thanks NIH Intramural Research Program for support and NVIDIA for
donating Tesla K40 GPUs! All code and data (except full radiology
reports) discussed are in the process to make publicly available, or
already shared at NCI cancer image archive or Github (upon approval).
CVPR 2015, 2016 Workshop on Medical Computer Vision: How Big Data is Possible for Medical
Image Analysis, invited talks only, Boston, MA, June 11th, 2015; Las Vegas, NV, July 1st, 2016