PHD Stavros Tsantis

UNIVERSITY OF PATRAS SCHOOL OF HEALTH SCIENCES FACULTY OF MEDICINE SCHOOL OF NATURAL SCIENCES DEPARTMENT OF PHYSICS
INTERDEPARTMENTAL POSTGRADUATE PROGRAM IN MEDICAL PHYSICS
PhD Thesis Image Processing and Analysis Methods in Thyroid Ultrasound Imaging
Tsantis Stavros
Patras, 2007, Hellas
We thank the Operational Program for Educational and Vocational Training II (EPEAEK II) for funding this PhD thesis.
, 2007
( )
SUPERVISING COMMITTEE
1. George Nikiforidis, Professor, Department of Medical Physics, University of Patras (Supervisor), Greece 2. Dionisis Cavouras, Professor, Department of Medical Instruments Technology, Technological Institute of Athens, Greece 3. Vassilis Anastasopoulos, Professor, Department of Physics, University of Patras, Greece
EXAMINING COMMITTEE
1. George Nikiforidis, Professor, Department of Medical Physics, University of Patras, Greece. 2. Vassilis Anastasopoulos, Professor, Department of Physics, University of Patras, Greece. 3. George Panayiotakis, Professor, Department of Medical Physics, University of Patras, Greece. 4. Anastasios Bezerianos, Professor, Department of Physics, University of Patras, Greece. 5. Dimitrios Siamplis, Professor, School of Medicine, University of Patras, Greece. 6. Kostas Berberidis, Associate Professor, Department of Computer Engineering and Informatics, University of Patras, Greece. 7. George Oikonomou, Assistance Professor, Department of Physics, University of Patras, Greece.
(496 406 .) Inspiration denies perception Sofoklis (496 406 b.C)
ACKNOWLEDGEMENTS
I wish to express my gratitude to my supervisor Professor G. Nikiforidis for the assignment of this project and for his suggestions and guidance throughout this thesis. I am also grateful to Professor D. Cavouras for his faith and confidence towards me and for his contribution in the fulfillment of this thesis. I would also like to thank him for his valuable guidelines in writing scientific articles. I would like to thank Dr N. Dimitropoulos for his selfness support throughout this thesis, his proposition regarding the topic of this thesis and his valuable medical guidelines regarding thyroid imaging. I would also like to thank N. Arikidis mainly for his friendship and for the long conversations regarding wavelet theory, Dr I. Kalantzis and Dr N. Piliouras for their important guidance in pattern recognition theory and algorithms. Finally, I wish to express my gratitude to my parents for their constant support and encouragement during the years of that work.
TABLE OF CONTENTS
-SUMMARY IN GREEK CHAPTER 1 Introduction
1.1 Power of Ultrasound 1.2 Need for Image Processing and Analysis Methods in Thyroid Ultrasound Images
1
1 1 2 2 4 5 7
1.3 Aims and Novelties of Thesis 1.3.1 WaveletBased Image Processing 1.3.2 Image Analysis 1.4 Publications 1.5 Dissertation Layout
CHAPTER 2 Thyroid Gland

2.1 2.2 2.3 2.4 Introduction Thyroid Disorders Management of Solitary Nodules Grading
9
9 10 11 13
CHAPTER 3 Physics & Instrumentation of Ultrasound

3.1 3.2 3.3 3.4 3.5 3.6 Nature of Ultrasound Propagation in Tissue Pulse Echo Imaging Instrumentation Quality control of the ultrasound system Data Acquisition and Storage
15
15 15 16 16 17 18
CHAPTER 4 The Wavelet Transform

4.1 4.2 4.3 4.4 4.5 Summary Wavelet Theory Continuous Wavelet transform Redudant Dyadic Wavelet Transform (1-D) Redudant Dyadic Wavelet Transform (2-D) Multiscale edge representation
21
21 21 22 23 28 32
CHAPTER 5 Singularity Detection

5.1 5.2 5.3 5.4 Summary Singularity and mathematical description Wavelet transform and singularity Singularity Detection (1-D) Singularity Detection (2-D)
35
35 35 36 38 41
CHAPTER 6 Pattern Recognition

6.1 6.2 6.3 6.4 Summary Pattern Recognition Theory Object Isolation Feature Generation Textural Features 6.4.1 First order statistical features 6.4.2 Second order statistical features 6.4.2.1 Co-Occurrence matrix features 6.4.2.2 Run-Length matrix features 6.4.3 Shape and Geometrical features 6.4.4 Local maxima features 6.5 Data Normalization 6.6 Classification task 6.6.1 Minimum distance classifiers 6.6.2 Bayesian classifier 6.6.3 Neural networks classifiers 6.6.4 The Support Vector Machines Classifier
43
43 43 44 44 44 44 45 45 48 50 53 53 54 54 55 55 57
CHAPTER 7 Wavelet-based speckle suppression in ultrasound images

Summary 7.1 Review of the Literature 7.2 Materials and methods 7.2.1 Overview and Implementation of the Algorithm 7.2.2 Speckle Model 7.2.3 Inter-scale Wavelet Analysis 7.2.3.1 Dyadic Wavelet Transform 7.2.3.2 Gradient Vector 7.2.3.3 Modulus Maxima 7.2.3.4 Lipschitz Regularity 7.2.3.5 Detection of singularities 7.3 Experimental Results and Evaluation 7.3.1 Tissue mimicking Phantom Validation 7.3.2 US image Case Study 7.3.3 Observer evaluation study 7.4 Discussion and Conclusions
61
61 61 64 64 65 67 67 69 69 71 71 75 76 76 81 85
CHAPTER 8 - Thyroid Nodule Boundary Detection in ultrasound images

Summary 8.1 Review of the Literature 8.2 Materials and Methods 8.2.1. Overview and implementation of the algorithm 8.2.2. US data Acquisition 8.2.3. Edge Detection Procedure
87
87 87 92 92 94 94
8.2.3.1. Multiscale Edge Representation 8.2.3.2. Coarse to Fine Analysis 8.2.4. Multi-scale Structure Model 8.2.4.1. Maxima Linking 8.2.4.2. Structure Identification 8.2.5. Nodules Boundary Extraction 8.3.1. Constrained Hough Transform 8.3.2. Accumulator Local Maxima Detection 8.3 Results 8.4 Discussion and Conclusion
94 95 96 97 98 99 100 100 101 106
CHAPTER 9 Development of a Support Vector Machine Based Image Analysis System for Assessing the Thyroid Nodule Malignancy 109 Risk on Ultrasound
Summary 9.1 Review of the Literature 9.2 Materials and Methods 9.2.1 US image data acquisition 9.2.2 Data pre-processing 9.2.3 Classification 9.2.4 Support vector machine classifier 9.2.5 Multilayer perceptron (MLP) classifier 9.2.6 Quadratic least squares minimum distance classifier 9.2.7 Quadratic Bayesian classifier 9.2.8 Support Vector Machines Wavelet Kernels 9.2.8.1 Wavelet Kernels implementation 9.2.9 System performance evaluation 9.3 Results and discussion 9.3.1 SVM Classification Outcome 9.3.2 MLP Classification Outcome 9.3.3 GLSMD & QB Classification Outcome 9.3.4 SVM with Wavelet Kernels Classification Outcome 109 109 110 110 111 111 112 113 113 113 114 114 116 116 116 119 120 122
CHAPTER 10 Pattern Recognition Methods Employing Morphological and Wavelet Local Maxima Features towards Evaluation of Thyroid Nodules Malignancy Risk in Ultrasonography 127
Summary 10.1 Materials and Methods 10.1.1 Patients 10.1.2 Feature extraction 10.1.3 Feature selection and classification 10.2 Results and discussion 10.2.1 SVM & PNN model Evaluation without the presence of Speckle 10.2.2 SVM & PNN model Evaluation with the presence of Speckle 10.3 Conclusion 127 127 127 128 129 130 130 136 139
CHAPTER 11 Conclusion and Future Work

11.1 Conclusion 11.2 Feature Work
141
141 142
REFERENCES APPENDIX I
List of Figures
143
157
APPENDIX II
List of Tables 161
APPENDIX III
Abbreviations 163
APPENDIX III
Index of Terms 165
, . , , , ., , . . , , . . , , . , B-Mode Doppler, . (1mm) [14-28]. , (Fine Needle Aspiration - FNA) [2933]. H , .
, , , [185-190]. , , . - . , : 1. (Wavelet Transform) . 2. .
1. Wavelet Transform (WT)

1.1
, . , ( ) ( ). speckle [90,111]. speckle . speckle . , .
II
speckle, . , , [47,49]. , . . , FNAB [29-33]. . , . . (wavelet transform) , [55]. , . . . - [57,58]. Mallat Zong [62] , . . ( )
III
-. , . (WT) . Lipschitz [61]. (singularities ) . (singularities) Lipschitz . (singularities) . speckle speckle , (singular) Lipschitz. , (singularities) Lipschitz.
1.2
1.2.1
, speckle . . speckle . speckle [92,93,94]. 1990 / speckle.
IV
[95-100]. , 90 [101-103].
1.2.2
: 1. (Dyadic Wavelet Transform) 2. (Modulus maxima representation) 3. (Coarse to fine grouping of local maxima) 4. Lipschitz (Lipschitz regularity calculation) 5. (Inverse Dyadic Wavelet Transform) 1. : (Multiscale Edge Representation) [62]. `algorithme a atrous ( ) [63]. 2. : O . [62,66]. 3. : () . (Back-propagation Tracking). : 2 j ,
2 j 1 .
[68,69]. 4. Lipschitz: . Lipschitz "". Lipschitz ( Lipschitz). Lipschitz ( Lipschitz). speckle Lipschitz Lipschitz [68-71]. 5. : speckle Lipschitz. Lipschitz . speckle.
1.2.3

. (Inter-scale wavelet speckle suppression) : () Adaptive speckle suppression filter (ASSF) [97], () Soft Thresholding [104] () Hard thresholding [104]. : speckle index signal-to-mean-square-error ratio speckle ( 1.1).
VI
1.1 speckle /
ASSF Soft Thresholding Hard Thresholding Inter-scale wavelet speckle suppression SI ( ) 14% / 12% 19% / 18% 16% / 15% 23% / 21% S/mse (dB) 10.5664 / 12.7728 14.5472 / 16.7709 11.4083 / 15.1853 16.2937 / 18.3241 (dB) 0.3592 / 0.1458 0.7063 / 0.7453 0.7836 / 0.8013 0.8490 / 0.8485
63
63
. . . speckle . speckle , .
1.2.4
speckle . , . - .
VII
1.3
1.3.1
- , , , , . (ROI). : () [118,123], () [124130], () (deformable and active models) [131,141], () [142145], () [146159].
1.3.2
,
(Edge Detection), . , , Hough . : 1. (Edge detection procedure) 2. (Multi-scale structure model) 3. (Nodules boundary extraction) 1. . . . ` lipschitz " . VIII
lipschitz speckle . 2. . , . : () 1 , () , () , () () . . 3. . , Hough . Hough .
1.3.3
, , 40 40 40 65 . . ( ground truth) (OB1 OB2) , (roundness), (concavity) (Mean Absolute Distance).
IX
, 90,14 89,33%. (91,83%) ( 2.1). 1.2 (AU) (OB1, OB2) , , MAD% .

AUOB1 88,83 AUOB2 87,58 AUOB1 91,77 AUOB2 91,12 AUOB1 89,21 AUOB2 89,08 MAD% AUOB1 90,77 AUOB2 89,53
inter-observer kappa 0,83. , . .
1.3.4
, . . . (FNAB). .
2.
2.1
(). 95% . . : (Papillary carcinoma 75%), (follicular carcinoma 15%, Doppler ), (Medullary carcinoma 7%), (Anaplastic carcinoma 3%) . , 95%. . , , , [1-12]. , ( ) . , () . , , ( , -), [1]. () Fine Needle Aspiration , . FNA . . 85%.
XI
: () , () ( ), . . . [164,165], [166] (discriminant analysis) [164-166]. , 83,9% [166] 85% [164]. FNA EUROMEDICA. : 1. . 2. - . . 3. (, ) ( , [170] [73,74] .. [70, 83, 170]. 4. ( ) , (SVM) (PNN).
XII
2.2
2.2.1
120 , 30 75 2003 2004 EUROMEDICA, 4, , . HDI 3000 ATLPHILPS (PHILIPS, USA). . . 256 , (Linear Array) 7 MHz (5-9 z - Broadband). , . Video Video, Miro PCTV (Pinnacle Systems), /. . . , 26 co-occurrence [ 73 ] 10 run-length [ 74 ].
2.2.2
leave-one-out. leave-one-out . . , , . : (minimum distance - MD), (least square minimum distance LSMD), Bayesian, (artificial neural networks MLP),[72,83,84,170] . XIII
: (Support Vector Machines-SVMs) [87,88]. SVMs : 1. ( ) (feature space) ( ), . 2. (margin) : ) , ) . , (discriminant function) SVMs : g(x) = sign
N i y i K ( x, x i ) + b i =1
x , , xi i- , yi {-1,+1} , i, b (x,xi) . ( Mercer [177]) , , : :
K ( x, x i ) = x T x i + 1
d , Gaussian Radial Basis :
((
) )
xx K (x, xi ) = exp 2 2i
2
2.2.3
SVM- 3 , (mean value) sum variance co-occurrence 96.7%. wavelet 3 rd ( sum variance).
XIV
MLP Run Length Non Uniformity run-length 95.0%. (QLSMD, QB) , sum variance co-occurrence Run Length Non Uniformity run-length 92.5% ( 3.1). 2.1 leave-one-out re-substitution.
LOO+ (%) Resub.* (%) 89.2 93.3 91.7 96.7 94.2 94.2 95.8 97.5 95.0 95.0 92.5 96.7 98.3 99.2 97.5 98.3 100.0 99.2 96.6 96.7 95.8 NSV** 17 13 12 15 15 9 10 10
SVM 1 SVM 2 SVM 3 SVM 4 SVM RBF SVM Daubechies Wavelet SVM Coiflet Wavelet SVM Symmlet Wavelet MLP QLSMD

Bayesian 92.5 + Leave-one-out * Re-substitution ** re-substitution
2.2.4 :
( sum variance) (, ) [17,18,25,27]. , , (sum variance) . co-occurrence [164] , 85%. (83,9%) (discriminant function) [166]. XV
SVM . , , SVM, . .
2.3
2.3.1
86 , 2005 2006 EUROMEDICA, 4, , . , . 12 . . 8 . .. , speckle, . speckle .
2.3.2
ROC leave-one-out. , , . ROC
XVI
(AUC) : (Support Vector Machines-SVMs) . (Probabilistic Neural Networks-P): PNN Bayesian , . Bayesian , PNN (Parzen). PNN feed-forward . ) : , ) : , . . ) : ) : PNN . PNN j [84,85,86]:
g j ( x) =
1 (2 ) p / 2 p N j
e
i =1
Nj
x xi 2 2
x , xi i , Nj j, , p . . 2.3.2.1 speckle SVM, (AUC 0,96) :
XVII
(Smoothness) (Symmetry) , 2 . PNN (AUC 0.91) "fractal dimension, (Concavity) " ( 2.2). 2.2 ROC SVM PNN
Model AUC (Lower Upper 95.0% Confidence Limit) 0.88 (0.69 0.96) 0.96 (0.84 0.99) 0.92 (0.78 0.97) 0.89 (0.69 0.97 ) 0.91 (0.79 0.96) 0.91 (0.85 0.95) Sensitivity (SN) Specificity (SP) Likelihood Ratio SN/(1-SP) 9.3 Number of Support Vectors 13
SVM 1 SVM 2 SVM 3 SVM 4 SVM RBF
0.93
0.90
0.93
0.98
46.5
0.87
0.93
12.4
10
0.93
0.93
13.3
12
0.93 0.96
0.96 0.94
23.2 16
17
PNN
2.3.2.2 speckle SVM 3 (AUC 0.88) leave-one-out " (Symmetry)". PNN (AUC 0.86) : : (concavity) ( 2.3).
XVIII
2.3 ROC SVM PNN

Model AUC (Lower Upper 95.0% Confidence Limit) 0,83 (0,63 0,93) 0,86 (0,68 0,94) 0,88 (0,68 0,97) 0,78 (0,52 0,86) 0,79 (0,66 0,91) 0,86 (0,74 0,90) Sensitivity (SN) 0.74 Specificity (SP) 0.90 Likelihood Ratio SN/(1-SP) 7.4 Number of Support Vectors 11
SVM 1 SVM 2 SVM 3 SVM 4 SVM RBF SVM 1
0.74
0.91
8.3
11
0.93
0.96
23.2
10
0.70
0.85
4.6
13
0.74 0.84
0.87 0.88
5.6 7
13
2.3.3
speckle, SVM PNN . (concativity) () () (MCs) . , , , . speckle, . (AUCSVM 0.88, AUCPNN 0.86), .
XIX
SVM PNN. , ( , , .....) . , , , , . , .
XX
CHAPTER 1
Introduction
1.1 Power of Ultrasound The establishment of Ultrasonography (US) as a leading tool in the majority of medical applications worldwide, is directly associated with the evolution of imaging technology employed in medicine and biology. The design and implementation of novel and -state of the art- ultrasound systems, allowed US to infiltrate into medical applications such as Orthopedics, I.C.U, Diabetology etc, in which few years ago the performing of US examinations was prohibited. The latter modified the use of prognostic medicine nowadays in a radical way. In fact, ultrasonography is recognized as the fundamental technique in prevention, diagnosis and therapy of a constantly broadened spectrum of diseases. US imposes itself on the hands of every physician anywhere he practices medicine. From a small private clinic via a portable unit to a general hospital through an expensive four-dimensional system, ultrasound proves its efficiency and accuracy in a daily basis. 1.2 Need for Image Processing and Analysis Methods in Thyroid Ultrasound Images Despite its non-invasive nature, low cost and easy-to-use real time application, US imaging suffers from the presence of a granular pattern termed as speckle. It is the result of various constructive and destructive interference phenomena, which occur when the distances between the tissue scatterers are smaller than the axial resolution limit of the system. It causes deformities of anatomic structures as well as random fluctuations in the images intensity profile. If an image is corrupted with speckle there are no regions of approximately constant intensity profile even if the reflecting tissue is entirely uniform. In addition, several US properties can lead to misleading effects in the ultrasound image. Reverberation, shadowing, refraction, side and grating lobes deteriorate the resolution of the US image, thus degrade its overall quality. The aforementioned problems arising by the complex nature of US imaging constitute speckle suppression and accurate boundary detection as important steps towards US image quality and diagnostic procedure enhancement in medical ultrasound imaging. The sonographic evaluation of malignancy risk in thyroid nodules represents a typical example of the way ultrasonography is accomplished to gain the confidence of medical community throughout the past years. The medical interest regarding biopsys necessity based on sonographic
-1-
Introduction
criteria is extremely high. New and more detailed features that have derived from US examinations of thyroid nodules are investigated to decide whether to proceed or not into Fine Needle Aspiration Biopsy (FNAB). Features, such as the nodules echo-structure and echogenicity (solid or colloid, hyper-hypo or iso-echogenic), its shape differentiation (round, eggshape, wide or tall), its boundary irregularity degree (from normal to highly irregular borderline), its calcifications pattern (massive, snow-storm etc), are employed towards an improved prognosis. The increasingly amount of information provided by high resolution US systems constitutes the clinical decision procedure rather difficult, therefore the quantification of sonographic findings and the implementation of computer-based algorithms could be of assistance as a second opinion tool. 1.3 Aims, Contributions and Novelties of Thesis The aim of the present thesis was the design and implementation of new image processing and analysis methods in ultrasound thyroid images. The research procedure comprised two main concepts towards optimization of thyroid ultrasonography. The design and implementation of: 1. Wavelet-based image processing methods towards speckle suppression and thyroid nodule segmentation 2. Image analysis methods in order to evaluate the thyroid nodules malignancy risk factor. 1.3.1 WaveletBased Image Processing
Most contemporary vision algorithms cannot efficiently perform on image intensity values that are directly derived from the initial gray-level representation. These intensity values are highly redundant, while the amount of important information within the image may be small. The depiction of ultrasound intensity values under a different angle of view can reveal several significant features that are not easily distinguishable in the original image. The wavelet-based transformation from the initial ultrasound image representation into a feature representation explicitly reveals the useful image features without the loss of essential image information, reduces the redundancy of the image data and eliminates any irrelevant information [55]. When an image contains meaningful structures of various sizes, the scale parameter should vary. Edges at different scales correspond to different physical entities. Large objects are well represented in large scale whereas small structures are localized in small scales. The multiresolution wavelet analysis provides information content of images by viewing any sharp variations (edges) at different scales by investigating the neighbors of these edges with the neighboring size varying [57,58].
-2-
Chapter 1
Since edges are considered as efficient descriptors of images, the multi-resolution formalism has the ability to detect and record them towards edge detection and segmentation purposes. Wavelet theory offers a mathematical framework for the multiscale processing and relates the behavior of edges across scales to local image properties. Mallat and Zong [62] have proved that a multiscale edge representation can provide a complete and stable representation of a signal, which in turn means that the whole signal information is carried by those multiscale edges. The latter denotes that image processing methods can be preferably utilized on the edge representation than directly onto the intensity value representation. Isolated edges or contours (group of edges with similar properties) correspond to sharp contrasts and can be detected from the local maxima of the wavelet transform. The multiscale local maxima representation is a reorganization of image information that provides higher level description of structures. A remarkable property of the wavelet transform is its ability to characterize the local regularity of image features such as discontinuities and sharp cusps. In mathematics, this local regularity is often measured with Lipschitz exponents [61]. The multiscale edge representation provides the ability to detect all the singularities (small points in space of sudden localized changes, which often indicate the most important features) of an image and to measure their Lipschitz regularity. In this thesis, an investigation has been made to the local behavior of image singularities in terms of Lipschitz regularity in thyroid ultrasound images. The wavelet transform modulus maxima created by noise singularities have a different behavior than those that are mainly affected by image singularities. The following realization has been made; the additive signal dependent noise random field, obtained by the speckle model implemented in this thesis, is a distribution which is almost everywhere singular or discontinuous, with non-positive Lipschitz exponents. On the contrary, the worth singularities derived from non irregular texture are sharp cusps that have positive Lipschitz exponents. As an application, an algorithm has been developed that removes speckle noise from ultrasound images by analyzing the evolution of wavelet transform modulus maxima across scales. This inter-scale search has been implemented by zooming into edges, beginning at low resolution (large scales) and adaptively increasing the resolution (small scales) to acquire the necessary details. In the resulting de-speckled ultrasound image, contrast enhancement of various structures in regard with the surrounding environment without the creation of blurring has been observed. In addition, according two independent observers the disclosure of structures that are not easily distinguishable by the human eye has also been reported. The two-microlocalization properties [69] of these edges that provide characterization of singularities have also been utilized in the subsequent segmentation algorithm. The multiscale
-3-
Introduction
information acquired by the aforementioned study has been integrated towards the nodule boundary detection. A multi-scale hybrid model has been introduced that employed wavelet local maxima, after the regularity estimation, towards object identification in order to extract the thyroid nodules boundary. The proposed model transfers the multiscale local maxima representation into a multiscale object representation. Each object that occupies a physical region has been detected by means of local maxima adjacency in all available scales. The multiscale structure representation associates an anatomical object in the image with a volume in the multiscale edge transform. This structure representation serves as input to a constrained Hough transform for nodule detection. The segmentation method offered an additional tool in the shapebased thyroid nodule categorization from the physician and accuracy enhancement during fine needle aspiration procedure. 1.3.2 Image Analysis
Finally, an investigation of various pattern recognition methods for automatic thyroid nodule discrimination in terms of high and low risk of malignancy has been made. Various pattern recognition algorithms such as Support Vectors Machines (SVMs), the Probabilistic Neural Network (PNN), the classical quadratic least squares minimum distance (QLSMD), the quadratic Bayesian (QB) and the multilayer perceptron (MLP) classifiers, have been implemented throughout this thesis. This research comprised two independent studies that employed initially textural features and subsequently morphological and wavelet based features derived from the segmentation procedure. The texture-based classification scheme implemented in this study has managed to quantify several textural parameters visually evaluated by physicians in assessing the thyroid nodules risk factor and succeeded high classification rates. These parameters mainly involved echogenicity in regard with the surrounding environment, presence of calcifications within the nodule and increased vascularity. An additional study has also been made that aimed at the employment of quantified morphological and wavelet-based features, in order to evaluate the malignancy risk factor in ultrasound thyroid nodules. In this research, a novel approach has been made that utilized the image singularities in order to evaluate the effect of speckle in the classification procedure. In a parallel study (with and without speckle), the pattern recognition algorithms employed various wavelet features so as to evaluate the discrimination importance of speckle. As a conclusion, speckle noise, even if in the original US image its effect cannot be easily evaluated; in the wavelet feature level its presence had a negative influence.
-4-
Chapter 1
The quantification of various sonographic observations (such as echogenicity, the boundary irregularity degree, the non-circular boundary and the presence of micro-calcifications) led to a more objective evaluation towards biopsy necessity and could be of assistance in the decision making procedure. 1.4 Publications The research work of this thesis has resulted or contributed in publications and presentations in international journals and conferences. 1.4.1. Publications in peer reviewed international journals
Journal Published Papers: 1. S. Tsantis, D. Cavouras, I. Kalatzis, N. Piliouras, N. Dimitropoulos, and G. Nikiforidis: Development of a support vector machine-based image analysis system for assessing the thyroid nodule malignancy risk on ultrasound, Ultrasound in Medicine and Biology, Vol. 31, No. 11, pp. 14511459, 2005 2. S. Tsantis, N. Dimitropoulos, D. Cavouras and G. Nikiforidis: A Hybrid Multi-Scale Model for Thyroid Nodule boundary detection on Ultrasound Images, Computer Methods and Programs in Biomedicine, Volume 84, Issues 2-3, Pages 86-98, 2006 3. S. Tsantis, N. Dimitropoulos, M. Ioannidou, D. Cavouras and G. Nikiforidis: Inter-Scale Wavelet Analysis for Speckle Reduction in Thyroid Ultrasound Images, Computerized Medical Imaging and Graphics, Volume 31, Issue 3, Pages 117-127, 2007 Journal Submitted Papers: 4. S. Tsantis, N. Dimitropoulos, D. Cavouras, and G. Nikiforidis: Pattern Recognition Methods Employing Morphological and Wavelet Local Maxima Features towards Evaluation of Thyroid Nodules Malignancy Risk in Ultrasonography, Submitted in Ultrasound in Medicine and Biology, April 2007. 1.4.2. Publications in International Conference Proceedings
1. S. Tsantis, N.Dimitropoulos, D. Cavouras and G. Nikiforidis: Morphological Features towards Ultrasound Thyroid Nodules Malignancy Evaluation, 2nd IC-EpsMsO, Athens, 4-7 July, 2007 2. S. Tsantis, N.Dimitropoulos, D. Cavouras and G. Nikiforidis: 1st Order vs. 2nd Order Derivatives towards Wavelet-Based Speckle Suppression in Ultrasound Images, 2nd ICEpsMsO, Athens, 4-7 July, 2007 -5-
Introduction
3. S. Tsantis, D. Gklotsos, I. Kalantzis, N. Piliouras, P. Spyridonos, N.Dimitropoulos, G. Nikiforidis and D. Cavouras: Computer Assisted Diagnosis of Thyroid Nodules Malignancy Risk.. European Congress of Radiology, 2005 4. S. Tsantis, I.Kalantzis, N Piliouras, D. Cabouras, N Dimitropoulos, G. Nikiforidis: Computeraided characterization of thyroid nodules by image analysis methods, Proceedings in International Conference of Computational Methods in Sciences and Engineering 2003 (ICCMSE 2003), pp 639:642, September 2003 5. S. Tsantis, D. Cabouras, N Dimitropoulos, G. Nikiforidis: Denoising sonographic images of thyroid nodules via singularity detection employing the wavelet transform modulus maxima, Proceedings in International Conference of Computational Methods in Sciences and Engineering 2003 (ICCMSE 2003), pp 643:646, September 2003. 6. S. Tsantis, N.Piliouras, N.Dimitropoulos, D. Cavouras and G.Nikiforidis: Evaluation of Support Vector Machines Wavelet kernels for the automatic categorization of thyroid nodules, 4th European Symposium on Biomedical Engineering, Patra, 25th - 27th June 2004 7. Stavros Tsantis, Dimitris Glotsos, Giannis Kalatzis, Nikos Dimitropoulos, George Nikiforidis, Dionisis Cavouras: Automatic contour delineation of thyroid nodules in ultrasound images employing the wavelet transform modulus-maxima chains, 1st International Conference From Scientific Computing to Computational Engineering, IC-SCCE, Athens, 8-10 September, 2004 8. Stavros Tsantis, Dimitris Glotsos, Panagiota Spyridonos, Giannis Kalatzis, Nikos Dimitropoulos, George Nikiforidis, Dionisis Cavouras: Improving Diagnostic Accuracy in the classification of thyroid cancer by combining quantitative information extracted from both ultrasound and cytological images, 1st International Conference From Scientific Computing to Computational Engineering, IC-SCCE, Athens, 8-10 September, 2004 1.4.3. Contributions in Publications in International Conference Proceedings
1. Glotsos D., Spyridonos P., Tsantis S., Kalatzis I., Dimitropoulos N., Nikiforidis G., Cavouras D: Unsupervised Segmentation of Fine Needle Aspiration Nuclei Images of Thyroid Cancer using a Support Vector Machine Clustering Methodology, 1st International Conference From Scientific Computing to Computational Engineering, IC-SCCE, Athens, 8-10 September, 2004
-6-
Chapter 1
2. D. Glotsos, S. Tsantis, J. Kybic 12, I. Kalatzis, P. Ravazoula, N. Dimitropoulos, G. Nikiforidis, D. Cavouras, Pattern recognition based segmentation versus wavelet maxima chain edge representation for nuclei detection in microscopy images of thyroid nodules, 3rd European Medical and Biological Engineering Conference, Prague, Czech Republic, 20-25 November, 2005. 3. Glotsos D., Spyridonos P., Ravazoula I., Kalatzis I., Tsantis S., Nikiforidis G., Cavouras D: Evaluating the Generalization Performance of a Support Vector Machine based Classification Methodology in Brain Tumor Astrocytomas Grading, 1st International Conference From Scientific Computing to Computational Engineering, IC-SCCE, Athens, 8-10 September, 2004. 4. Contribution in Dimitris Glotsos and Jan Kybic: Development of a wavelet-assisted edgedetection algorithm for boundary detection of fine needle aspiration images of thyroid nodules. Research Report CTU-CMP-2005-17, Center for Machine Perception, K13133 FEE Czech Technical University, Prague, Czech Republic, March 2005 1.5 Dissertation Layout In Chapter 2, a background on thyroid physiology and anatomy is provided as well as the grading categories of solitary thyroid nodules. In Chapter 3, a theoretical background on physics and instrumentation of ultrasound in general is presented. Moreover, the quality control procedure and the data acquisition system employed in this thesis are also given. Chapter 4 provides an overview of classic wavelet theory and wavelet transforms. Emphasis is given to redundant dyadic wavelet transform since it is the basis of the wavelet-based techniques used in this thesis. In Chapter 5 the regularity theory along with its correlation to wavelet transform modulus maxima is depicted. Chapter 6 describes the fundamentals of pattern recognition theory and various classification algorithms with a thorough study in feature selection and generation methods. In Chapter 7, a survey of various speckle suppression methods in ultrasonography is presented first. Then, a new wavelet-based method for speckle reduction in thyroid ultrasound imaging is explained in detail. Chapter 8 contains at first an extensive review of various segmentation algorithms in ultrasound imaging. Consequently a novel hybrid model is presented towards boundary extraction of thyroid nodules in ultrasound. In Chapter 9, an SVM model is designed and implemented in order to assess thyroid nodule malignancy risk factor that employed several textural characteristics of the sonographic image. Chapter 10 encloses a pattern recognition study based on two well known classification algorithms (SVMs and PNN) that
-7-
Introduction
employed morphological in conjunction with various wavelet local maxima features directly derived from the segmentation procedure. In Chapter 11, a general conclusion and some future perspectives of the present thesis are provided. In Appendixes I, II, III, IV the List of Figures, List
of Tables, Abbreviations and Index of this manuscript are listed respectively.
1.6 Research Funding

The present research was funded by the Operational Program for Educational and Vocational Training II (EPEAEK II).
-8-
CHAPTER 2
Thyroid Gland
2.1 Introduction The thyroid gland is a brownish-red and highly vascular organ, located in the front of the lower neck and attached between the lower part of the larynx and the upper part of the trachea. The gland varies from an H to a U shape formed by two elongated lateral lobes. Both lobes are about 4 cm long and 1-2 cm wide and are linked together by a median isthmus [1] (Figure 2.1).
Figure 2.1 The thyroid gland. The thyroid gland produces, stores and secrets thyroid hormones, which are peptides containing iodine. The two most important hormones are tetraiodothyronine (thyroxine or T4) and triiodothyronine (T3). They are essential for humans and have many effects on body metabolism, growth, and development. The thyroid glands function is influenced by hormones produced by two organs: 1. The pituitary gland, located at the base of the brain which produces thyroid stimulating hormone (TSH) and, 2. The hypothalamus, a small part of the brain above the pituitary that produces thyrotropin releasing hormone (TRH) (Figure 2.2). Low levels of thyroid hormones in the blood are detected by the hypothalamus and the pituitary. TRH is released, stimulating the pituitary to release TSH. Increased levels of TSH, in turn, stimulate the thyroid to produce more thyroid hormone, thereby returning the level of thyroid hormone in the blood back to normal. The three glands and the hormones produce the "Hypothalamic - Pituitary - Thyroid axis". Once thyroid hormone levels are restored, TSH secretion stabilizes at a high level. [2-4].
-9-
Thyroid Gland
Figure 2.2 Schematic representation of Hypothalamic - Pituitary - Thyroid axis 2.2 Thyroid Disorders The enlargement of the thyroid gland is called goitre. Goitre does not always indicate a disease, since thyroid enlargement can also be caused by physiological conditions such as puberty and pregnancy [5,6]. The main causes of thyroid disease are: 1. Excessive thyroid hormone production or hyperthyroidism. 2. Decreased thyroid hormone production or hypothyroidism. 3. The state of normal thyroid function is called euthyroidism. All thyroid disorders are much more common in women than in men. Other disorders termed as "Autoimmune" of the thyroid gland are also common. These are caused by abnormal proteins, (called antibodies), and the white blood cells which act together to stimulate or damage the thyroid gland. Graves' disease (hyperthyroidism) and Hashimoto's thyroiditis, are diseases of this type [7-12]. Graves' Disease: Graves' disease (thyrotoxicosis) is due to a unique antibody called "thyroid stimulating antibody" which stimulates the thyroid cells to grow larger and to produce excessive amounts of thyroid hormones. In this disease, the goitre is due not to TSH but to this antibody. Hashimoto's Thyroiditis: In Hashimoto's thyroiditis, the goitre is caused by an accumulation of white blood cells and fluid (inflammation) in the thyroid gland. This leads to destruction of the thyroid cells and, eventually, thyroid failure (hypothyroidism). As the gland is destroyed, thyroid hormone production decreases; as a result, TSH increases, making the goitre even larger. Hyperthyroidism is treated mostly by medical means, but occasionally it may require the surgical removal of the thyroid gland. Sometimes, thyroid enlargement is restricted to one part of the gland. The most common cause of this is a cyst or nodule, which may be benign or malignant. Occasionally there are many nodules. This, so called "multinodular goitre", is probably caused by mutations of follicular cells. Thyroid nodules are not expression of a single disease but constitute the
- 10 -
Chapter 2
clinical indication of a wide range of different diseases. An initial differentiation within thyroid nodules subjects to quantitative criteria. A single nodule is called solitary nodule whereas the presence of multiple nodules is often called as multinodular goitre. Although as many as 50% of the population will have a nodule somewhere in their thyroid, the vast majority of these are benign. Occasionally, solitary thyroid nodules can take on characteristics of malignancy and require either a needle biopsy or surgical excision. 2.3 Management of solitary nodules The parameters that must be considered into a clinical decision, regarding solitary nodules, include the history of the lesion, age, sex, and family history of the patient, physical characteristics of the gland, local symptoms, and laboratory evaluation. The age of the patient is an important consideration since the ratio of malignant to benign nodules is higher in youth and lower in older age. Male sex also carries a similar importance. The basics steps towards an efficient management of solitary nodules include [13-16]: Clinical examination Thyroid function tests: TSH, antibodies Ultrasound hyroid scan Cytology of fine needle aspirate (FNA)
Clinical examination: The physician inspects the neck and feels the thyroid gland (palpation). The size and consistency of the thyroid gland, how painful it is, and the extent to which it may have moved out of position from surrounding structures are also assessed by the physician. Thyroid function tests: Through a blood test the thyroid glands functionality can be evaluated. Measurements of T3, T4 and hormones that control thyroid gland activity (the TSH test) are taken and compared with the norm. Ultrasound: High-resolution ultrasonography (US) can be used to determine the size and presence of nonpalpable nodules as small as 1 mm within the thyroid tissue (Figure 2.3). Furthermore, any solid or cystic components within a thyroid nodule can be detected with high precision [17-27]. In this examination technique, different types of body tissue conduct and reflect sound in different ways. The reflecting echo is recorded and displayed as an image from the part of the body from which it resonated. Ultrasound has now established itself as a standard means of examination in thyroid gland morphology. In cases where a nodule presents some suspicious US characteristics, a fine needle aspirate biopsy (FNAB) is performed. In a review of published studies, the use of conventional thyroid ultrasonography did not allow accurate prediction between malignant and benign cases of solitary thyroid nodules. Its main
- 11 -
Thyroid Gland
indications are accurate measurement of size and as a guide for FNAB. However, certain US features such as irregular borders of the nodule, lack of a "halo", echogenicity, evidence of calcium flakes, marginal nodules in a cyst, increased blood flow, and growth on consecutive ultrasounds, are suggestive signs of malignancy.
Figure 2.3 Ultrasonographic examination in the transverse plane of the thyroid containing a solid nodule in the right lobe and a homogeneous appearance on the left lobe. Thyroid scan: By using thyroid gland scintigraphy, a morphological and functional image of the thyroid gland is produced simultaneously. This means that the way different areas of the thyroid gland are depicted relates to how they are functioning (normally, hyperactively or hypoactively). Such examinations were obligatory before, with the use of radio-iodine for treating thyroid gland disorders [28,29]. Thyroid scan can differentiate a solitary nodule as cold (Hypo-functioning) nodule or hot (Hyper-functioning) nodule (Figure 2.4).
(a) (b) Figure 2.4 Scintiscans of thyroid. (a)The scan on the left is normal. (b) A typical scan of a "cold" thyroid nodule failing to accumulate iodide isotope is shown on the right. The thyroid scan can also provide evidence for a diagnosis in a multinodular goiter, in Hashimotos thyroiditis, and rarely in thyroid cancer when functioning cervical metastases are seen. Malignant tumors usually fail to accumulate iodide to a degree equal to that of the normal gland.
- 12 -
Chapter 2
FNAB: Fine needle aspiration biopsy has become the diagnostic tool of choice for the initial evaluation of solitary thyroid nodule because of its accuracy, safety, and cost effectiveness. In most but not all cases, FNAB is the only non-surgical method which can differentiate malignant and benign nodules [30,31]. Fewer patients have undergone thyroidectomy for benign disease as a result of FNAB, with resultant decreased health care costs. Although needle biopsy can be performed easily, consistently obtaining adequate tissue and processing the specimens to achieve accurate cytopathological interpretation, requires expertise and experience. The needle is placed into the nodule several times and cells are aspirated into a syringe. The cells are placed on a microscope slide, stained, and examined by a pathologist. Often a small percentage of FNAs are termed as Nondiagnostic, which indicate that there are an insufficient number of thyroid cells in the aspirate and no diagnosis is possible. A nondiagnostic aspirate should be repeated. 2.4 Grading Various alternative classifications of thyroid nodules have been proposed. A summarized classification approach based on cytology findings is illustrated in Figure 2.5 [32,33].
Thyroid Nodules
Benign Nodules
Malignant Nodules
Simple Cyst Inflammatory Focal Hemorrhage Indeterminate
Papillary Carcinoma Medullary Carcinoma Lymphoma Anaplastic Carcinoma
Colloid Cyst
Follicular Cells (hyperplasia)
non-functioning follicular adenoma
Follicular carcinoma
Figure 2.5 Classification of thyroid solitary nodules Bening nodules: The great majority of solitary thyroid nodules are benign (>90%). Common types of the benign thyroid nodules are simple thyroid cysts, inflammatory cysts and cysts with focal haemorrhage [34].
- 13 -
Thyroid Gland
Adenomatous Hyperplasia: An interesting fact regarding bening nodules, is that a certain category (epithelial hyperplastic nodules) can, under particular circumstances, be transformed into malignant ones. The thyroid cells on these aspirates are neither clearly benign nor malignant [45,46]. Twenty five percent of such suspicious lesions are found to be malignant when these patients undergo thyroid surgery. These are usually follicular cell cancers (Figure 2.6). Therefore, surgery is recommended for the treatment of thyroid nodules from which a suspicious aspiration has been obtained.
(a) (b) Figure 2.6 (a) Thyroid nodule with epithelial hyperplasia, (b) Colloid nodule. Malignant nodules: Most thyroid cancers are very curable. In fact, the most common types of thyroid cancer (papillary and follicular) are the most curable. In younger patients, both papillary and follicular cancers can be expected to have better than 97% cure rate if treated appropriately. Both papillary and follicular cancers are typically treated with complete removal of the lobe of the thyroid which harbors the cancer [35-38]. Medullary cancer of the thyroid is significantly less common, but has a worse prognosis. Medullary cancers tend to spread to large numbers of lymph nodes very early on therefore requiring a much more aggressive operation than does the more localized cancers such as papillary and follicular. The least common type of thyroid cancer is anaplastic which has a very poor prognosis [3940].Most primary thyroid lymphomas occur in middle-aged or elderly patients with a ratio of women-men ranging from 2:1 to 8:1. Patients present with a relatively rapid thyroid enlargement accompanied by hoarseness, dysphagia and/or dyspnea in approximately 25% of cases and cord paralysis in about 17% [41-43]. Anaplastic thyroid cancer tends to be found after it has spread and is not cured in most cases. Often an operation cannot remove the entire tumor [44]. One of the objectives of this thesis is to explore all available ultrasonic characteristics by means of image analysis methods in order to predict the malignancy risk factor between high risk nodules (adenomatus or epithelial hyperplasia) and low risk nodules (simple or colloid cysts).
- 14 -
CHAPTER 3
Physics & Instrumentation of ultrasound
3.1 Nature of Ultrasound A sound or ultrasound (US) wave consists of a mechanical disturbance of a medium (gas, liquid or solid) which passes through the medium at a fixed speed. The rate at which particles in the medium vibrate is the frequency of the sound and is measured in hertz (cycles/second). In medical ultrasound, the disturbance which is characterized by the local pressure change of the particles of the medium from the resting positions originates at a piezoelectric transducer in a probe placed on the skin surface. The transducer (operating as a transmitter) transforms electrical signals to mechanical movement. The same transducer can transform the reflecting mechanical vibrations into electrical signals (operating as a receiver). The ultrasound frequencies used in contemporary US systems range from 1 to 20 MHz [47,48]. 3.2 Propagation in Tissue Ultrasound is altered by the tissue through which it passes. At the boundaries between different tissue types, the US beam can be partially reflected, refracted, scattered by small tissue structures or subjected to energy loss by absorption [49,50]. Reflection: When ultrasound is incident on a smooth boundary (interface) between two media some ultrasound is transmitted through the interface and some reflected. If the interface is perpendicular to the direction of propagation the intensity of the reflected ultrasound beam is proportional to the acoustic impedances of the two media. Refraction: The transmitted beam at an interface between media having different speeds of ultrasound deviates from the path of incident beam, provided the angle of incident is nonzero. The beam deviation is dependent on the difference of ultrasound speed (not impedances) and the refracted beam bends away from the perpendicular if the speed in the second medium is higher than in the first and vice versa. Scattering: If ultrasound is incident on a rough surface or on particles with size small or comparable with the beams wavelength then the ultrasound is scattered in all directions. If the scattering particles are small compared with the wavelength then the scattered power is proportional to the fourth power of the ultrasound frequency. Absorption: Ultrasound power is also subjected to absorption in which the energy of the ultrasound is converted into heat. The loss due to absorption increases with frequency.
- 15 -
Physics & Instrumentation of Ultrasound
3.3 Pulse Echo Imaging At a boundary between two tissues a proportion of ultrasound passes on and the rest is reflected. The degree of reflection depends on the acoustic impedances of the two tissues which are depended on density and compressibility. A large difference in acoustic impedance (i.e. soft tissue bone or soft tissue air interfaces) leads to a high degree of reflection. At the boundary between two different types of soft tissue (i.e. muscle fat) the degree of reflection is small [51]. In ultrasonic imaging the transducer is periodically driven by an electrical pulse leading to the transmission of an ultrasound pulse which is received back after reflection or scattering at tissue interfaces. The time of arrival of the echo from a given interface depends on its depth and the US system employs the time of echos arrival after the transmission as an indication of the depth of the interface. Since the amplitude of an echo is determined by the structure and physical decomposition of the reflector or scatterer, it is used to determine the brightness of the echo in a display. B Mode: In this imaging technique the ultrasound beam is scanned through the tissue. The echo signals received at each beam position are displayed as spots on the monitor screen in which the brightness indicate the echo amplitude (grayscale display). The positions of the spots are determined by the orientation of the beam and by the time of arrival of the echoes. M Mode: The movement of echo-generating tissues can be displayed as a function of time by means of the M-mode display. Doppler Mode: Movement of reflectors or scatterers also changes the frequency of the received signal. This change from the transmitted signal frequency is known as the Doppler Effect and the magnitude of the change (Doppler shift) is proportional to the reflector or scatterer velocity. By measuring the Doppler shift, the cyclical variation of blood velocity can be monitored. The integration of real time Doppler instruments produces a color-flow image which is superimposed in the grayscale display. 3.4 Instrumentation All US examinations throughout this thesis were performed on an HDI-3000(Figure 3.1) ATL digital ultrasound system Philips Ultrasound P.O. Box 3003 Bothel, WA 98041-3003, USA with a wide band (5-12 MHz) linear probe (L7-4) using various scanning methods such as longitudinal and transversal cross sections of the thyroid gland. The system is located in Medical Imaging Department, EUROMEDICA Medical Center, 2 Mesogeion Avenue, Athens, Greece.
- 16 -
Chapter 3
Figure 3.1 HDI-3000-ATL digital ultrasound system The wide-band linear ultrasound transducer has the capacity to resonate at multiple frequencies which in turn gives the system the ability to acquire a wide range of frequencies (5-9 MHz), contrary to conventional transducers that detects only the nominal frequency (7 MHz), producing US images of high quality (Figure 3.2).
Figure 3.2 US image of the thyroid gland with a cystic nodule. 3.5 Quality Control of the Ultrasound System Before the acquisition and storage of US image in the computer a thorough quality control is performed on the system. The quality control procedure utilized a tissue mimicking phantom RMI 403 LE, GAMMEX RMI P.O. Box 620327 Middleton, WI 53562-0327 USA with the same attenuation and speed of sound as human soft tissue with uniform scatter distribution
- 17 -
that yields a smooth image texture. The image quality indicators employed in the procedure are presented below: Depth of Penetration: The point at which usable tissue information disappears or maximum depth of penetration is reached can be defined simply as how far one can see into the phantom. Equipment sensitivity and noise determines the deepest echo signal which can be detected and clearly displayed. Image Uniformity: Ultrasound systems can produce various image artefacts and nonuniformities which in some cases mask variations in tissue texture. Common non-uniformities are horizontal bands in the image caused by inadequate handling of transitions between focal zones or vertical bands indicating inactive or damaged transducer elements. Axial Resolution: Axial resolution describes the scanners ability to detect and clearly display closely spaced objects that lie on the beams axis. Using pin targets of decreased vertical spacing, the systems axial resolution is determined by locating the two resolvable pins with the smallest separation. Distance Accuracy: Vertical and horizontal distance measurement errors can easily go unnoticed on clinical images. Distance accuracy as a quality indicator is determined by comparing the measured distance between selected pin targets in the phantom with the known distance. Lateral Resolution: Lateral resolution is described as the distinction of small adjacent structures perpendicular to the beams major axis. The lateral resolution is measured indirectly by measuring the width of pin targets at depths corresponding to near, mid, and far field ranges of the transducer. Dead Zone: The dead or ring down zone is the portion of the image directly under the transducer where image detail is missing or distorted. The depth of an instruments dead zone is determined by identifying the shallowest pin target that can be clearly visualized. All steps carried out during the quality control procedure were upon the guidelines determined both from the ultrasound system and Phantom manufacturer user manuals and the performance of the system was accordingly to the standard specifications. 3.6 Data Acquisition and Storage An image processing system for acquisition and storage of US images consists of the US system, a personal computer and an interface between the system and the computer (Figure 3.3). The interface converts analog information into digital data which the computer can process. This takes place in a special piece of hardware, the frame grabber, which also stores the image. Usually the frame grabber package contains a library of often-used routines which can be linked to the users program.
- 18 -
Chapter 3
The frame grabber utilized in the present study is the Miro PCTV (Pinnacle Systems Inc. 280 N.Bernardo Avenue Mountain View, CA 94043) with the BT 848 chipset integrated. The video output of the ultrasound system (HDI 3000) is connected to the frame grabber of the image processing computer (Microsoft PC, PII at 600 MHz with 64 MB of RAM).
Figure 3.3 Image processing system for acquisition and storage of US images. The BT848 chipset integrates an NTSC/PAL/SECAM composite, an S-Video decoder, a scaler, a DMA controller, and a PCI Bus master on a single device. It can place video data directly into host memory for video capture applications and into a target video display frame buffer for video overlay applications. BT848 is designed to efficiently utilize the available 132 MB/s PCI bus. The video stream consumes bus bandwidth with average data rates set to 44 MB/s for full size 768x576 PAL RGB32. Consecutive video frames can be written into the Video image buffer (continuous capture mode). The external trigger (Freeze command) signal temporarily stops this process thus freezing the video. When the external trigger is disabled the video buffer is then again written to, by every consecutive frame until the software through the radiologist issues another FREEZE command. The selected US Images are captured at frame rate (40 ms, 25 Hz - PAL standard video signal) and converted into JPEG format images with a resolution of 768x576
- 19 -
(full PAL resolution) pixels. The software of acquisition and storage of US images is the Icon-Print 2000, it is written in Visual C ++ and uses technology VFW (Video for Windows).
- 20 -
CHAPTER 4
The Wavelet Transform
Summary This chapter reviews the theory behind wavelets and wavelet transforms. At first a small review of the continuous wavelet transform is given, followed by an extensive study of the dyadic wavelet transform both in one dimensional (1-D) signals and two dimensional (2-D) images. The 2-D redundant dyadic wavelet transform is implemented and utilized throughout this thesis. Special reference is made also to the spline wavelets employed in this thesis. Moreover, a summary is given regarding the advantages of the redundant dyadic wavelet transform. The multi-scale edge representation is also analyzed and depicted at the last part of this chapter. 4.1 Wavelet Theory Wavelets are an extension of windowed Fourier analysis by Gabor [52], in which through a fixed window a large number of oscillations are used for detecting high frequencies, whereas a small number is used to detect low frequencies. However, in the first case the window is blind to smooth events and in the second case the window probably will miss a brief change. Instead of a fixed window and a variable number of oscillations Morlet and Grossman [53] employed a mother wavelet which is stretched or compressed to change the size of the window, thus providing a decomposition of the signal at different scales (frequency bands). The dilation of the function called mother wavelet produces a family of functions. The wavelet transform of a signal is a sequence of signals obtained by the convolution of the signal with the wavelet family. The wavelets size variation due to dilation permits them to automatically adapt to the different components of the signal. A small window (high frequency band) detects rapid high-frequency components and a large window (low frequency band) traces slow low-frequency components. The wavelet transform is required to satisfy a so called admissibility condition so that it can form a complete and numerically stable representation. The wavelet transform gives a representation that has good localization in both frequency and space [54-56]. The localization in frequency implies a correspondence between a scale of the wavelet transform and a frequency band. The overall study across all available frequency bands is called multiresolution analysis [57-59]. The wavelet transform is divided in two main categories: the continuous wavelet transform (CWT) in which all values of the
- 21 -
parameters are employed and the discrete wavelet transform (DWT) in which only a discrete set of parameters are considered. 4.2 Continuous Wavelet Transform The continuous wavelet transform is shift invariant thus suitable for feature extraction and image analysis methods [62]. The CWT decomposes a signal by means of dilated and translated wavelets. Let a wavelet ( x) L2 () is a function of zero average:
( x)dx = 0
4.1
It is normalized = 1 and centered in the neighborhood of x = 0. The function (x) is used to create a wavelet family by dilating with s:
s ( x) =
1 x s s
4.2
All the functions in the wavelet family have the same shape as the wavelet. The continuous wavelet transform of a signal f L2 () is a family of functions [Ws f ( x)]sR + and defined by:
Ws f ( x) = s f ( x), s R +
If ( ) is the Fourier transform of (x), then: Ws f ( ) = ( s ) f ( )
4.3
4.4
In order for the transform to be invertible, the wavelet (x) must satisfy the admissibility condition [53]:
C = ( )
< +
4.5
The function f(x) can be reconstructed from its wavelet transform [53]:
W 1 : f ( x) = s Ws f ( x)
0
ds s
4.6
The admissibility condition also ensures that the wavelet transform is an isometry:
- 22 -
Chapter 4
f ( x) =
2
Ws f ( x)
ds s
4.7
Equation 4.7 implies that the continuous wavelet transform is a complete and numerical stable representation. 4.3 Redundant Dyadic Wavelet Transform (1-D) In the translation-invariant dyadic wavelet transform the scale parameter s is discretized dyadically ( [2 j ] jZ ) to simplify the numerical calculations, while the spatial parameter is continuous [60,61]. Let a wavelet ( x) L2 () is a wavelet whose average is zero. The wavelet family by dilating with s is:
2 ( x) =
j
1 x ( j) j 2 2
4.8
The dyadic wavelet transform of a function f(x) L2, at a given scale 2 j and at the position x obtained by the convolution of f(x) with the wavelet family
W2 j f ( x) = f 2 j ( x)
We refer to the dyadic wavelet transform as the sequence of functions:
4.9
Wf = (W2 j f ( x)) jZ ,
where W is the dyadic wavelet transform operator.
4.10
In order to study the completeness and stability of the DWT we denote the Fourier transform of W2 j f ( x) as:
W2 j f ( ) = f ( ) (2 j )
Given that there are two strictly positive constants A & B such that:
4.11
R, A (2 j ) B ,
2 j Z
4.12
it is ensured that the whole frequency axis is covered by dilations of ( ) by ( 2 j ) jZ so that

f ( ) and consequently f (x) can be recovered from its dyadic wavelet transform. The
reconstructing wavelet (x) is any function whose Fourier transform satisfies:
- 23 -
j =
(2 ) (2 ) = 1.
j j
4.13
If equation (4.12) is valid, an infinite number of functions ( x ) exist that satisfy equation
(4.13). The inverse dyadic wavelet transform that recovers f (x) is given by the summation:
f ( x) =
j =
2j
f 2 j ( ).
4.14
In practice, we can compute a Wavelet Transform only over finitely many scales. This is because the observed data is limited between a non-zero small (fine) scale and a finite large (coarse) scale. According to Mallat [57], one can normalize the observable finest scale to 1(20) and the coarsest scale to 2J where J is dependent on the sample size of the data. In order to model this scale limitation, a real function (x) is introduced, whose Fourier transform is an
aggregation of (2 j ) and (2 j ) at scales 2 j larger than 1:

( ) = (2 j ) (2 j )
j =1 2 +
4.15
The reconstructive wavelet ( ) is such a function that ( ) ( ) is a positive, real and

even function. The equation (4.13) implies that the integral of (x) is equal to 1 and hence that it is a smoothing function. Let S 2 j be the smoothing operator defined by:
S 2 j f ( x) = f 2 j ( x) , 2 j ( x) =
1 x ( j ) j 2 2
4.16
If the scale 2 j is larger, the more details of f(x) are removed by S 2 j . For any scale 2J>1 equation (4.15) yields:
( ) (2 J ) = (2 j ) (2 j )
j =1
4.17
From this equation it is derived that the higher frequencies of S1 f ( x) , which have disappeared in S 2 J f ( x) can be recovered from the dyadic wavelet transform [W2 j f ]1 j J between the scale 2 j and 2J. In numerical applications, the input signal is measured at a finite resolution and thus the wavelet transform cannot be computed at any arbitrary scale. The original signal can be
- 24 -
Chapter 4
considered as a discrete sequence D = [d n ]nZ of finite energy. If two constants C1>0 and
C2>0 exist, such that ( ) satisfies:
R, C1
n =
( ) + 2n C 2
4.18
From Equation 4.18 the periodic signal D can be considered as the sampling of a smoothed version of f ( x) L2 () at the finest scale 1:
n Z , S1 f (n) = d n
4.19
The input signal can thus be rewritten as D = [ S1 f (n)]nZ . Mallat and Zhong [62] have proposed the redundant discrete wavelet transform (RDWT), utilizing a particular class of wavelets, to compute a uniform sampling of the wavelet transform of f(x) at any scale larger than 1.
d Let us denote S 2 j f = S 2 J f (n + w) nZ and W2dj f = [W2 j f ( n + w)]nZ where w is a
sampling shift that depends on (x). For any coarse scale 2J the sequence of discrete signals:
{S
d 2J
f ,[W2dj f ]1 j J
4.20
is called the discrete dyadic wavelet transform of D = [ S1 f (n)]nZ . The coefficient signal
[W2dj f ] provide the details of the input signal at scales 1 j J and the coarse signal
d S 2 J f provides the approximation of the input signal at the coarse scale 2J. The filter bank
algorithm for computing 1-D RDWT is presented in Figure 4.1. The left size shows the decomposition into wavelet coefficients and the right the reconstruction from wavelet coefficients.
G() W1 f(x) ()
f(x)=S1f(x)
G(2)
W2 f(x)
(2)
f(x)
() S2 f(x) (2) S3 f(x)
W3 f(x) G(4) (4)

+
*()
*(2)
(4)
*(4)
Figure 4.1 One-dimensional three level redundant discrete dyadic wavelet transform.
- 25 -
The algorithm does not involve sub-sampling and is similar to the algorithme trous (algorithm with holes) [63], which also does not involve sub-sampling. Filters H(), G() and K(), are 2 periodic and satisfy the perfect reconstruction condition:
H ( ) + G ( ) K ( ) = 1
4.21
At dyadic scale j, the discrete filters H j, G j, K j, are obtained by inserting 22-1 zeros between each of the coefficients of the corresponding filters at scale 21. The scaling (smoothing) function (x) defined in equation (4.15) can be derived from H() using the equation:
( ) = e iw (2 )
=1
4.22
where the sampling shift parameter w is adjusted so that (x) is symmetrical with respect to 0. Equation (4.22) implies that
(2 ) = e iw H ( ) ( )
A wavelet (x) is defined, whose Fourier transform ( ) is given from the equation:
4.23
(2 ) = e iw G ( ) ( )
The reconstruction wavelet (x) is derived from the equation:
4.24
(2 ) = eiw ( ) ( )
4.25
A class of filters that satisfy equation (4.21) has been provided by Mallat and Zhong [62]. H() was chosen to obtain a wavelet (x) which is anti-symmetrical, as regular as possible and has a compact support. The wavelet (x) is also equal to the first order derivative (gradient) of a smoothing function (x):
( x) =
Filters H(), G() and K() are given by:
d ( x) dx
4.26
H ( ) = ei / 2 (cos( / 2)) 2 +1 G ( ) = 4ei / 2 sin( / 2)
4.27
4.28
- 26 -
Chapter 4
K ( ) =
1 H ( ) G ( )
4.29
All filters have compact support and are either symmetrical or anti-symmetrical. From equations (4.22 & 4.24) the corresponding scaling and wavelet functions can be derived as:
( ) =
sin( / 2) /2
2 n +1
4.30
( ) = i
sin( / 4) /4
2n + 2
4.31
The Fourier transform of the smoothing function (x) is therefore:
( ) =
sin( / 4) /4
2n+ 2
4.32
We have chosen 2n+1=3. In order to have a wavelet anti-symmetrical with respect to 0 and (x) symmetrical with respect to 0, the shifting constant w is equal to . From equations (4.31 & 4.32) it can be proven that (x) is a quadratic spline wavelet with compact support, while (x) is a Gaussian-like cubic spline whose integral is equal to 1. These functions are depicted in Figure 4.2.
(a) (b) Figure 4.2 (a) A cubic spline function and (b) a wavelet that is a quadratic spline of compact support. Mallat [64] have extended this class of filters and derived wavelet functions (x) that are equal to the second order derivative (Laplacian) of a smoothing function.
( x) =
d 2 ( x) dx 2
4.33
- 27 -
4.4 Redundant Dyadic Wavelet Transform (2-D)

1 Let 2 j ( x, y ) =
1 x y 1 1 x y 2 ( j , j ) and 2 j ( x, y ) = j 2 ( j , j ) . The dyadic wavelet j 2 2 2 2 2 2
transform of a 2-D function f ( x, y ) L2 ( R 2 ) has two components defined by:
Wf = { 21j f ( x, y ),W22j f ( x, y )}jZ W

where the Fourier transforms of W21j f ( x, y ) and W22j f ( x, y ) are given respectively by:
4.34
W21j f ( x , y ) = f ( x , y ) 1 (2 j x ,2 j y ) W22j f ( x , y ) = f ( x , y ) 2 (2 j x ,2 j y )
4.35
and 1 (2 j x ,2 j y ) , 2 (2 j x ,2 j y ) are the Fourier transforms of the partial wavelet

functions 1 ( x, y ), 2 ( x, y ) respectively. The function f(x,y) can be reconstructed from its dyadic wavelet function with:
f ( x, y ) =
j =
(W
+
1 2j
1 2 2 j ( x, y ) + W22j 2 j ( x, y )
4.36
1 2 where the partial reconstruction wavelets 2 j ( x, y ) and 2 j ( x, y ) satisfy the equation:
j =
( (2 ,2
+
1 j j x
) 1 ( 2 j x , 2 j y ) + 2 ( 2 j x ,2 j y ) 2 ( 2 j x ,2 j y ) = 1
4.37
The scaling function is an aggregation of (2 j ) and (2 j ) at scale 2j greater than 1:

( x , y ) = ( 1 (2 j x ,2 j y ) 1 (2 j x ,2 j y ) + 2 (2 j x ,2 j y ) 2 (2 j x ,2 j y ) ) = 1
2 + j =1
4.38
The approximation of f(x) at scale 2 j is defined as a convolution with a dilated scaling function 2 j ( x) :
S 2 j f ( x , y ) = f 2 j ( x , y )
4.39
In practice, the input image is measured in finite resolution and thus the wavelet transform cannot be computed at any arbitrary fine scale. Similarly to the 1-D case, if the discrete signal has a finite number N x pixels, it is symmetrically extended to 2N x 2 pixels. The discrete
- 28 -
Chapter 4
periodic image D can then be considered as the sampling of a smoothed version of a function f(x,y) at the finest scale 1:
n, m Z 2 , S1 f (n, m) = d n , m
4.40
The 2-D RDWT of Mallat and Zhong [62], computes the uniform sampling of the wavelet transform of f(x,y) at any larger scale than 1, using a particular class of wavelets. For any coarse scale 2 j , the RDWT is defined as a sequence of discrete coefficients:
{S
where
d 2j
f ,W21,j d f1 j J ,W22j, d f1 j J
4.41
W22j, d f = (W22j f (n + w, m + w) ),
d S 2 j f = (S 2 j f (n + w, m + w) )
W21j, d f = (W21j f (n + w, m + w) )
4.42
and w is a sampling shift that depends on the choice of wavelets. The coefficient images
W21,j d f and W22j,d f , provide the details of the input image at scales 1 j J and the coarse
image S 2 j f provides the approximation of the input image at the coarse scale 2J. The filter bank algorithm for computing the 2-D RDWT is depicted in Figure 4.3.
W11 f
G(x)
(x)L(y)
G(y)
W12 f
(y)L(x)
f(x,y)=S1 f
G(2x)
W21 f
(2x)L(2y)
f(x,y)
G(2y)
W2 f
(2y)L(2x)
W31 f (x)(y) G(4x) W32 f G(4y) (4y)L(4x)

+
(4x)L(4y)
*(x)*(y)
S2 f (2x)(2y) (2)
*(2x)*(2y) *(2)
S3 f (4x)(4y) *(4x)*(4y)
Figure 4.3 Two dimensional three level redundant dyadic wavelet transform.
- 29 -
The left side depicts the decomposition into wavelet coefficients, and the right the reconstruction from wavelet coefficients. Filters H(), G(), K() and L() are 2 periodic and satisfy the perfect reconstruction condition:
H ( ) + G ( ) K ( ) = 1 1 + H ( ) L( ) = 2
2
4.43
At dyadic scale j, the discrete filters H j, G j, K j, L j and are obtained by inserting 22-1 zeros between each of the coefficients of the corresponding filters at scale 21. The 2-D wavelets are given by expressions analogous to that of the 1-D case. The class of spline functions described in the previous section is used. The wavelets are partial derivatives of 2-D smoothing functions:
1 ( x, y ) =
2 ( x, y ) 1 ( x, y ) and 2 ( x, y ) = x x
4.44
where the functions 1 ( x, y ) and 2 ( x, y ) are numerically closed to a single smoothing function ( x, y ) . The redundant wavelet representation presented in this section has several advantages with respect to orthogonal wavelet representation. The sub-band images are shift invariant [65], do not present aliasing and have the same number of pixels as the original, thus the representation is highly redundant. Moreover, smooth symmetrical or anti-symmetrical wavelet functions can be used, allowing the alleviation of any boundary effects via mirror extension of the signal. Due to these advantages it has been extensively used for denoising, segmenting and pattern recognition applications. An example of the RDWT employed in the Circle image is depicted in Figure 4.4.
- 30 -
Chapter 4
S1 f
S2 J f
W211 f
W22 f 1
W212 f
W222 f
W213 f
W223 f
W214 f
W224 f
Figure 4.4 Redundant dyadic wavelet transform of the Circle image.
- 31 -
4.5 Multiscale Edge Representation (MER) Gradient Vector: Let ( x, y ) be a symmetrical smoothing function approximating the Gaussian. As already explained in details in the precious section the RDWT of a function
f ( x, y ) L2 ( R 2 ) is the set of functions (W21j f ( x, y ),W22j f ( x, y )) , which are respectively

the partial derivative along the horizontal and vertical orientation of the convolution of f ( x, y ) by the smoothing function ( x, y ) , dilated along a dyadic sequence (2 j ) j and is given by [62]:
1 ( f 2 j )( x, y ) W 1j f ( x, y ) 2 j j x 2 = f 2j = 2 W 2j f ( x, y ) ( f 2 j )(x, y ) 2 2 y
where 1 ( x, y ) and 2 ( x, y ) are the analyzing wavelets and j the dyadic scale.
4.45
Equation 4.45 indicates that the above set of functions can be viewed as the two components of the gradient vector of f ( x, y ) smoothed by ( x, y ) at each scale 2 j :
W 1j f ( x, y ) 2 = 2 j ( f * j )( x, y ) 2 2 W j f ( x, y ) 2
The modulus angle representation of the gradient vector is given by:
4.46
M 2 j f ( x, y ) = W 1 2 j f ( x, y ) + W 2 2 j f ( x , y )
and
4.47
A2 j f ( x, y ) = arctan
W21j f ( x, y ) W22j f ( x, y )
4.48
Modulus Maxima: The sharper variation points of f * 2 j ( , y ) at a scale 2 j correspond to edges, are obtained from the local maxima of M 2 j f ( x, y ) along the gradient direction given by A2 j f ( x, y ) . The gradient direction values A2 j f ( x, y ) were constricted to the following values [66]:
5 3 7 3 0, , , , , , , 4 2 4 4 2 4
At each scale 2 j of the Dyadic Wavelet Transform, the point (x,y) where the modulus of the
- 32 -
Chapter 4
gradient vector M 2 j f ( x, y ) is maximum compared with its neighbors locally positioned in the direction specified by A2 j f ( x, y ) , is called modulus maxima (Figure 4.5). Each time such a point is detected, the position of the resultant local maxima is recorded as well together with the values of the modulus M 2 j f ( x, y ) and angle A2 j f ( x, y ) at the corresponding locations [67].
M 2 j f ( x, y )
A2 j f ( x, y )
Local Maxima
21
22
23
24
Figure 4.5 The gradient magnitude, the gradient directions and the local maxima of the Circle image.
- 33 -
In order to construct curves in the image plane individual wavelet modulus maxima are connected in a certain scale if they are neighbors and the vector that joins these two points is perpendicular to the angle direction at these points.
- 34 -
CHAPTER 5
Singularity Detection
Summary This chapter discusses at first the theory of singularities along with their classifications according to Lipschitz criteria. In the second part of the chapter a thorough review is given regarding the correlation between singularity detection and wavelet transform modulus maxima accompanied with suitable depictions. 5.1 Singularity and Mathematical Description Singularities are points of sharp variations, which often indicate the most important features of the estimated functions. A specific singularity called change-point, which describes sudden localized change, is of important interest in statistics and has been studied over years. Wavelet analysis is an ideal tool to study localized changes such as discontinuities and sharp cusps in a noisy function. The magnitudes and positions of singularities can be observed from the empirical wavelet coefficients. The modulus maxima of discrete wavelet transform are directly related to the Lipschitz regularity, a mathematical measurement of singularity [68]. The local regularity of a signal is measured with the Lipschitz criteria. Lipschitz exponents (also called Holder exponents) provide uniform regularity measurements over time intervals, but also at any point . If f has a singularity at , then the Lipschitz exponent at characterizes this singular behavior. Pointwise Lipschitz Regularity: A function f is pointwise Lipschitz a0 at v, if there exist K>0 and a Taylor polynomial pv of degree m = a such that:
t R, f (t ) p (t ) K t
5.1
A function f is uniformly Lipschitz over [a,b] if it satisfies condition (5.1) for all v [a, b] , with a constant K that is independent of . If f is uniformly Lipschitz >m in the neighborhood of , then one can verify that f is necessarily m times continuously differentiable in this neighborhood. If 0 a < 1 then p (t ) = f ( ) and the condition (5.1) becomes
t R, f (t ) f ( ) K t
5.2
- 35 -
A function that is bounded but discontinuous at is Lipschitz 0 at . If the Lipschitz regularity is <1 at , then f is not differentiable at and characterizes the singularity type. 5.2 Wavelet Transform and Singularity A remarkable property of the wavelet transform is its ability to characterize the local regularity of functions. If the wavelet has n vanishing moments then we show that the wavelet transform can be interpreted as a multiscale differential operator of order n. This yields a first relation between the differentiability of f and its wavelet transform decay at fine scales. The following proposition proves that a wavelet with n vanishing moments can be written as the nth order derivative of a function [69]. The resulting wavelet transform is a multiscale differential operator. We suppose that has a fast decay which means that for any decay exponent m N there is a Cm such that:
t R, (t )
Cm 1+ t
m
5.3
Theorem 1: A wavelet with a fast decay has n vanishing moments if and only if there is a function with a fast decay such that:
(t ) = (1) n
d n (t ) dt n
5.4
As a consequence the following equation is obtained:
Wf (u, s ) = s n
with: s (t ) = ( ) .
dn ( f s )(u ) , du n
5.5
1 s
t s
Moreover, has no more vanishing moments if and only if
(t )dt 0 . The decay of the
wavelet transform amplitude across scales is related to the uniform and pointwise Lipschitz regularity of the signal. Measuring this asymptotic decay is equivalent to zooming into signal structures with a scale that goes to zero. Mallat [68,70] also proved that the uniform Lipschitz regularity of f on an interval is related to the amplitude of its wavelet transform at fine scales. Theorem 2: If f L2 R is uniform Lipschitz n over [a, b] then there exists A>0 such that:
(u, s ) [a, b] R, Wf (u , s ) As a +1 / 2
5.6
- 36 -
Chapter 5
Conversely, if Wf (u , s ) satisfies condition (5.6) and if <n is not an integer then f is uniformly Lipschitz on [a + , b ] for any >0. Condition (5.6) signifies that the wavelet transform Wf (u, s ) decays like s a +1/ 2 over intervals where f is uniformly Lipschitz , when the scale s goes to 0. Theorem 3: A necessary and sufficient condition on the wavelet transform for estimating the Lipschitz regularity of f at a point v is [69]:
u (u, s ) R R + , Wf (u, s ) As a +1 / 2 1 + s
5.7
Conversely, if <n is not an integer and there exists a constant A, and < such that:
(u, s ) R R , Wf (u , s ) As
then f is Lipschitz at .
a +1 / 2
1 + u s
a'
5.8
To interpret more easily the necessary and the sufficient conditions (5.7 & 5.8), it is supposed that has compact support equal to [-C , C]. The cone of influence of in the scale-space plane is the set of points (u,s) such that is included in the support of u , s (t ) = ( Since the support of ( by:
1 s
t u ). s
t u ) is equal to [u-Cs , u+Cs], the cone of influence of is defined s
u Cs
5.9
Theorems 2 and 3 prove that the local Lipschitz regularity of f at depends on the decay at fine scales of Wf (u , s ) in the neighborhood of . The decay of Wf (u , s ) can be controlled from its local maxima values. We use the term modulus maximum to describe any point
(uo , so ) such that Wf (u , so ) is locally maximum at u=uo. This implies that: Wf (uo , so ) = 0. u
5.10
This local maximum should be a strict local maximum in either the right or the left neighborhood of uo, to avoid having any local maxima when Wf (u , so ) is constant. We call maxima line any connected curve s(u) in the scale-space plane (u,s) along which all point are
- 37 -
local maxima. In Figure (5.1) high amplitude wavelet coefficients are in the cone of influence of each singularity.
(a)
(b)
(c)
(d)
(e)
Figure 5.1 (a),(b),(c),(d) Wavelet transform of f(t) calculated with quadratic spline wavelet =- where is the cubic spline smoothing function approximating the Gaussian. The red stars are the local maxima of the wavelet coefficients along each scale. The scale increases from top to bottom. (e) Maxima line in the scale-space plane inside the cone of influence. 5.3 Singularity Detection (1-D) The singularities are detected by finding the abscissa where the wavelet modulus maxima converge at fine scales. If the wavelet has only one vanishing moment, wavelet modulus maxima are the maxima of the first order derivative of f smoothed by s . Hwang and Mallat [71] proved that if Wf (u , s ) has no modulus maxima at fine scales then f is locally regular.
- 38 -
Chapter 5
Theorem 4: Suppose that is Cn with a compact support, and =(-1)n(n) with
(t )dt 0. Let f L1[a, b] . If there exists so>0 such that Wf (u , s ) has no local
maximum for u [a, b] and s<so then f is uniformly Lipschitz n on [a + , b ] for any >0. This Theorem implies that f can be singular (not Lipschitz 1) at a point only if there is a sequence of wavelet maxima points (un , sn ) nN that converges towards at fine scales:
n +
lim un = and lim sn = 0

n +
5.11
These modulus maxima may or may not be along the same maxima line. The result guarantees that all singularities are detected by following the wavelet transform modulus maxima at fine scales. Figure (5.2) gives an example where all singularities are located by following the maxima lines. For a =(-1)n(n) where is a Gaussian the modulus maxima of Wf (u, s) of any f L2 R belong to connected curves that are never interrupted when the scale decreases. The decay of
Wf (u, s) in the neighborhood of is controlled by the decay of modulus maxima in the cone
u Cs [70,71]. According to Theorem 1 a function f is uniformly Lipschitz in the
neighborhood of if and only if there exists A>0 such that each modulus maximum (u,s) in the cone u Cs satisfies:
Wf (u , s ) As a +1 / 2
which is equivalent to:
5.12
1 log 2 Wf (u, s ) log 2 A + (a + ) log 2 ( s ) 2
5.13
Therefore, the Lipschitz regularity at is the maximum slope of log 2 Wf (u , s ) as a function of log2s along the maxima lines converging to (Figure 5.3).
- 39 -
Figure 5.2 Wavelet transform of f(t) calculated with quadratic spline wavelet =- where is the cubic spline smoothing function approximating the Gaussian. The red stars are the local maxima of the wavelet coefficients along each scale. The scale increases from top to bottom.
- 40 -
Chapter 5
Figure 5.3 The full line gives the decay of log 2 Wf (u , s ) from Figure (5.2) as a function of log2s along the maxima line that converges to the abscissa t=11. The dashed line gives
log 2 Wf (u , s ) along the maxima line that converges at t=168.

The last graph (Figure 5.3) depicts the maxima lines in the scale-space plane towards zero s for singularity detection. 5.4 Singularity Detection (2-D) Theorems 2 and 3 constitute an efficient proof that the wavelet transform is particularly well adapted to estimate the local regularity of functions. The decay of the two-dimensional wavelet transform depends on the regularity of f. We restrict the analysis to Lipschitz exponents 0 a < 1 . A function f is said to be Lipschitz at (xo,yo) if there exists K>0 such that for all ( x, y ) R 2
f ( x, y ) f ( xo , yo ) K ( x xo + y yo ) a / 2
5.14
If there exists K>0 such that condition (5.14) is satisfied for any ( xo , yo ) then f is uniformly Lipschitz over . he Lipschitz regularity of a function f(x,y) is related to the asymptotic decay at fine scales of wavelet transform along horizontal and vertical directions |W1f(u,v,2j)| and |W2f(u,v,2j)| in the corresponding neighborhood. This decay is controlled by its local maximum value Mf(u,v,2j). Like in Theorem 2 one can prove that f is uniformly Lipschitz inside a bounded domain of R2 is and only if there exists A>0 such that for all (u,v) inside this domain and all scales 2 j:
Mf (u , v,2 j ) A2
( a +1)
5.15
- 41 -
Suppose that the image has an isolated edge curve along which f has Lipschitz regularity . The value of Mf (u , v,2 j ) in a two dimensional neighborhood of the edge curve can be bounded by the wavelet modulus values along the edge curve. The Lipschitz regularity of the edge is measured with condition (5.15) by estimating the decay exponent of the modulus amplitude across scales.
- 42 -
CHAPTER 6
Pattern Recognition
Summary This chapter discusses pattern recognition theory along with the implementation of various classifiers employed in this thesis. In the first part of the chapter a detailed explanation is given regarding the feature generation employed as input in the classification system. In the second part the classification algorithms implemented in this thesis are presented. 6.1 Pattern Recognition Theory Pattern recognition classifies objects into a number of categories or classes. This classification procedure is a two-folded process which at first generates a description of the object (i.e., the pattern) and then classifies it based on that description (i.e., the recognition). The object description involves feature generation techniques in order to produce certain attributes, whereas the classification task associates a predefined label with the object based on those attributes. The main goal of each pattern recognition system is to determine the most accurate label for each object analyzed. The pattern recognition procedure is accomplished with a training phase that configures the algorithms used in both the description and classification tasks based on a number of objects whose labels are known as the training set. During the training phase, a training set is analyzed to determine the attributes used to label the objects with the highest possible accuracy. Following the training phase, the classification takes place to an unlabeled object based on the attributes of that object. High coincidence between the known labels and those assigned by the pattern recognition system denotes high classification accuracy. The methodology for description and classification with known attributes is called supervised learning. In cases where the training set is not available the procedure employed is termed as unsupervised learning. A common step in the pattern recognition procedure that usually precedes the hierarchy presented before is the isolation of the object to be recognized from the surrounding environment. This step is prerequisite in order for the feature extraction to take place.
- 43 -
Pattern Recognition
6.2 Object Isolation Thyroid nodules play a key role in US thyroid imaging diagnostic procedure. Thus it is very important to extract them from the noisy environment for further processing. This isolation procedure is termed as segmentation and a detailed review regarding the segmentation approach employed in this thesis is presented in chapter 8. 6.3 Feature Generation The feature generation stage is the process of computing features from an image or from a region within this image to be used in the classification task. The generated features must encode this kind of information in order to enhance the classification accuracy. In US thyroid nodule image analysis, the computed features should exhibit high separability attributes between high and low risk cases. In the current project three categories of features were employed to assess the malignancy risk factor of thyroid nodules: (a) Textural features, (b) Shape and Geometrical features and (c) Wavelet Local Maxima features. 6.4 Textural Features The textural information extracted from the thyroid nodule can be employed as criteria in assessing the risk factor of malignancy and can be of value in patient management, i.e. whether to recommend or not surgical operation. Textural features are divided in two main categories: First and second order statistical features. 6.4.1 First Order Statistical Features
The 1st order statistical features determine the distribution of grey level values within the thyroid nodule [72]. The most important features are: 1. Mean value(m)
m=
g (i, j)
i j
6.1
where g(i,j) is the grey level value in the position (i,j) and N the number of pixels. 2. Standard Deviation (std)
std =
( g (i, j) m)
i j
6.2
The standard deviation represents the variation of grey level value in comparison with the mean value m.
- 44 -
Chapter 6
3.
Skewness (sk)
1 sk = N
( g (i, j) m)
i j
std 3
6.3
The skewness describes the degree of histogram asymmetry around the mean. 4. Kurtosis (k)
k=
( g (i, j) m) 1
i j
std 4
6.4
Kurtosis describes the sharpness of the grey level histogram. 6.4.2 Second Order Statistical Features
Features resulting from the 2nd order statistics provide information regarding the spatial relationship between various grey level values within thyroid nodule. These textural features were derived from the co-occurrence and run-length matrices [73,74]. 6.4.2.1 Co-Occurrence Matrix Features In the co-occurrence matrices, grey level pixels are considered in pairs with a relative distance (d) and orientation among them [73]. The orientation is quantized in four directions (00, 450, 900, 1350). An example of co-occurrence matrix computation is depicted in Figure 6.1.
0 1 3 3
0 1 2 2
2 0 3 2
2 0 3 2
(a)
0 1 2 3 0 (0,0) (1,0) (2,1) (3,1) Grey tones 1 2 (0,1) (0,2) (1,1) (1,2) (2,2) (2,3) (3,2) (3,3) 3 (0,3) (1,3) (2,4) (3,4) Co-Occurrence =1350 Grey tones
(b)
Co-Occurrence =0
0
Co-Occurrence =45
0
Co-Occurrence =90
0
4 1 1 0
1 2 0 0
1 0 6 3
0 0 3 2
0 1 2 1
1 0 1 1
2 1 0 3
1 1 3 0
0 2 2 2
2 0 1 1
2 1 2 2
2 1 2 2
2 1 1 1
1 0 1 1
1 1 2 2
1 1 2 0
(c) (d) (e) (f) Figure 6.1 (a) Image array with four grey levels. (b) General form of any grey-tone cooccurrence matrix. (c)-(f) Computation of all four co-occurrence matrices with distance d=1.
- 45 -
Pattern Recognition
Let an image array I(m,n) with four grey levels (Ng=4) ranging from 0 to 3. Figure 6.1(b) depicts the general form of any grey tone co-occurrence matrix. For example, the element in the (0,0) position of a distance d=1 is the total number of times two grey tones of value 0 and 0 occurred along the four quantized directions adjacent to each other. Figures 6.1(c) - 6.1(f) demonstrate all possible grey tones combinations with distance set to 1 along all four directions. The textural features that can be extracted from the co-occurrence matrices are presented below: 1. Angular Second Moment (ASM)
ASM =
N g 1 N g 1
( p(i, j))
i =0 j =0
6.5
where Ng is the number of grey levels in the image, i,j=1,,Ng, and p(i,j) is the co-occurrence matrix. The ASM feature describes the degree of homogeneity within the thyroid nodule and takes small values in regions with no variability. 2. Contrast (CON)
CON = N g 1 N g 1 n2 ( p(i, j )) 2 , i j = n i =0 j =0

n =0
N g 1
6.6
Con feature describe the amount of local variations present within the nodule and takes high values in regions great variability. The factor n2 enhances any possible existence of local variations. 3. Inverse Different Moment (IDM)
IDM =
N g 1 N g 1
1 + (i j)
p (i, j )
i =0 j =0
6.7
IDM feature takes high values for low-contrast images due to the inverse (i-j)2 dependence. 4. Entropy(ENT)
ENT =
p(i, j) log( p(i, j))

i =0 j =0
N g 1 N g 1
6.8
ENT feature describes the degree of randomness and takes low values for smooth images.
5.
Correlation(COR)
COR =
(ij) p(i, j) m m
x i =0 j =0
N g 1 N g 1
x y
6.9
where mx,my,x,y are the mean values and standard deviations of px and py (equations ) respectively. COR feature describes the spatial dependencies of the grey tones within the thyroid nodule.
N rows
p x (i ) =
p(i, j)
i =1
6.10
- 46 -
Chapter 6
N columns
p y ( j) =
p(i, j)
j =1
6.11
Other features derived from the co-occurrence matrices are: 6. Sum of Squares (SSQ)
SSQ =
(1 m)
i =0 j =0
N g 1 N g 1
p (i, j )
6.12
7. where px+y is
Sum Average (SAV)
SAV =
ip
i =2
2Ng
x+ y
(i )
6.13
p x + y (k ) =
p(i, j), i + j = k , k = 2,3,...,2 N

i =i j =1
N g 1 N g 1
6.14
8.
Sum Entropy (SENT)
SENT =
p
i=2 i=2
2Ng
x+ y
(i ) log p x + y (i )
6.15
9.
Sum Variance (SVAR)
SVAR =
(i SENT )
2Ng
p x + y (i )
6.16
10.
Difference Variance (DVAR)
DVAR =
(i SAV )
i=2 x y i =0
2Ng
p x y (i )
6.17
11. where px-y is
Different Entropy (DENT)
DENT =
N g 1
(i ) log p x y (i )
6.18
p x y (k ) =
p(i, j), i j = k , k = 2,3,...,N

i =i j =1
Ng
Ng
6.19
12.
Information Measure of Correlation (ICM1)
ICM 1 =
HXY HXY 1 max{HX , HY }
6.20
13. where
Information Measure of Correlation (ICM2)
ICM 2 = (1 exp[2.0( HXY 2 HXY )])1 / 2
6.21
HXY =
p(i, j) log( p(i, j))

i =0 j =0
N g 1 N g 1
6.22
- 47 -
Pattern Recognition
HXY 1 =
p(i, j) log( p (i) p ( j))

x y i =0 j =0
N g 1 N g 1
6.23
HXY 2 = p x (i ) p y ( j ) log( p x (i) p y ( j ) )

i =0 j = 0
N g 1N g 1
6.24
6.4.2.2 Run-Length Matrix Features The run length matrix encodes textural information based on the number each grey level appears in the image by itself [74]. Let an image array I(m,n) with four grey levels (Ng=4) ranging from 0 to 3( Figure 6.2(a) ). For each direction (00, 450, 900, 1350) the corresponding run length matrix is computed (Figure 6.2(b) - Figure 6.2(e)).
0 1 3 3
0 1 2 2
2 0 3 2
2 0 3 2
(a)
Run Length Run Length 3 0 0 1 0 4 0 0 0 0
00 Grey Level
0 1 2 3
1 0 0 1 2
2 2 1 1 1
450 Grey Level

0 1 2 3
1 4 2 6 4
2 0 0 0 0
3 0 0 0 0
4 0 0 0 0
(b)
Run Length
(c)
Run Length 3 0 0 0 0 4 0 0 0 0 2 0 0 1 1
900 Grey Level

0 1 2 3
1 4 2 4 2
1350 Grey Level

0 1 2 3
1 2 2 4 4
2 1 0 1 0
3 0 0 0 0
4 0 0 0 0
(d) (e) Figure 6.2 (a) Image array with four grey levels. (b)-(e) Computation of all four run length matrices for texture analysis. Each matrix element specifies the number of times that the picture contains a run of length (03) in the given direction. The first element of the first row of the matrix is the number of times grey level 0 appears by itself, the second element is the number of times it appears in pairs and so on. The textural features that can be extracted from the run length matrices are presented below:
- 48 -
Chapter 6
1.
Short Run Emphasis(SRE)
SRE =
i =1 Ng Nr
N g Nr
r (i, j ) j2 j =1
r (i, j )
i =1 j =1
6.25
where r(i,j) is the run length matrix, Ng is the number of gray values in the image, Nr is the largest possible run, i=1,,Ng, j=1,,Nr. SRE tends to emphasize short runs due to the division with j2. It takes large values for nodules with high variability.
N g Nr
2.
Long Run Emphasis(LRE)
LRE =
j r (i, j )
2 i =1 j =1 N g Nr
r (i, j )
i =1 j =1
6.26
LRE tends to emphasize long runs. It takes large values for nodules with low variability.
3.
Grey Level Non Uniformity (GLNU)
Nr r (i, j ) i =1 j =1 GLNU = N N g r r (i, j )

Ng i =1 j =1
6.27
GLNU is proportional with large run length values that are uniformly distributed. It takes large values for nodules with high variability.
4.
Run Length Non Uniformity (RLNU)
Ng r (i, j ) j =1 i =1 RLNU = N g N r r (i, j )

Nr i =1 j =1
6.28
RLNU encodes long runs that are non-uniformly distributed. It takes small values for nodules with high variability.
N g Nr
5.
Run Percentage (RP)
RP =
r (i, j )
i =1 j =1
6.29
where P is the total possible number of runs in the nucleus image. This feature takes its lowest value in nodules with low variability.
- 49 -
Pattern Recognition
6.4.3 Shape and Geometrical Features Besides textural sonographic criteria of the thyroid nodule, various shape and geometrical features such as irregular margins and circular boundaries are employed in the decision making procedure. In the present thesis several geometrical and shape based features were computed in order to quantify all the observations made by the physicians throughout the thyroid nodule literature [72, 75, 76]. These features were:
1.
Average Radius
Ravg =
((x(i) Xo) +( y(i) Yo) )

N 2 2 i =1
6.30
Ravg is computed by averaging the Euclidean distance from the nodules centroid (Xo,Yo) to each of the boundary points (x,y) (Figure 6.3).
Figure 6.3 Line segments used to compute radius.
2.
Radius Standard Deviation
RST =
(R(i) Ravg )
i =1
6.31
RST encodes information regarding the irregularity of the nodules borderline. It takes high values in cases where the boundary is not circular. 3. Perimeter
P=N
6.32
P is measured by summing the number or pixels on the border of the nodule. 4. Area
Area is computed by counting the number of pixels on the interior of the nodules boundary. 5. Radial Entropy
RE = pk log( pk )
k =1
100
6.33
- 50 -
Chapter 6
where pk is the probability that the radius distance will be between d(i) and d(i) + 0.01d(i). The parameter pk was computed by the radial histogram. The amplitude range between the minimum and maximum values of the radius distance measure was divided into 100 bins and the number of times the radius distance plot passed through each bin was summed. Afterwards the sums were divided by the total number of samples. 6. Circularity or Roundness
Circ =
P2 Area
6.34
Circ is minimized for a circle and is proportional with the nodules shape irregularity.
7.
Smoothness
SM =
po int s
Ri P
Ri 1 + Ri +1 2
6.35
where the Ri, Ri-1 and Ri+1 are depicted in Figure 6.4. SM is computed by measuring the difference between the length of a radius and the mean length of the two radiuses surrounding it. It takes small values for nodules with regular borders.
Figure 6.4 Line segments used to compute Smoothness.
8.
Concavity
1 N (YCentroid YCVi ) 2 + ( X Centroid X CVi ) 2 N i =1 Concavity = 1 M (YCentroid YABi ) 2 + ( X Centroid X ABi ) 2 M i =1
6.36
Concavity is computed by measuring the size of any indentations in the thyroid nodule. In fact is the average of the convex hull (CV) distances from the center (Centroid) of the nodule and the distances from the actual nodule boundary (AB) (Figure 6.5). Apparently it takes minimum values for circular or elliptical nodules. 9. Concave Points
- 51 -
Pattern Recognition
Concave points are the number of points in the actual nodules boundary that lies in the concave region (Figure 6.5). The greater the number of concave points the more irregular the nodules border.
Figure 6.5 Convex hull is used to compute concavity and concave points.
10. Average Convex Hull Radius
RCHavg =
((CHx(i) Xo) +(CHy(i) Yo) )

N 2 2 i =1
6.37
RCHavg is computed by averaging the Euclidean distance from the nodules centroid (Xo,Yo) to each of the convex hull boundary points (CHx,CHy).
11.
Symmetry
SYM =
i lefti righti ilefti + righti
6.38
SYM is computed by measuring the relative difference in length between pairs of line segments perpendicular to the major axis of the nodule. The major axis is determined by finding the diameter of the nodule and the line segments were drawn at regular intervals (Figure 6.6). It encodes geometrical information regarding shape variability of the nodule [77,78].
Figure 6.6 Line segments used to compute Symmetry. The lengths of perpendicular segments on the right of the major axis are compared to those on the left.
- 52 -
Chapter 6
12.
Fractal Dimension
Fractal dimension of the nodules boundary is approximated using the box-counting method [72,79,80].The perimeter of the nodule is measured using decreasingly smaller rulers to construct a box that contains the nodule. As the ruler size decrease, increasing the precision of the measurement, the observed perimeter increases. Plotting these values on a log scale and measuring the upward slope gives the approximation of the fractal dimension (Figure 6.7). Fractal dimension is a measure of how complicated is the boundary of the nodule.
Figure 6.7 Fractal dimension estimation. N is the number of covering boxes and s is the number of rules or the size (perimeter) of each box. 6.4.4 Local Maxima Features
A comprehensive study regarding the local maxima features employed in the present thesis is presented in Chapter 10. 6.5 Data Normalization In most cases the features values have different dynamic ranges. In order to overcome this problem all features values are normalized so that they lie within similar dynamic ranges. The normalization technique used in this thesis is made via the mean and variance of the feature values [72,81].
xik =
xik xik
6.39
- 53 -
Pattern Recognition
where xik and xik are the kth feature values before and after the normalization. xik and k are the
mean value and standard deviation of the kth feature. 6.6 Classification Task Given a specific classifier, classification performance is tested employing the leave-one-out method and for all possible combinations (2s, 3s, etc) of the computed features during the feature generation stage. The aim is to determine the optimum combination of features that achieves the highest classification accuracy with the minimum number of features. According the leave-one-out method, the classifier is designed by all but one the training sets of feature vectors. The left-out-feature vector is treated us unknown class and it is classified by the system. The whole procedure is repeated until all feature vectors have been tested. Results are then presented in a two way truth table or confusion matrix [82,83]. The classifiers designed throughout this thesis are presented below. 6.6.1 Minimum Distance Classifiers
Minimum distance classifier: In the minimum distance classifier the pattern classes in the feature space cluster around their respective means. The decision boundary which separates the two patterns is the perpendicular bisector of the line joining both means. Each feature vector is classified whether its positions is on the left or the right of the bisector [83]. The discriminant function of the minimum distance classifier using Euclidean metrics is presented in the following equation:
1 d j ( x ) = x T m j ( m j mT ) 2
where x is the input feature vector, and mj the mean value of class j.
6.40
Least square minimum distance (LSMD) classifier: The LSMD classifier maps via a non-linear transformation the input data set into a decision space where each class is clustered around a preselected point [83]. The classification of a given test point is based on its minimum distance from each pre-selected point. For the LSMD, the discriminant function is given by: gi(x) =
j =1
ij
x j i(d+1)
6.41
where d is the number of features, ij are weight elements and xj are the input vector feature elements.
- 54 -
Chapter 6
6.6.2
Bayesian Classifier
The Bayes decision theory develops a probabilistic approach to pattern recognition, based on the statistical nature of the generated features. The Bayes discriminant function [72,83] for class i and for pattern vector x is given by: gi(x) = lnPi 1 ln|Ci| 1 [(x mi)TCi-1(x mi)] 2 2 is the covariance matrix of class i. 6.6.3 Neural Networks Classifiers 6.42
where Pi is the probability of occurrence of class i, mi is the mean feature vector of class i, and Ci
Artificial Neural Networks are basic input and output models, with the neurones organised into layers. Simple perceptrons consist of a layer of input neurones, coupled with a layer of output neurones, and a single layer of weights between them. The learning process consists of finding the correct values for the weights between the input and output layer. The principle weakness of simple perceptron was that it could only solve problems that were linearly separable. To obtain a bilinear solution more layers of weights are added to the simple perceptron model obtaining the multilayer perceptron network [83,84]. Multilayer Perceptron (MLP) Classifier: In MLP classifier [83,84] (Figure 6.8), each node of a hidden layer or output layer and the output y(j) of node j is related to its input by:
y( j ) =
where S ( j ) =
1 1 + e S ( j )
6.43
i =1
y (i ) w(i, j ) and w(i,j) are connections weights between the previous node i and
the current node j; y(i)w(i,j) is the weighted output of the previous node i, which is used as input to node j; N is number of inputs to node j; and S(j) is the sum of all weighted inputs y(i)w(i,j) of the previous layer to node j.
- 55 -
Pattern Recognition
Figure 6.8 Schematic diagram of the multilayer perceptron neural network employed, with two input features, two classes, two hidden layers and four nodes in each hidden layer. The connection weights w(i,j) between different layer nodes of the MLP are calculated iteratively until they stabilize, by the following equation: w(i,j)n+1 = w(i,j)n + d(j)y(j) + z(w(i,j)n w(i,j)n-1) d(j) is the error between the desired t(j) and actual y(j), and is given by: d(j) = (t(j) y(j))y(j)(1 y(j)) and for a hidden layer node by: d(j) = y(j)(1 y(j)) 6.45 6.44
where (n+1), n, (n-1) correspond to next, present, and previous respectively, , z are constants,
d (k )w( j, k )
k
6.46
where k is associated with all layers nodes to the right of the current node j. Probabilistic Neural Network (PNN) classifier: The PNN is implemented by a feed-forward and one-pass structure (see Figure 6.9) and encapsulate the Bayess decision rule together with the use of Parzen estimators of datas probability distribution function. The discriminant function of a PNN for class j is given by the following equation, as described at [84,85,86]:
g j (x) =
1 (2 )
p/2
Nj
p
e
i =1
Nj
(x x ij )T (x x ij )
2 2
6.47
- 56 -
Chapter 6
where x is the test pattern vector to be classified, xi is the i-th training pattern vector of the j-th class, Nj is the number of patterns in class j, is a smoothing parameter, and p is the number of features employed in the feature vector. The PNN architecture comprises 4 layers (Figure 6.9). The input layer that has a node for each feature of input data. The pattern layer in which, one pattern node corresponds to each training pattern. The summation layer, which receives the outputs from pattern nodes associated with a given class and the output layer which has as many nodes as the input classes. The test pattern x is classified to the class with the larger discriminant function value.
Figure 6.9 Schematic diagram of the probabilistic neural network employed, with two input features and two classes. 6.6.4 Support Vector Machines Classifier
A classifier based on support vector machines (SVM) [84,87,88] is a general classifier that it can be applied to linearly as well to non-linearly separable data, with or without overlap between the classes. In the most general case of overlapped and non-linearly separable data, the problem is (a) to transform the training patterns from the input space to a feature space with higher dimensionality (x Rd a (x) Rh) where the classes become linearly separable, and (b) find two parallel hyperplanes with maximum distance between them and at the same time with minimum number of training points in the area between them (also called the margin). The separating hyperplanes in the transformed feature space are defined by the following equation: w(x) + b = 1 6.48
- 57 -
Pattern Recognition
where +1 is referred to class 1, 1 is referred to class 2, x if the pattern vector, w is the normal vector to the hyperplanes, and b the bias or threshold which describes the distance of the decision hyperplane from the origin (that is equal to b/||w||). The discriminant function is given by: g(x) =sign(w(x) + b) 6.49
The parameters w and b are calculated as follows: Let N training pattern vectors xi Rd, i=1N (where d is the number of features) belonging to two classes identified by the label yi {1, +1}. The conditions for the hyperplanes may take the following mathematical formulation: (i) minimize the number of training pattern vectors that lie between the two hyperplanes, so: yi(w(x i) + b) + i 1 where i 0, i=1N are real non-negative slack-variables. (ii) the distance between the two hyperplanes (which is equal to 2/||w||) must be maximized, so
1 2
6.50
||w|| must be minimized. The above conditions lead to minimizing
1 2
||w||2 + Ci subject to
(6.49), where C is a positive constant that reflects a trade-off between the classification errors and the size of the margin. Introducing Lagrangian multipliers i, i, i=1N, the Lagrangian is given by:
N N N 1 2 w + C i i ( y i (w (x i ) + b) + i 1) i i 2 i =1 i =1 i =1
LP =
6.51
The problem is now to maximize LP subject to i, i 0). These constraints give respectively:
LP LP LP = 0, = 0 and = 0 (with b w i
y
i =1 i N
=0
6.52
w= and
y ( x )
i =1 i i i
6.53
C = i + i (6.51, 6.52, 6.53) in relation with (6.50), we take the dual variables Lagrangian LD:
6.54
The equation (A.7), in combining with i, i 0, results that 0 i C. Substituting equations
- 58 -
Chapter 6
LD =
i
i =1
1 N i j y i y j ( x i ) ( x j ) 2 i , j =1
6.55
By use of a kernel function, that it can replace the inner product (xi)(xj) in the higher dimensional feature space, the dual Lagrangian LD can take the form of (6.55):
LD =
i
i =1
1 N i j y i y j k ( x i , x j ) 2 i , j =1
6.56
A function can be used as a kernel function if it satisfies the following Mercers condition: Any symmetric function k(x,y) in the input space is equivalent to an inner product in the feature space, if
k (x, y) g (x) g (y)dxdy 0 , for any function g(x) for which g
(x)dx <
Using equations (6.48), (6.52), and (6.55), it may be seen that i and i have vanished, so the discriminant function of the SVM classifier may be written as:
g(x) = sign
NS
i =1
y i k ( x i , x) + b
6.57
where Ns is the number of pattern vectors (also called the support vectors) with non-zero is. Combining the equations (6.47), (6.52), and (6.55), the threshold b may be found as:
1 b= NS
y y k (x , x
j =1 j i =1 i i i
NS
6.58
and the coefficients i are obtained by solving the dual problem, which is maximization of LD (equation (6.55)) subject to equation (6.51), with 0 i C. Functions that are commonly used as kernels are: 1. 2. The linear kernel The polynomial kernel k(xi,xj) = xi.xj k(xi,xj) = (xi.xj + )d 6.59 6.60
where d is the degree of the polynomial and an offset parameter, 3. The Gaussian radial basis kernel k(xi,xj) = exp
(xi x J )T (xi x J ) 2 2
6.61
where is the standard deviation. 4. The sigmoidal kernel k(xi,xj) = tanh((xi.xj) + ) 6.62
where the gain and the offset.
- 59 -
Pattern Recognition
5.
The inverse multiquadric kernel
k(x,y) = ((xi xj)T(xi xj) + c2)
1 2
6.63
where c a non-negative real number. 6. The wavelet kernel
K ( x, y ) = j , k ( x) j , k ( y )
j,k
6.64
where j, k is a translated wavelet of resolution j.
- 60 -
CHAPTER 7
Wavelet-Based Speckle Suppression in Ultrasound Images
Summary In this chapter a wavelet-based method for speckle suppression in ultrasound images of the thyroid gland is introduced. The chapter is organized as follows: At first an extensive review of the literature regarding noise reduction in US images is presented. Afterwards, the speckle model adopted by this study is presented followed by the proposed strategy based on the inter-scale wavelet analysis. In the results section the speckle removal efficiency and edge preserving are compared to that of current speckle suppressing methods. Moreover 63 US images of the thyroid gland are subjected to review by 2 experienced observers via questionnaire for qualitative evaluation of the proposed despeckling process. In the last section an extensive discussion regarding the proposed algorithm is given. 7.1 Review of the Literature A variety of speckle reduction techniques have been implemented in the past two decades. Part of them reduce speckle by acquiring the radio frequency (RF) pulse echo signals from the US devices directly after log compression and time gain compensation (TGC) and before scan conversion [89,90]. However, access to raw RF data is somewhat complex and sometimes impossible, especially in modern US scanners, which in turn renders the application of such methods difficult for research purposes. In the early years of computer image processing speckle removal in US images was achieved via simple averaging, median filtering and Wiener filtering [91]. Simple averaging not only failed to eliminate speckle but introduced blurring and edge loss in areas where anatomic boundaries prevail. Median filtering enhances edges and speckle indiscriminately, while Wiener filter manages to remove considerable amounts of speckle but also tends to oversmooth the boundaries of important image features. Various adaptive filters based on local statistics, such as mean and variance, have been implemented for noise reduction not solely in medical imaging but for image denoising in general [92,93,94]. As the computing technology boosted during the 90s, along with the processors speed and power, new and more complicated filters were introduced. They were employed mainly in time domain such as the adaptive-weighted median filter by Loupas et al. [95], the segmentation based L-filtering by Kofidis et al. [96], the adaptive speckle suppression by Karaman et al. [97], the aggressive region growing method by Chen et al. [98], the symmetrical speckle reduction filter by Huang et al. [99] and the diffusion stick model by Hiao et al. [100]. - 61 -
Wavelet-based speckle suppression in ultrasound images
Loupas introduced a new class of non-linear adaptive filters employing some local statistics, such as the ratio of 2 / m where , m are the local variance and mean inside a moving window with pre-specified dimensions. Through these local statistical parameters the adaptive filter was turned into a general low pass filter in homogenous areas and into an enhanced median filter into areas with small structures or boundaries. Karamans method depended on the same statistical principles employed by Loupas regarding the moving window and its local statistics. The filter was transformed into a mean filter or a median depending on an estimated homogeneity criterion. Kofidis segmented the US image to various stationary regions employing a combination of the Learning Vector Quintizer (LVQ) and the maximum likelihood estimator of the original noiseless signal (L2 mean). Subsequently, these subimages are filtered by a set of L-filters. L-filter design process is based on order statistics (i.e. autocorrelation matrix) derived from the previous stage. Chen presented a new region growing filtering method based on a trade off between a trimmed mean filter and a median filter. In order to overcome some of the limitations of the above methods he added an adaptive homogeneity criterion. Through this criterion a homogenous area is differentiated from a heterogeneous area, thus altering the filter applied to that area to mean or median respectively. Huang divided his filtering strategy based on the slope facet model in two stages: firstly he introduced two criteria in the region growing process to approximate the largest despeckling window within an 11x11 matrix. The first is the widely used variance to mean criterion and the second is the gradient criterion. In the second stage after the major removal of speckle he used only the gradient criterion for the final noise elimination. In both stages the filter acts generally as common mean filter. Xiao exhibits an interesting oriented filter with 24 asymmetrical diffusion sticks inside a symmetric moving matrix. Through a variation function applied in every stick, the algorithm smoothes the sticks with high homogeneity and penalizes smoothing within heterogeneous regions. The smoothing function comprised the weighted sum of averages along each stick. For optimization of the results the whole filtering process is done iteratively. The empirical choice of some parameters such as window size, weight calculation, homogeneity criterion or various thresholds employed of the above mentioned methods degraded their generalization ability thus made them US machine and anatomical region depended. In the past decade a new approach in US images denoising emerged based on the wavelet transform. Some of the wavelet based proposed methods for US image despeckling, are the homomorphic wavelet shrinkage by Zong et al. [101], the multiscale nonlinear processing method by Hao et al. [102], and the Bayesian wavelet method by Achim et al. [103]. Zong applied Mallats Dyadic Wavelet transform [62] on a US image, which is logarithmically transformed due to the multiplicative nature of speckle. In the resulted decompositions he applied a combination of soft and hard thresholding, introduced by Donoho [104], at fines and - 62 -
Chapter 7
middle scales respectively in order to eliminate the presence of speckle. Besides wavelet coefficients shrinkage, Zong achieved boundary enhancement by means of an adaptive gain operator and some predefined thresholds. Hao presented a combination of Loupas method and wavelet transform shrinkage. Initially, he divided the image via the adaptive-weighted median filter in two parts that approximate signal and noise. These two parts are decomposed through the wavelet transform and a modification of Donohos soft thresholding is used to remove speckle. The final denoised image is the sum of the two reconstructed image parts, into which the original image was split in the first stage. Both methods are mainly adopting the denoising thresholding procedure presented by Donoho. In this method thresholds are calculated empirically and in an ad hoc manner without taking into account the special statistical properties of speckle. Achim in his attempt to overcome the limitations arising from empirical thresholding of wavelet coefficients employed a Bayesian approach for signal extraction and speckle suppression. The log-transformed US image was decomposed in different frequency scales via the wavelet transform. In each scale a Bayesian estimator is used, based on symmetric alpha stable distribution of the wavelet decomposition, to differentiate the signal coefficients from the noise coefficients. Besides US images, speckle dominates Synthetic Aperture Radar (SAR) images as well, introducing difficulties on their correct interpretation. Various attempts are made in the wavelet domain in order to efficiently reduce the resulting granular pattern. Sveinsson et al [105] and Pantaleoni et al [106], via orthogonal wavelet transform and the Daubechies wavelet family, applied soft thresholding and the enhanced Lee filter respectively on the wavelet coefficients to reduce the presence of speckle. The aforementioned wavelet-based approaches use the logarithmic transform to convert the multiplicative model of speckle into additive model with signal independent noise before performing the speckle reduction method. After that, an exponential transform is applied to convert the denoised image to its original format. The fact that the mean of the log transformed speckle noise is not zero, whereas additive white Gaussian noise (AWGN) is considered with zero mean from the above methods, led to the need of a correction step regarding the mean bias in the processing stages, to avoid distortion in the de-speckled image [107]. Several recent studies avoid the log transform and directly apply the wavelet transform onto the SAR images. Foucher et al [108] in order to discriminate reflectivity coefficients from speckle coefficients implemented a Bayesian analysis based on the Pearson system for probability density function (pdf) approximation of the wavelet coefficients. Argenti et al [109] applied a minimum mean-square error (MMSE) filtering in the un-decimated wavelet domain to suppress speckle noise coefficients and Dai et al [110] combined a Bayesian shrinkage factor and a ratio edge detector applied in the wavelet coefficients for speckle - 63 -
reduction with edge preservation in homogenous areas. An efficient discrimination between speckle noise and reflected signal in US or SAR images either in time or wavelet domain is still under discussion in the scientific society. As already mentioned an accurate despeckling algorithm is very important in the decision making process especially in US images of the thyroid gland. Often, thyroid nodules, which play the most important role in estimating the malignancy risk factor, are of low contrast in a noisy background. Likewise the presence of various structures inside the nodules comprises a critical factor for a proper interpretation of the US image. The dominating speckle noise in all US images can lead to misleading analysis thus obstructing the physicians diagnosis. 7.2 Materials and Methods 7.2.1 Overview and Implementation of the Algorithm
A wavelet based method is introduced in this thesis for efficient speckle suppression in sonographic images of the thyroid gland while important edges and boundaries are preserved. The proposed wavelet approach avoids both log and exponential transform, considering the fully developed speckle as additive signal-dependent noise with zero mean. The proposed method throughout the wavelet transform has the capacity to combine the information at different frequency bands and accurately measure the local regularity of image features. The inter-scale information is acquired by means of a coarse to fine connectivity of the wavelet transform modulus maxima (WTMM). Two structures, represented by the modulus maxima, in two consecutive scales belong to the same anatomic area if the pair position & angle pixel of the maximum wavelet coefficient value in the upper scale is also approximately present in the lower scale. The decay across scales of wavelet transform maxima is related to local regularity of these structures and is assessed by the Lipschitz exponent a [70]. The purpose of the present study is to employ the knowledge given from the evolution of the wavelet transform maxima across scales to discriminate image singularities from speckle singularities. All the proposed methods steps (i.e Redundant wavelet transform, multiscale edge representation, coarse to fine analysis together with the wavelet coefficient and maxima display) were all implemented in Matlab 6.5. The methods to which the proposed algorithm is compared with were also implemented and integrated with the same software packet. The computer used for processing has an AMD Athlon XP+ processor running at 1.8 GHz and 512 of RAM. The ultrasound system used for this study was the HDI-3000 ATL digital ultrasound system with a broadband linear array with 7 MHz central frequency. The digitization of the output Video signal of the ultrasound system was made via the video card Miro PCTV(Pinnacle Systems), which is installed in a PC. US images are stored in JPEG format and their size is 768 x 576 x 8. The primary steps of the proposed method are illustrated in Figure 7.1. - 64 -
Chapter 7
Multiscale Edge Representation
Function Regularity Estimation
'Atrous' DWT Coarse to Fine Interscale Analysis Speckle Image Gradient Vector Computation Singularity Detection Modulus Maxima IDWT De-Speckled Image
Figure 7.1 Block Diagram of the proposed wavelet based algorithm for speckle suppression. 7.2.2 Speckle Model
Despite the profound advantages of ultrasonography, US images carry a granular pattern, so called speckle, which constitutes a major image quality degradation factor. Speckle pattern is created when an ultrasonic wave with uniform intensity is incident either on a rough surface or on tissue particles that are spaced at less than the axial resolving distance of the US system. In that case, the reflection beam profile will not have a uniform intensity. Instead it will be composed of many regions with strong and weak intensities. This complex intensity profile arises because sound is reflected in many different directions from the rough surface or from the small scatterers, thus leading US waves that have travelled different scan lines to interfere constructively and destructively towards the ultrasonic transducer. The intensity fluctuations within a uniform anatomic area, caused by the above phenomenon, constitute speckle [111]. The resulting degraded by speckle US image does not correspond to the actual tissue microstructure. In fact, speckle noise deteriorates image quality, fine details and edge definition. Speckle also tends to mask the presence of low-contrast lesions, therefore reducing the physicians ability for accurate interpretation. Moreover, it constitutes a limiting factor in the performance of quantitative procedures such as segmentation and pattern recognition algorithms. Hence, effective speckle suppression is considered of value for improving US image quality and possibly the diagnostic potential of medical ultrasound imaging, except in rare cases of abdominal and breast US images where the presence of speckle may assist in assessing liver cirrhosis or breast cancer [112,113]. The speckle model employed in the despeckling strategies in time domain [95,97,98,99, 100,102] considers the envelope detected RF signal having a Rayleigh distribution, thus speckle can be considered as multiplicative noise. However, due to US devices undergoing signal processing stages, the finally formatted US image speckle is no longer multiplicative and can be thought as Gaussian additive noise independent of the noise-free signal. Most
- 65 -
wavelet based methods [101,103,105,106,70], adapted for additive Gaussian white noise, applied a logarithmic transform in the speckle image and approximated speckle as additive noise. The proposed method adopted Foucher et al [108], Argenti et al [109] and Dai et al [110] approaches, by omitting the log-transform to avoid the mean bias correction problem and decomposing the multiplicative speckle model into an additive signal dependent noise model. The multiplicative speckle model at pixel position [x,y] is expressed in the following form:
I ( x , y ) = f ( x, y ) r ( x , y )
7.1
where f(x,y) is an unknown 2-d function, such as the original image to be recovered without noise. I(x,y) is the corrupted with noise formatted US image, and r(x,y) a random variable that represents speckle. We consider speckle as fully developed (large number of small scatterers in each resolution cell) whose magnitude follows the Rayleigh pdf [114]:
r 2 p r (r ) = exp 4 , r 0 2
Its mean and variance are [114]:
7.2
E (r ) = 1 ,
var(r ) =
7.3
We convert the multiplicative model into an additive model:
I ( x, y ) = f ( x, y ) + f ( x, y )[r ( x, y ) 1]
= f ( x, y ) + N ( x, y )
Where [r ( x, y ) 1] is a random variable with zero mean and variance 2 7.4
N ( x, y ) represents an additive signal dependent noise term, which is proportional to the

signal to be estimated.
- 66 -
Chapter 7
7.2.3
Inter-Scale Wavelet Analysis
7.2.3.1 Dyadic Wavelet Transform Employing wavelet theory, the correlation of the inter-scale edge information may be applied to characterize different types of edges. Wavelet analysis was performed by means of the Dyadic Wavelet Transform (DWT) introduced by Mallat and Zong [62] for characterization of signals from multiscale edges. DWT is based on a wavelet function (x) with compact support, which is the first order derivative of cubic spline function. The wavelet decomposition across scales of the original image was implemented with a filter bank algorithm, so called algorithme a atrous (algorithm with holes). The proposed transform is in fact a fast bi-orthogonal discrete wavelet transform, in which the size of the decomposed sub-band images is the same as that of the original image thus making the transform highly redundant. Let ( x, y ) be a symmetrical smoothing function approximating the Gaussian. The twodimensional Dyadic Wavelet Transform of a function f ( x, y ) L2 ( R 2 ) is the set of functions (W21j f ( x, y ),W22j f ( x, y )) , which are respectively the partial derivative along the horizontal and vertical orientation of the convolution of f ( x, y ) with the smoothing function ( x, y ) dilated along a dyadic sequence (2 j ) j . The DWT is given by:
1 ( f 2 j )( x, y ) W 1j f ( x, y ) 2 j 2 = f = 2 j x 2j W 2j f ( x, y ) ( f j )( x, y ) 2 2 y 2
where 1 ( x, y ) and 2 ( x, y ) are the analyzing wavelets and j the dyadic scale.
7.5
We performed the dyadic wavelet transform using Mallats filters. These filters are suitable for fast implementation of discrete algorithms and they offer exact reconstruction. At a dyadic scale j the dilation of the discrete filters is obtained by inserting (2j-1) zeros (holes) between each of the coefficients of the corresponding filters. In Figure 7.2 the wavelet decomposition with the atrous algorithm in 3 dyadic scales of a US image of the thyroid gland is presented.
- 67 -
Figure 7.2 At the top is the original US image. The two columns show respectively the horizontal and vertical wavelet transform W21j f ( x, y )1 j 3 , W22j f ( x, y )1 j 3 dyadic scales. The scale increases from top to bottom. The redundant wavelet transform presented is in fact shift-invariant and it is widely used for pattern recognition, feature extraction and edge detection purposes. The wavelet coefficients
along three
- 68 -
Chapter 7
comprise the intensity profile of an images local variations for a given scale. They can be considered as a classification map in which any kind of change (abrupt or smooth) exist in an image can be localized on a particular scale. The latter conclusion indicates the importance of an accurate selection of the dyadic scale j in which the image will be decomposed. The choice of that scale is in fact a trade off between the suppression of wavelet coefficients characterizing images irregularities and the blurring effect caused by the dilation of the smoothing function. In small scales the wavelet coefficients mostly characterize high frequency events mainly caused by noise. In bigger scales low frequency events are detected such as smooth image variations. 7.2.3.2 Gradient Vector Equation 5 indicates that the above set of functions can be viewed as the two components of the gradient vector of f ( x, y ) smoothed by ( x, y ) at each scale 2 j :
W 1j f ( x, y ) 2 = 2 j ( f * j )( x, y ) 2 2 W j f ( x, y ) 2
The modulus angle representation of the gradient vector is given by:
7.6
M 2 j f ( x, y ) = W 1 2 j f ( x, y ) + W 2 2 j f ( x , y )
and
7.7
A2 j f ( x, y ) = arctan
7.2.3.3 Modulus Maxima
W21j f ( x, y ) W22j f ( x, y )
7.8
The sharper variation points of f * 2 j ( , y ) at a scale 2 j correspond to edges, are obtained from the local maxima of M 2 j f ( x, y ) along the gradient direction given by A2 j f ( x, y ) . At each scale 2 j of the Dyadic Wavelet Transform, the point (x,y) where the modulus of the gradient vector M 2 j f ( x, y ) is maximum compared with its neighbors locally positioned in the direction specified by A2 j f ( x, y ) , is called modulus maxima (Figure 7.3).
- 69 -
Figure 7.3 At the top is the original US image. The first column displays the modulus images M 2 j f ( x, y ) . High intensity values correspond to black pixels whereas low intensity values to white pixels for optimized visual interpretation of the results. At the second column the angle images A2 j f ( x, y ) are shown. The angle value turns from 0 (white) to 2 (black) along the circle contour. At the last column, the image points where M 2 j f ( x, y ) has local
- 70 -
Chapter 7
maxima in the direction indicated by A2 j f ( x, y ) are presented (black pixels). Each time such a point is detected, the position of the resultant local maxima is recorded as well together with the values of the modulus M 2 j f ( x, y ) and angle A2 j f ( x, y ) at the corresponding locations. 7.2.3.4 Lipschitz Regularity The aim of the present study is to efficiently characterize the images singularities, via an inter-scale wavelet analysis, in order to discriminate speckle-noise from signal. The classification of singularities depends upon their local regularity. This regularity is quantified by Lipschitz exponents [68]. A function f(x,y) is said to be Lipschitz , 0 1, at (x0,y0) if there exists K>0 such that for all points ( x, y ) R 2 :
f ( x, y ) f ( x 0 , y 0 ) K x x 0
+ y y0
2 a/2
7.9
If there exists a constant K>0 such that equation (9) is satisfied for any ( x 0 , y 0 ) , then f is uniformly Lipschitz a over . The larger the a, the more regular is the function. he Lipschitz regularity of a function f(x,y) is related to the asymptotic decay from coarse to fine scales of its wavelet transform along horizontal and vertical directions |W1f(u,v,s)| and |W2f(u,v,s)| in the corresponding neighborhood. This decay is controlled by the wavelet transform local maximum value Mf(u,v,s) [70]. A function f (x, y) is uniformly Lipschitz , 0 1 inside a bounded domain of R2 if and only if there exists a constant A >0 such that for all (u ,v) inside this domain and for any dyadic scale s relation 10 holds:
Mf (u , v, s ) As a +1
7.10
By measuring from (10) the Lipschitz exponent a through the computation of the decay slope of log2|Mf(u,v,s)| we derive an estimate of the Lipschitz regularity along the edge. 7.2.3.5 Detection of Singularities All singularities of f(x,y) can be located by following the wavelet transform modulus maxima up to the finer scale. The terms coarse and fine are relative. Conventionally, coarse scales are referred to bigger dyadic scales (23, 24), whereas fine scales are referred to smaller dyadic scales (21, 22). The main objective of that inter-scale analysis is to isolate different structures exist in the image, beginning at a coarse scale and adaptively decrease the scale to gather the necessary details. If an edge appears in a coarser level 2j, it should also appear in finer level 2j1
. The latter can be rephrased that any wavelet transform modulus maximum at a coarse scale
belongs to a connected inter-scale chain that is never interrupted when the scale decreases [70], which in turn means that any structure represented by the corresponding maxima is
- 71 -
located in a coarse scale can also be found in a finer scale with an approximate position and angle value. Mallat [62,68] implements the maxima chaining procedure starting from a scale 2j and considers that it propagates to coarser scale 2j+1 having similar position and angle values. In this study the forming of maxima chains in the scale-space domain is made with a backpropagation approach starting the inter-scale connectivity from the coarser scale (23) computed and complete it at the finer scale (21) available. With this approach the computational complexity of the implemented algorithm is reduced even more since we employ the inter-scale exhaustive search with the smaller possible number of local maxima (as the scale increases the number of local maxima decreases). Before we apply this back-propagation tracking of modulus maxima in wavelet space, the majority of some false maxima at the coarser scale (23) that either are not suppressed by the smoothing function or created by numerical errors in regions where the wavelet transform is close to zero are removed through a simple 70th percentile thresholding (all maxima values below the 70% of the maximum modulus value are discarded) [115]. The chaining of modulus maxima, after the thresholding procedure, across scales employs a two-folded interscale investigation based on the parameter pair: position angle. Two modulus maxima at two successive scales [(Xk,Yk,Mk,Ak)2j, (Xk,Yk,Mk,Ak)2j-1] are chained if they have a close position in the image plane(Xk,Yk) and similar angle value(Ak). If a single coarse local maximum computed to back-propagate in more than one finer local maxima, only the one having the largest maximum (Mk) is considered to belong to the maxima chain. The maxima matching between different scales was not an easy task. The position angle investigation was implemented at each scale within a different neighbourhood taking into account the different size of the decomposition Mallats filter (different number of zeros holes between the coefficients). The coarse information is traced within large neighbourhoods whereas the fine information in small neighbourhoods. The adaptively decreasing investigation window from coarse to fine scales avoids matching errors created either from small maxima groups that might not constitute an exact match or large maxima groups that may produce inaccurate inter-scale linking if the two successive structures are locally distorted. A square window of width K at a coarse resolution 2j corresponds to square windows with approximate size of K/2 and K/4 at finer resolutions 2j-1 and 2j-2 (Figure 7.4)
- 72 -
Chapter 7
Figure 7.4 Inter-scale back-propagation maxima connectivity in wavelet space. At each of the maxima chains, acquired from back-propagation tracking, the decay of the modulus maxima amplitude across scales is calculated in order to discriminate speckle singularities from image singularities. In maxima chains where the amplitude of the wavelet transform modulus maxima decreases when the scale decreases the Lipschitz regularity is positive (positive Lipschitz exponents). On the contrary, when the maxima amplitude increases when the scale decreases the Lipschitz regularity is negative (negative Lipschitz exponents). The Lipschitz regularity was calculated between those scales in which the amplitude of the decay slope was the greater. In our case was between the 23 & 22 scales. The different decay behaviour of the modulus maxima is the main criterion in an accurate discrimination of image and noise singularities. Image singularities belong to regular curves with positive Lipschitz regularity that varies smoothly along these curves. Speckle singularities give rise to negative Lipschitz regularity and considered as irregular variations of the positions, angle and modulus values of the maxima. The propagating maxima chains with positive Lipschitz regularity can be considered as an edge map that corresponds to important image structures. The despeckling procedure implemented in this article removes all wavelet coefficients at all scales that correspond to those maxima whose amplitude increase when the scale decreases or do not belong to a backpropagating maxima chain. The maxima recognized by the algorithm as speckle and as edges for all scales are presented in Figure 7.5.
- 73 -
Figure 7.5 At the first column are the wavelet transform modulus maxima (non-propagating maxima and propagating maxima with negative Lipschitz exponents) classified as speckle. In the second are the propagating maxima with positive Lipschitz exponents classified as important edges.
- 74 -
The remaining coefficients at all scales including the coarse image for completeness are utilized in the inverse Dyadic Wavelet Transform to obtain the speckle suppressed US image. The ability to isolate all important structures at all scales (figure 7.5 right column) via the proposed wavelet inter-scale analysis gave us the opportunity to perform the despeckling strategy (maxima removal) at all computed scales, even at the finer one (21), contrary to Mallat [29] which avoids to incorporate that scale to his denoising procedure due to signal domination from noise. In our method as we can see in figure 7.5 left column although at the finer scale (21) available the human eye cannot discriminate the contours of the anatomical structures, after the back-propagation tracking and singularity detection, in the same scale these contours with approximate positions and angles became prominent. 7.3 Experimental Results and Evaluation The effectiveness of the introduced wavelet based de-speckling approach was tested using a tissue mimicking digital phantom and a US image of the thyroid gland. An observer evaluation study was also undertaken involving 63 US thyroid images of 63 patients via a questionnaire regarding the performance of the proposed algorithm. The proposed inter-scale wavelet analysis method was compared with three representative denoising methods: (a) Karamans adaptive speckle suppression filter ASSF [97] (b) Donohos soft thresholding and (c) Donohos hard thresholding [104]. Wavelet shrinkage was implemented with Daubechies 8 mother wavelet in three decomposition scales, produced by the wavelet toolbox in matlab. The quantification of the speckle suppression performance of all methods (in both phantom and thyroid US image) was carried out by means of the speckle index (SI: mean to standard deviation), on a homogenous area with uniformly distributed echoes, and the signal-to-meansquare-error ratio (S/mse), introduced by Cagnon [116], on the same homogenous area (Local region of interest) and on the entire image (Total). S/mse is defined as:
K I i2 S mse = 10 log 10 K i =1 Ii Ii i =1
7.11
Where, I are the intensity values of the speckle image,
are the intensity values of the de-
speckled image and K is the image size. The S/mse index can be considered as an index of signal-to-noise within an image. High S/mse index values refer to efficient speckle suppression while low to inadequate performance. The S/mse index is expressed in dB. The evaluation of the edge preservation capacity, both locally (area where boundaries prevail) and totally (entire image), of all methods was made by means of the parameter , which has been introduced by Hao [102] as shown in relation (7.12):
75
I I , I I
I I , I I I I , I I
7.12
Where: I , I are speckle and de-speckled images respectively, filtered by a 3x3 pixel standard approximation of the Laplacian operator, and is given by:
(I 1 , I 2 ) = I 1i I 2i
i =1 K
7.13
In case of optimum edge preservation, approximates to 1. The closer the is to 1 the better are the edge preservation properties of each algorithm. 7.3.1 Tissue Mimicking Phantom Validation
The phantom used in this study was the 403 LE model manufactured by GAMMEX. It comprises of three groups of three anechoic cystic targets (approximating thyroid nodules) with 2, 4 and 6 mm diameter positioned at 3, 8 and 14 cm respectively. The attenuation coefficient for the tissue mimicking materials is 0.5 dB/cm/MHz whereas for the anechoic cysts is 0.05 dB/cm/MHz. Regarding the phantom image, the value of SI prior to despeckling was 7.18 and the values of SI, S/mse and after despeckling are shown in Table 7.1. Table 7.1 Image quality measures obtained by four denoising methods tested on a digital ultrasound phantom image
Method ASSF Soft Thresholding Hard Thresholding Wavelet inter-scale analysis Denoising SI (Percentage Improvement) 6.98 (12%) 7.33 (18%) 7.11 (15%) 7.50 (21%) S/mse (Local/ Total) (dB) 13.4912 / 12.7728 15.6425 / 16.7709 13.7175 / 15.1853 17.9677 / 18.3241 (Local / Total) (dB) 0.2148 / 0.1458 0.7020 / 0.7453 0.7764 / 0.8013 0.8395 / 0.8485
7.3.2
US Image Case Study
All despeckling methods were also applied to an US image of the thyroid gland and their results are demonstrated in Figure 7.6.
76
Chapter 7
Figure 7.6 (a) US image of the thyroid gland. (b) ASSF method, (c) wavelet shrinkage with soft thresholding, (d) wavelet shrinkage with hard thresholding, (e) wavelet inter-scale analysis denoising. Both parameters (S/mse & ) are calculated locally and totally. The selected regions are 77
presented in Figure 7.7. The values of S/mse and for all methods applied to the US image are given in Table 7.2. The Si value prior to despeckling was 3.49.
Figure 7.7 Locally selected area containing the thyroid nodule Box A for calculation. Locally selected area corresponding to homogenous tissue Box B for S/mse calculation. Table 7.2 Image quality measures obtained by four denoising methods tested on an ultrasound image of the thyroid gland
Method ASSF Soft Thresholding Hard Thresholding Wavelet inter-scale analysis Denoising SI (Percentage Improvement) 3.98 (14%) 4.15 (19%) 4.05 (16%) 4.30 (23%) S/mse (Local /Total) (dB) 11.2871/10.5664 14.0577/14.5472 10.4593/11.4083 15.4256/16.2937 (Local /Total) (dB) 0.6020/0.3592 0.6705/0.7063 0.7602/0.7836 0.8225/0.8490
A more detailed description regarding despeckling and edge preservation may also be obtained by the profile signals depicted in Figure 7.9 derived from figure 7.6. The corresponding scan line is presented in figure 7.8.
Figure 7.8 The scan line including the borders (A&B) of the thyroid nodule.
78
Chapter 7
Figure 7.9 Scan profiles of US thyroid image. High intensity line corresponds to the denoised scan line whereas low intensity line to original image. (a) ASSF method, (b) Soft thresholding, (c) hard thresholding, (d) wavelet inter-scale analysis denoising The performance of the proposed method on various US images is depicted in figures 7.10, 7.11, 7.12.
79
(a) Figure 7.10 (a) Original US image, (b) De-speckled US image
(b)
(b)
(b)
80
Chapter 7
7.3.3
Observer Evaluation Study
Sixty three US images of the thyroid gland were included in a questionnaire (Figure 7.13) that comprised seven queries concerning various visual observations regarding the proposed algorithms effectiveness.
Figure 7.13 Microsoft access interface employing the questionnaire for US images denoising evaluation The image dataset was acquired following the same parameters protocol in the time interval from October 2003 to September 2004. The questionnaire was implemented in Microsoft Access. The seven queries were: 1) 2) 3) 4) 5) 6) 7) Removal of speckle granular pattern. Improvement of micro-calcification detection inside the thyroid nodule. Creation during denoising of artefacts and ghost images. Preservation of nodules boundaries, resolvable details and anatomical structures. Contrast enhancement between nodule and surrounding environment. Revealing of small structures invisible in the original image. Improvement of diagnostic evaluation procedure.
The study was performed by two experienced qualified radiologists specialized in ultrasonography. The reviewing of all cases was done independently on a high resolution monitor. The ranking for each query ranged from 0 to 100 with a 25-point step corresponding to fail (0), poor, good, very good and excellent performance (100), except query number 3 where low score refers to high effectiveness (Tables 7.3 and 7.4). The performance of the 81
proposed method, based on both observers evaluation, was assessed by means of the percentage of cases where the algorithm is effective (>=50) in all queries (Table 7.5). Inter observer agreement was determined using the weighted K (kappa) coefficient calculated for all qualitative parameters [117]. A kappa statistic above 0.75 was arbitrarily chosen so as to show excellent agreement, between 0.40 and 0.75 as moderate agreement and below 0.40 as poor agreement Table 7.6. Table 7.3 1st Observers evaluation of algorithm performance
QUERIES Number of US images 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 1
50 25 25 75 25 50 75 0 0 50 0 100 75 50 0 25 75 50 0 0 0 0 75 0 0 75 0 0 0 0 100 75 0 0 0 75 0
2
100 100 50 75 100 50 50 100 100 100 75 50 100 75 25 75 100 75 100 25 75 50 25 75 75 100 50 25 50 25 75 75 75 50 25 75 25
3
75 75 50 75 50 50 50 100 100 75 50 75 100 50 25 75 100 50 100 25 75 50 50 75 75 100 50 25 50 25 100 50 75 50 25 75 25
4
25 50 25 25 25 25 75 100 0 50 25 0 50 0 75 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 75 50 0 0 0 0
5
100 100 50 100 100 25 100 100 100 100 50 50 100 100 50 75 100 75 100 25 100 75 25 75 75 100 75 25 50 25 75 75 75 25 25 75 25
6
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7
75 100 75 100 75 50 100 100 100 100 75 75 100 50 50 75 100 50 75 50 100 75 75 75 75 75 75 50 50 25 100 75 75 50 50 75 25
82
Chapter 7
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
50 0 75 75 0 0 50 0 0 0 50 0 0 75 0 75 50 0 0 0 0 0 75 0
50 25 75 75 75 25 25 75 75 25 75 50 50 75 50 25 50 50 75 25 50 50 100 50
50 25 75 75 75 25 50 50 75 25 75 50 50 75 25 50 50 50 75 25 25 50 100 50
0 0 0 0 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50 25 75 50 75 25 25 75 75 25 75 50 50 75 50 50 50 75 75 50 25 50 100 50
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50 25 75 75 75 50 50 100 75 50 75 50 50 75 50 75 50 75 75 50 50 50 100 50
Table 7.4 2nd Observers evaluation of algorithm performance

QUERIES Number of US images 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 1
75 0 0 25 50 100 100 0 0 0 0 100 100 75 0 50 75 0 0 0 0 0
2
75 100 25 75 75 50 75 75 75 75 75 50 75 100 50 75 100 25 100 25 100 100
3
75 100 25 75 75 75 75 75 75 75 75 75 100 75 50 75 100 25 100 50 100 100
4
0 0 0 0 0 0 0 100 0 0 0 100 100 0 75 0 0 0 100 0 0 0
5
100 100 25 75 100 25 75 75 75 100 75 25 75 100 50 100 100 25 100 50 100 100
6
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
7
100 100 50 75 100 75 75 75 100 75 75 75 100 75 25 75 100 75 100 50 100 100
83
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
100 0 0 100 0 0 0 0 100 75 0 0 0 100 0 75 0 50 50 0 50 75 0 75 0 0 0 0 50 0 50 0 0 25 0 50 0 75 0
50 100 100 100 50 25 25 25 75 75 100 75 50 100 25 100 25 100 75 50 50 50 100 75 25 75 25 50 75 50 50 25 50 50 25 25 25 100 25
100 100 100 100 25 25 50 25 100 100 75 75 25 100 25 100 25 100 75 50 50 75 50 50 25 100 25 50 75 50 50 25 50 50 25 25 25 100 25
100 0 0 0 0 0 0 0 50 100 0 0 0 75 0 75 0 75 75 0 0 0 100 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
50 100 100 100 50 25 50 25 75 75 50 25 50 100 25 100 25 100 75 50 25 25 100 75 25 75 25 50 50 50 50 25 50 50 25 25 25 100 50
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
100 100 100 100 50 25 50 50 100 75 75 100 75 100 50 100 50 100 100 75 50 75 100 75 25 100 50 50 75 75 50 50 75 75 50 50 50 100 25
Table 7.5 Observers evaluation of algorithm performance

Observer A Ranking Query 1 Query 2 Query 4 Query 5 >=50 55 28 46 48 <50 8 35 17 15 Efficiency Percentage 87 % 44 % 73 % 76 % >=50 55 27 50 49 Observer B <50 8 36 13 14 Efficiency Percentage 87 % 42 % 79 % 78 %
84
Chapter 7
Query 6 Query 7
50 14
13 49
79 % 22 %
51 14
12 49
80 % 22 %
Table 7.6 Agreement (kappa coefficient) between the two observers

Kappa Coefficient Query 1 Query 2 Query 4 Query 5 Query 6 Query 7 0.86 0.71 0.56 0.54 0.55 0.67
7.4 Discussion and Conclusions Multiscale wavelet analysis is one of the most promising approaches for speckle suppression in ultrasound imaging. The proposed method attempts an optimal speckle reduction in US images of the thyroid gland, employing singularity detection on local maxima chains within a coarse to fine framework. Significant image features are represented across scales with maxima chains of positive Lipschitz regularity while speckle noise gives rise to maxima chains of negative Lipschitz regularity. The combination of the proposed back-propagation maxima tracking together with the singularity detection achieved high speckle reduction performance with remarkable edge preservation accuracy. An important task of any multi-resolution algorithm is to optimize the detection of high resolution information. The back-propagation approach through the position-angle pair and occasionally amplitude, together with the adaptive neighbourhood size, approximates with significant proximity the coarse information even at the 21 scale (see Figure 7.5). Nevertheless the implementation of an optimized coarse to fine connection is still under investigation. Any prior information regarding potential patterns within the US image can be utilized in this research. The method is generic and can be also applied to US medical images of different anatomical structures along with other imaging modalities suffering from the presence of speckle such as synthetic aperture radar images. The comparative study regarding the effectiveness of all methods was based on their speckle reduction and edge preservation properties. Regarding speckle reduction on the phantom image, both SI and S/mse obtained from the inter-scale analysis method were greater compared with the equivalents of the other three methods locally and totally on the phantom image (Table 7.1). Accordingly, regarding parameter , the proposed algorithm exhibited greater performance.
85
The results of the proposed method on the thyroid US image (SI, S/mse and ) exhibited much greater scores than the other three methods (Table 7.2). Additionally, a closer look in figure 7.9(d) can help us observe that the inter-scale wavelet denoising method retained the boundaries of the thyroid nodule, similar to that of the original image, while at the same time the speckle suppression degree in speckle regions was high. An interesting observation made from the same figure was the tendency of Karamans and Donohos soft thresholding methods to remove completely the signal fluctuation in a homogeneous regions, thus eliminating possible small low contrast structures while failing to follow with a relative compliance the initial trend. At the same figure the spurious oscillations resulted from Donohos hard thresholding method are also apparent. The results of the questionnaire given in the two specialized doctors can give us some useful conclusions regarding the adaptability and power of our method. The scores in queries 1 and 4 (>70% evaluation efficiency percentage of both observers) confirmed the results the algorithm had in speckle reduction and edge preservation. High score in query number 5 (>75% evaluation efficiency percentage of both observers) proved that a successful noise suppression without the creation of blurring actually increases the contrast of various structures in regard with the surrounding environment. An important subject introduced in this study through query number 6 (>75% evaluation efficiency percentage of both observers) is the potential of the algorithm to reveal structures that are not easily distinguishable by the human eye. The low score in query number 7 is predictable due to the great expertise of both observers regarding the final diagnosis. However in less experienced physicians it could turn out as a very useful tool. The correlation results between the two observers are indicative of the algorithms ability. Particular attention should be given to the high correlation (k=0.86) of the two observers regarding the speckle suppression efficiency of our method, constituting an additional advantage of the proposed algorithm. As a final conclusion we can say that an efficient wavelet-based speckle reduction is presented in this article. This method was based on inter-scale wavelet analysis, in which the primal aim was to isolate edges existing across scales and check their regularity. Successful speckle suppression made by the proposed algorithm can be employed as an additional step in the improvement of the overall diagnostic procedure.
86
CHAPTER 8
Thyroid Nodule Boundary Detection in Ultrasound Images
Summary In this chapter a multiscale hybrid model is introduced for unsupervised thyroid nodules boundary extraction from US images. The chapter is organized as follows: At first an extensive review of the literature regarding segmentation in US images is presented. In the Materials & Method section, the speckle reduction edge detection proposed strategy, the introduced multi-scale structure model and the constrained Hough transform are presented. In the Results section, the performance of the hybrid method applied on real US images is demonstrated. In addition, an inter-observer study was carried out between the two physicians to estimate the degree of variance the manually boundaries present. In the last section an extensive review and discussion regarding methodology, performance and potentials of the algorithm are given. 8.1 Review of the Literature Numerous computerized segmentation methods have been employed in US imaging of the prostate, kidney, cardiac anatomy, ovaries, fetal head and breast lesions. All these algorithms can be categorized in five types depending on the strategy chosen for segmenting the ROI. Segmentation methods based on (a) edge detection [118-123], (b) texture or feature analysis [124-130], (c) deformable and active models [131-141], (d) methods based on multiscale algorithms that visualise US images at different level of resolution [142-145] and (e) methods based on combination of the above algorithms for optimization of the results [146-159]. An overview is presented below of segmentation algorithms applied in US imaging. An edge-based segmentation algorithm detects any abrupt changes in gray level values within the US image. For the final contour extraction an additional process is performed to select and link edge pixels. Various edge detection methods for segmentation of US images have been developed throughout the past decade. Kwoh et al [118], in order to avoid false edges and broken segments created by the radial bas-relief (RBR) method, applied the Fourier transform related harmonic method directly in the skeletonized edge image to extract the final prostate smooth contour. Aarnink et al [119] implemented a pre-processing algorithm for US image segmentation in which edges were detected with a non-linear Laplacian operator based on the detection of zero crossings. Aarnink et al [120], also proposed a post-processing algorithm applied in the edge map, obtained from zero crossing detection, in which he integrated edge maps for outlining the region of interest. Sarty et al [121] introduced a complex semi- 87 -
Thyroid Nodule Boundary Detection in ultrasound images
automatic algorithm for segmentation of ovarian follicular US images. It was based on a user defined ROIs edge strength and direction for approximate inner wall border detection. Subsequently, the prior knowledge of follicles inner and outer wall border intensity profile was employed for outer wall contour delineation. Pathak et al [122], applied a speckle reduction edge enhancing technique called sticks as well as anisotropic diffusion filtering combined with shape prior knowledge for differentiation of prostates from false edges. Yu et al [123] used the instantaneous coefficient of variation detector which combined gray level intensity profile with first and second derivative operators for edge detection in ultrasound images. In texture analysis, rather than trying to locate edges in a US image, several texture features are employed for region characterization. These features usually serve as input to a classification or clustering algorithm to discriminate a group of pixels as correct or false regions. Richard and Keen, 1996 [124] used a pixel-classifier based on four feature images resulting by energy measures associated with each pixel. A clustering algorithm is employed for each pixel to discriminate the most probable class to achieve prostate segmentation. Zimmer et al [125] implemented a minimum cross entropy thresholding algorithm employing a linear combination of gray level and local entropy values for segmenting US images of ovarian cysts containing fluid and surrounded by soft tissue. Potocnik et al [126] combined the detection of homogenous regions, derived from gray-scale histogram, within the ovarian US image and a region growing algorithm that employs a centre to contour filling until the borders of the growing object meet with the ovarys contour. Docour and Olmez [127] used a hybrid neural network having as input textural feature data obtained from discrete cosine transform of pixel intensities in ROIs for segmentation of US images of several anatomical structures. Huang and Chen [128] utilised textural analysis as input to a self organised Neural Network classifier. After the classification of the textural features, a watershed segmentation algorithm determined the final contour of breast tumours. Archip et al [129] introduced a US imaging segmentation method that uses unsupervised spectral clustering subsequent to anisotropic diffusion filtering. Strzelecki et al [130] employed texture features as input to an oscillating neural network, based on the temporary correlation theory, to segment intra-cardiac masses from cardiac tumour echocardiograms. Model based segmentation approaches use either a priori knowledge with well-known active contours and deformable models, or statistical models without the use of any prior information regarding the ROI. Lorenz et al [131] applied a two folded probabilistic method by means of the assumption that the contour sequence is a two-dimensional first order Markov random process and the prior knowledge about the prostates contour shape. The contour estimation was performed using iteratively the maximum a posteriori principle. - 88 -
Chapter 8
Pathak et al [132] introduced an automatic algorithm for inner and outer scull boundaries detection of foetal head in US images. The algorithm has as input a user defined centre of the foetal head and generates an initial contour. That contour is fed in an active contour model that assumes that the desired boundaries lie along high gradient points for detection of the inner scull boundary. This boundary in complement with a priori shape knowledge (ellipse) is passed into the active contour model to outline the external scull boundary. Obadia et al [133] developed a semi-automatic snake-based approach incorporating local statistics boundary models (gradient, 1st and 2nd order statistics) along with on the fly training of the boundary models for US image segmentation. Ladak et al [134] developed a semi-automatic method based on a deformable contour model, named the discrete dynamic contour. An initial contour was outlined via cubic interpolation functions arising by four manually selected input points. The deformable model following utilizes gradient direction information to estimate the final contour. Chen et al [135] combined a distance map computed by the early vision model and investigation of snake elements derived from a discrete-snake model for a semi automatic US image segmentation. Chen et al [136] also proposed a snake model for segmentation of sonographic images in which he compound the modified trimmed mean filter for speckle suppression, the ramp integration for weak edge enhancement for detecting edges and the adaptive weighting parameters for fine tuning of the deformation process. Wu et al [137] developed a feature model-based boundary recognition method combining prior information about the prostate such as shape and size, and a genetic algorithm for object boundary detection with model constrains. Chen et al [138] also proposed a dual snake deformable model for boundary extraction from US images. It is a combinative study that comprised a new external force called discrete gradient flow, a new edge strength weighting scheme and a new stability index for the two underlying snakes. Sandra et al [139] applied a region-based model to derive the likelihood estimation function via low order parameterization of the contour shapes to extract anatomical structures from foetal ultrasound images. Cvancarova et al [140] via some snake algorithms and a modification of the well known gradient vector flow model they segmented liver tumors from US images. Rabhi et al [141] proposed a geodesic active region model, based on boundary and region information, for segmentation of thrombosis in vivo venous ultrasound images. Multiscale methods decompose the input US image into several different levels of resolution in order to acquire all available information towards an efficient segmentation. Lain and Zong [142] utilised wavelet multiscale analysis combined with shape-matched filtering to estimate the center point of the LV. Consequently, a boundary contour reconstruction process is employed to link the cardiac broken segments and finally a filtering process of the closed boundary to extract the ROI. Liu et al [143] used the RBR method for edge enhancing and a multi-resolution analysis to obtain a skeletonized image, which was superimposed on the - 89 -
original US image of the prostate as an edge map. Lin et al [144] employed a combinative multiscale framework for echocardiographic image segmentation. At the coarse scale a temporary boundary was computed based on region homogeneity and edge features. A coarse to fine boundary evolution was then employed, based on shape similarity constraints, for contour refinement. Davignon et al [145] performed a multi-parametric approach for the segmentation of ultrasonic data. Various maps of local features, derived from statistical measurements using a Bayesian multi-resolution Markov random field, were used for a pixelbased tissue differentiation. Most contemporary algorithms use combinative strategies in order to overcome the difficulties arising from the nature of US imaging. Boukerroui et al [146] employed various textural features (uniform or slowly varying intensities versus sharp transitions) obtained from a wavelet wavelet-based multi-resolution pyramid in a K-mean clustering algorithm for automatic extraction of breast tumors. Chen et al [147] proposed a novel texture edge detection method in US imaging in which edges are the local maxima of a weighted distance map based on an early vision model. Prior the built of the texture edge map, the US image is undergone a texture enhancement procedure employing multi-scale wavelet analysis. Mignotte et al [148] implemented an optimization approach for segmentation of endocardial contour and inner wall of arteries from US imaging. An initial snake position is derived from active contour model and gray level statistical information obtained bilaterally of the objects boundary. The final contour is outlined within a multi-scale framework for minimization of the initial snakes global energy function. Boukerroui et al [149], via a wavelet multiresolution analysis and the use of a weighting function on both local and global statistics achieved to segment breast and echocardiographic sequences ultrasound data. Kotropoulos and Pittas [150] applied Support Vector Machines as a classification tool to differentiate lesions from background in a US image. The authors used a running window over the US image and divided each of the regions as positive and negative pattern according their gray level histogram. Chin et al [151] introduced a semi-automatic segmentation in US images. It is based in initial contour tracking derived from wavelet coefficients calculated from Dyadic Wavelet Transform and in the Discrete Dynamic Contour deformation model for the final contour estimation. Xie et al [152] proposed a supervised segmentation method in which a texture model and a shape prior model were combined to partition the US image in two regions, inside and outside the contour curve of kidneys. Flores et al [153] introduced a segmentation algorithm for breast tumours. The segmentation approach, originally estimated an initial snake through a region growing method based on gradient information. Then it served as input in an active contour model for the final extraction of the tumour. Betrouni et al [154] in his attempt to segment the prostate from trans-abdominal US images reduced speckle with an - 90 -
Chapter 8
adaptive noise filter as a pre-processing step. Afterwards, he applied a statistical prostate model, based on heuristic searches, to the final contour. Dydenco et al [155] used a level set representation, that incorporates both shape and motion prior knowledge, for segmentation and region tracking in echocardiographic sequences. Liu et al [156] combined a multiresolution approach for active contour initialization and the Gradient Vector Flow (GVF) model. The GVF model uses prior shape knowledge for segmenting thermal lesions in elastographic US images. Fernadez and Lopez [157] applied a delineation algorithm on a sequence of kidney US images within a Bayesian framework with two components: a Marcom Random Field with an active contour model based on star-like shape prior knowledge for initial detection, and a likelihood model having as input both intensity and gradient information for the final and correct contour estimation. Eslami et al [158] combined a Gibbs joint probability function for image transformation, wavelet transform for initial contour estimation and finally an active contour model for lesion detection in kidney ultrasonic images. Gong et al [159] described a deformation-based method incorporating prior shape knowledge (the ROI is modelled as super-ellipse) and parametric deformations for automatic segmentation of prostate ultrasound images. The poor US image quality in general, in conjunction with the drawbacks arising from the nature of ultrasound, limit the performance of various segmentation methods, proposed throughout the past decade. Most edge based techniques generally detect changes in the grey level profile and usually are unable to isolate and extract the ROI. Segmentation approaches based in textural characteristics are optimised for particular US images and suffer from the presence of adjacent tissues that exhibit similar acoustic properties. More complicated algorithms, based on active models or multi-scale analysis, are usually semiautomatic. Moreover, the lack of detailed information regarding the way authors handle the presence of adjacent structures is worth noticing. Many researchers in their attempt to overcome some of these limitations convert their methods into semi-automatic, thus transformed them into user-dependent approaches. Others incorporated prior knowledge information regarding shape or texture features in order to optimize their results. Besides the difficulties most segmentation algorithms encounter, US imaging of thyroid nodules carry additional disadvantages. Thyroid nodules lack of uniform echogenicity behavior (i.e nodules can be hypo-, iso- or hyper-echogenic in regard with the surrounding tissue), which in turn means that no prior information regarding texture characteristics can be employed to enhance the algorithms efficiency for all cases. Most proposed methods attempt to calculate a closed contour as a final segmentation outcome. Again in thyroid US imaging, the presence of diffusion between the nodule and the surrounding tissue is of importance clinical information for the physician, thus any algorithm that approximate a closed contour might mislead the physician. - 91 -
8.2 Materials and Methods 8.2.1 Overview and Implementation of the Algorithm
In the proposed method a multiscale hybrid model is introduced for unsupervised thyroid nodules boundary extraction from US images. Our approach, in order to overcome the texture limitations in US thyroid imaging, integrates a wavelet-based multiscale edge detection into an across scale structure detection model for a contour map estimation. The final contour map, derived from the introduced model, serves as input to a constrained Hough transform for nodule detection. The Hough transform is invariant to any open contours which may be derived from the multiscale structure model. A schematic representation of the algorithms steps is depicted in Figure 8.1. At first, a speckle reduction edge detection procedure is implemented based on a multi-fold wavelet-based analysis. A wavelet decomposition at four dyadic scales with the " trous" algorithm is employed, in which speckle is removed via a coarse to fine analysis of Wavelet Transform Modulus Maxima (WTMM). The backpropagating local maxima with positive Lipschitz exponents a are considered of value and utilised in the subsequent multiscale structure model. On the other hand, back-propagating local maxima with negative Lipschitz exponents are classified as speckle and discarded. A multi-scale pixel representation is derived from the speckle removal edge detection procedure, in which pixels of the input US image are associated with WTMM. A further step of this method is to consider a multi-scale structure representation which would associate an anatomical object in the image with a volume in the multi-scale edge transform. A multi-scale structure model has been developed for boundary detection in US images. The principal components of the introduced model are the WTMM, the Maxima Chains which are groups of local maxima with similar properties at the same scale, the Structures which are a set of connected maxima chains, the Interscale relation which determines the criteria employed for the algorithm to relate maxima chains across scales into a significant structure and the Structure Operator that indicates in which structure a given maxima chain belongs to. All mentioned components are integrated to form a multi-scale contour representation. In order to extract the nodules boundary, the last scale of the contour decomposition is employed as input into the constrained Hough transform based in a priori shape knowledge for partial circular object recognition. The software development of the proposed method and the user-friendly software for manual delineation from the two observers was implemented with Matlab 6.5. The computer used for processing has an AMD Athlon XP+ processor running at 1.8 GHz and 512 of RAM.
- 92 -
Chapter 8
Figure 8.1 Schematic representation of the segmentation algorithm.
- 93 -
8.2.2
US Data Acquisition
All US images used throughout this study were obtained from an HDI-3000 ATL digital ultrasound system Philips Ultrasound P.O. Box 3003 Bothel, WA 98041-3003, USA with a broadband (5-12 MHz frequency band) linear array. The sonographic scans were taken in both the transverse and longitudinal plane and instrument settings were set accordingly to the built-in SmallPartTest Philips protocol. The selected static US frames were digitized by a video card frame grabber (Miro PCTV, Pinnacle Systems) installed in a PC, capable of acquiring and displaying US images with a 768 x 576 resolution at 8 bits. 8.2.3 Edge Detection Procedure
8.2.3.1 Multiscale Edge Representation Multiscale Edge Representation (MER) utilizes the local maxima of the Dyadic Wavelet Transform (DWT) for characterization of signals from multi-scale edges. MER can be considered as a transformation from the initial US image representation into a feature representation based on the images intensity sharp variations defined as edges [62]. This transformation is considered as an intermediate step for analysis towards thyroid nodule segmentation. The proposed method decomposes the multiplicative speckle model into an additive signal dependent noise model, which in turn means that it omits the log-transform to avoid the mean bias correction problem [108, 109]. The two-dimensional DWT is the set of functions (W21j f ( x, y ),W22j f ( x, y )) and is given by:
1 W 1 j f ( x, y ) 2 = f 2 j 2 j W 2j f ( x, y ) 2 2
( f 2 j )( x, y ) = 2 j x ( f )( x, y ) y 2j
8.1
Where 1 ( x, y ) and 2 ( x, y ) are the analyzing wavelets, ( x, y ) is a symmetrical smoothing function approximating the Gaussian, f is the image function f ( x, y ) L2 ( R 2 ) and j the dyadic scale. The two-dimensional wavelet transform of an image can be viewed as a gradient vector (Equation 8.2) whose magnitude and phase are given by Equations 8.3 & 8.4.
W 1j f ( x, y ) 2 = 2 j ( f * j )( x, y ) 2 W 2j f ( x, y ) 2
8.2
M 2 j f ( x , y ) = W 1 2 j f ( x, y ) + W 2 2 j f ( x , y )
8.3
- 94 -
Chapter 8
W22j f ( x, y ) A2 j f ( x, y ) = tan 1 W j f ( x, y ) 2
1
8.4
8.2.3.2 Coarse to Fine Analysis The next step afterwards MER is a coarse to fine procedure that employed all available information of WTMM at different frequency bands for pointwise singularity detection. Lipschitz regularity was the main criterion towards classification of edges as speckle or important sharp variations. The outcome of this procedure was a multi-scale edge map that relates all significant information of the US image with WTMM. The discrimination between edges that correspond to structures and those arising from speckle or artifacts was a twofolded process. At first, groups of maxima that back-propagate from coarse to fine scales were detected and formed vectors in the scale-space plane, and afterwards were utilized for singularity detection via the Lipschitz exponents a [68,70]. That inter-scale information is acquired by means of a back-propagation connectivity of the wavelet transform modulus maxima. The grouping of local maxima across scales is made on the basis that if an edge exists in a coarser scale, it can also be located in all available finer scales [70]. Two local maxima from two successive scales are grouped together if they possess a close position in the image plane and similar angle value. This back-propagation tracking, from the coarsest scale 2j to the finer scale 21, produces curves of maxima in the scale-space plane (termed as maxima lines ML), which take the following form:
MLk = M j , A j , P j 2 2 2
, M ,A , P ,..., M , A , P 2 j 1 2 j 1 2 j 1 21 21 21
8.5
where: M, A and P are the magnitude, angle and position of each local maximum at a given scale and k the total number of local maxima found in the coarsest scale 2j where the backpropagation tracking is employed. The magnitude parameter is exploited in cases where a coarse local maximum is computed to back-propagate in more than one finer local maxima. In such cases the maximum with the greater magnitude is chosen to form the maxima line. The coarse to fine local maxima detection is made with a corresponding large to small investigation window, depending on the length of the dilated filter for each scale. In coarse scales the window is relatively the same with the corresponding dilated filter with 2j-1 inserted zeros while in the finer scale available the window has the same size as the filter without dilation. The decay of log2|Mf(2j,x,y)| as a function of log2s is estimated along all maxima lines that correspond to singularities with varying Lipschitz regularity. When the maxima amplitude within the maxima line decreases when the scale decreases its Lipschitz regularity is positive
- 95 -
(positive Lipschitz exponents). On the contrary, in a maxima line with maxima amplitudes that increase when the scale decreases the Lipschtiz regularity is negative (negative Lipschitz exponents). The local maxima being part of maxima lines with positive Lipschitz exponents correspond to important edges, whereas local maxima inside maxima lines with negative Lipschitz exponents correspond to speckle. After the coarse to fine analysis the backpropagating maxima with positive Lipschitz regularity are utilized as input to the subsequent multi-scale structure model. The implementation of the MER requires the computation of the wavelet transform for scales 21 to 2N. The choice of the parameter N depends on the application. For feature extraction and segmentation methods the choice of N is crucial. The Lipschitz regularity calculation of two adjacent singularities, considers as prerequisite that the two sharp variations are isolated. As the dilation of the wavelets and the smoothing function are increased, the resulting singularities will begin to overlap. In order to estimate the optimum dyadic level, the location of each sharp variation is associated with the WTMM. In this study, it is experimentally shown that the localization of WTMM is increased from N=1 to N=4. From this scale and on, the evolution of WTMM across scales produce localization and numerical errors. In fact, the majority of local maxima tracked at scale N=4 are converged into a single maxima at N=5 whereas a small fraction demonstrates irregular behavior, which in turn produces negative Lipschitz exponents between these two scales. 8.2.4 Multi-Scale Structure Model
The proposed structure model considers a significant structure as a hierarchical set of connected maxima chains. A structure representation is obtained, in which structures are bridged maxima chains with similar properties across all available scales. The fundamental parameters that the introduced model employs to generate the multi-scale structure representation, are presented below: - Local maxima: the back-propagating WTMM derived from the speckle reduction edge detection step. - Maxima Chains: WTMM are grouped together in each available scale with similar properties forming one-dimensional curves, termed as maxima chains (Figure 8.2). A chain (C) is a set of linked local maxima at a given scale 2j:
C f ( x1 , y1 ) , M f ( x 2 , y 2 ),..., M f ( x p , y p ) = M j j 2j 2j 2 , k 2
8.6
where M is the amplitude, k and p the number of chains and local maxima included in the chain respectively.
- 96 -
Chapter 8
Figure 8.2 Local maxima linking procedure. Adjacent local maxima form a maxima chain
C 2 j ,k due to positional proximity combined with amplitude and angle values similarity.
- Structures: a structure is a set of connected maxima chains at the same scale. A significant structure derived from adjacent chains is of the form:
S , C ,..., C = C j j j j 2 ,1 2 ,2 2 , i 2
j
8.7
where i is the number of adjacent maxima chains bridged to form a structure and 2 the coarsest scale. - Inter-scale relation: the criteria employed for the algorithm to relate maxima chains across scales into a single structure. - Structure operator: which indicates to which structure a given maxima chain belongs to. 8.2.4.1 Maxima Linking The previous edge detection procedure generated a speckle-free multi-scale pixel representation, where each pixel corresponds to WTMM emanating from various structures located within the US image. Nevertheless, individual wavelet maxima are mostly not independent features; they are part of certain lines or curves localized in multiple scales. The initialization of the model is implemented as global searching of WTMM in each scale with matching properties. The maxima chain linking procedure creates curves comprising of local maxima groups at each scale. The connectivity procedure is a complicated operation depended on directional compatibility, spatial adjacency and amplitude similarity. The latter can be rephrased as two edge-points in a given scale, linked to form a maxima chain on condition that they are close to each other and have similar phase and amplitude values.
- 97 -
8.2.4.2 Structure Identification Despite the chaining procedure, small gaps between adjacent maxima chains are created resulting to a broken outline. The main reasons of this inadequacy are: possible transducer displacement from the physician during US examination, various acoustic phenomena such as refraction, shadowing and reverberation, and numerical errors made during the calculation of the coarse to fine evolution of WTMM. These numerical errors are caused by the fact that the wavelets used (quadratic spline) are not the derivative of a Gaussian but only an approximation. An efficient structure representation prerequisites continuity of chains. The rule applied to connect two maxima chains into a single structure is termed as inter-scale relation. All maxima chains located in the multi-scale edge map correspond to significant segments of a broken borderline. Each of the maxima chains is approximated by its mean and position. This approximation of a maxima chain is expressed by the following equation:
C2 j
p ( M 2 j ,1 , M 2 j , 2 ,..., M 2 j , p ) , P , P = 1 2 j ,1 2 j , p p
8.8
where 2
is the dyadic scale, M2j,1..:p are the amplitude values of the chain, p the number of
local maxima and P2 j ,1 , P2 j , p are the chains maxima starting and ending positions. Two maxima chains at two successive scales
C
2j
C 2 j +1 are said to belong to a single structure
because of their maxima positions proximity. This means that the majority of the local maxima compose a maxima chain C 2 j must also be contained in the maxima chain C 2 j +1 . Moreover, two adjacent maxima chains at a given scale comprise a single structure due to position and mean-amplitude similarity. All maxima chains, appearing in successive scales that satisfy the above two rules are connected in such a way to form a structure. In each structure defined as a set of maxima chains, an operator L is assigned which indicates the arithmetic label given at each detected group of connected maxima chains: L(C
2 j ,i ) = l , where i is the number of maxima chains
and l is the arithmetic label of each structure. In the resultant structure representation any individual maximum that is not linked during the chaining is removed. Once the structure representation has been implemented the structure image from the coarsest scale is transformed into a Boolean one (i.e. pixel intensity equals 1 when a maximum is detected and zero otherwise). An additional step is required to separate and detect the important structure that corresponds to the thyroid nodule from other various structures that correspond to - 98 -
Chapter 8
different anatomical regions located in the US image. A pseudo code of the structure detection procedure that constructs the multiscale structure representation is depicted below. PSEUDO CODE
function STRUCTURE DETECTION returns multiscale structure representation maxima chain construction inputs: N =4, the maximum decomposition scale. Multi-scale Edge Map with M
f ( x, y ) 1 j 4 local maxima classified as important edges 2 j
for each scale: 2j (j=1N) do find M f ( x, y ) with similar position, amplitude and phase values, j 2 construct maxima chain: C
j 2 ,1
,M
j 2 ,2
,..., M
j 2 ,P
structure detection
for each C
2N
do
for all finer scales: 21 23 do if there are maxima chains C

with similar position values 3 ,C 2 ,C 1 2 2 2
construct unified maxima chain across scales: C j and if two adjacent maxima chains (C
), (C ) j 2 ,2
j =1 C j 2 N j 1 2
N
j 2 ,1
with position and amplitude proximity

N j 1
assign structure operator: L L ( C construct structure S
j 2 ,i
) , inumber of connected maxima chains in each scale
, C ,..., C C j j j 2 ,1 2 ,2 2 , i 2 j
8.2.5
Nodules Boundary Extraction
The boundaries of several anatomical parts investigated in medical imaging such as kidney, prostate, parts of the heart and of course thyroid nodules can be approximated by regular curves. As already stated in the introduction section, despite the irregularity degree of its contour, every thyroid nodule retains a partial circular shape [1-46]. The edge detection methods do not extract the ROI. In optimal algorithm performance, the candidate anatomical structure is revealed noise-free in the xy-plane surrounded by other anatomical structures present in the medical image. An efficient technique to isolate features of a particular shape within an image is the Hough transform. The classical Hough transform requires that the desired features must be specified in some parametric forms, by regular curves such as lines,
- 99 -
circles, ellipses, etc. [160] whereas the generalized Hough transform can be employed in applications where a simple analytic description of a feature is not possible [161]. Due to the computational complexity of the generalized Hough algorithm, we have restricted our boundary extraction attempt within the framework of the classical Hough transform. 8.2.6 Constrained Hough Transform
The main idea of the Hough transform can be considered as a point to curve transformation from a Cartesian image space map in the Hough parameter space. When viewed in Hough parameter space, points that are collinear in the Cartesian image space become readily apparent as they yield curves, which intersect at a common point. From the synthetic image of Figure 8.3 (a), a Boolean map is derived with labeled structures corresponding to several anatomical regions. For every structure a reference point (xref,yref) is computed based on the x and y coordinates of the maxima constituting each structure. Each reference point is considered as candidate center-point for a circular object with a varying radius ri:
ri 21: p = (xi x ref =
) + (y
2
y ref
8.9
where xi, yi are the coordinates and p the number of the connected maxima for each structure. Circles are projected around the candidate center-point, and for each maxima point on the structure boundary, the corresponding cell in the accumulator array is incremented by one. The accumulator matrix representing the Hough parameter space has the same dimensions as the original, and the method involves projecting circles using Cartesian coordinates (Figure 8.3 (b)). 8.2.7 Accumulator Local Maxima Detection
As a result in the parameter space those reference points located within circular objects (i.e. thyroid nodule) are related with relatively large values in the accumulator array, in contrast to other structures (i.e. veins, arteries etc). In Figure 8.3 (c) we can observe that the cell approximately corresponding to the simulated nodules center appears as the maximum value in the parameter space. A local maxima detection procedure is applied in the accumulator array to find the approximate center. Subsequently, its corresponding structure is superimposed on the image as the final segmentation outcome (Figure 8.3 (d)).
- 100 -
Chapter 8
Figure 8.3 (a) Synthetic Image, (b) Hough parameter space via projecting circles with variable radius. High amplitude values (candidate center points) correspond to white pixels, (c) Accumulator array, (d) Segmented image. 8.3 Results To evaluate the performance of the proposed segmentation method, a comparative study was employed, comprised 40 US thyroid images from 40 female patients between 40 and 65 years old. All images were randomly chosen from a greater US image database collection acquired from an experienced radiologist (N.D). The proposed methods segmentation results were compared with the delineated boundaries (used as ground truth) drawn from two experienced observers (OB1 & OB2) in terms of nodule area, roundness, concavity and Mean Absolute Distance (MAD). The area of each nodule is calculated by measuring the pixels inside the nodules borders. Roundness characterizes the circularity of the nodule and takes low values for circular nodule and high for irregular boundaries [75]. The nodules roundness is defined as:
Roundness =
Perimeter 2 Area
8.10
The perimeter is measured by summing the number of pixels on the border of the nodule. Concavity is a shape feature that indicates the presence of concave regions [75]. It is measured by dividing the mean value of the Euclidean distances between the centroid and the
- 101 -
Convex Hull (CV) pixels with the mean value of the Euclidean distances between the centroid and the Actual Boundary (AB) pixels:
1 N (YCentroid YCVi ) 2 + ( X Centroid X CVi ) 2 N i =1 Concavity = 1 M (YCentroid YABi ) 2 + ( X Centroid X ABi ) 2 M i =1
8.11
The MAD characterizes the shape difference evaluation between two contours [162]. If two given curves are represented as point sets: A = {a1 , a 2 ,..., a n } and B = {b1 , b2 ,..., bm }the MAD describes the average pixel distance between these two curves:
e( A, B) =
1 1 n 1 m i =1 d (ai , B ) + i =1 d (bi , A) m 2 n
8.12
with:
d (ai , B ) = min b j ai
j
8.13
Regarding the MAD parameter, in order for the results to be numerically comparable with the other three parameters the percentage MAD% parameter is introduced. It is considered as the percentage pixel area difference between the automatic and manual boundaries. The distances summation of each pair automatic (AU) manual boundary (OB) is subtracted from the total number of pixels within each manual boundary (OB). The result of this operation is converted in terms of percentage. High MAD% values suggest that the great majority of pixels within the manual boundary area (used as ground truth) are also present in the automatic boundary area (see Equation 8.14).
MAD% = OB pixels
d (OB, AU )
pixels %
8.14
The comparison of the manually (OB1 & OB2) and the automatically (AU) segmented nodules regarding area, roundness, concavity and MAD% gave agreement rates on average 88.95%, 91.77%, 91.09% and 90,64% respectively for the first observer and 87.58%, 91.12%, 89.08% and 89,53% for the second observer. The actual MAD values, calculated as the average of the distances, were 2.54 for the set of pairs AU OB1 and 2.16 for the set of pairs AU OB2 with standard deviations 0.88 and 0.83 respectively. Results are presented in Tables 8.1 & 8.2.
- 102 -
Chapter 8
Table 8.1 Percentage agreement between automatic (AU) and manual segmentations (OB1, OB2) in terms of Area, Roundness, Concavity and MAD%.
US Image A/A Area Roundness Concavity MAD% Average Accuracy
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
Average accuracy
AUOB1 84,55 85,31 84,82 86,92 87,99 86,13 95,00 87,22 94,52 82,89 87,40 94,20 87,27 90,68 85,90 82,30 88,86 88,81 93,80 87,30 91,04 90,86 84,95 88,37 88,16 93,82 94,64 92,84 93,44 96,86 95,26 81,52 93,49 77,87 84,91 80,55 83,19 92,47 94,83 92,30 88,83
AUOB2 90,65 78,94 90,40 91,21 85,71 82,49 88,96 94,22 88,51 82,64 80,97 91,15 84,31 84,19 78,96 91,24 95,09 91,25 88,08 86,83 94,50 87,05 76,00 84,23 86,08 97,93 90,71 88,71 95,12 91,13 94,02 81,61 89,03 78,68 84,08 79,31 84,63 96,88 91,16 86,59 87,58
AUOB1 91,63 92,78 92,79 91,97 93,65 90,93 89,42 93,78 90,29 93,03 91,88 93,64 86,90 96,73 93,27 93,93 75,51 95,27 92,48 94,09 93,45 89,19 94,10 96,21 96,70 95,00 93,37 94,39 90,43 87,07 95,59 87,19 95,02 88,81 91,92 93,94 92,44 92,91 79,91 89,16 91,77
AUOB2 94,44 97,08 92,03 94,68 90,36 94,22 89,51 84,48 95,51 96,77 93,12 86,59 88,07 93,53 89,12 90,23 81,76 93,43 93,27 95,67 95,48 85,80 89,86 93,48 92,31 91,21 87,35 96,12 92,31 89,95 96,78 91,78 84,46 84,56 89,34 92,68 94,55 93,51 84,12 85,19 91,12
AU OB1 90,19 93,11 86,61 90,91 95,87 94,94 83,80 84,24 89,19 88,75 82,64 80,63 83,65 87,44 97,86 98,01 89,26 91,66 85,31 93,16 89,18 76,27 92,20 92,01 92,98 95,12 81,38 88,93 87,72 93,30 86,59 90,96 96,07 82,93 82,60 93,14 93,63 89,97 86,60 89,46 89,21
AUOB2 92,53 96,33 86,71 95,19 81,94 87,61 89,62 85,49 92,12 92,90 90,54 88,93 89,72 81,52 90,32 95,20 84,14 87,57 85,58 95,32 85,49 82,45 84,10 86,11 88,93 96,24 90,76 93,70 83,43 85,21 90,03 89,51 92,27 91,36 88,44 91,24 86,12 96,71 84,97 86,91 89,08
AU OB1 90,70 93,44 85,93 93,72 92,01 87,87 86,81 87,47 89,23 93,55 91,94 94,94 90,10 89,98 92,05 85,21 88,62 93,16 87,08 95,71 92,09 90,01 89,33 85,12 92,53 91,53 91,14 93,37 90,13 87,99 96,96 93,38 92,52 92,19 93,53 94,44 91,96 84,65 91,73 86,53 90,77
AUOB2 90,19 93,11 86,61 90,91 95,87 94,94 83,80 84,24 85,12 88,75 82,64 88,12 83,65 87,44 97,86 93,19 89,26 91,66 85,31 93,16 89,18 84,11 92,20 92,01 92,98 95,12 81,38 88,93 87,72 93,30 90,12 90,96 96,07 83,98 84,56 93,14 93,63 89,97 86,60 89,46 89,53
AUOB1 89,27 91,16 87,54 90,88 92,38 89,97 88,76 88,18 90,81 89,56 88,47 90,85 86,98 91,21 92,27 89,86 85,56 92,22 89,67 92,57 91,44 86,58 90,14 90,42 92,59 93,87 90,13 92,38 90,43 91,30 93,60 88,26 94,27 85,45 88,24 90,52 90,31 90,00 88,27 89,36 90,14
AUOB2 91,95 91,37 88,94 93,00 88,47 89,81 87,97 87,11 90,31 90,27 86,82 88,70 86,44 86,67 89,07 92,47 87,56 90,98 88,06 92,75 91,16 84,85 85,54 88,96 90,07 95,12 87,55 91,87 89,65 89,90 92,74 88,47 90,46 84,65 86,61 89,09 89,73 94,27 86,71 87,04 89,33
Table 8.2 Mean values and standard deviation of the computed MADs for the pairs AU OB1 and AU OB2.
MAD (pixels) Automatic Observer 1 (AU OB1) Automatic Observer 2 (AU OB2) Standard Deviation
2.54 2.16
0.88 0.83
The overall segmentation procedure (edge detection multiscale structure representation nodules boundary extraction) employed towards an efficient boundary detection in two US - 103 -
images with an iso-echoic and a hypo-echoic thyroid nodule, is depicted in Figures 8.4 & 8.5 respectively.
Figure 8.4 (a) US image with an iso-echoic thyroid nodule, (b) Contour representation, (c) Constrained Hough transform, (d) Accumulator array, (e) Outcome of the hybrid model, (f) Manually delineated boundary.
- 104 -
Chapter 8
Figure 8.5 (a) US image with a hypo-echoic thyroid nodule, (b) Contour representation, (c) Constrained Hough transform, (d) Accumulator array, (e) Outcome of the hybrid model, (f) Manually delineated boundary.
- 105 -
Inter-observer variability: US regular diagnostic procedure is highly subjective, thus ideal boundaries for the thyroid nodules are difficult to acquire. In order to assess any potential variation in boundary recognition, an inter-observer study was also performed for the two expert radiologists specialized in ultrasonography. The manual delineation in this study for all US images was done independently. Inter - observer agreement was determined using the weighted K (kappa) coefficient [117] calculated for all parameters employed in the previous comparative study. The ideal boundary was selected as the vector space union of each manual borderline pair. The threshold chosen for each parameter, in order for a manual boundary to coincide with the ideal one, was set to 90% agreement. A kappa statistic above 0.75 is taken arbitrarily to show excellent agreement, between 0.40 and 0.75 as substantial agreement and below 0.40 as poor agreement Table 8.3. Table 8.3 Inter-observer agreement (kappa coefficient) between the two observers
Kappa Coefficient
Area Roundness Concavity MAD

Average
0.89 0.77 0.75 0.92 0.83
The comparison of the two manually segmented nodules regarding area, roundness, concavity and MAD gave agreement rates on average approximately 90.77%, 91.39%, 92.25% and 92.91% respectively with an overall percentage agreement of 91.83%. 8.4 Discussion and Conclusion A new segmentation technique for automatic boundary extraction of thyroid nodules in US imaging is designed. The contribution of this approach is the integration of wavelet-based coarse to fine singularity detection, multiscale structure model, and the constrained Hough transform. In the comparative study, thyroid nodule segmentation accuracy reached approximately 90.14% and 89.33% in respect with the two observers. The percentage agreements between the derived and the manual delineated boundaries are within the interobserver range (91.83%). The MAD values, between the derived and the manual delineated boundaries (2.16 2.54), also coincide with the values other studies have presented (1.37 to 4.55) [122,135,136]. The inter-observer study demonstrated high kappa coefficient agreement of approximately 0.83. Although the algorithm is unsupervised, the evaluation results may be regarded as most encouraging considering the reduced US image quality. The proposed
- 106 -
Chapter 8
algorithm may be of value for computer-assisted systems aiming to support the standard diagnostic procedure as an objective second opinion tool. The local maxima representation employed to construct the speckle free multiscale edge map not only does it identify edges but also characterizes them. The integration of local maxima via a coarse to fine method led to the different classification of edges based on their regularity. As a result, strong edges arising from significant structures are utilized in the subsequent structure model, whereas edges corresponding to speckle were discarded. Regarding the maxima chaining procedure, similar maxima linking attempts [70,163] have adopted a thresholding procedure based on the number of adjacent local maxima only in the last scale available to isolate the ROI. The complex nature of US imaging often provides maxima chains with relatively small number of linked local maxima (two or three contiguous maxima) that might be a part of a greater structure contour. In contrast to these studies, the proposed multiscale structure model utilizes all local maxima existing in the edge map across scales, in order to reconstruct the best possible contour map. Besides the presence of speckle, another major difficulty that all segmentation approaches encounter, is the existence of various anatomical structures located within the image. In very noisy structural environments, ROI isolation is an extremely difficult task. Most proposed methods also attempt to calculate a closed contour as a final segmentation outcome. The constrained Hough transform employed in this study, not only is it relatively unaffected by structure noise, but manages to isolate all thyroid nodules despite the fact that sometimes the nodule contour is not closed. In cases of diffusion between the nodule and the surrounding tissue, the algorithm does not guess the nodule boundary in that region, thus leaves this important clinical information for the physician. In addition, due to the fact that nodules are not perfect circles, the radius of the contour is variable and the accumulator array local maxima detection is made in a small area rather than in a single point. The different echogenicity behavior of thyroid nodules (hypo-, iso- or hyper-echoic) limits the accuracy of thyroid nodules segmentation algorithms based on texture characteristics [164]. The edge-based technique presented in this article is echogenicity invariant as we can observe in Figures 8.4 & 8.5, in which an iso-echoic and a hypo-echoic nodule are successfully segmented. Apparently, the hybrid algorithm can be employed to segment various anatomical structures viewed via US imaging with only one constrain: The a-priori shape knowledge selection regarding the structure of interest. The outcome of the edge detection stage is highly depended on the radiologists competency in performing the US examination. Any possible misplacement of the transducer or the occurrence of various acoustic phenomena such as reverberation or shadowing, may produce a false contour map. The proposed algorithm identifies sharp variations with great accuracy wherever they occur but dont approximate the edges that are not visible in the image. In US - 107 -
images, where the thyroid nodule is partially visible or greatly distorted due to aforementioned acoustic phenomena, the proposed algorithm faced problems during the edge detection procedure. All the cases that had these drawbacks were omitted from the comparative study. The incorporation of the generalized form of the Hough transform for arbitrary shape detection into the maxima chaining procedure across scales, may avoid most of the problems encountered by the majority of edge detection approaches. Nevertheless, the fine-tuning between the inter-scale wavelet maxima chaining and Hough transform algorithms, along with the extensive compilation time of the Hough transform are main drawbacks towards this segmentation approach. The aforesaid potential problems of any segmentation approach necessitate the continuous investigation for an optimal segmentation technique. A parallel wavelet-based study is under research and development from the same team, based on zero crossings detection in which the wavelet transform is made with the second derivative of a Gaussian smoothing function. As a conclusion, a new efficient segmentation technique for thyroid nodule segmentation in sonography is introduced with promising results. The proposed algorithm is able to outline with high accuracy thyroid nodules regardless their texture, possible discontinuities in the boundary line and the presence of extensive structure noise. The method was evaluated in comparison with two experienced observers and demonstrated great agreement accuracy. The utilization of the hybrid method may assist a shape-based thyroid nodule categorization from the physician. Also, it might enhance the accuracy of the fine needle aspiration procedure thus offer several advantages in the decision-making procedure. Moreover it can be used as an educational tool for inexperienced radiologists.
- 108 -
CHAPTER 9
Development of a Support Vector Machine Based Image Analysis System for Assessing the Thyroid Nodule Malignancy Risk on Ultrasound
Summary In this chapter an image analysis system, based on the support vector machine (SVM) classifier is introduced for the automatic characterization of thyroid nodules in sonographic images that employed textural features. The chapter is organised as follows: at first a review of the literature regarding clinical and computer based attempts to categorize thyroid nodules is presented. In the materials and methods sections the SVM classifier is presented along with the classical quadratic least squares minimum distance (QLSMD) the quadratic Bayesian (QB) and the multilayer perceptron (MLP) classifiers for comparison reasons. In the results section the performance of the SVM classifier compared to that of the other three classifiers is also presented. Moreover a study regarding the design and implementation of wavelet kernels built within the framework of Reproducing Kernel Hilbert Spaces (RKHS) is introduced. In the discussion section a detailed analysis regarding performance and clinical aspects raised from the algorithms results is given. 9.1 Review of the Literature Thyroid nodules are swells that appear in the thyroid gland and can be due to growth of thyroid cells or a collection of fluid known as cyst. They can become large enough to press on nearby structures in the neck, they can overproduce thyroid hormone (hyperthyroidism) or they may be indicative of thyroid cancer [1]. Various techniques have been introduced for the detection and evaluation of thyroid nodules such as physical examination, cytological examination, scintigraphy, ultrasonography, magnetic resonance, and computed tomography [14]. High-resolution thyroid ultrasonography (US) is exceptionally sensitive in locating the size and number of thyroid nodules [19]. The sonographic findings of the thyroid nodule are often employed as criteria in assessing the risk factor of malignancy and are crucial in patient management, i.e. whether to recommend or not surgical operation [19]. Such criteria include echogenicity, absence of halo, calcifications, irregular margins, and intra-nodular vascular patterns or spots [20,22]. However, estimation of the risk factor involves the subjective evaluation - 109 -
Development of an SVM Model for Assessing Thyroid Nodule Malignancy Risk
of US images by the physician and, thus, it depends upon the experience of the examiner. Previous US studies [5,17,18,21,25,27] on thyroid nodules have reported different diagnostic accuracies in predicting malignancy based on the visual analysis of US images. It is evident that a quantitative assessment of the thyroid nodules risk factor may be of value in avoiding unnecessary invasive interventions. Previous studies on quantitative methods for estimating the risk associated with thyroid gland disease mainly concern evaluation of parameters from the gray-level histogram of the thyroid gland US image [165,166], several textural features from gray-tone spatial-dependence matrices [167] and the application of discriminant analysis [165,167]. In those studies, discrimination accuracies between benign and malignant thyroid nodular lesions were 83.9% [167] and 85% [165]. The fact that echogenicity and the existence of different structures inside the thyroid nodule have been indicated as important factors leading to thyroid malignancy [17,18,25,27], combined with the lack of recent quantitative studies in assessing the nature of thyroid nodules, necessitates research to continue employing: (a) US thyroid images of modern high resolution scanners and (b) robust computer-based pattern recognition methods using state-of-the-art classification algorithms to increase the classification accuracy of objective methods and thus assist physicians in the pre-operative management of patients. 9.2 Materials and Methods 9.2.1 US Image Data Acquisition
The study comprised 120 ultrasonic images displaying thyroid nodules of 120 patients. All US examinations were performed on an HDI-3000 ATL digital ultrasound system Philips Ultrasound P.O. Box 3003 Bothel, WA 98041-3003, USA with a wide band (5-12 MHz) linear probe using various scanning methods such as longitudinal, transversal and sagittal cross sections of the thyroid gland. The dataset was acquired in the time interval from October 2003 to September 2004. During ultrasound examinations the SmallPartTest Philips protocol was used. All protocols settings remained constant throughout that period. Time gain compensation-setting had a linear increasing gain compared to the depth. Magnification setting remain 1:1 except rare cases of very small nodules (<1.5 cm), which were not included in the present study. Dynamic range setting was relatively high (60 dB) in order to exploit the high capability of contemporary US systems to visualise US images with the maximum number of gray tones. Thermical and mechanical indexes were set at 0.2 and 0.9 respectively. Multiple focal lengths were set at 1, 2 and 3 cm simultaneously.
- 110 -
Chapter 9
Each US image was digitized via connecting the video output of the ultrasound scanner to a Screen Machine II frame grabber using 768 x 576 x 8 image resolution. Under real-time ultrasound guidance, all nodules with size above 1.5 cm underwent fine needle (23-gauge) aspiration biopsy. From various sites of the nodules 6 to 10 specimens were taken and smears were placed in slides. Those slides were evaluated from two experienced observers. Excluding the rare cases of typical neoplasm of the thyroid both observers graded the cases in two major categories: epithelial hyperplasia which can be characterised as high risk (42 cases) due to potential malignancy growth, thus leading to repeated ultrasound and cytology examinations of the patient, and in benign lesions (colloid nodules 78 cases), in which the follow up examinations can be performed in long time intervals. Retrospectively the physician (N.D) analysed the US characteristics of the corresponding ultrasound images of the two classes given by the cytologists in accordance with Tomimoris grading [20]. The low risk class mostly comprised iso-echoic or hyper-echoic solid nodules with or without cystic change and coarse calcification while the majority of the high-risk class contained hypo-echoic solid nodules with regular borders or cystic nodules with solid components. 9.2.2 Data Pre-processing
The boundary of each nodule was delineated by the physician employing an easy-to-use interactive software program implemented for the purposes of the present study. Data processing was performed on an AMD Athlon XP+ processor running at 1.8 GHz and 512 of RAM. A number of textural features were automatically calculated from the segmented ROI of each thyroid nodule. Textural features are related to the gray-tone structure of the thyroid nodule as depicted on the ultrasound unit and carry information relevant to the risk factor of malignancy. Four features were computed from the nodules gray-tone histogram, 26 from the co-occurrence matrix [73] and 10 from the run-length matrix [74]. 9.2.3 Classification
System evaluation was performed by means of the leave-one-out methods (LOO) [83] and highest classification accuracies were determined by means of the exhaustive search method [83]. Besides LOO method, the re-substitution method was also employed (all data were involved in the design and evaluation of the classifier), so as to find the upper and lower bounds of the classification error [83]. For the purposes of the present study four classifiers were designed, the Support Vector Machine classifier and for comparison reasons the QLSMD, the QB and the MLP classifiers. In the design of a classifier, features play an important role that influences its final discriminatory performance. Ideally, all features at hand (40) should be employed, but since a - 111 -
number of them may be redundant due to mutual correlations [83], an optimum number of them had to be selected to achieve highest classification accuracy. Choosing the best feature combination that will maximize the performance of the classifier is a necessary but timeconsuming and computationally demanding procedure. The method followed (exhaustive search) involved designing the classifier by means of every possible feature-combination (i.e. 2, 3, 4 feature combinations) and all thyroid data available, each time testing the classifiers performance in correctly classifying the thyroid data, and finally selecting that feature combination that demonstrated the highest classification accuracy with the smallest number of textural features. The exhaustive search method was chosen instead of statistical methods such as F-statistics, because the latter could result in unreliable error probability estimation [83], due to small size of the dataset. For the SVM-classifier [87, 88, 168] employing the polynomial kernel of 3rd degree, best feature combination comprised the mean gray-level (Chapter 6 Equation 6.1) value of the thyroid ROIs histogram and the sum variance (Chapter 6 Equation 6.16) from the co-occurrence matrix [73]. For the MLP classifier the highest classification accuracy was achieved by the mean gray-level value (Chapter 6 Equation 6.1) and the run length non-uniformity (Chapter 6 Equation 6.26) from the run length matrix [74]. For both the QLSMD and QB classifiers, highest classification accuracies were achieved by the feature combination of the mean gray-level value, the sum variance, and the run length nonuniformity. 9.2.4 Support Vector Machine Classifier
An SVM based classifier is designed to work for two class-problems and it can be applied to linearly or non-linearly separable data, with or without class data overlap [87]. In the most difficult case of non-linearly separable and overlapped data, which is often the case, data are first transformed from the input space to a higher dimensionality feature space, where classes are linearly separable. Then two parallel hyperplanes are determined with maximum distance between them and at the same time with minimum number of training points in the area between them (also called the margin). Finally, a third hyperplane through the middle of the margin is defined, which is the decision boundary of the two classes. The discriminant equation of the SVM classifier may thus be defined as in (9.1):
- 112 -
Chapter 9
NS g ( x) = sign i yi k (x i , x) + b i =1
9.1
where ai are weight parameters, k(xi,x) is the kernel function employed for the data transformation into the linearly-separable feature space, xi are the support vectors (i.e. the training pattern vectors that have their corresponding weights ai 0), NS is the number of support vectors, x is the input pattern vector, b is the bias or threshold, and yi {1, +1}, depending on the class. In the present work, the SVM classifier was designed employing various polynomial kernels up to the 4th degree (Chapter 6 Equation 6.61) and the radial basis function (Chapter 6 Equation 6.62) kernel. 9.2.5 Multilayer Perceptron (MLP) Classifier
The MLP classifier employed in this study is a feed-forward back propagation neural network that has two input features, two classes, two hidden layers and four nodes in each hidden layer (for more details see Chapter 6, paragraph 6.6.3). 9.2.6 Quadratic Least Squares Minimum Distance Classifier
The QLSMD classifier [169] maps via a non-linear transformation the input data set into a decision space where each class is clustered around a pre-selected point. The classification of a given test point is based on its minimum distance from each pre-selected point. For the QLSMD, the discriminant function for class i and for pattern vector x is given by:
g i (x) = ij xi2 +
j =1
d 1
j =1 k =i +1
ij x j xk + ij x j b
j =1
9.2
where d is the number of features, ij are weight elements, b is a threshold parameter, and xj are the input vector feature elements. 9.2.7 Quadratic Bayesian Classifier
The Bayes decision theory develops a probabilistic approach to pattern recognition, based on the statistical nature of the generated features. The Bayes discriminant function [170] for class i and for pattern vector x is given by: gi(x) = lnPi 1 ln|Ci| 1 [(x mi)TCi-1(x mi)] 2 2 9.3
where Pi is the probability of occurrence of class i, mi is the mean feature vector of class i, and Ci is the covariance matrix of class i.
- 113 -
9.2.8
Support Vector Machines Wavelet Kernels
In this section the efficiency evaluation of wavelet kernels within the SVM classifier model for the characterization of the malignancy risk factor of thyroid nodules in sonographic images is introduced. The SVM classifier designed in this study was based in wavelet kernels built within the framework of Reproducing Kernel Hilbert Spaces (RKHS) [171,172]. Several periodized orthogonal wavelets were used (Daubechies, Coiflett, Symmlet,) for the construction of the corresponding reproducing kernels and their performance was compared to that of the standard polynomial and radial basis function kernel functions. 9.2.8.1 Wavelet Kernels Implementation A reproducing kernel Hilbert space is a Hilbert space of functions with special properties [173] and along with its associated kernels can be constructed by means of the frame theory [174,56,175]. Frame theory analyzes the completeness, stability and redundancy of linear discrete signal representations. A frame is a family of vectors { n }n that characterizes any signal f from its inner products
{ f , }n . Frame theory allows the representation of any vector in

n
space by linear combination of the frame elements. The discrete wavelet transform is studied and developed through the frame formalism. Building RKHS from a Hilbert Space: Below are presented the conditions under which a frameable Hilbert Space is also a Reproducing kernel Hilbert Space. Let B be a functional Hilbert Space endowed with inner product
so that:
f B, f
<
9.4
In order to build an RKHS from that Hilbert space, an operator T is defined to map the H functions onto the set of the pointwise valued functions R X = { f : X R} . A general way of constructing such a linear mapping is based on the scalar product. We define a set of functions
x () [176], indexed by x X and the mapping operator T:
B RX
T:
f g () so that g ( x) = Tf ( x ) = x (), f ()
9.5
B
B is decomposed to B = Ker(T ) M and the bijective restriction of T is called S
- 114 -
Chapter 9
S:
M H = Im(T )
f g () = Sf = T
9.6
If H is endowed with the following inner product
g 1 , g 2 H , g 1 , g 2
= S g11 , S g 21
= f1 , f 2
9.7
Then H is a RKHS [173] with a Reproducing Kernel K in H:
K ( x, y ) = ( x,), (, y )
9.8
According to Aronjain theorem [173] for any RKHS space of functions exist a reproducing kernel and vice versa. This reproducing kernel constitutes a Mercer Kernel [177]. Construction of wavelet kernel in L2 ( X ) : The dilated and translated wavelet family {i} constitutes an orthonormal basis of L2 (R) space in which underlie a wavelet functions Hilbert space [171, 172]. We restrict to cases where B=L2 and H L2 . Consider the wavelet family {i} as an orthonormal basis of L2. Since x () is a function set of L2 and H L2 , we denote:
x X , x () = ai , j j ( x)i ()
i, j
9.9
with,
a
i
2 i
( x) <
9.10
where { ai () } being a set of coefficients depending on the evaluation point x. This equation supposes that i ( x) exist and is well defined for x X . In other words, this means that the considered orthonormal basis must be defined pointwise. The reproducing kernel of H becomes:
K ( x, y ) = x (), y () =
i , j ,n
i, j
a i , n j ( x ) n ( y )
9.11
For each classification problem in SVMs, a hypothesis space associated with a kernel K built from an operator x () is expressed as:
- 115 -
K ( x, y ) = x (), y () = a 2,k j ,k ( x ) j ,k ( y ) j
j ,k
9.12
where j, k is a translated 1-dimensional wavelet of resolution j: [171, 172].
j ,k ( x ) =
9.2.9
1 aj
x ku0 a j aj
9.13
System Performance Evaluation
System evaluation was performed by means of the leave-one-out method. Accordingly, each classifier was designed employing its best feature combination determined in section 2.3 and by all but one thyroid-ROI feature vector. The latter was presented to the input of the system to be classified as either high-risk or low-risk. The process was repeated, each time leaving a different thyroid-ROI out, until all data had been processed. In this way, each classifier was evaluated by data that were not involved in its design. It is evident, that the classifiers had to be re-designed each time a thyroid-ROI was left out. This required few hours of computer processing time. 9.3 Results and Discussion We have developed a quantitative method by means of an SVM based software classification system that employed a large number of textural features from US thyroid images for assessing the malignancy risk factor of thyroid nodules. 9.3.1 SVM Classification Outcome
Table 9.1 shows the results obtained by the SVM classifier for different kernel functions. Results were obtained by the leave-one-out method and by the re-substitution method. Maximum classification accuracy in distinguishing low-risk from high-risk thyroid nodules by the LOO method was 96.7% using the polynomial kernel of 3rd degree. Table 9.1 Classification accuracies for various SVM kernels using the leave-one-out and resubstitution methods, for the mean gray value sum variance best feature combination SVM kernel Polynomial of 1st degree Polynomial of 2nd degree Polynomial of 3rd degree Polynomial of 4th degree RBF + Leave-one-out method Classification accuracy LOO+ (%) Resub.* (%) 89.2 93.3 91.7 96.7 96.7 98.3 94.2 99.2 94.2 97.5 NSV** 17 13 12 15 15
- 116 -
Chapter 9
* Re-substitution method ** Number of support vectors employed using the re-substitution method Its worth noting that the maximum classification accuracy corresponds to the minimum number of support vectors involved in the design of the SVM-classifier. The number of support vectors among different kernel functions for the best feature combination ranged between 10% and 14.2 % of the number of training points. The small number of support vectors is indicative of the SVM manageable class separability. Table 9.2 gives a detailed account of the SVM-3rd degree polynomial kernel classification accuracies obtained by the LOO and re-substitution methods, employing the mean gray-level value and sum variance features combination. Table 9.2 Truth table of the SVM classifier employing the 3rd degree polynomial kernel, and the mean gray value sum variance best feature combination SVM classification (3rd degree polynomial kernel) Verified thyroid nodule classes Low risk High risk Overall accuracy + Leave-one-out method * Re-substitution method Low risk 76 (78) 2 (2) High risk 2 (0) 40 (40)
LOO+ (resub.*) accuracy 97.4 (100) % 95.2 (95.2) % 96.7 (98.3) %
Seventy-six of the low-risk thyroid nodules were correctly classified while two nodules were incorrectly assigned to the high-risk class, giving a classification accuracy of 97.4% by the LOO method. In the case of the high-risk thyroid nodules, forty were assigned to the correct class while only two were wrongly classified to the low-risk class, scoring a 95.2% class discrimination accuracy. Overall, the SVM achieved 96.7% precision in distinguishing correctly low-risk from high-risk thyroid nodules. Figure 9.1 shows a scatter diagram of the mean gray-level value against sum variance, the class margins and the decision boundary drawn by the SVM-3rd degree polynomial kernel classifier.
- 117 -
Figure 9.1 Sum variance versus mean gray value scatter diagram, displaying the low-risk and high-risk thyroid nodule data points, the SVM classifier margins, and the decision boundary employing the polynomial kernel of 3rd degree. The best features combination employed (mean gray-level value and sum variance) are related to the textural parameters visually evaluated by physicians in assessing the thyroid nodules risk factor [17, 18, 25, 27]. The mean gray-value is closely associated with the echogenicity of the nodule and the sum variance feature expresses useful spatial information inside the nodule linked to the existence of various structures within the nodule. Features of relative nature (upper 10% gray-level histogram distribution and entropy) have also been indicated in previous quantitative studies [164] to play an important role in thyroid nodule malignancy assessment, scoring an overall of 85% classification accuracy. Similar accuracies (83.9%) were also obtained in another quantitative study employing discriminant analysis of thyroid nodules [166]. The higher discriminatory precision achieved in the present study was most probably due to the improved resolution of the US-images and to the non-linear nature of the highly sophisticated SVM algorithm employed.
- 118 -
Chapter 9
9.3.2
MLP Classification Outcome
Table 9.3 gives a detailed account of the MLP classification accuracies obtained by the LOO and re-substitution methods, employing the mean gray-level value and Run Length Non Uniformity features combination. Figure 9.2 shows a scatter diagram of the mean grey-level value against Run Length Non Uniformity and the decision boundary drawn by the MLP classifier. Table 9.3 Truth table of the MLP classifier employing the mean gray value Run Length Non Uniformity best feature combination. MLP classification Verified thyroid nodule classes Low risk High risk Overall accuracy + Leave-one-out method * Re-substitution method Low risk 73 (75) 1 (1) High risk 5 (3) 41 (41) LOO+ (resub.*) accuracy 93.6 (96.2) % 97.6 (97.6) % 95.0 (96.6) %
Figure 9.2 mean gray valu\e, and run length non-uniformity scatter diagram, displaying the low-risk and high-risk thyroid nodule data points and the MLP classifier decision boundary.
- 119 -
Both features employed by the MLP classifier represent the echogenicity of the thyroid nodule (mean grey value) and the possible existence of micro-calcifications or other structures (RLNU) within the nodules environment. 9.3.3 QLSMD & QB Classification Outcome
Regarding the classical QLSMD and QB classifiers, their classification precision dropped when they were employed as shown in Tables 9.4 and 9.5 respectively. Table 9.4 Truth table of the QLSMD classifier employing the mean gray value sum variance run length non-uniformity best feature combination Verified thyroid nodule classes Low risk High risk Overall accuracy + Leave-one-out method * Re-substitution method QLSMD classification Low risk High risk 74 (76) 4 (2) 5 (2) 37 (40) LOO+ (resub.*) accuracy 94.9 (97.4) % 88.1 (95.2) % 92.5 (96.7) %
Table 9.5 Truth table of the QB classifier employing the mean gray value sum variance run length non-uniformity best feature combination. QB classification Verified thyroid nodule classes Low risk High risk Overall accuracy + Leave-one-out method * Re-substitution method Low risk 73 (73) 4 (0) High risk 5 (5) 38 (42) LOO+ (resub.*) accuracy 93.6 (93.6) % 90.5 (100.0) % 92.5 (95.8) %
Table 9.4 is the truth table giving the classification performance of the QLSMD classifier using the best feature combination (mean gray-level value, sum variance, and run length nonuniformity). Seventy-four of the low-risk and 37 of the high-risk thyroid nodules were correctly classified using the LOO method, resulting in group classification accuracies of 94.9% and 88.1% respectively and overall precision of 92.5%. Similarly, Table 9.5 presents the results obtained by the QB classifier. Although the QB overall accuracy (92.5%) was similar to that obtained by the QLSMD classifier, the corresponding group accuracies differed to 93.6% and 90.5% for the lowrisk and high-risk thyroid nodules respectively. These differences are insignificant and may be attributed to differences in the nature of the algorithms. The third feature employed (run length non-uniformity) by MPL, QB and QLSMD classifiers signifies the existence of structures of
- 120 -
Chapter 9
different sizes within the thyroid nodule, which is related to the optical criteria employed by physicians in assessing the nodules risk factor for malignancy [17, 18, 25, 27]. Figures 9.3 and 9.4 show the 3-dimensional scatter diagrams of the mean gray-level value, sum variance, and run length non-uniformity, as well the decision boundaries drawn by the QB and QLSMD classifiers respectively.
Figure 9.3 Sum variance, mean gray value, and run length non-uniformity scatter diagram, displaying the low-risk and high-risk thyroid nodule data points and the QB classifier decision boundary.
- 121 -
Figure 9.4 Sum variance, mean gray value, and run length non-uniformity scatter diagram, displaying the low-risk and high-risk thyroid nodule data points and the QLSMD classifier decision boundary. Comparing the SVM with these two classical classifiers it is evident that the latter had to employ an extra feature to enhance their performance, however without reaching the SVMs precision. This is indicative of the effectiveness of the SVM. The penalty, however, that had to be paid for employing the SVM algorithm was much higher processing time during classifier design (training). We have tackled the above problem by suitably distributing computer processing to different workstations and by using the re-substitution method to find well-behaved feature combinations with high classification accuracies and small numbers of support vectors (i.e. leading to separable classes) prior to system evaluation by the LOO method. 9.3.4 SVM with Wavelet Kernels Classification Outcome
The wavelet kernel construction procedure is a multi-parametric approach comprised of the wavelet family selection, the number of vanishing moments for each wavelet and its corresponding scaling function, in addition with the number of the dyadic decomposition. In order to avoid the unconscionable processing time of the leave-one-out method for all possible
- 122 -
Chapter 9
combinations that would preclude any possible fine tuning of the above mentioned parameters, the wavelet kernel evaluation was constricted to the best fifty features combinations obtained by the SVM classifier with the 3rd degree polynomial kernel which in our study exhibited the highest classification accuracy. The wavelets families employed in the present study were the periodized orthogonal Daubechies, Coiflet and Symmlet. The aforementioned wavelets had their support in the interval [2, 10] whereas the dyadic decomposition ranged from 20 to 29. The fine tuning of the wavelet kernel parameters led to a significant number of different combinations. Best feature combination for all wavelet kernels was similar to that of the 3rd degree polynomial kernel (mean gray value & sum variance). Maximum classification accuracy was achieved for the Daubechies, Coiflet and Symmlet wavelet kernels with 3,8 and 6 vanishing moments and a 27, 27 and 29 decomposition scales respectively (Table 9.6). Tables 9.7, 9.8 and 9.9 give a detailed account of the SVM wavelet kernels classification accuracies obtained by the LOO and re-substitution methods, employing the mean gray-level value and sum variance features combination. Table 9.6 Classification accuracies for various SVM wavelet kernels using the leave-one-out and re-substitution methods, for the mean gray value sum variance best feature combination Classification accuracy SVM kernel LOO+ (%) Resub.* (%) Daubechies Wavelet Kernel 95.8 98.3 Coiflet Wavelet Kernel 97.5 100.0 Symmlet Wavelet Kernel 95.0 99.2 + Leave-one-out method * Re-substitution method ** Number of support vectors employed using the re-substitution method NSV** 9 10 10
Table 9.7 Truth table of the SVM classifier with the Daubechies wavelet kernel employing the mean gray value sum variance best feature combination Verified thyroid nodule classes Low risk High risk Overall accuracy + Leave-one-out method * Re-substitution method SVM classification Low risk High risk 77 (77) 1 (1) 4 (1) 38 (41) LOO+ (resub.*) accuracy 98.7 (98.7) % 90.5 (97.6) % 95.8 (98.3) %
Table 9.8 Truth table of the SVM classifier with the Coiflet wavelet kernel employing the mean gray value sum variance best feature combination QB classification Verified thyroid nodule classes Low risk High risk LOO+ (resub.*) accuracy
- 123 -
Low risk High risk Overall accuracy + Leave-one-out method * Re-substitution method
76 (78) 1 (0)
2 (0) 41 (42)
97.4 (100.0) % 97.6 (100.0) % 97.5 (100.0) %
Table 9.9 Truth table of the SVM classifier with the Symmlet wavelet kernel employing the mean gray value sum variance best feature combination QB classification Verified thyroid nodule classes Low risk High risk Overall accuracy + Leave-one-out method * Re-substitution method Low risk 77 (77) 5 (0) High risk 1 (1) 37 (42) LOO+ (resub.*) accuracy 98.7 (98.7) % 88.1 (100.0) % 95.0 (99.2) %
Figures 9.5, 9.6, 9.7 shows a scatter diagram of the mean gray-level value against sum variance, the class margins and the decision boundary drawn by the SVM classifier employing the Daubechies, Coiflet and Symmlet wavelet kernels respectively.
Figure 9.5 Sum variance versus mean gray value scatter diagram, displaying the low-risk and
- 124 -
Chapter 9
High-risk thyroid nodule data points, the SVM classifier margins, and the decision boundary employing the Daubechies wavelet kernel.
Figure 9.6 Sum variance versus mean gray value scatter diagram, displaying the low-risk and high risk thyroid nodule data points, the SVM classifier margins, and the decision boundary employing the Coiflet wavelet kernel.
- 125 -
Figure 9.7 Sum variance versus mean gray value scatter diagram, displaying the low-risk and high-risk thyroid nodule data points, the SVM classifier margins, and the decision boundary employing the Symmlet wavelet kernel. Wavelet kernels achieved analogous performance compared to standard kernels with substantially lesser number of support vectors (The number of support vectors among different wavelet kernel functions for the best feature combination ranged between 7.5% and 8.3 % of the number of training points.) thus improving their generalization ability. The processing time is even higher than that of the classical kernels due to wavelet construction procedure. Reproducing kernel Hilbert spaces framework is widely used in regularization theory, regression and function approximation [178184]. The fact that for every Hilbert space of functions exists a reproducing kernel function suggests some of the power and insight that the RKHS affords. The classification power of SVMs comes directly from the complexity of the underlying kernel. The wavelet kernels parameterization can be interpreted as flexibility against a particular dataset. Both Mercer condition and frameable RKHS allowed obtaining a definite positive function. However, it is obvious that conditions for having frameable RKHS is easier to verify than Mercer condition.
- 126 -
CHAPTER 10
Pattern Recognition Methods Employing Morphological and Wavelet Local Maxima Features towards Evaluation of Thyroid Nodules Malignancy Risk in Ultrasonography
Summary In this chapter, a new approach is presented towards thyroid nodules automatic classification in terms of potential malignancy growth. It employs morphological and wavelet-based features derived from each nodules ROI that was automatically extracted via a multi-scale hybrid model [195]. The chapter is organized as follows: In the material and methods section an extensive analysis is given, of the patients set and the feature extraction procedure derived from each ROIs local maxima edge map. In the results and discussion a parallel study is held with two well known pattern recognition algorithms (SVMs and PNNs). The design and implementation of both classifiers involved the local maxima directly derived from the multiscale edge representation [62] and those obtained from inter-scale regularity estimation for speckle suppression [196]. In all cases a thorough discussion is presented regarding the clinical importance of the morphological and wavelet-based features. 10.1 Materials and Methods 10.1.1 Patients The study comprised 85 patients; each one of them has undergone high-resolution US examination of the thyroid gland with an HDI-3000 ATL ultrasound system (Philips Ultrasound, Bothel, WA, USA) equipped with a wide band 5 12 MHz linear transducer. The data set was acquired in the time interval from February 2005 to August 2006 in the Medical Imaging Department Medical Center EUROMEDICA, Athens, Greece. All US examinations were performed with the SmallPartTest built-in protocol from Philips. The systems settings (Time Gain Compensation TGC, magnification, dynamic range, focal lengths etc) remained constant throughout the study period [194]. Each US image was digitized (768 x 576 x 8 bit) using a Screen Machine II frame grabber directly connected to the video output of the US system. In each - 127 -
Morphological and wavelet features towards thyroid nodule malignancy evaluation
patient with thyroid nodules sized above 1.0 cm, real time sonographically guided FNA (23gauge) biopsy was performed. Five or more passes were made through the nodule and smears were withdrawn by capillary action and placed in slides. Of the 85 cases evaluated by two observers, 54 were diagnosed as benign lesions (Low Risk 54 cases) and 31 as lesions with epithelial hyperplasia (High Risk 31 cases). Low risk thyroid patients with possible colloid nodules require long time intervals between consecutive examinations, whereas in high risk patients frequent US and cytology examinations are necessary (Figure 10.1).
Figure 10.1 US images representing various morphologic types of Low-Risk and High-Risk Thyroid nodules. From each US image, a set of morphological and wavelet local maxima features was extracted to encode the malignancy risk factor of thyroid nodules and classification was performed using the SVM and PNN classifiers. 10.1.2 Feature Extraction The boundary of each thyroid nodule was extracted through a hybrid multi-scale model that integrated in a cascade level at first a speckle reduction edge detection procedure that employed dyadic wavelet transform and local maxima regularity estimation. Consequently, a multi-scale structure model for boundary detection, and finally the Hough transform for the boundary nodule extraction [195]. The regularity detection has been employed in maxima chains that consist of local maxima that back-propagate towards the finer available scale. These maxima possess similar position and angle values in the inter-scale level. In maxima chains where the amplitude of the wavelet transform modulus maxima decreases along with the scale the Lipschitz regularity is positive. When the maxima amplitude increases while the scale decreases the Lipschitz
- 128 -
Chapter 10
regularity is negative. The maxima with positive Lipschitz exponents were considered as edges that correspond to important image features, whereas the maxima with negative Lipschitz exponents were classified as speckle [196]. The edge map utilized in the local maxima feature extraction was derived both prior and after the regularity estimation so as to evaluate the speckle effect in the classification accuracy. A set of 20 morphological and local maxima features was extracted (Table 10.1) from the segmented nodules describing each individual case patient. Morphological features describe the shape and size of each nodule (12 features) and comprised area, roundness concavity, fractal dimension etc [185,190,75]. Local maxima features derived from speckle and speckle-free edge map, encode information regarding the presence of micro-calcifications (MCs) and the variability of the echogenicity inside the nodules boundary (8 features). The classification procedure was performed independently in the speckle and speckle-free feature sets for both classifiers. Table 10.1 Morphological and Local Maxima Features of Thyroid Nodules Morphological Features a/a 1 2 3 4 5 6 7 8 9 10 11 12 Radius Radius Entropy Radius Standard Deviation Perimeter Area Circularity Smoothness Convex Hull Radius Concavity Number of Concave Points Symmetry Fractal Dimension a/a 1 2 3 4 5 6 7 8 Local Maxima (LM) Features First Order Histogram Mean Value Entropy Central Moment(3rd Degree) Kurtosis Skewness Variance Standard Deviation
10.1.3 Feature Selection and Classification The optimum goal in the design of a classifier with a given feature set, is the selection of the most important features with the minimum number of them without the loss of their discriminatory information. The feature selection procedure employed in the present study towards highest likelihood of prediction, was made by exhaustively combining all pre-selected features in any possible combination (2, 3, 4, feature combinations). The training of the classifiers is performed by means of the leave-one-out (LOO) method. In LOO method the training is performed in all but one feature vector. The depiction of each best features combination scatter diagram and the decision boundary for both classifiers is made with the Re-substitution (Re-Sub) method. Re-Sub method utilized the same feature vector at first for training and then for testing [83]. System
- 129 -
evaluation was performed by means of Receiver Operating Characteristics (ROC) curves analysis. ROC curve is a plot of the true positive rate (sensitivity) versus the false positive rate (1specificity) for different thresholds over the entire range of each classifier output values. In contrast with the classification accuracies obtained from truth tables, ROC analysis is independent of class distribution or error costs. The best feature combination was considered the subset of features that led to the highest area under the ROC curve value (AUC) with the LOO method. The AUC can be statistically interpreted as the probability of the classifier to correctly classify High from Low-Risk cases. In this study, the AUC is obtained by the binomial parametric method in order to approximate the area. This method computes the AUC by fitting two normal distributions to the data [197]. The hybrid multi-scale model and feature generation along with the design, training and testing of both classifiers were all implemented in Matlab 6.5. The ROC analysis was made with the NCSS, PASS and GESS software package. The computer used for processing had an AMD 64 Athlon processor running at 3.6 GHz and 1.00 GB of RAM.
- 130 -
Chapter 10
10.2 Results and Discussion Two pattern recognition models (SVM and PNN) have been developed and utilized several morphological and wavelet local maxima features for assessing the malignancy risk factor of thyroid nodules. The classification experiments utilized the SVM (with the polynomial up to 4th degree and RBF kernels) and PNN classifiers, in two independent feature sets. At first, the model evaluation has been made in the feature set that was extracted from the wavelet local maxima that corresponded only to significant structures. Afterwards the classifiers are also tested in the feature set that incorporates the local maxima classified as speckle. 10.2.1 SVM & PNN Model Evaluation Without the Presence of Speckle Table 10.2 reports the results of ROC analysis for both classifiers. AUC represents the probability that a random pair of High and Low-Risk thyroid nodules will be correctly classified. Sensitivity (SN) depends only on measurements of High-Risk nodules and specificity (SP) only on Low-Risk nodules. The likelihood ratio measures the power of each classifier for increasing certainty about a positive diagnosis. Table 10.2 ROC analysis results of Smoothness Symmetry Standard Deviation of Local Maxima feature combination for the SVM classifier and Concavity Fractal Dimension Standard Deviation of Local Maxima for the PNN classifier in the speckle-free thyroid nodules
Model SVM with Polynomial of 1st degree kernel SVM with Polynomial of 2nd degree kernel SVM with Polynomial of 3rd degree kernel SVM with Polynomial of 4th degree kernel SVM with RBF kernel PNN AUC (Lower Upper 95.0% Confidence Limit) 0.88 (0.69 0.96) 0.96 (0.84 0.99) 0.92 (0.78 0.97) 0.89 (0.69 0.97 ) 0.91 (0.79 0.96) 0.91 (0.85 0.95) Sensitivity (SN) 0.93 0.93 0.87 0.93 0.93 0.96 Specificity (SP) 0.90 0.98 0.93 0.93 0.96 0.94 Likelihood Ratio SN/(1-SP) 9.3 46.5 12.4 13.3 23.2 16 Number of Support Vectors 13 7 10 12 17
In the SVM model, highest classification accuracy with the minimum number of features (AUC 0.96) by the LOO method was achieved by the feature combination Smoothness Symmetry Standard Deviation of Local Maxima, employing the polynomial kernel of 2nd degree. This combination gave sensitivity and specificity values of 0.93 and 0.98 respectively and a likelihood ratio value of 46.5. In Figure 10.2(a), the binomial ROC curves of various kernels for the SVM model with the best feature combination are depicted. The maximum classification accuracy
- 131 -
coincides to the minimum number of support vectors involved in the design of the SVM classifier.
Figure 10.2 Receiving Operating Characteristics (ROC) curves by the binomial method of (a) SVM classifier with polynomial and RBF kernels employing the best feature combination and (b) SVM with the 2nd degree polynomial kernel against PNN classifier for their corresponding best feature combination in the speckle-free thyroid nodules The number of support vectors for the best feature combination ranged from 8% to 20% of the number of training points. Since the decision hyperplane of the SVM classifier is determined only - 132 -
Chapter 10
by the support vectors, their minimum number is indicative of the SVM low complexity and differentiation capability. In Figure 10.3, we present the scatter diagram of the best feature combination along with each class margins and the SVM (2nd degree polynomial) decision boundary.
Figure 10.3 Smoothness, Symmetry and Standard Deviation of Local Maxima scatter diagram, displaying the low-risk, high-risk and support vectors thyroid nodule data points along with the SVM classifier (2nd degree polynomial kernel) decision boundary. Regarding the PNN model, highest classification performance (AUC 0.91) is accomplished with the combination of Concativity Fractal Dimension Standard Deviation of Local Maxima features. The PNN classifier exhibited relatively high performance in distinguishing
- 133 -
Low from High-Risk thyroid nodules with 0.96 sensitivity and 0.94 specificity values giving a likelihood ratio of 16. In Figure10.2(b) the SVM (2nd degree polynomial kernel) and the PNN ROC curves by the binomial approach are illustrated. Regardless its smaller AUC value compared to SVM with 3rd degree and RBF kernels the PNN classifier provides a tighter confidence bound which in turn increases its power of separation. Figure 10.4 represents the scatter diagram of high and low Risk thyroid nodules points with the PNN decision boundary superimposed.
Figure 10.4 Concavity, Fractal Dimension and Standard Deviation of Local Maxima scatter diagram, displaying the low and high-risk thyroid nodule data points and the PNN decision boundary
- 134 -
Chapter 10
A comparison between the two areas under the ROC curves (AUCsvm AUCPNN) has also been made so as to estimate whether two classifiers have significant difference. A significance level of 5% (0.05) is pre-selected as a criterion for a statistical difference. The AUC difference between the two curves is equal to the criterion, hence the H0 is rejected, which suggests statistical equivalence between the two classifiers. Obviously this decision due to marginal equivalence is subjected to an error probability. Taking into account that the two classifiers are not statistically different we can safely assume that all five features that comprise both best features combinations exhibit high class separability power. The irregularity degree, the non-circular boundary and the presence of micro-calcifications have already been reported with relative high precision as suggestive of thyroid malignancy [4,6,10,11]. However, these observations luck of certain and explicit patterns that could guide an objective evaluation procedure. In this study the quantification of - until know - only observable shape and texture characteristics has led to an extensive feature generation that encoded any available information from each thyroid nodule. Both SVM and PNN classifiers have seized the complicated relationship between the shape and local maxima features towards an accurate differentiation between low and high-risk thyroid nodules. The presence of small concave regions (concativity) and the increased irregularity (smoothness) in the nodules boundary have been proved as significant characteristics that suggest potential thyroid malignancy. Similarly, the noticeable difference between the nodules width and length (symmetry) and the increased variability of MCs (LM Standard Deviation), suggest potential malignancy. On the contrary, the lack of concave points, the regular borderline, the round-like nodule shape and the undifferentiated aggregation of MCs coincide with benignancy. The fact that features such as perimeter and area, which are associated with the size of the nodule, exhibited poor differentiation performance is worth noticing. The wavelet transform can detect discontinuities in the gray-level map with high precision. Moreover, the computation and localization of local maxima from the wavelet coefficients discloses not only the presence of a sudden abnormality but also the extent of this alteration by employing its amplitude. The LM-based features introduced in this research can be considered as equivalent to textural features. The echogenicity degree of a nodule can be comparable to the number of LM inside the nodule. A small number of LM suggests the absence of structures within the nodule, which can be fairly interpreted as hypo-echogenicity and not as iso or hyperechogenicity with almost zero gray level variability. The ratio (first order histogram) between the number of local maxima and the total number of points within the nodule encodes such echogenicity information.
- 135 -
Furthermore, the differentiation degree of the LM amplitudes can be regarded as analogous to the massive or scattered presence of MCs within the nodule. High variability stands for presence of MCs scattered in rather hypoechogenic environment whereas low variability represents relatively aggregated MCs in the same environment. Besides the concentration amount of MCs acquired from the variability of LM amplitudes, additional information can be received in case of high values of standard deviation. In such case, the nodules environment cannot be presumed only as a whole hypo, iso or hyper echogenic but the co-existence of various echogenicity types within the nodule may also be concluded. An accessional advantage of the LM features is that both classifiers dont acquire additional processing time (for textural features computation) since the local maxima edge map is derived directly from the hybrid multi-scale model that segments the nodules. The only information related to LM that hasnt quantified in this study is the degree of scattering, in terms of position proximity, with respect to the nodules boundary. The integration of each local maximum orientation in a future study might provide more information regarding the MCs positions. 10.2.2 SVM & PNN Model Evaluation With the Presence of Speckle Table 10.3 provides the results of ROC analysis in the feature set wit speckle for both classifiers. For the SVM classifier, best feature combination (AUC 0.88) comprised the symmetry and standard deviation of radius of the thyroid nodules boundaries, employing the polynomial kernel of third degree. Table 10.3 ROC analysis results of Symmetry Standard Deviation of Radius feature combination for the SVM classifier and Concavity Entropy of Radius for the PNN classifier in the thyroid nodules with speckle.
Model AUC (Lower Upper 95.0% Confidence Limit) 0,83 (0,63 0,93) 0,86 (0,68 0,94) 0,88 (0,68 0,97) 0,78 (0,52 0,86) 0,79 (0,66 0,91) 0,86 (0,74 0,90) Sensitivity (SN) Specificity (SP) Likelihood Ratio SN/(1-SP) 7.4 8.3 23.2 4.6 5.6 7 Number of Support Vectors 11 11 10 13 13
SVM with Polynomial of 1st degree kernel SVM with Polynomial of 2nd degree kernel SVM with Polynomial of 3rd degree kernel SVM with Polynomial of 4th degree kernel SVM with RBF kernel PNN
0.74 0.74 0.93 0.70 0.74 0.84
0.90 0.91 0.96 0.85 0.87 0.88
- 136 -
Chapter 10
The analogy (higher AUC smaller number of support vectors) is also present in this particular feature set. The binomial ROC curves for the SVM model for various kernels are depicted in Figure10.5(a). The sensitivity and specificity values although reduced compared to the specklefree feature set remained high (0.93 and 0.96 respectively). However, the large confidence bound indicates less discriminative power.
Figure 10.5 Receiving Operating Characteristics (ROC) curves by the binomial method of (a) SVM classifier with polynomial and RBF kernels employing the best feature combination and (b)
- 137 -
SVM with the 3rd degree polynomial kernel against PNN classifier for their corresponding best feature combination in the feature set with speckle. Figure 10.6, describes the scatter diagram of the symmetry and standard deviation of radius for the SVM (3rd degree) classifier along with its decision boundary, with the Re-Sub method.
Figure 10.6 Symmetry and Standard Deviation of Radius scatter diagram, displaying the low-risk, high-risk and support vectors thyroid nodule data points along with the SVM classifier (3rd degree polynomial kernel) decision boundary
The PNN classifier employed as best feature combination (AUC 0.86) the concavity and radius entropy of each nodules boundary. Its performance is degraded in the speckled feature set achieving 0.84 and 0.88 sensitivity and specificity values. In Figure 10.5(b) the binomial ROC curves of the SVM (3rd degree) against PNN classifiers are shown. It is worth noticing that the PNN algorithm manages to retain the narrowest confidence interval compared to the SVM. Figure 10.7, describes the scatter diagram of concavity vs. radius entropy and the PNN decision boundary with the Re-Sub method.
- 138 -
Chapter 10
Figure 10.7. Concavity and Entropy of Radius scatter diagram, displaying the low and high risk thyroid nodule data points and the PNN decision boundary The impact, speckle had on the feature selection procedure towards the optimum combination with the higher discriminatory score was greater than expected in the beginning of this study. The classification power of both classifiers is reduced, whereas the utilization of features derived only from the morphological group is an issue that requires further investigation. The decrement of the feature number without a great reduction in overall precision can be considered as an important asset on the algorithm complexity and power of SVM and PNN models. Nevertheless, it cannot compensate for the removal of the important information the LM features provide (MC presence or not and aggregation or not, etc) in the evaluation procedure of the nodules malignancy risk factor. The literature is full with reports that acknowledge speckles deteriorating effect in the US image quality. On the contrary, two reports [112,113] have been integrated speckle phenomenon as a feature to improve classification of liver cirrhosis and breast cancer. All aforementioned reports considered speckle as an overall natural phenomenon and did not attempt to quantify and evaluate its influence. In this study, speckle has been correctly identified and localized in a
- 139 -
microscopic pixel by pixel level (local maxima with negative Lipschtiz exponents) and for the first time a parallel study has been held (with and without speckle) to evaluate its importance in more details. In the original US image speckle effect cannot be easily evaluated however, according to our study, in the LM feature level it had a negative influence. 10.3 Conclusion A comprehensive study has been made that aimed at the generation of several morphological and wavelet local maxima features and the design of two powerful classifiers (SVMs and PNN), in order to evaluate the malignancy risk factor in ultrasound thyroid nodules. Moreover, a study on the speckle effect in the classification procedure has been made. Various shape-based and LM features proved to differentiate suspicious from benign nodules, such as concave regions, borderline irregularity, shape asymmetry and presence of MCs. These features were well defined and could be integrated in the overall diagnostic procedure. Another important conclusion of this study was that speckle limited the performance of the classifiers due to subtraction of features associated with MCs during the classification procedure. The continuously growing amount of information, derived from the ultrasound image, has turned the decision whether or not the patient must undergo into FNA as a rather complex procedure. This fact constitutes pattern recognition algorithms as an essential auxiliary tool, in order to parameterize and quantify all available information.
- 140 -
CHAPTER 11
Conclusions and Future Perspectives
11.1 Conclusion As a final conclusion we can say that an extensive study regarding image processing and analysis methods has been made in thyroid ultrasonography. The study comprised from denoising and segmentation algorithms to pattern recognition approaches such as SVMs, PNN and other classical classifiers. Regarding the image processing techniques employed in this study, special emphasis has been made in wavelet transform theory, whereas in image analysis methods, the main classification models that have been designed and implemented were the SVMs and PNN. In more details, the speckle phenomenon that dominates ultrasound imaging has been suppressed by means of a wavelet-based speckle reduction algorithm. An inter-scale wavelet analysis has been made towards edge detection and isolation edges across scales. Consequently, singularity detection has been held in these edges in order to discriminate speckle from important image features within the ultrasonic image. The success of the proposed method has been proven with various indexes compared to several well-known speckle reduction algorithms. In addition a clinical study has proven that the proposed algorithm can enhance the overall diagnostic procedure. Besides the speckle-suppression algorithm, within the same wavelet framework a new segmentation hybrid algorithm has been introduced towards thyroid boundary detection. The proposed model combined the wavelet transform, an inter-scale model and the constrained Hough Transform to extract round-like objects from a rather noisy environment. The segmentation method may assist in the thyroid nodule categorization from the physician based in morphology characteristics. The classification between high and low risk thyroid nodules has been made with various pattern recognition algorithms that employed several textural, morphological and wavelet-based features. The primary model through this study was the SVM that was accompanied with other models for comparison reasons, such as the QB, LSMD, MLP, and PNN classifiers. Various textural features have been proved to discriminate with relatively high accuracy the thyroid nodules, such as the Sum Variance from the Co-Occurrence matrix, the gray-level mean value, the Run Length NonUniformity from the run length matrices along with various shape features (i.e. concativity,
- 141 -
Conclusions and Future Perspectives
roundness, fractal dimension etc). Moreover, a study on the speckle effect in the classification procedure has been made. Two independent studies have been made that employed wavelet-based features with and without the presence of speckle. The important conclusion of this study was that speckle reduced the performance of the classifiers due to subtraction of wavelet-based features associated with micro-calcifications during the classification procedure. 11.2 Future Work The advent of new methods and algorithms necessitate the need for future work. The study held in this thesis presents a number of points that require closer examination. The multiscale edge representation (MER) procedure employed in both the despeckling and segmentation procedures requires extra investigation. The employment of second derivative spline wavelets will alter not only the way local maxima are located but possibly the algorithms performance. Moreover, new and more efficient methods towards local maxima detection in the already implemented wavelet framework could enhance the performance of the proposed methods. An important perspective is the implementation of new approaches that could correlate the wavelet coefficients with various image structures beside the local maxima presentation. The proposed hybrid model is highly dependent on the local maxima detection and chaining in the first stage of the algorithm. The employment of additional characteristics, beside the abrupt changes in the gray level, such as the echogenicity difference between inside and outside the nodule and other features might improve the algorithms performance even in cases that it proved unable to process. The time performance for both algorithms is a critical issue. Due to MATLAB platform the processing time requires several seconds either to denoise or to segment the ultrasound images. Computational time can be decreased into sub-second time by exploiting the proposed algorithms with the C++ language platform. Regarding the pattern recognition methods employed in this study, emphasis must be given in increasing the patient dataset in order to improve the generalization. Moreover, new acquisition techniques such as DICOM archiving must be implemented so as to offer our clinical confirmed database into other research groups and vise-versa. The utilization of new and more efficient classifiers could improve the accuracy performance towards thyroid nodule malignancy risk factor assessment. The features served as input into all classifiers in this study has proven to posses high discriminatory attributes. However the generation of more features, especially from the wavelet framework, may enhance the evaluation procedure accuracy. Feature work could also involve the combination of texture, shape and wavelet-based characterization methods towards segmentation and classification purposes.
- 142 -
REFERENCES
1. 2. van Herle A.J, Pick P. Ljung B. M. E, Ashcraft M. W, Solomona D. H. and Keeler E.B. The thyroid nodule. Ann. Intern. Med. 1982; 96: 221-232. Daniels G.H. Physical examination of the thyroid gland, In: Braverman LE, Utiger RD, eds. Werner and Ingbar's The Thyroid: A Fundamental and Clinical Text. 6th ed. Philadelphia: JB Lippincott; 1991:572-7. 3. 4. 5. 6. 7. 8. 9. Perry Charles W, Phillips Bradley J. Quick Review: Thyroid Nodules, The Internet Journal of Surgery. 2003; 3 (1). Woeber K.A. The Year in Review: The Thyroid, Ann Intern Med, 1999; 131(12): 959 - 962. McCaffrey T, Evaluation of the Thyroid Nodule, Cancer Control 2000; 7: 223-228. Kirsten D, The thyroid gland: physiology and pathophysiology, Neonatal Netw. 2000 Dec;19(8):11-26 Mazzaferri EL, de los Santos ET, Rofagha-Keyhani S. Solitary thyroid nodule: diagnosis and management, Med Clin North Am. 1988; 72:1177-211 Loy T.J, Sundram F.X, Diagnostic management of solitary thyroid nodules. Ann Acad Med Singapore 1989; 6:658664 Ridgway E.C, Clinical review: Clinicians evaluation of a solitary thyroid nodule. J Clin Endocrinol Metab. 1992; 74:231-235. 10. Mazzaferri EL. Management of a solitary thyroid nodule. N Engl J Med. 1993; 328:553-559. 11. Ferrari C, Reschini E, Paracchi A, Treatment of the autonomous thyroid nodule: a review. Eur J Endocrinol 1996;135:383390 12. Kaplan M.M, Clinical Perspectives in the Diagnosis of Thyroid Disease Clin. Chem. 1999; 45(8): 1377 - 1383. 13. Ashcraft M.W, Van Herle AJ, Management of thyroid nodules. I. History and physical examination, blood tests, x-ray tests, and ultrasonography, Head Neck Surg 1981;3:216230 14. Blum M, Evaluation of Thyroid Function; Sonography, Computed Tomography and Magnetic Resonance Imaging, in Principles and Practice of Endocrinology and metabolism, 1990;289 - 293. 15. Christensen S.B, Bondeson L, Ericsson U.B, Lindholm K, Prediction of malignancy in the solitary thyroid nodule by physical examination, thyroid scan, fine-needle biopsy and serum thyroglobulin. A prospective study of 100 surgically treated patients, Acta Chir Scand 1984;150:433439
143
References
16. Cox M.R, Marshall S.G, Spence R.A, Solitary thyroid nodule: a prospective evaluation of nuclear scanning and ultrasonography, Br J Surg 1991;78:9093 17. Watters D.A, Ahuja A.T, Evans R.M, Chick W, King W.W, Metreweli C. and Li A.K, Role of ultrasound in the management of thyroid nodules, American Journal of Surgery 1992; 164; 654-657. 18. Gimondo P, Mirk P, Messina G, Pizzi G. and Tomei A, The role of ultrasonography in thyroid disease, Minerva Medica 1993; 84; 671-680. 19. Stanley F, AACE Clinical Practice Guidelines for the Diagnosis and Management of Thyroid Nodules, Endocrine Practice 1996; 2; 78-84. 20. Tomimori E.K, Camargo R.Y.A, Bisi H. and Medeiros-Neto G, Combined ultrasonographic and cytological studies in the diagnosis of thyroid nodules, Biochimie 1999; 81: 447-452. 21. Marquee E, Benson C.B, Frates M.C, Doubilet P.M, Larsen P.R, Cibas E.S. and Mandel S.J, Usefulness of Ultrasonogrpahy in the Management of Nodular Thyroid Disease, Annals of Internal Medicine 2000; 133: 696-700. 22. Koike E, Noguchi S, Yamashita H, Murakami T, Ohshima A, Kawamato H. and Yamashita H, Ultrasonographic characteristics of thyroid nodules: Prediction of Malignancy, Archives of Surgery 2001; 136; 334-337. 23. Gross J.L, Ultrasonography in Management of Nodular Thyroid Disease Ann Intern Med 2001; 135(5): 383 - 384. 24. Leinung M.C, Gianoukakis A, and Lee D.W, Ultrasonography in Management of Nodular Thyroid Disease Ann Intern Med 2001; 135(5): 383 - 383. 25. Papini E, Guglielmi R, Bianchini A, Crescenzi A, Taccogna S, Nardi F, Panunzi C, Rinaldi R, Toscano V. and Pacella C, Risk of malignancy in nonpalpable thyroid predictive value of ultrasound Endocrinology and Metabolism 2002; 87; 1941-1946. 26. Kim E.K, Park C.S, Chung W.Y, Oh K.K, Kim D.I, Lee J.T. and Yoo H.S, New Sonographic Criteria for Recommending Fine-Needle Aspiration Biopsy of Nonpalpable Solid Nodules of the Thyroid Am. J. Roentgenol. 2002; 178(3): 687 - 691. 27. Peccin S, de Castsro J.A. S, Furlanetto T.W, Furtado A.P.A, Brasil B.A and Czepielewski M.A, Ultrasonography: is it useful in the diagnosis of cancer in thyroid nodules? Journal of Endocrinological Investigation 2002; 25; 39-43. 28. Oommen R, Walter N.M, Tulasi N.R. Scintigraphic diagnosis of thyroid cancer. Correlation of thyroid scintigraphy and histopathology. Acta Radiol 1994;35:222225 nodules: and color-Doppler features, The Journal of Clinical
144
References
29. Okumura Y, Takeda Y, Sato S, Komatsu M, Nakagawa T, Akaki S, Kuroda M, Joja I, Hiraki Y, Comparison of differential diagnostic capabilities of scintigraphy and fine-needle aspiration of thyroid nodules. J Nucl Med 1999;40:19711977 30. Gharib H, Fine-needle aspiration biopsy of thyroid nodules: advantages, limitations, and effect, Mayo Clin Proc 1994; 69(1): 44-9 31. Verde G, Papini E, Pacella C.M, Gallotti C, Delpiano S, Strada S, Fabbrini R, Bizzarri G, Rinaldi R, Panunzi C, Ultrasound guided percutaneous ethanol injection in the treatment of cystic thyroid nodules. Clin Endocrinol 1994; 41:719724 32. Chen H, Zeiger M.A, Clark D.P. et al., Papillary carcinoma of the thyroid: can operative management be based solely on fine-needle aspiration? J Am Coll Surg. 1997; 184:605-610. 33. Shemen L.J, Chess Q. Fine-needle aspiration biopsy diagnosis of follicular variant of papillary thyroid cancer: therapeutic implications. Otolaryngol Head Neck Surg. 1998; 119:600-602. 34. Singer P.A, Evaluation and management of the solitary thyroid nodule, Otolaryngol Clin North Am 1996; 29(4): 577-91. 35. Yano, K, Morita, S, Furukawa, Y, et al., Diagnosis of malignant neoplasms with chloride. Jpn J Nucl Med 197815:989 36. Harness J.K, Thompson N.W, McLeod M.K, Eckhauser F.E, Lloyd R.V. Follicular carcinoma of the thyroid gland: trends and treatment. Surgery 1984; 96:972980. 37. Pasieka J.L, Zedenius J, Azuer G. et al., Addition of nuclear content to the AMES risk-group classification for papillary thyroid cancer, Surgery 1992; 112:1154. 38. Hay I.D, Bergstralh E.J, Goellner J.R. et al., Predicting outcome in papillary thyroid carcinoma. Surgery 1993; 114: 1050. 39. Hazard J, Hawk W, Crile G.J, Medullary (solid) carcinoma of the thyroid. A clinicopathologic entity, J Clin Endocrinol Metab 1959;19:152 40. Gagel, R, Robinson, M, Donovan, D, Alford, B. Medullary thyroid carcinoma. Recent progress. J Clin Endocrinol Metab 1993;76:.809-814 41. Souhami, L, Simpson,W, Carrothers, J., Malignant lymphoma of the thyroid gland. Int J Radiat Oncol Biol Phys 1980;6:1143 42. Leedman P, Sheridan W, Downey W, Fox R, Martin F., Combination chemotherapy as single modality therapy for stage IE and IIE thyroid lymphoma, Med J Australia 1990;152:40 43. Butler J, Brady L, Amendola B., Lymphoma of the thyroid. Report of five cases and review. Amer J Clin Oncol (CCT) 1990; 13:64
145
References
44. Harada T, Ito K, Shimaoka K, Hosoda Y, Yakumara K., Fatal thyroid carcinoma. Anaplastic transformation of adenocarcinoma, Cancer 1977;39:2588 45. de los Santos E.T, Keyhani-Rofagha S, Cunningham J.J, Mazzaferri E.L, Cystic thyroid nodules. The dilemma of malignant lesions. Arch Intern Med 1990; 150(7): 1422-1427 46. Tyler D.S, Winchester D.J, Caraway N.P, Hickey R.C, Evans D.B, Indeterminate fine-needle aspiration biopsy of the thyroid: identification of subgroups at high risk for invasive carcinoma, Surgery. 1994; 116(6):1054-60. 47. Fullerton G. and Zagzebski J., Medical Physics of CT and Ultrasound. AAPM monograph. (1989). 48. Krestel E. (Ed), Imaging Systems for Medical Diagnostics. Siemens Aktiengesellsschaft (1990). 49. Fish P. Physics and Instrumentation of Diagnostic Medical Ultrasound. J. Willey & Sons, (1990). 50. Wells P.N.T., Ultrasonic Colour Flow Imaging. Phys. Med. Biol. (1994);39 51. Beutel J, Kundel H.L, Van Metter R. (Eds), Handbook of Medical Imaging. SPIE Press (2001). 52. Gabor D., Theory of communication J.Inst. Elect. Eng (London), 1946; 93(III):429-457. 53. Grossmann and Morlet J., Decomposition of Hardy functions into square integrable wavelets of constant shape, SIAM J. Math., 1984; 15:723-736. 54. Daubechies I., Orthogonal bases of compactly supported wavelets, Comm. Pure Appl. Math., 1988; 41:909-996. 55. Daubechies I., The wavelet transform, time-frequency localization and signal analysis, IEEE Trans. Inform. Theory, 1990; IT-36:961-1005. 56. Daubechies I., Ten Lectures on Wavelets, Philadelphia: SIAM, (1992). 57. Mallat S.G., A theory for multiresolution signal decomposition: the wavelet representation, IEEE Trans. Pattern Anal. and Machine Intell., 1989;PAMI-11:674-693. 58. Mallat S.G., Multiresolution approximations and wavelet orthogonal bases of L2(R), Trans. Amer. Math. Soc., 1989; 315:69-87. 59. Mallat S.G., Multifrequency channel decomposition of images and wavelet models, IEEE Trans Acoust. Speech and Signal Proc., 1989; ASSP-37:2091-2110. 60. Mallat S., A wavelet tour of signal processing. New York: Academic Press, 1998:76. 61. Mallat S., A wavelet tour of signal processing. New York: Academic Press, 1998:151-158.
146
References
62. Mallat S.G. and Zhong S., Characterization of signals from multiscale edges, NYU Technical Report 1991; 592. Also in IEEE Trans. Pattern Anal. And Machine Intell., 1992;PAMI14:710-732. 63. Shensa M., The discrete wavelet transform: Wedding the `a trous and Mallat algorithms, IEEE Trans. Signal Proc., 1992; 40:246482. 64. Mallat S. G, Zero-crossings of a wavelet transform, IEEE Trans. Inform. Theory, 1991; 37:1019-1033. 65. Simoncelli E.P, Freeman W.T, Anderson E.H, and Heeger D.J., Shiftable multiscale transforms, IEEE Trans. Inform. Theory 1992; 38(2):587-607. 66. Zhong S., Edge representation from wavelet transform maxima, Ph.D. Dissertation, New York University, (1990). 67. Canny J., A computational approach to edge detection, IEEE Trans. Pattern Anal and Machine Intell., 1986; PAMI-8:679-698. 68. Mallat G. and Hwang W.L., Singularity detection and processing with wavelets, IEEE Trans. Inform. Theory, 1992; 38(2):617-643. 69. Jaffard S. and Meyer Y., Wavelet methods for pointwise regularity and Local Oscillations of Functions, Vol 123 American Methematical Society Providence RI (1996) 70. Mallat S., A wavelet tour of signal processing. New York: Academic Press, 1998:165-197. 71. Hwang W.L and Mallat S., Characterization of self-similar multifractals with wavelet maxima. J. of Appl. And Comput. Harmonic Analysis, 1994;1:316-328. 72. Theodoridis S. and Koutroubas K., Pattern Recognition: Academic Press, 1999. 73. Haralick R., Shanmugam K., and Dinstein I., Textural features for image classification, IEEE Transactions on Systems, Man and Cybernetics, 1973; 3:610-621. 74. Galloway M.M., Texture Analysis Using Grey Level Run Lengths, Computer Graphics and Image Processing, 1975; 4:172-179. 75. Street N., Cancer diagnosis and prognosis via linear programming based machine learning, P.hD, Madison: University of Wisconsin, 1994. 76. Keserci B, Yoshida H. Computerized detection of pulmonary nodules in chest radiographs based on morphological features and wavelet snake model. Med Image Anal. 2002; 6(4):43147. 77. Atallah J.R., On symmetry detection, IEEE Transaction on computers, 1985:663-666. 78. Keller Y., An algebraic approach to symmetry detection, Proc. of ICPR, Cambridge, 2004. 79. Mandelbrot B.B, The Fractal Geometry of Nature, Freeman New York, 1982. 80. Kenkel, N.C and D.J Walker, Fractals in the biological sciences Coenoses 1996; 11:77-100.
147
References
81. Fellow J, Duin R, and Mao J., Statistical pattern recognition: A review, IEEE Transactions on pattern analysis and machine intelligence, 2000; 22:4-37. 82. McLachlan G.J, Discriminant analysis and Statistical Pattern Recognition. New York: Wiley, 1992. 83. Theodoridis S. and Koutroubas K., Pattern Recognition, 2nd ed. Amsterdam; Boston: Academic Press, 2003. 84. Voijislav Kecman Learning Soft Computing, MIT Press London England, 2001. 85. Parzen E, On the estimation of a probability density function and the mode, Annals of Mathematical Statistics, 1962; 33:1962. 86. Specht D, Probabilistic neural networks, Neural Networks, 1990; 3:109-118. 87. Burges C, A tutorial on support vector machines for pattern recognition," Data Mining and Knowledge Discovery, 1998; 2:121-167. 88. Kechman V, Support Vector Machines, in Learning and Soft Computing: MIT, 2001, 89. Abd-Elmoniem Z, Youssef A.M, Kadah Y.M, Real-Time speckle reduction and coherence enhancement in ultrasound imaging via nonlinear anisotropic diffusion, IEEE Trans. Biomedical Eng., 2002; 49:9971014. 90. Thijssen J.M, Ultrasonic speckle formation, analysis and processing applied to tissue characterization, Patt. Recog. Lett. 2003; 24:659675. 91. Jain K, Fundamental of Digital Image Processing. Englewood Cliffs, NJ: Prentice- all, 1989. 92. Lee J, Digital image enhancement and noise filtering by use of local statistics, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1980;2(2):165-168. 93. Frost V, Stiles J, Shanmugan K, and Holtzaman J., A model for radar images and its application to adaptive digital filtering and multiplicative noise, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1982; 4(2):157-166. 94. Kuan D.T, Sawchuk A.A, Strand T.C, and Chavel P., Adaptive noise smoothing filter for images with signal-dependent noise, IEEE Transactions on Pattern Analysis and Machine Intelligence, 1987; PAMI-7(2):165-177. 95. Loupas T, Mcdicken W.N, and Allan P.L, An adaptive weighted median filter for speckle suppression in medical ultrasonic images, IEEE Trans. Circuits Syst. 1989; 36:129135. 96. Kofidis E, Theodoridis S, Kotropoulos C, Pitas I., Nonlinear adaptive filters for speckle suppression in ultrasonic image analysis, Signal Processing 1996; 52:367372. 97. Karaman M, Kutay M.A, Bozdagi G., An adaptive speckle suppression filter for medical ultrasonic imaging, IEEE Trans. Med. Imag. 1995; 14:283292.
148
References
98. Chen Y, Yin R, Flynn P, Broschat S., Aggressive region growing for speckle reduction in ultrasound images, Pattern Recognition Letters 2003; 24:677691. 99. Huang H, Chen J, Wang S and Chen C., Adaptive ultrasonic speckle reduction based on the slope facet model, Ultrasound in Med. & Biol., 2003; 29(8):1161-1175. 100. Xiao C.Y, Su Z, Chen Y., A diffusion stick method for speckle suppression in ultrasonic images, Pattern recognition letters, 2004; 25:1867-1877. 101. Zong X, Laine A.F, and Geiser E.A., Speckle reduction and contrast enhancement of echocardiograms via multiscale nonlinear processing, IEEE Trans. Med. Imag., 1998; 17:532540. 102. Hao X, Gao S, and Gao X., A novel multiscale nonlinear thresholding method for ultrasonic speckle suppressing, IEEE Trans. Med. Imag. 1999; 18:787794. 103. Achim A, Bezerianos A, and Tsakalides P., Novel Bayesian multiscale method for speckle removal in medical ultrasound images, IEEE Trans. Med. Imag. 2001; 20:772783. 104. Donoho D.L., Denoising by soft-thresholding, IEEE Trans. Inform. Theory, 1995; 41:613 627. 105. Sveinsson J.R. and Benediktsson J.A., Speckle reduction and enhancement of SAR images in the wavelet domain, in Proc. IGARSS, 1996. 106. Vidal-Pantaleoni, Marti D, and Ferrando M., An adaptive multiresolution method for speckle noise reduction in synthetic aperture radar images, in Proc. IGARSS, 1999. 107. Xie H, Pierce L.E, and Ulaby F.T, Statistical properties of logarithmically transformed speckle, IEEE Trans. Geosc. Remote Sensing, 2002; 40:721-727. 108. Foucher S, Bni G.B, and Boucher J.M, Multiscale MAP filtering of SAR images, IEEE Trans. Image Processing, 2001; 10:4960. 109. Argenti F. and Alparone L, Speckle removal from SAR images in the undecimated wavelet domain, IEEE Trans. Geosci. Remote Sensing, 2002; 40:23632374. 110. Dai M, Peng C, Chan K, and Loginov D., Bayesian Wavelet Shrinkage With Edge Detection for SAR Image Despeckling, IEEE Transactions On Geoscience and Remote Sensing, 2004; 42(8):16421648. 111. Burckhardt C.B., Speckle in Ultrasound B-Mode Scans, IEEE Trans. Sonic Ultrason., 1978; SU-25(1):16. 112. Yang P.M, Huang G.T, Lin J.T, et al. Ultrasonography in the diagnosis of benign diffuse parenchymal liver diseases: A prospective study. J Formosan Med Assoc 1988; 87:966 977.
149
References
113. Chang R.F, Wu W.J, Moon W.K, and Chen D.R., Improvement in breast tumor discrimination by support vector machines and speckle-emphasis texture analysis Ultrasound in Med. & Biol., 2003; 29(5):679686. 114. Ulaby F, and Dobson C., Handbook of Radar Scattering Statistics for Terrain. Norwood, MA: Artech House, 1989. 115. Hwang W, Chang F, Character extraction from documents using wavelet maxima, Image and Vision Computing, 1998; 16:307 315. 116. Gagnon L. and Jouan A., Speckle filtering of SAR imagesA comparative study between complex-wavelet based and standard filters, in SPIE Proc., 1997; 3169:8091. 117. Rigby A.S. Statistical methods in epidemiology. Towards an understanding of the kappa coefficient. Disability & Rehabilitation, 2000; 22(8):339344. 118. Kwoh C.K, Teo M.Y, Ng W.S, Tan S.N, Jones L.M., Outlining the prostate boundary using the harmonics method, Med Biol Eng Comput. 1998; 36(6); 768-71. 119. Aarnink R.G, de la Rosette J.M.C.H., Wouter F.J. Feitz, Frans M.J. Debruyne, Hessel Wijkstra, A preprocessing algorithm for edge detection with multiple scales of resolution, European Journal of Ultrasound 1997; 5:113:126. 120. Aarnink R.G, Pathak S.D, de la Rosette J.M.C.H., Frans M.J. Debruyne, Kim Y, Hessel Wijkstra, Edge detection in prostatic ultrasound images using integrated edge maps, Ultrasonics 1998; 36:635-642. 121. Sarty G.E, Liang W, Sonka M, and Pierson R.A., Semiautomated segmentation of ovarian follicular ultrasound images using a knowledge based algorithm Ultrasound in Med. & Biol. 1998; 24(1): 27-42. 122. Pathak S.D, Chalana V, Haynor D.R, Kim Y., Edge-Guided Boundary Delineation in Prostate Ultrasound Images. IEEE Transactions on Medical Imaging 2000; 19(12):12111219. 123. Yu Y, and Acton S.T., Edge detection in ultrasound imagery using the instantaneous coefficient of variation, IEEE Transactions on image processing, 2004; 13(12):1640-1655. 124. Richard W.D, and Keen-f C.G., Automated texture-based segmentation of ultrasound imaged of the prostate Computerized Medical Imaging and Graphics 1996; 20(3):131-140. 125. Zimmer U, Tepper R, and Akselrod A., A two-dimensional extension of minimum cross entropy thresholding for the segmentation of ultrasound images, Ultrasound in Medicine and Biology 1996; 22(9):1183-1190. 126. Potocnik B, and Zazula D, Automated analysis of a sequence of ovarian ultrasound images. Part I: segmentation of single 2D images, Image and vision computing 2002; 20:217-225.
150
References
127. Docur Z and OlmezT., Segmentation of ultrasound images by using a hybrid neaural network, Pattern recognition Letters 2002; 23:1825-1836. 128. Huang Y.L and Chen D.R., Watershed segmentation for breast tumor in 2-d sonogrpahy, Ultrasound in Medicine & Biology 2004; 30(5):625632. 129. Archip N, Rohling R, Cooperberg P, and Tahmasebpour H., Ultrasound image segmentation using spectral clustering, Ultrasound in Medicine and Biology, 2005; 30(11):14851497. 130. Strzelecki M, Materka A, Drozdz J, Krzeminska-Pakula M, Kasprzak J.D., Classification and segmentation of intracardiac masses in cardiac tumor echocardiograms, Computerized Medical Imaging and Graphics, 2006; 30:95107. 131. Lorenz A, Haas C, Ermert H., Segmentation of Ultrasonic Prostate Images using a probabilistic model based on Markov Random Processes Ultrasonic Imaging 1997; 19:44 45. 132. Pathak S.D, Chalana V, and Kim Y., Interactive automatic fetal head measurements from ultrasound images using multimedia computer technology, Ultrasound in Medicine and Biology 1997; 23(5):665-673. 133. Levienaise-Obadia B, and Gee A., Adaptive segmentation of ultrasound images, Image and Vision Computing 1999; 17:583588. 134. Ladak H.M, Mao F, Wang Y, Downey D.B, Steinman D.A, and Fenster A., Prostate segmentation from 2D ultrasound images, Med. Phys. 2000; 27:17771788. 135. Chen C.M, Lu H.H-S, and Lin Y-C., An early vision-based snake model for ultrasound image segmentation Ultrasound in Med. & Biol. 2000; 26(2):273285. 136. Chen C.M, Lu H.H, An adaptive snake model for ultrasound image segmentation: modified trimmed mean filter, ramp integration and adaptive weighting parameters. Ultrason Imaging. 2000; 22(4):214-36. 137. Wu R.Y, Ling K.V, and Ng W.S., Automatic prostate boundary recognition in sonographic images using feature model and genetic algorithm Journal of Ultrasound in Medicine 2000; 19(11):771-782. 138. Chen C-M, Lu H.H-S, and Hsiao A-T, A dual snake model of high penetrability for ultrasound image boundary extraction Ultrasound in Med. & Biol. 2001; 27(12):16511665. 139. Sandra M.G, Jardim V.B, and Figueiredo M.A.T., Segmentation of fetal ultrasound images, Ultrasound in Med. & Biol. 2005; 31(2):243250. 140. Cvancarova M., Albregtsen T. F, Brabrand K, Samset E, Segmentation of ultrasound images of liver tumors applying snake algorithms and GVF, International Congress Series, 2005; 1281:218223.
151
References
141. Rabhi A, Adel M, Bourennane S, Segmentation of ultrasound images using geodesic active contours, ITBM-RBM 2006; 27:818. 142. Laine A, and Zong X, Border identification of echocardiograms via multiscale edge detection and shape modelling, IEEE Int Conf Image Proc,(1996) 143. Liu Y.J, Ng W.S, Teo M.Y, Lim H.C., Computerised prostate boundary estimation of ultrasound images using radial bas-relief method, Med Biol Eng Comput; 1997; 35(5): 445454. 144. Lin N, Yu W, Duncan J.S., Combinative multi-scale level set framework for echocardiographic image segmentation, Medical Image Analysis, 2003; 7:529537. 145. Davignon F, Deprez J-F, Basset O., A parametric imaging approach for the segmentation of ultrasound data, Ultrasonics, 2005; 43(10):789-801. 146. Boukerroui D, Basset O, Guerin N, Baskurt A, Multiresolution texture based adaptive clustering algorithm for breast lesion segmentation European Journal of Ultrasound; 1998; 7:135144. 147. Chen C.M, Lu H.H.-S, Han K.C., A textural approach based on Gabor functions for texture edge detection in ultrasound images. Ultrasound in Medicine and Biology 2001; 27:515 534. 148. Mignotte M, and Meunier J, A multiscale optimization approach for the dynamic contourbased boundary detection issue, Computerized Medical Imaging and Graphics, 2001; 25: 265-275. 149. Boukerroui D, Baskurt A, Noble J.A, Basset O, Segmentation of ultrasound images multiresolution 2D and 3D algorithm based on global and local statistics, Pattern recognition letters 2003; 24:779-790. 150. Kotropoulos C, and Pitas I., Segmentation of ultrasonic images using support vector machines, Pattern recognition letters 2003; 24:715-727. 151. Chin B, Freeman G.H, Salama M.M.A and Fenster A., Prostate segmentation algorithm using dyadic wavelet transform and discrete dynamic contour Phys. Med. Biol. 2004; 49:49434960 152. Xie J, Jiang Y, and Tsui H-T., Segmentation of Kidney From Ultrasound Images Based on Texture and Shape Priors, IEEE Transactions on Medical Imaging, 2005; 24(1):45-57. 153. Aleman-Flores M, Aleman-Flores T.P, Alvarez-Leon L, Esteban-Sanchez M.B, FuentesPavon R, Santana-Montesdeoca J.M, Computerized ultrasound characterization of breast tumors, International Congress Series, 2005; 1281:1063 1068.
152
References
154. Betrouni N, Vermandel M, Pasquier D, Maouche S, Rousseau J., Segmentation of abdominal ultrasound images of the prostate using a priori information and an adapted noise filter, Computerized Medical Imaging and Graphics, 2005; 29:4351. 155. Dydenko I, Jamal F, Bernard O, Dhooge J., Magnin I. E, Friboulet D., A level set framework with a shape and motion prior for segmentation and region tracking in echocardiography, Medical Image Analysis, 2006; 10:162177. 156. Liu W, Zagzebski J.A, Varghese T, Dyer C. R, Techavipoo U, and Hall T. J., Segmentation of elastographic images using a coarse-to-fine active contour model, Ultrasound in Medicine & Biology, 2006; 32(3):397-408. 157. Martin-Fernandez M, Alberola-Lopez C., An approach for contour detection of human kidneys from ultrasound images using Markov random Fields and active contours, Medical Image Analysis 2005; 9:123. 158. Eslami A., Jahed M, and Naroienejad M., Fully Automated Cyst Segmentation in Ultrasound Images of Kidney, Proceedings in Biomedical Engineering 2005; (458). 159. Gong L, Pathak S.D, Haynor D.R, Cho P.S, Yongmin K., Parametric shape modelling using deformable superellipse segmentation, IEEE trans Medical Imaging 2004; 23(3):340-349.
160. Duda R.O. and Hart P.E., Use of the Hough Transformation To Detect Lines and Curves in
Pictures, Communications on the ACM, 1972; 15(1):11-15.

161. Ballard D. H., Generalizing the Hough transform to detect arbitrary shapes, Pattern
Recognition, 1981; 13(2):111-122. 162. V. Chalana, D.T. Linker, D.R. Haynor, Y. Kim, A multiple active contour model for cardiac boundary detection on 713 echocardiographic sequences, IEEE Trans. Med. Imag. 1996; 15 (3): 290298. 163. Rodenas J.A and Garello R., Internal wave detection and location in SAR images using Wavelet Transform, IEEE transactions on Geoscience and Remote sensing, 1998; 36(5):1494-1507. 164. Maroulis D.E, Savelonas M.A, Karkanis S.A, Iakovidis D.K, Dimitropoulos N., ComputerAided Thyroid Nodule Detection in Ultrasound Images, 18th IEEE Symposium on Computer-Based Medical Systems (CBMS'05) 2005:271-276. 165. Hirning T, Zuna I, Schlaps D, et al. Quantification and classification ofechographics findings in the thyroid gland by computerized Bmode texture analysis. Eur J Radiology 1989;9:244 247. 166. Mailloux G, Bertrand M, Stampfler R, Ethier S, Computer analysis of echographic textures in Hashimoto disease of the thyroid, J Clin Ultrasound 1986;14:521527.
153
References
167. Mller MJ, Lorenz D, Zuna I, Lorenz WJ and van Kaick G., The value of computer-assisted sonographic tissue characterization in focal lesions of the thyroid. Der Radiologe 1989;29:132136. 168. Mller K-R, Mika S, Ratsch G, Tsuda K and Schlkopf B., An Introduction to Kernel-Based Learning Algorithms, IEEE Transactions on Neural Networks 2001; 12:181-202. 169. Ahmed N, and Rao K.R., Orthogonal transforms for digital signal processing, SpringerVerlag, 1975, pp. 225-258. 170. Gonzalez R. C and Woods R. E, Digital Image Processing, Addison-Wesley, New York, 1992:588-590. 171. Rakotomamonjy A, Mary X and Canu S., Wavelet Kernel and RKHS, Proc. of Statistical Learning :Theory and Applications, Paris, 2002. 172. Canu S, Mary X and Rakotomamonjy A., Functional learning through kernel, Advances in Learning Theory: Methods, Models and Applications, NATO Science Series III: Computer and Systems Sciences, 2003; 190:89-110. 173. Aronszajn, N., Theory of reproducing kernels, Trans. Am. Math. Soc. 1950; 68: 337. 174. Duffin, R. & Schaeffer, A. A class of nonharmonic fourier series, Trans. Amer. Math. Soc. 1952; 72:341:366. 175. Mallat S., A wavelet tour of signal processing. New York: Academic Press, 1998:127. 176. Gyorgy I. Targonski. On Carleman integral operators. Proceedings of the American Mathematical Society, 1967; 18(3):450456. 177. Mercer J., Functions of positive and negative type and their connection with the theory of integral equations. Transactions of the London Philosophical Society A, 1909; 209:415446. 178. Girosi, F., Jones, M. & Poggio, T., Regularization theory and neural networks architectures, Neural Computation 1995; 7(2): 219. 179. Vapnik, V., Golowich, S. & Smola, A.. Support Vector Method for function estimation, Regression estimation and Signal processing, MIT Press, Cambridge, MA, (1997). 180. Girosi, F., An equivalence between sparse approximation and support vector machines, Neural Computation 1998; 10(6):1455:1480. 181. Evgeniou, T., Pontil, M. & Poggio, T., Regularization networks and support vector machines, Advances in Computational Mathematics 2000; 13(1): 1:50. 182. Smola, A., Learning with Kernels, PhD thesis, Published by: GMD, Birlinghoven, (1998). 183. Smola, A., Scholkopf, B. & Muller, K., The connection between regularization operators and support vector kernels, Neural Networks 1998; 11: 637:649.
154
References
184. Jaakkola, T. & Haussler, D., Probabilistic kernel regression models, Proceedings of the 1999 Conference on AI and Statistics, (1999). 185. Bruce LM. and Adhami RR, Classifying Mammographic Mass Shapes Using the Wavelet Transform Method IEEE Transactions on Medical Imaging 1999; 18(12): 1170-1177. 186. Frates MC, Benson CB, Charboneau JW, Cibas ES, Clark OH. et al. Management of Thyroid Nodules Detected at US: Society of Radiologists in Ultrasound Consensus Conference Statement. Radiology 2005; 237(3): 794 - 800. 187. Frates MC, Benson CB, Doubilet PM, Kunreuther E, Contreras M, Cibas ES, Orcutt J, Moore Jr FD, Larsen PR, Marqusee E. and Alexander EK, Prevalence and Distribution of Carcinoma in Patients with Solitary and Multiple Thyroid Nodules on Sonography, J. Clin. Endocrinol. Metab 2006; 91(9): 3411 - 3417. 188. Iannuccilli JD, Cronan JJ, Monchik JM, Risk for malignancy of thyroid nodules as assessed by sonographic criteria: the need for biopsy, J Ultrasound Med 2004;23 : 1455-1464 189. Kuntz KM. and Youngs DJ, Papillary Thyroid Cancer: A Classic Example, Journal of Diagnostic Medical Sonography 2005; 21(3): 262 - 266. 190. Liu JZ, Zhang LD, Yue GH, Fractal Dimension in Human Cerebellum Measured by Magnetic Resonance Imaging, Biophysical J 2003; 85:4041-4046 191. Masters T. Advanced Algorithms for Neural Networks. John Wiley & Sons, Academic Press, (1995). 192. Mihmanli I. and Kantarci F., Concurrent Routine Breast and Thyroid Sonography for Detection of Thyroid Tumors, Am. J. Roentgenol., October 1, 2006; 187(4): W448 - W448. 193. Shetty SK, Maher MM, Hahn PF, Halpern EF, and Aquino SL, Significance of incidental thyroid lesions detected on CT: correlation among CT, sonography, and pathology, Am. J. Roentgenol 2006; 187(5): 1349 - 1356. 194. Tsantis S, Cavouras D, Kalatzis I, Piliouras N., Dimitropoulos N, and Nikiforidis G., Development of a support vector machine-based image analysis system for assessing the thyroid nodule malignancy risk on ultrasound, Ultrasound in Medicine and Biology 2005; 31(11):14511459 195. Tsantis S, Dimitropoulos N, Cavouras D, and Nikiforidis G., A Hybrid Multi-Scale Model for Thyroid Nodule boundary detection on Ultrasound Images, Computer Methods and Programs in Biomedicine, 2006; 84(2-3):86-98. 196. Tsantis S, Dimitropoulos N, Ioannidou M, Cavouras D, and Nikiforidis G., Inter-Scale Wavelet Analysis for Speckle Reduction in Thyroid Ultrasound Images. Computerized Medical Imaging and Graphics 2007; 31(3):117-127.
155
References
197. Lasko T.A., Bhagwat J.G., Zou K.H., Ohno-Machado L The use of receiver operating characteristic curves in biomedical informatics Journal of Biomedical Informatics 2005; 38: 404415.
156
APPENDIX I
List of Figures
Figure 2.1 The thyroid gland.. Figure 2.2 Schematic representation of Hypothalamic - Pituitary - Thyroid axis. Figure 2.3 Ultrasonographic examination in the transverse plane of the thyroid containing a solid nodule in the right lobe and a homogeneous appearance on the left lobe... Figure 2.4 Scintiscans of thyroid. (a)The scan on the left is normal. (b) A typical scan of a "cold" thyroid nodule failing to accumulate iodide isotope is shown on the right. Figure 2.5 Classification of thyroid solitary nodules. Figure 2.6 (a) Thyroid nodule with epithelial hyperplasia, (b) Colloid nodule. Figure 3.1 HDI-3000-ATL digital ultrasound system Figure 3.2 US image of the thyroid gland with a cystic nodule. Figure 3.3. Image processing system for acquisition and storage of US images... Figure 4.1 One-dimensional three level redundant discrete dyadic wavelet transform... Figure 4.2 (a) A cubic spline function and (b) a wavelet that is a quadratic spline of compact support.. Figure 4.3 Two dimensional three level redundant dyadic wavelet transform Figure 4.4 Redudant dyadic wavelet transform of the Circle image. Figure 4.5 The gradient magnitude, the gradient directions and the local maxima of the Circle image Figure 5.1 (a),(b),(c),(d) Wavelet transform of f(t) calculated with quadratic spline wavelet =- where is the cubic spline smoothing function approximating the Gaussian. The red stars are the local maxima of the wavelet coefficients along each scale. The scale increases from top to bottom. (e) Maxima line in the scale-space plane inside the cone of influence. Figure 5.2 Wavelet transform of f(t) calculated with quadratic spline wavelet =- where is the cubic spline smoothing function approximating the Gaussian. The red stars are the local maxima of the wavelet coefficients along each scale. The scale increases from top to bottom. Figure 5.3 The full line gives the decay of log 2 Wf (u , s ) from Figure (5.2) as a function of log2s along the maxima line that converges to the abscissa t=11. The dashed line gives log 2 Wf (u, s ) along the maxima line that converges at t=168.. Figure 6.1 (a) Image array with four grey levels. (b) General form of any grey-tone cooccurrence matrix. (c)-(f) Computation of all four co-occurrence matrices with distance d=1.. Figure 6.2 (a) Image array with four grey levels. (b)-(e) Computation of all four run length matrices for texture analysis. 41 9 10 12 12 13 14 17 17 19 25 27 29 31 33
38
40
45 48
- 157 -
List of Figures
Figure 6.3 Line segments used to compute radius. Figure 6.4 Line segments used to compute Smoothness Figure 6.5 Convex hull is used to compute concavity and concave points Figure 6.6 Line segments used to compute Symmetry. The lengths of perpendicular segments on the right of the major axis are compared to those on the left. Figure 6.7 Fractal dimension estimation. N is the number of covering boxes and s is the number of rules or the size (perimeter) of each box. Figure 6.8 Schematic diagram of the multilayer perceptron neural network employed, with two input features, two classes, two hidden layers and four nodes in each hidden layer. Figure 6.9 Schematic diagram of the probabilistic neural network employed, with two input features and two classes. Figure 7.1 Block Diagram of the proposed wavelet based algorithm for speckle suppression.. Figure 7.2 At the top is the original US image. The two columns show respectively the horizontal and vertical wavelet transform W21j f ( x, y )1 j 3 , W22j f ( x, y )1 j 3 along three
50 51 52 52 53
56 57 65
dyadic scales. The scale increases from top to bottom... Figure 7.3 At the top is the original US image. The first column displays the modulus images M 2 j f ( x, y ) . High intensity values correspond to black pixels whereas low intensity values to white pixels for optimized visual interpretation of the results. At the second column the angle images A2 j f ( x, y ) are shown. The angle value turns from 0 (white) to 2 (black) along the circle contour. At the last column, the image points where M 2 j f ( x, y ) has local maxima in the direction indicated by A2 j f ( x, y ) are presented (black pixels). Each time such a point is detected, the position of the resultant local maxima is recorded as well together with the values of the modulus M 2 j f ( x, y ) and angle A2 j f ( x, y ) at the corresponding locations... Figure 7.4 Inter-scale back-propagation maxima connectivity in wavelet space... Figure 7.5 At the first column are the wavelet transform modulus maxima (nonpropagating maxima and propagating maxima with negative Lipschitz exponents) classified as speckle. In the second are the propagating maxima with positive Lipschitz exponents classified as important edges. Figure 7.6 (a) US image of the thyroid gland. (b) ASSF method, (c) wavelet shrinkage with soft thresholding, (d) wavelet shrinkage with hard thresholding, (e) wavelet interscale analysis denoising.. Figure 7.7 Locally selected area containing the thyroid nodule Box A for calculation. Locally selected area corresponding to homogenous tissue Box B for S/mse calculation... Figure 7.8 The scan line including the borders (A&B) of the thyroid nodule... Figure 7.9 Scan profiles of US thyroid image. High intensity line corresponds to the denoised scan line whereas low intensity line to original image. (a) ASSF method, (b) Soft thresholding, (c) hard thresholding, (d) wavelet inter-scale analysis denoising. Figure 7.10 (a) Original US image, (b) De-speckled US image
68
70 73
74
77 78 78
79 80
- 158 -
Appendix I
Figure 7.11 (a) Original US image, (b) De-speckled US image Figure 7.12 (a) Original US image, (b) De-speckled US image Figure 7.13 Microsoft access interface employing the questionnaire for US images denoising evaluation Figure 8.1 Schematic representation of the segmentation algorithm. Figure 8.2 Local maxima linking procedure. Adjacent local maxima form a maxima chain C 2 j ,k due to positional proximity combined with amplitude and angle values similarity.. Figure 8.3 (a) Synthetic Image, (b) Hough parameter space via projecting circles with variable radius. High amplitude values (candidate center points) correspond to white pixels, (c) Accumulator array, (d) Segmented image.. Figure 8.4 (a) US image with an iso-echoic thyroid nodule, (b) Contour representation, (c) Constrained Hough transform, (d) Accumulator array, (e) Outcome of the hybrid model, (f) Manually delineated boundary... Figure 8.5 (a) US image with a hypo-echoic thyroid nodule, (b) Contour representation, (c) Constrained Hough transform, (d) Accumulator array, (e) Outcome of the hybrid model, (f) Manually delineated boundary... Figure 9.1 Sum variance versus mean gray value scatter diagram, displaying the lowrisk and high-risk thyroid nodule data points, the SVM classifier margins, and the decision boundary employing the polynomial kernel of 3rd degree.. Figure 9.2 mean gray value, and run length non-uniformity scatter diagram, displaying the low-risk and high-risk thyroid nodule data points and the MLP classifier decision boundary... Figure 9.3 Sum variance, mean gray value, and run length non-uniformity scatter diagram, displaying the low-risk and high-risk thyroid nodule data points and the QB classifier decision boundary Figure 9.4 Sum variance, mean gray value, and run length non-uniformity scatter diagram, displaying the low-risk and high-risk thyroid nodule data points and the QLSMD classifier decision boundary Figure 9.5 Sum variance versus mean gray value scatter diagram, displaying the lowrisk and high risk thyroid nodule data points, the SVM classifier margins, and the decision boundary employing the Daubechies wavelet kernel.. Figure 9.6 Sum variance versus mean gray value scatter diagram, displaying the lowrisk and high risk thyroid nodule data points, the SVM classifier margins, and the decision boundary employing the Coiflet wavelet kernel. Figure 9.7 Sum variance versus mean gray value scatter diagram, displaying the lowrisk and high-risk thyroid nodule data points, the SVM classifier margins, and the decision boundary employing the Symmlet wavelet kernel.. Figure 10.1 US images representing various morphologic types of Low-Risk and HighRisk Thryoid nodules. Figure 10.2 Receiving Operating Characteristics (ROC) curves by the binomial method of (a) SVM classifier with polynomial and RBF kernels employing the best feature combination and (b) SVM with the 2nd degree polynomial kernel against PNN classifier for their corresponding best feature combination in the speckle-free thyroid nodules.
80 80 81 93 97
101
104
105
118
119
121
122
124
125
126 128
132
- 159 -
List of Figures
Figure 10.3 Smoothness, Symmetry and Standard Deviation of Local Maxima scatter diagram, displaying the low-risk, high-risk and support vectors thyroid nodule data points along with the SVM classifier (2nd degree polynomial kernel) decision boundary.. Figure 10.4 Concativity, Fractal Dimension and Standard Deviation of Local Maxima scatter diagram, displaying the low and high-risk thyroid nodule data points and the PNN decision boundary Figure 10.5 Receiving Operating Characteristics (ROC) curves by the binomial method of (a) SVM classifier with polynomial and RBF kernels employing the best feature combination and (b) SVM with the 3rd degree polynomial kernel against PNN classifier for their corresponding best feature combination in the feature set with speckle. Figure 10.6 Symmetry and Standard Deviation of Radius scatter diagram, displaying the low-risk, high-risk and support vectors thyroid nodule data points along with the SVM classifier (2nd degree polynomial kernel) decision boundary.. Figure 10.7 Concativity and Entropy of Radius scatter diagram, displaying the low and high risk thyroid nodule data points and the PNN decision boundary.
133
134
137
138 139
- 160 -
APPENDIX II
List of Tables
Table 7.1 Image quality measures obtained by four denoising methods tested on a digital ultrasound phantom image.. Table 7.2 Image quality measures obtained by four denoising methods tested on an ultrasound image of the thyroid gland Table 7.3 1 Observers evaluation of algorithm performance.. Table 7.4 2 Observers evaluation of algorithm performance. Table 7.5 Observers evaluation of algorithm performance Table 7.6 Agreement (kappa coefficient) between the two observers... Table 8.1 Percentage agreement between automatic (AU) and manual segmentations (OB1, OB2) in terms of Area, Roundness, Concavity and MAD%.......................................... Table 8.2 Mean values and standard deviation of the computed MADs for the pairs AU OB1 and AU OB2.. Table 8.3 Inter-observer agreement (kappa coefficient) between the two observers. Table 9.1. Classification accuracies for various SVM kernels using the leave-one-out and re-substitution methods, for the mean gray value sum variance best feature combination. Table 9.2. Truth table of the SVM classifier employing the 3 degree polynomial kernel, and the mean gray value sum variance best feature combination Table 9.3. Truth table of the MLP classifier employing he mean gray value Run Length Non Uniformity best feature combination Table 9.4. Truth table of the QLSMD classifier employing the mean gray value sum variance run length non-uniformity best feature combination... Table 9.5. Truth table of the QB classifier employing the mean gray value sum variance run length non-uniformity best feature combination... Table 9.6. Classification accuracies for various SVM wavelet kernels using the leave-oneout and re-substitution methods, for the mean gray value sum variance best feature combination. Table 9.7. Truth table of the SVM classifier with the Daubechies wavelet kernel employing the mean gray value sum variance best feature combination... Table 9.8. Truth table of the SVM classifier with the Coiflet wavelet kernel employing the mean gray value sum variance best feature combination... Table 9.9. Truth table of the SVM classifier with the Symmlet wavelet kernel employing
rd nd st
76 78 82 83 84 85 103 103 106
116 117 119 120 120
123 123 123
- 161 -
List of Tables
the mean gray value sum variance best feature combination. Table 10.1 Morphological and Local Maxima Features of Thyroid Nodules Table 10.2 ROC analysis results of Smoothness Symmetry Standard Deviation of Local Maxima feature combination for the SVM classifier and Concavity Fractal Dimension Standard Deviation of Local Maxima for the PNN classifier in the specklefree thyroid nodules. Table 10.3 ROC analysis results of Symmetry Standard Deviation of Radius feature combination for the SVM classifier and Concavity Entropy of Radius for the PNN classifier in the thyroid nodules with speckle.
124 129
131
136
- 162 -
APPENDIX III
List of Abbreviations
ASM: Angular Second Moment ASSF: Adaptive Speckle Suppression Filter B Mode: Brightness Modulation CV: Convex Hull CWT: Continuous Wavelet Transform D Mode: Doppler Modulation DWT: Dyadic Wavelet Transform FD: Fractal Dimension FNA: Fine Needle Aspirate GLNU: Gray Level NonUniformity GLRL: Gray Level Run Length HT: Hough Transform LM: Local Maxima LOO: Leave One Out LSMD: Least Square Minimum Distance LVQ: Learning Vector Quintizer M Mode: Motion Modulation MAD: Mean Absolute Distance MD: Minimum Distance MER: Multiscale Edge Representation MLP: Multilayer Perceptron Network MM: Modulus Maxima NN: Neural Network PDF: Probability Density Function PNN: Probabilistic Neural Network QB: Quadratic Bayesian QP: Quadratic Programming RBF: Radial Basis Function Network
- 163 -
List of Abbreviations
RBR: Radial Bas-Relief RDWT: Redudant Dyadic Wavelet Transform Re-Sub:Re Substitution RF: Radio Frequency RKHS: Reproducing Kernel Hilbert Space RLNU: Run Length Non Uniformity ROI: Region Of Interest RP: Run Percentage S/mse: signal-to-mean-square-error ratio SAR: Synthetic Aperture Radar SI: Speckle Index SRE: Sort Run Emphasis SRM: Structural Risk Minimization SV: Support Vector SVAR: Sum Variance SVM: Suppor Vector Machines TGC: Time Gain Compensation TRH: Thyrotropin Releasing Hormone TSH: Thyroid Stimulating Hormone US: ultrasonography WT: Wavelet Transform WTMM: Wavelet Trasnform Modulus Maxima
- 164 -
APPENDIX IV
Index of Terms
a trous 26,65,67,92 Absolute distance 101 Absorption 15 Adaptive Speckle Suppression Filter 61,75,76 Anaplastic 13,14 Angular second moment 46 Back Propagation 72,73,75,85,92,96,113 Bayes 55,56,113 Bayes - Decision Theory 55, 113 Bayes Rule 56 Bayesian classifier 55,109,113 Biorthogonal wavelet 67 Boundary detection 1,4,87,89,92,103 Brightness Modulation 16 Center of mass 51,100,101,132 Central moment 129 Chaining 72,98,107,108,142 Classification 13,43, 58,69, 95, 110,116,117,123 Classification error 58,111,129 Coherent 65 Coiflet 114,123,124,125 Colloid nodule 2,13,14,111,128 Cone of influence 37,38,39 Continuous Wavelet Transform 21,22,23 Contrast 3,46,64,81 Convex Hull 51,52,102,129,132 Convolution 21,23,28,32,67 Co-Occurrence 45,46,47,111,112,141 Covariance matrix 55,113 Cubic spline 27,38,40,67 Daubechies 63,75,114,123,124,125 Decision boundary 117,119,126,131,133,137,138 Decision rule 56 Derivative 26,27,30,32,36,38,67,88,98,142 Diameter 52,76,130 Discriminant function Bayesian 55 Discriminant function MLP 56 Discriminant function PNN 56 Discriminant function QLSMD 54 Doppler Modulation 16 DWT 7,21,23,24,25,28 31,32,62,67,69,75,90,94 Entropy 46,118,129 Epithelial hyperplasia1 4,111,128 Euclidean distance 50,52,54,101 Feature extraction 22,43,68,96,127,128 Feature selection 7,129,139 Feature vector 54,55,57,113,116,129 Fine needle aspiration 2,11,13,128,139 First order features 44 Fractal dimension 53,128,132,133 Frame grabber 18,19,94,111 Gaussian 27,32,38,39,40,65,67,98 Gland 9,10,11,16,61,64,67,76,78,81,85,109 Gradient Vector 32,33,65,69,91,94 Gray Level Non Uniformity 49 Hilbert space 109,114,115,126 Hough Transform 4,87,92,99,100,104,106,107 Hyperthyroidism 10,109 Hypothyroidism 10 Kernel Function 59,113,114,116,126 Kurtosis 45,129 Lagrangian 58 Lagrangian Dual 59 Lagrangian Primal 59 Learning Vector Quintizer 62 Least Square Minimum Distance 109,112,120141 Leave One Out 54,111,116,119,130,136 Linear transducer 16,17,64,94 Lipschitz exponent 3,35,41,64,71,73,92 Local Maxima 3,32,37,44,69, 95,100,107,128, 142 Long run emphasis 49 Mallat 3,24,25,26,27,29,36,62,72 Mean Absolute Distance 101 Medullary 13,14
- 165 -
Index of Terms
Mercer 59,115,126 Mercer Kernel 115 Minimum Distance 4,54,109,113 Modulus Maxima 3,32,35,39,64,69,73,92,95,128 Moments 36,122,123 Morphological features 4,12,127,128,129,139 (MLP) network 4,55,56,109,111,112,113,119 Multiresolution 2,21,90 Multiscale edge representation 32,94,95,96,142 Neural Network 4,55,56,57,88,113 Normalization 53,54 Pal (resolution) 19 Parzen 56 Pattern recognition 43,55,65,110,113,127,140 PNN 4,7,56,57,127,130,132,135,137,141 Probability Density Function 63 Quadratic Bayesian 4,109,113 Quadratic spline 4,27,38,40,98 Radial Basis Function 113,114 Re Substitution 116,117,119,129,123,129,136,138 Reflection 15,16,65 Refraction 15 Region Of Interest 87,88,89,91,99,107,111,127 Regularity 42,64,71,73,85,95,107,127 Reproducing kernel 114,115 Reproducing Kernel Hilbert Space 109,114,126
Roundness 51,101,102,103,106,128,142 Run length 48,111,112 Run Length Non Uniformity 49,120 Run Percentage 49 Scattering 15,16,135 Shift invariant 22,30,68 Signal-to-mean-square-error ratio 75 Singularity 35,36,38,41,65,75,85,95,106,141 Skewness 45,129 Short Run Emphasis 49 Speckle 1,3,4,61,62,65,71,81,85,107,128,136 Speckle Index 75 Sum Variance 47,112,116,118,124 Support Vector 59,113,117,126 Symmlet 114,123,124,126 Synthetic Aperture Radar 63,85 Thresholding 62,63,72,75,76,86 Thresholding Hard 62,75,76 Thresholding Soft 62,75,76,86 Thyroid Scan 12 Thyroid Stimulating Hormone 9 Thyrotropin Releasing Hormone 9 Thyroxine 9 Time Gain Compensation 6,127 Triiodothyronine 9
- 166 -

PHD Stavros Tsantis

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

PHD Stavros Tsantis

Caricato da

Copyright:

Formati disponibili

UNIVERSITY OF PATRAS SCHOOL OF HEALTH SCIENCES FACULTY OF MEDICINE SCHOOL OF NATURAL SCIENCES DEPARTMENT OF PHYSICS

INTERDEPARTMENTAL POSTGRADUATE PROGRAM IN MEDICAL PHYSICS

Patras, 2007, Hellas

(496 406 .) Inspiration denies perception Sofoklis (496 406 b.C)

CHAPTER 2 Thyroid Gland

CHAPTER 3 Physics & Instrumentation of Ultrasound

CHAPTER 4 The Wavelet Transform

CHAPTER 5 Singularity Detection

CHAPTER 6 Pattern Recognition

CHAPTER 7 Wavelet-based speckle suppression in ultrasound images

CHAPTER 8 - Thyroid Nodule Boundary Detection in ultrasound images

94 95 96 97 98 99 100 100 101 106

CHAPTER 11 Conclusion and Future Work

, . , , , ., , . . , , . . , , . , B-Mode Doppler, . (1mm) [14-28]. , (Fine Needle Aspiration - FNA) [2933]. H , .

, , , [185-190]. , , . - . , : 1. (Wavelet Transform) . 2. .

1. Wavelet Transform (WT)

lipschitz speckle . 2. . , . : () 1 , () , () , () () . . 3. . , Hough . Hough .

, 90,14 89,33%. (91,83%) ( 2.1). 1.2 (AU) (OB1, OB2) , , MAD% .

inter-observer kappa 0,83. , . .

x , , xi i- , yi {-1,+1} , i, b (x,xi) . ( Mercer [177]) , , : :

Bayesian 92.5 + Leave-one-out * Re-substitution ** re-substitution

x , xi i , Nj j, , p . . 2.3.2.1 speckle SVM, (AUC 0,96) :

SVM 1 SVM 2 SVM 3 SVM 4 SVM RBF

2.3 ROC SVM PNN

SVM 1 SVM 2 SVM 3 SVM 4 SVM RBF SVM 1

SVM PNN. , ( , , .....) . , , , , . , .

1.6 Research Funding

Simple Cyst Inflammatory Focal Hemorrhage Indeterminate

Papillary Carcinoma Medullary Carcinoma Lymphoma Anaplastic Carcinoma

Follicular Cells (hyperplasia)

non-functioning follicular adenoma

Physics & Instrumentation of Ultrasound

Physics & Instrumentation of Ultrasound

Physics & Instrumentation of Ultrasound

The Wavelet Transform

it is ensured that the whole frequency axis is covered by dilations of ( ) by ( 2 j ) jZ so that

The Wavelet Transform

aggregation of (2 j ) and (2 j ) at scales 2 j larger than 1:

The reconstructive wavelet ( ) is such a function that ( ) ( ) is a positive, real and

C2>0 exist, such that ( ) satisfies:

() S2 f(x) (2) S3 f(x)

W3 f(x) G(4) (4)

The Wavelet Transform

H ( ) = ei / 2 (cos( / 2)) 2 +1 G ( ) = 4ei / 2 sin( / 2)

The Fourier transform of the smoothing function (x) is therefore:

The Wavelet Transform

4.4 Redundant Dyadic Wavelet Transform (2-D)

1 x y 1 1 x y 2 ( j , j ) and 2 j ( x, y ) = j 2 ( j , j ) . The dyadic wavelet j 2 2 2 2 2 2

transform of a 2-D function f ( x, y ) L2 ( R 2 ) has two components defined by:

Wf = { 21j f ( x, y ),W22j f ( x, y )}jZ W

and 1 (2 j x ,2 j y ) , 2 (2 j x ,2 j y ) are the Fourier transforms of the partial wavelet

1 2 where the partial reconstruction wavelets 2 j ( x, y ) and 2 j ( x, y ) satisfy the equation:

The scaling function is an aggregation of (2 j ) and (2 j ) at scale 2j greater than 1:

W31 f (x)(y) G(4x) W32 f G(4y) (4y)L(4x)

The Wavelet Transform

Figure 4.4 Redundant dyadic wavelet transform of the Circle image.

The Wavelet Transform

f ( x, y ) L2 ( R 2 ) is the set of functions (W21j f ( x, y ),W22j f ( x, y )) , which are respectively

The Wavelet Transform

As a consequence the following equation is obtained:

Moreover, has no more vanishing moments if and only if

(t )dt 0 . The decay of the