Sei sulla pagina 1di 4

MEDICAL IMAGE COMPRESSION USING PRINCIPAL COMPONENT

ANALYSIS

J.S. Taur C?. W. Tao

Department of Electrical Engineering Department of Electrical Engineering


National Chung-Hsing University, National I-Lan Institute of
Taichung, Taiwan Agriculture and Technology,
j st aur@dragon.nchu.edu.tw I-Lan, Taiwan

ABSTRACT be very careful when lossy compression is applied to medical


In this paper, we describe a coding scheme based on prin- images, since coding error might reduce the diagnostic val-
cipal component analysis t o compress medical images. The ues or even leads to incorrect diagnoses. Meanwhile, lossy
region of interest (tissue region) is first located. T h e back- compression provides an efficient way for remote diagnosis
ground area can then be coded as simple models. In this and storage of medical images. It is important that the
situation the compression ratio can be quite high. As for the compressed images preserve the details required for diagno-
region of interest, more sophisticated algorithm will be re- sis. Based on this argument, we must use as much a priori
quired to achieve high compression ratio and preserve neces- information as possible, for example, what kind of features
sary information for diagnosis a t the same time. We assume should be preserved and ,which part of the images are im-
that the medical images from the same modality will exhibit portant for diagnosis. In this case, we can use lower bit rate
similar statistics. This suggests that principal component for the irrelevant regions and higher bit rate .for the regions
analysis will be a good candidate for the block transform of interest. A good coding scheme will take advantage of
coding. And it will be unnecessary to store the principal the characteristics of a given imaging modality to achieve
eigenvectors for each images since this information can be high compression ratios.
calculated and stored in advance. Adaptive scheme will In this paper we describe a coding schleme based on
then be used to select proper basis for transform coding. principal component anal:ysis t o compress medical images.
Using this scheme, the peak signal to noise ratio can reach Usually only part of the image is important to the diagno-
49.81 dB with compression ratio 58.5. sis. Using the chest X-ray image as an example, only the
lung and the heart areas are important, and other regions
can be coded with much lower bit rate witshout decreas-
1. INTRODUCTION ing the diagnostic value of the image. The background
area can be coded as sirnple models. As for the region
Digital radiography is becoming more and more popular as
of interest, more sophisticated algorithm will be required
the technology advances. Several imaging modalities are
to achieve high compress:ion ratio and preserve necessary
producing purely digital output, for example computerized
tomography (CT), and magnetic resonance (MR). Digital information for diagnosis at the same time. We assume
that the images from the same modality will exhibit similar
images has several advantages over the conventional analog
images. It can preserve images over a significantly longer statistics. This suggests t:hat principal component analysis
will be a good candidate for the block transform coding.
period of time. With the digital images, lots of tools can
And it will be unnecessary to store the principal eigenvec-
be designed to increase the diagnostic power, e.g., image
tors for each blocks in the images. This information can
enhancement, computer-aided instruction, 3D view of the
be calculated and stored in advance. Other transform tech-
body structure, and preplanning tools. Although the per-
formances of the storage devices and the channel capacity niques, e.g. discrete cosin'e transform (DCT) , will also be
studied. Adaptive scheme will then be used to select proper
for communication increase a lot recently, the size of the
basis for transform coding;. The coefficients can be further
medical image can sometimes increase the complexity of
encoded through predictive coding scheme. In the following
the algorithms. The medical image may have the size of
sections, the proposed cod:ing algorithms will lbe introduced.
4kx3kx12 bits or 18M bytes per image. I t will degrade the
performance of certain applications such as remote diagno-
sis. In this context, medical image compression is still an 2. IMAGE COMPRESSION
important topic[], 21.
Lossy compression will result in loss of information and We would like to study a coding scheme as shown in Figure
this process can not be reversed. However, lossy compres- 1. In the experiments, the mammogram will be used to test
sion is still an interesting research area because of its rela- the coding scheme. For the mammogram images, a simple
tive high compression ratiosC3, 41. For many applications, thresholding technique will suffice t o segment the tissue and
the small amount of error caused by the lossy compression background regions. For more complex images, for example
is acceptable, for example, video conferencing. We have to chest X-ray image, artificial neural networks can be applied.

0-7803-3258-X/96/$5.00 0 1996 IEEE 903


the compression rates. If the same number of components
are used, the P C approach will yield better performance
but it will also require higher channel capacity. Note that
medical images from the same modality usually have similar
statistics. Therefore the subspace represented by principal
eigenvectors for the images will be similar. In this case, it
will be unnecessary to store the principal eigenvectors for
each images. Thus P C A can achieve a much higher perfor-
mance than D C T with the same compression ratio.
Sometimes physician would like to select an image to ex-
amine from a set of images of different view angles. It would
be efficient to transmit approximate images first. Once the
desired image is selected, the details of the image will then

T be sent. T h e idea of the progressive transmission can save


communication bandwidth and transmission time. More-
over, it can make the application more friendly. In the pro-
posed coding scheme, we can transmit the principal com-
ponents of each blocks to achieve progressive transmission.
Coding Procedure T h e images are partitioned into
blocks of the size 16 x 16. For each block, the mean and the
variance are computed. T h e tissue part of the mammogram
I usually has higher grey level. In the experiments, we use a
simple thresholding scheme to segment the background and
the tissue part. Each of the coarsely segmented regions ob-
tained from the previous step is fine-tuned. Small clusters
of background region within the surrounding regions are re-
moved and the boundaries of adjacent regions are smoothed
by morphological operations[6]. T h e tissue regions are di-
lated to provide safeguard. T h e boundary of regions can be
coded using chain code. In order to have better represen-
tation of the blocks using P C A , the tissue region is further
divided into three different type of regions according t o the
variance of the blocks. For each type of regions, the prin-
cipal component of the blocks are precomputed and stored
Figure 1: The flow chart of the proposed image coding using some of the mammograms in the database. Therefore,
scheme. the blocks are classified into four classes, i.e. the back-
ground, tissue with large variance, medium variance, and
small variance. One example of the mask for the classes is
Principal Component Analysis (PCA) P C A iden- shown in Figure 2.
tifies the most important features (subspaces) of a high- T h e background can be coded as zero or the mean of
dimensional statistical distribution in the sense that the each block which requires zero or one byte for each block re-
projection error onto those- feature subspaces is minimal. spectively. For the tissue part, each block has a prespecified
Alternatively, the PCA subspaces can be interpreted as the maximum error according to the class of the block. Then
maximizers of the projection variance of the stochastic sig- the blocks are projected to the space spanned by the princi-
nal. The optimal solution under both criteria is the sub- pal vectors. T h e number of components used is selected as
space spanned by the eigenvectors of the signal autocorrela- the smallest number that can satisfy the error requirement.
tion matrix associated with the largest eigenvalues (hence- Note that the sets of principal components are different for
forth called principal eigenvectors). In the experiments, a different classes.
neural network model (APEX)[5] is used to extract the mul- To further improve the accuracy, new components are
tiple principal component and the innovative components calculated from the coding error of the tissue region (the dif-
The APEX algorithm is parallelizable allowing the concur- ference between of the original image and the reconstructed
rent extraction of multiple principal components image in the tissue part). In this case, the new principal
The problem of image compression relies on finding an vectors are the same for all the blocks in the tissue region.
efficient representation through a mapping from a higher The new component vectors have to be made orthogonal
dimensional (input) space t o a lower dimensional (repre- to the subspace spanned by the original vectors used for
sentation) space The essential information of the input that block. The new component vectors and all the projec-
data is captured by minimizing (in some sense) the error of tion coefficients are quantized and stored. T h e compressed
reconstructing the image d a t a from their representation. I t images contain the following information: the mask to de-
has been recognized that the comparison between the PCA scribe the classes of each block, the number of components
and Discrete Cosine Transform (DCT) data compression needed t o achieve the error requirement, new component
techniques is the trade-off between the performance and vectors from the coding error, and all projection coefficients.

904
Scheme SNR SNR (tissue) Peak SNR CR
SVD 20.98 36.29 48.63 69.60
SVD2 21.01 37.48 49.81 58.50 I I
DCT 20.69 31.02 43.35 63.46

Table 1: In this table, the SNRs are measured in dB. And


CR stands for compression ratio. In this simulation, the
two-dimensional DCT uses almost the same number of com-
ponents as the SVD algorithm for each block. When com-
puting the compression ratio for D C T algorithm, each co-
efficient is counted as one byte although the coefficients are
not quantized. Therefore the performance of the D C T is
over-estimated. In SVD2 scheme eight new principal com-
ponents are added.

3. RESULTS

In the experiments, we use three mammogram images to


compute the eigenvectors. The performances of a typical
image with different coding schemes are shown in Table
1 In SVD2 scheme, eight new principal components are
computed. The original and reconstructed (using SVD2)
images are shown in Figure 3 and Figure 4 respectively.
We can see from these two images that the tissue part has
very small distorsion while the background region does not
keep the high frequency information (artificially created by
the digitizer.) Better results can be obtained with careful
quantization of coefficients and eigenvectors.

4. REFERENCES
[l] K.K. Chan,S. Lou, and H.K. Huang. Full-frame trans-
form compression of C T and MR images. Radzology,
171~847-851, 1992.
[a] E.A. Riskin, T. Lookabaugh, P.A. Chou, and R.M.
Gray. Variable rate vector quantization for medical im-
age compression. IEEE Trans. Med. Imag., Vol. 9:pp.
290-298, Sept. 1990. 1AA

[3] M. Antonini, M. Barlaud, P. Mathieu, and


I. Daubechies. Images coding using wavelet trans-
form. IEEE Transactzons on Image Processzng, (2),
April 1992.
[4] T . Senoo and B. Girod. Vector quantization for entropy
coding of image subbands. IEEE Transactzons on Image
Processzng, 1(4):526-533, Oct. 1992.
[5] S. Y. Kung, K. I. Diamantaras, and J. S.T a w . Adap- 120
tive principal component Extraction (APEX) and ap-
plications. IEEE Transactzons on Szgnal Processing,
42(5):1202-1217, 1994.
5 10 15 20 25 30
[6] R.C. Gonzalez and R.E. Woods. Dzgztal Image Process- Figure 2: T h e figure shows the mask for four different class
zng. Addison Wesley, 1992. regions. The brightest region represents the background
region. For the rest area, the regions with large, medium
and small variance are denoted with different brightness.
The region with lower variance are indicated with higher
brightness.

905
50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500
Figure 3: The original mammogram image. Figure 4: T h e reconstructed mammogram image.

906

Potrebbero piacerti anche