Sei sulla pagina 1di 5

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 31, NO.

2, FEBRUARY 1989

189

X l

Yl

Encoding of Images Based on a Lapped Orthogonal Transform


PHILIPPE M. CASSEREAU, DAVID H. STAELIN, AND GERHARD DE JAGER

4
\
\\

b-

Abstract-Unitary transform image coding has been successfully applied to image data compression. However, traditional block transform image coding systems generate artifacts near block boundaries which degrade low bit rate coded images. To reduce these artifacts a new class of unitary transformations, defined here as lapped orthogonal transforms (LOT), has been investigated. The basis functions upon which the signal is projected are overlapped for adjacent blocks. An example of a LOT optimized in terms of energy compaction was numerically derived using an augmented Lagrangian optimization algorithm. Using this LOT, intraframe coding experiments for 256 x 240 pixel images were performed at bit rates between 0.1 and 0.35 bits/pixel. Tbe LOT improved the coded image subjective quality over other transforms such as the discrete cosine transform (DCT) and the short-space Fourier transform (SSlT). The LOT was also used in interframe full-motion video coding experiments for head and shoulder sequences a t 28 and 56 kbits/s. Experiments designed to measure the subjective quality assessment showed that significant improvement resulted at low data rates and if no motion compensation were used. However, the improvement was no longer significant at 56 kbits/s with full motion compensation.

S -

x
Z

Y
~ Y n

a .

>s

n Z ~ X n -

Z ~ T n Z n zc>
0:

n ZC-Xn

Y C Z

i - n zc>;T

I. INTRODUCTION Transform coding is recognized as one of the most successful methods for digital image data compression. In transform coding systems the digital video signal is typically divided into blocks, perhaps containing 8 x 8 pixels, which are then subjected to an energy-preserving unitary transformation. The aim of the transformation is to convert statistically dependent picture elements (pixels) into a set of essentially independent transform coefficients, preferably packing most of the signal energy (or information) into a minimum number of coefficients. The resulting transform coefficients are quantized, coded, and transmitted. At the receiver the video signal is recovered by computing the inverse transformation after decoding the transmitted data [1]-[3]. The input signal F represents the digitized image which can be viewed as a matrix of size R x R where R is the resolution of the image. The representation of the video signal in the transform domain is the matrix Fl comprising R x R real transform coefficients. With a separable two-dimensional transformation, the matrix F, is derived as follows:

bC+eC2

Etij
i:XiEXnZE j = l

(12)
2

F I

= TFT

Applying the above to (1 l), we have b proof is completed.

+e+I

o and the

CONCLUDING REMARKS Using the algorithm in [ 2 ] , the complexity of the max-flow problem is O(M3). Since there are C time slots, each requiring solving a max-flow problem, the complexity of the overall problem is O(CM3). Solving the C individual max-flow problems separately is wasteful and further work on how to integrate them together will be worthwhile.
REFERENCES

where T indicates the transposed matrix. The R x R matrix T is unitary and represents the one-dimensional transform kernel. The rows of the transform matrix T a r e defined as the transform basis functions. In most transform coding systems, prior to transformation
Paper approved by the Editor for Image Processing of the IEEE Communications Society. Manuscript received March 13, 1987; revised March 21, 1988. This work was supported in part by the Defense Advanced Projects Research Agency under Contract MDA-903-84-K-0297; and in part by Rhodes University, The CSIR, and the Ernest Oppenheimer Memorial Trust, and the Center for Advanced Television Studies. P. M.Cassereau is with CSP Inc., Billerica, MA 02181. D. H. Staelin is with the Research Laboratory of Electronics, Massachusetts Institute of Technology, Cambridge, MA 02139. G. de Jager is with the Department of Electrical and Electronic Engineering, University of Cape Town, Rondebosch, 7700, South Africa. IEEE Log Number 8825339.

[l] C. H. Papadimitriou and K. Steiglitz, Combinatorial Optimization. Englewood Cliffs, NJ: Prentice-Hall, 1982. [2] V. M. Malhotra, M. P. Kumar, and S. N. Maheshwari, An O( 1 U I ) algorithm for finding maximum flows in networks, Inform. Proc. Letter, vol. 7, no. 6, pp. 135-150, Oct. 1978.

0090-6778/89/02OO-0189$01 .OO 0 1989 IEEE

190

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 31,NO. 2, FEBRUARY 1989

the video signal is subjected to a block segmentation in which the digitized image F of size R x R is divided into subimages or data blocks of size N x N pixels where R = mN so that m x m subimages result. The transformation is applied to each block independently. The block segmentation can be represented by the following block-diagonal structure of the onedimensional transform kernel T:

0 1
A A

T=[

0
L

'

A
4

where A , defined as the one-dimensional block transform kernel, is an N x N matrix. The matrix A is constrained to ensure that the matrix T is unitary. The block structure of T given by (2) produces the condition that A itself must be unitary. In other words, A is a smaller size transform matrix, and taking the transform of the image F is equivalent to taking a transform of size N x N for each block of F. The N * N , are the rows of A transform basis functions ai, i = 1

Fig. 1 . Four lowest order LOT basis functions for the case N

8, L = 8.

In the example given in Fig. 1, the block size N = 8 and the overlap L = 8. Consequently, a lapped orthogonal transform is defined by the following overlapping block structure of its one-dimensional transform kernel T:

(3)

L 4
Transform coding takes advantage of the correlation between adjacent pixels by reducing the redundancy. Because of block segmentation, statistical dependencies beyond the block boundaries are not taken into account. Additionally, at data 1 bit per pixel, the block boundaries may rates below become visible. The visibility of boundaries can be reduced by overlapping the blocks before transform coding [4]. The main disadvantage of this method lies in the redundancy of the image representation due to the overlapping blocks. Thus, the data compression ability of the system is significantly reduced. Based on a similar idea, first presented by Cassereau [SI, this paper introduces a new type of unitary transform which reduces the visibility of the blocks, but without any extraneous information or any penalty for redundancy.

T=

IAA

O l

11. DEFINITION AND EXAMPLE OF A LAPPED ORTHOGONAL TRANSFORM Suppose f is a column vector of size R representing a column of pixels extracted from the image F. The onedimensional transformation of f yields f', composed of coefficients derived as follows:

The one-dimensional block transform kernel A is an N X (N + L) matrix. With such a transformation, the transform process is no longer independent from block to block. Edge effects at the image boundary can be handled in various ad hoc ways that are not of interest here: let T be square. The matrix T given by ( 5 ) still has a block structure that may produce discontinuities. The value of L would generally be chosen so that N + L is a multiple of N and the number of new boundary locations is minimized, e.g., so that the new boundaries for alternate blocks coincide. Thus, L must be itself a multiple of N.Although any multiple is feasible, the LOT presented here has an overlap L equal to the block size N. Clearly, the block transform kernel A completely specifies the LOT. The matrix A must be derived so that the matrix T satisfies the orthogonality condition and is unitary

T'T= TT'= I .

(6)

f'=Tf.

(4)

For either lapped or nonlapped transform operators T, the transform signal f ' can be divided into m = R/Nnonoverlapping and independent blocks of N samples each. The block segmentation can be explained by the following property of T: the basis functions of T , upon which the projection off yields two adjacent transform data blocks off' ,are identical after an N sample linear shift, ignoring edge effects. This property is clearly illustrated by the diagonal block structure of T given by (2) for the nonoverlapped case. We define a lapped orthogonal transform (LOT) as a separable unitary transformation for which the basis functions corresponding to adjacent data blocks overlap in the image domain. Fig. 1 illustrates four such basis functions for the case N = 8. The overlap L of a LOT is defined as the number of overlapping samples between any basis function and adjacent blocks, as characterized above. Thus, the total number of nonzero coefficients of the basis functions is at most N + L.

This specifies all the constraints on the basis functions of A. With this condition satisfied, the transform process yields a nonredundant representation of the digitized image. The matrix A is composed of N row vectors of basis functions a;. The basis functions of A for L = N can be written u ; = ( x ; ,~
j )

i=l

(7)

where xi and yi are two row vectors of size N , respectively, representing the first and the last N elements of the basis function a;.To ensure that T is unitary, the row vectors of T must form an orthogonal set and be normalized. Given the block structure of T, the orthogonality condition yields the following set of constraints on the basis functions:
x;y!=O

i, j = l

x;x! +y ; yJ!= 0 J

N
e . .

(8)

i,j=1 i#j

(9)

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 31, NO. 2, FEBRUARY 1989

191

Normalization for each i requires


UiUf = 1 .

Examples of transforms satisfying these constraints can easily be found. In. DERIVATION OF A LOT OPTIMIZED IN TERMS OF ENERGY COMPACTION A transform coefficient c; o f f is obtained by projecting f onto one basis function ai of T. Since these basis functions have at most 2N nonzero coefficients, only 2N samples off have to be considered. That is,

In the formulation of the LOT, the problems are nonlinear because of the constraints (8)-(10). The augmented Lagrangian method that has been used to derive the LOT basis functions is a sequential multiplier method. For an optimization problem formulated as in (17), the augmented Lagrangian function +(x, A, a) is defined as

1 4(x, A, a ) = L ( x , A ) + - ac(x)c(x) 2
where L ( x , A) is the Lagrangian function

(18)

L ( x , A) = f ( x ) - Alc(x)

(19)

ci=aif.
Energy compaction results when each normalized basis function ai is chosen sequentially such that the variance (or expected energy) of ci is maximized, for i = 1 N . The variance of ciis given by

where A, represents the 2N x 2N covariance matrix off. To be able to derive a LOT optimized in terms of energy compaction for the L = N case, a model for the 2N x 2N covariance matrix Ay needs to be specified. Natural images are commonly approximated as first-order Markov processes [6]. The Markov model implies a covariance matrix of the form

The variance of each pixel of is constant because of the assumed image stationarity . The stationarity assumption is valid for natural images only because of the block segmentation [6]. The 2N x 2N matrix R is defined as

A is a vector of Lagrange multipliers, and U is a fixed parameter. The optimal solution (x*, A*) of (18) must be a stationary point of the Lagrangian. It can be shown that the value of A for which x * minimizes +(x, A, a) is A*. The exact description of the algorithm is given by Powell [7] and Bertsekas [8]. The augmented Lagrangian method was implemented on a digital computer to derive the LOT basis functions for a correlation factor p equal to 0.9 and for the two cases N = L = 16 and N = L = 8. This numerically derived LOT has been used for computer simulated intraframe and interframe coding experiments. Fig. 1 illustrates the four lowest order functions for N = 8 and L = 8. A more robust method for computing LOT basis functions has recently been presented by Malvar [9]; it is less likely to converge to local minima than is the present method.

IV. INTRAFRAME CODING EXPERIMENTS


To test the degree to which a LOT could improve coded images, intraframe coding experiments were performed using a monochrome digital image with resolution 256 X 256 pixels and 8 bits per pixel. This image has been transform coded with a block size equal to 16 using three different transforms: the discrete cosine transform (DCT), the lapped orthogonal transform (LOT), and the short-space Fourier transform (SSFT) [lo]. The DCT is a very popular transform in image coding and is described by Ahmed, Natarajan, and Rao [ 111. The LOT had an overlap equal to the block size 16 (the basis functions are therefore 32 samples long). The SSFT was introduced by Hinman, Bernstein, and Staelin [lo] as an alternative to the DCT to avoid blocking effects. It is a multidimensional extension of the short-time Fourier transform which was developed to provide local sliding-block spectral information for one-dimensional infinite-length signals, such as those occumng in speech. In the SSFT, a given finite image is first reflectively extended periodically to yield an infinite-length signal to which an infinite-length window is then applied. The SSFT is then obtained by taking the two-dimensional discrete Fourier transform of the resulting windowed signal. The applied window is located at the center of each block and extends over the entire signal. The SSFT is computed using all the image data, but still provides local spectral characteristics; the SSFT basis functions completely overlap. Adaptive block transform coding of the image was performed using the DCT, the LOT, and the SSFT. The quantizer used to code a coefficient of a block depends on the spectral energy distribution within the block. Depending on this distribution, the block is assigned to one category from four possibilities, before coding. These four possibilities correspond to blocks having structure that is mostly: vertical, horizontal, of low spatial frequency, or of isotropic high spatial frequency. Since each block in the transform domain reflects the image local structure, adaptive block transform coding can take into account the nonstationarity of the image. The variances of each transform coefficient were deter-

R ( k , l)=j?lk-l

k, 1 = 1

* . *

2N

(14)

where p is the correlation factor between adjacent pixels in both horizontal and vertical directions. For typical natural images, each pixel is strongly correlated with its neighbors (0.9 < p < 1) [6]. Using the Markov model, the energy compaction Ei of the basis function ai is Rayleighs quotient of the matrix R , and equals the ratio of transform coefficient variance to pel variance

This completely characterizes the LOT basis functions studied here. We have not proved that this sequential optimization yields the global optimum, or shown whether or not the resulting a; are eigenvectors. The algorithm we used to derive the LOT basis functions is a recursive procedure in which the a;vectors are computed 8 ) sequentially. For each vector, Ei is maximized subject to ( (10) where these constraints can only be imposed at each iteration with respect to the k < 2N basis functions first considered. Suppose the first k - 1 basis functions have been determined previously. The constraints that must be satisfied k are given by (8)-( 11). They can be by the kth basis function a written into a vector of constraints c(&). The feasibility condition for ak is
C(&)

= 0.

(16)

The objective function f(ak) to be minimized by a k is - E k . Consequently, the problem of finding the LOT is formulated as a sequence of N nonlinear optimization problems of the general standard form Minf(x) subject to c(x)=O.
X

(17)

192

IEEE TRANSACTIONS ON COMMUNICATIONS,

VOL. 31, N O . 2, FEBRUARY

1989

as illustrated in Fig. 1. By reducing boundary mismatches, the LOT image looks smoother than the DCT-coded image and exhibits less block noise. The LOT exhibits slightly more ringing than does the DCT. The ringing generated by the LOT spreads over about half a block only, whereas ringing by the SSFT spreads over the entire image.
V . INTERFRAME CODING EXPERIMENTS In order to evaluate the perceived quality of LOT-coded video images, interframe coding experiments were also performed. An original 10 s head-and-shoulders sequence of a 128 x 120 pixel image was transform coded using both the DCT and LOT, with and without motion-compensated interpolation, at data rates of 56 and 28 kbits/s [5]. The original sequence of 15 frames/s was subsampled at 7.5 frameds prior to coding. The decoded frame sequence was then noncasually interpolated in time to 15 frameds, with and without motion compensation. The technique used for motion estimation was that due to Hinman [13]. For purposes of the present experiment the details of the coding are less important than the perceived difference in quality between images coded using the DCT and those coded using the LOT. In these experiments, the data rate was kept constant at either 28 or 56 kbits/ s . As a result the number of blocks being updated and the number of bits used per block varied, depending on the motion in the image. These image sequences were used in subjective quality assessment tests laid out as three separate two-factors experiments A, B, and C, in which the two factors occur at two levels [14]. In each of the three experiments the transform (DCT or LOT) was taken as one factor. In experiment A , the data rate was used as the other factor (motion compensation being applied to both data rates). In experiment B, a data rate of 28 kbits/s with motion compensation was tested against a data rate of 56 kbits/s without motion compensation. In experiment C , the data rate was fixed at 56 kbits/s and coding with and without motion compensation was tested. A variation of the double stimulus technique [15] was used in which two sequences are presented in close succession, but each one is graded independently according to a continuous scale that varies from zero to one. The double stimulus technique is described by Allnatt 1151 and was also used by Redstall and White [16]. The grading is done by putting a mark on a continuous vertical line of unit length with markers at 0.2, 0.4, 0.6, and 0.8 to indicate boundaries between bad, poor, fair, good, and excellent. Each experiment was done as a four-way comparison in such a way that each cell of each experiment was graded twice by each experimenter, allowing each video sequence to be viewed immediately before or after the two others which differed from it in one factor only. Twelve subjects with good vision, mostly students, graded the sequences for subjective quality, yielding 24 samples in each cell. Each session started with experiment C followed by experiments A , B, and C again, thus allowing for any adaptation in the initial stages. The results of the assessment tests are presented in Table I. The entry in each cell represents the mean score with the standard deviation of the mean quoted as the uncertainty. Note that when a particular combination, such as DCT, 28 kbits/s with motion compensation, occurs twice (experiments A and B), the later one shows a higher quality rating. This gradual adaptation to poor quality pictures is a well-known phenomenon and was allowed for in the analysis. An analysis of variance of the results showed that the interaction in all cases was insignificant and consistent with the significance levels quoted below. Experiment A tested data rate against transform type. The improvement in quality at 56 kbits/s over the 28 kbit/s rate was at the 0.1 percent significance level, while the LOT improve-

Fig. 2. Test images 256 x 240 pixels. (a) Original, 8 bitdpixel. @)[At top right \-adaptive DCT coding at 0.1 bitslpixel and 23.8 dB SNR.i(c) Adaptive SSFT coding at 0.1 bitdpixel and 23.4 dB SNR. (d) Adaptlve LOT coding at 0.1 bitslpixel and 24.3 dB.

mined separately for each category by evaluating many transform blocks. Using these spectral variances, a bitallocation table was established for each category; these tables define the number of quantization levels for each coefficient. The DC term was uniformly quantized, and the other coefficients, which are approximately zero mean, were quantized using Gaussian-Max quantizers [ 121. The log-variance bit allocation algorithm was used to establish the bit patterns [13] because this had been shown to be appropriate for Max quantizers [6], [2]. Max quantizers minimize the mean-square reconstruction error for a fixed number of bits per coefficient. In Fig. 2 an original and three coded images are represented. With all three transforms, the image was adaptively coded at a bit rate of 0.1 bit per pixel following an identical categorization and quantization strategy. The further details of these strategies [5] are not critical here for they have little effect on the differences in performance of the various transforms. The relative qualities of the coded images FEwere measured using the normalized signal-to-noise ratio

These SNRs for the full head-and-shoulders images were 23.8, 23.4, and 24.3 dB for the DCT, SSFT, and LOT, respectively. Although other SNR definitions could be used, only the relative values are of interest here. With the DCT, mismatches between adjacent blocks are visible, especially in highly structured regions of the image. The SSFT-coded image shows considerable ringing effects, but is free of blocking effects because of the infinite overlap of the SSFT basis functions. Since the SSFT window covers the entire image, quantization noise generated in one part of the image spreads everywhere. Quantization noise is especially high around sharp edges because of the low-pass effect of transform coding. This results in ringing effects around the edges and an increased background noise level for the coded image. The LOT produces less significant mismatches between adjacent blocks than does the DCT, although block boundaries are still visible. These remaining effects are dominated by the boundary discontinuities of the low-order LOT basis functions

IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 31, NO. 2, FEBRUARY 1989 TABLE I


RESULTS OF SUBJECTIVE ASSESSMENT TESTS ON A CONTINUOUS SCALE BETWEEN ZERO AND ONE. THE UNCERTAINTIES QUOTED ARE THE

193
tn

no longer significantly improved by the LOT as compared the DCT.

ESTIMATED STANDARD DEVIATIONS OF THE MEANS BASED ON 12 SUBJECTS AND 24 SAMPLES


Motion Comp

Bit Rate (kbps) 28 56 28 56 56 56

Subjective Assessment
DCT

VII. ACKNOWLEEGMENTS We would like to thank H. Malvar for discussion and assistance. REFERENCES
A. K. Jain, Image data compression: A review, Proc. ZEEE, vol. 69, pp. 349-389, Mar. 1981. P. A. Wintz, Transform picture coding, Proc. ZEEE, vol. 60, pp. 802-820, July 1972. A. N. Netravali and J. 0. Limb, Picture coding: A review, Proc. ZEEE, vol. 68, pp. 336-406, Mar. 1980. D. E. Pearson and M. W. Whybray, Transform coding of images using interleaved blocks, ZEE Proc. Part F, vol. 131, pp. 466-472, Aug. 1984. P. M. Cassereau, A new class of optimal unitary transforms for image a s s .Inst. processing, S.M. thesis, Dep. Elec. Eng. and Comp. Sci., M of Tech., May 1985. W. K. Pratt, Digital Image Processing. New York: Wiley, 1978. M.J. D. Powell, Nonlinear Optimization. New York: Academic, 1981; and NATO Scientific Affairs Division, 1982. D. P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods, Computer Science and Applied Mathematics. New York Academic, 1982. H. S. Malvar, Optimal pre- and post-filtering in noisy sampled data systems, Ph.D. dissertation, Dep. Elec. Eng. and Comp. Sci., Mass. Inst. of Tech., Aug. 1986. B. L. Hinman, J. G. Bernstein, and D. H. Staelin, Short-space Fourier transform image processing, in Proc. ZEEE ZCASSP, San Diego, CA, 1984, pp. 4.8.1-4.8.4. N. Ahmed, T. Natarajan, and K. R. Rao, Discrete cosine transform, ZEEE Trans. Comput., vol. C-23, pp. 90-93, 1974. J. Max, Quantizing for minimum distortion, IRE Trans. Znform. Theory, vol. IT-6, pp. 7-12, 1960. W. C. Wong and R. Steele, Adaptive discrete cosine transformation of pictures using an energy distribution logarithmic model, Radio Electron. Eng., vol. 51, pp. 571-578, Nov. 1981. K. D. C. Stoodley, T. Lewis, and C. L. S. Stainton, Applied statistical techniques, Ellis Horwood Limited, Chichester, 1980. J. Allnatt, Transmitted-Picture Assessment. New York: WileyInterscience, 1983. M. W. Redstall and T. A. White, Subjective quality of a 70 mbit/s digital codec for colour television, ZEE Proc. Part F, vol. 130, pp. 477483, Oct. 1983.

LOT 0.28tO.02 0.49f0.02 0.38f0.03 0.36f0.03 0.4OtO.03 0.59f0.03

Yes Experiment A Yes Yes Experiment B


NO

0.21fO.02 0.44f0.02 0.30f0.02 0.27f0.02 0.33t0.02 0.57f0.03

No
Experiment C Yes

ment in quality over the DCT was at the 10 percent level, assuming insignificant interaction. Experiment B tested transform type against 28 kbits/s with motion-compensated interpolation, on the one hand, and 56 kbits/s without motion-compensated interpolation, on the other. The LOT improvement over the DCT was significant at the 1 percent level while the results for the two data rates were not significantly different. Experiment C tested motion compensation against transform type. At the 56 kbits/s bit rate the improvement due to motion-compensated interpolation was significant (0.1 percent) but there was no significant difference due to the transform type.
VI. CONCLUSIONS The intraframe coding experiments show that the LOT simultaneously reduces blocking noise below levels exhibited by the DCT, and reduces ringing effects below the levels exhibited by the SSFT. The interframe subjective assessments of the transform coded sequences show that quality improvements are most noticeable at low data rates and when no motion-compensated interpolation is applied. When the data rate is increased and when motion-compensated interpolation is applied, i.e., when block noise is not an issue, the quality of the resulting image is

Potrebbero piacerti anche