Sei sulla pagina 1di 8

JPEG2000 Image Adaptation for MPEG-21

Digital Items

Yiqun Hu, Liang-Tien Chia, and Deepu Rajan

Center for Multimedia and Network Technology


School of Computer Engineering
Nanyang Technological University, Singapore 639798
{p030070, asltchia, asdrajan}@ntu.edu.sg

Abstract. MPEG-21 user cases bring out a scenario of Universal


Multimedia Access which is becoming the reality: people use different
devices such as desktop PC, personal digital assistant as well as
smartphone to access multimedia information. Viewing images on
mobile devices is more and more popular than before. However, due to
the screen size limitation, the experience of viewing large image on small
screen devices is awkward. In this paper, an enhanced JPEG2000 image
adaptation system is proposed for MPEG-21 digital item adaptation.
The image is adapted considering both visual attentive region(s) of
image and terminal screen size. Through the subjective testing, the
system has been approved to be a solution of efficiently displaying large
images in different devices.

Keywords: MPEG-21 Digital Item Adaptation, JPEG2000, Image


Adaptation

1 Introduction
MPEG-21 multimedia framework aims to provide universal multimedia access
and experience for users with different devices. The most vital limitation of de-
vice terminal is the screen size. Viewing images especially large images on the
small device is awkward. Many inconvenient scrolling operations are required
when viewing the large images in its original size on small screens. On the con-
trary, if the image is directly down-scaled to the screen size, users can not see
them efficiently. The ideal solution is to make the best of screen by only crop-
ping the region which attracts human visual attention and fitting to the screen
size. New JPEG2000 image compression standard provides flexible scalability
for transmission as well as adaptation. With its scalability, different image re-
lated applications are becoming efficient such as scalable coding and progressive
transmission. MPEG-21 Standard Part 7 Digital Item Adaptation [1] describes
a standardized framework to adapt format-dependent and format-independent
multimedia resources according to terminal capability. Both of these two factors
motivate our work of JPEG2000 image adaptation using standard MPEG-21 dig-
ital item adaptation. In this paper, visual attention analysis is integrated into the

K. Aizawa, Y. Nakamura, and S. Satoh (Eds.): PCM 2004, LNCS 3331, pp. 470–477, 2004.

c Springer-Verlag Berlin Heidelberg 2004
JPEG2000 Image Adaptation for MPEG-21 Digital Items 471

MPEG-21 digital item adaptation framework for JPEG2000 image adaptation.


Using our image adaptation system, different devices will display different views
of the same JPEG200 image while reserving most attentive information. The
whole adaptation is performed on JPEG2000 bitstream level and transparent to
users.
The rest of this paper is organized as follows. We begin in Section 2 by briefly
review related work for visual attention and introduce the attentive region ex-
traction procedure. In Section 3, the JPEG2000 adaptation engine using the
enhanced visual attention model and MPEG21 digital item adaptation is intro-
duced. Experiment evaluations are given in Section 4 and we conclude our paper
in Section 5.

2 Visual Attention Model


General speaking, when human visually view an image, some particular re-
gions attract visual attention more than others. This visual attention mecha-
nism is useful in displaying large images on the devices with different screen
size. Through cropping and downsizing operation, a possible best solution which
both reduces the number of awkward scrolling operation and reserves important
information can be achieved with the attentive region information. In this sec-
tion, we discussed the techniques for attentive image region detection and our
attentive region extraction method.

2.1 Review of Visual Attention Model


Selective attention mechanism in human visual system has been studied and
applied in the literature of active vision. Designing computational model for
visual attention is the key challenge to simulate human vision. Several visual
attention models have been proposed from different assumptions of saliency. Itti
and Koch [2] [3] proposed a computational model to compute the saliency map
for images. They used contrast as the cue for visual attention. In their method,
pyramid technology was used to compute three feature maps for three low level
features: color, intensity and orientation. For each feature, saliency is measured
by the cross scale contrast in the neighboring location (1).
Ci,j = Fc (i, j)  Fs (i, j) (1)
Where Ci,j is the saliency at the location (i, j), Fx (i, j) is the low-level feature
(color, intensity and orientation) at the location (i, j) of the scale x. Finally
saliency map is generated by combining all feature maps. To shift among different
salient points, a ”inhibition of return” mechanism is performed iteratively.
Using the same cue of contrast, Ma et al. [4] provided another attention model
which only considered the color contrast. They divide the total image into small
perception units. Color contrast is calculated in each of the perception units
using (2). 
Ci,j = d(pi,j , q) (2)
q∈Θ
472 Y. Hu, L.-T. Chia, and D. Rajan

where Θ is the neighborhood of (i, j) whose size controls the sensitivity of perceive
field. pi,j and q denote the color features in the neighboring pixel and center pixel.
d is the gaussian distance of color in LUV space.
Another saliency measure is based on the information loss along fine-to-coarse
scale space which is proposed by Ferraro et al. [5]. They measured the saliency
using the density of entropy production (3) which is loss of information at a
given pixel for unit scale.
∇f (x, y, t) 2
σ=( ) (3)
f (x, y, t)
Where ∇ is gradient operator, (x, y) is spatial coordinates and t is scale param-
eter. Chen et al. [6] also proposed a semantic attention model combining visual
attention, face attention as well as text attention and different applications using
this attention model have also been proposed such as image navigation [7] and
thumbnail cropping [8].
In our system, our objective is to provide a general solution of efficient dis-
playing images on the devices with different screen size. Hence we currently use
Itti’s model [2] because of its generality.

2.2 Attentive Region Extraction


It is assumed that small object at the edges of an image is unlikely to be the
main attention region and the attention region closer to the center of the image
is perceptually more important in human vision. We assign a weight to each
pixel in the image. Without additive restriction, we assume the surface of the
weights of the image satisfies the gaussian distribution along both horizontal and
vertical directions ((4), (5)) and the total weight is the arithmetic mean of two
directions.
1 1 x − µx 2
N (µx , σx2 ) = √ exp[− ( ) ] (4)
2πσx 2 σx
1 1 y − µy 2
N (µy , σy2 ) = √ exp[− ( ) ] (5)
2πσy 2 σy
Both gaussian curves are centered at the center point of the image by setting µx
the half of the width (Width / 2) and µy the half of the height (Height / 2). The
σx and σy are fixed to 10 so that the gaussian curve is smooth, avoiding sharpness
which only considers the small center region of the image. These weights are used
to modify the saliency map as (6).

N (µx , σx2 ) + N (µy , σy2 )


S̄x,y = Sx,y ∗ ( ). (6)
2
S̄x,y is the weighted value of the saliency map at location (x,y). Weighting the
saliency map differently according to the position in the image, if there are tiny
attention points in the edges of the image, we will skip them and keep our focus
on the most important attention region. Our experiment result shows that this
simple factor has a good effect on noise reduction. The modified saliency map will
JPEG2000 Image Adaptation for MPEG-21 Digital Items 473

Fig. 1. Same image displayed on Desktop PC and PDA

now assign different value for each point according to their topology attention. In
our image adaptation model, a simple region growing algorithm whose similarity
threshold is defined as 30% of the gray level range in the saliency map is used to
generate the smallest bounding rectangle that includes the identified attention
area(s). Firstly, we take the pixels with maximum value (one or multiple) as
the seeds and execute the region growing algorithm. In each growing step, the
4-neighbour points are examined, if the difference between the point and the
current seed is smaller than a threshold (30% of the range of gray-level value),
the point will be added into the seed queue and will be grown later. The algorithm
will continue until the seed queue is empty. Finally, the output are one or several
separate regions and we generate a smallest rectangle to include these regions.

3 JPEG2000 Image Adaptation Engine

Different from current image engine, the proposed image adaptation server pro-
vides a transparent resolution adaptation for JPEG2000 images in the standard
way: MPEG-21 Digital Item Adaptation [9]. JPEG2000 image bitstream and its
Bitstream Syntax Description (BSD) [10] compose of digital item. BSD describes
the high-level structure of JPEG2000 bitstream and adaptation is performed on
the bitstream according to BSD. The adapted image is directly generated from
JPEG2000 bitstream according to both attentive region information and ter-
minal screen size. In our system, accessing image through different devices will
obtain different views of the original image each of which deliveries the best
experience using limited screen space. Figure 1 shows the view of accessing im-
age through desktop PC as well as the view through the PDA. We can see
only most attentive information is displayed on the small screen to avoid over
down-scaling the image or additional scrolling operations. Our standard im-
age adaptation engine automatically detects the visual attentive region(s) and
adapts JPEG2000 image in bitstream level using standard digital item adapta-
tion mechanism which differs itself from other similar wok. The advantage of
474 Y. Hu, L.-T. Chia, and D. Rajan

our intelligent resolution adaptation engine is to preserve, as much as possible,


the most attentive (important) information of the original image while satisfying
terminal screen constraints.
The engine utilizes the Structured Scalable Meta-formats (SSM) for Fully
Content Agnostic Adaptation [11] proposed as a MPEG-21 reference software
module by HP Research Labs. The SSM module adapts JPEG2000 images reso-
lution according to their ROIs and the terminal screen constraints of the viewers.
BSD description of the JPEG2000 image is generated by BSDL module [10]. The
attentive region is automatically detected using our enhanced visual attention
model and adaptation operation is dynamically decided by considering both
attentive region and terminal screen size constraint. We change the resolution
of JPEG2000 image by directly adapting JPEG2000 bitstream in compressed
domain. The whole adaptation procedure is described as follows. The BSD de-
scription and attentive region information are combined with image itself as a
digital item. When the user requests the image, its terminal constraint is sent
to server as a context description (XDI). Then combining XDI, BSD descrp-
tion and attentive region information, the Adaptation Decision-Taking Engine
decide on the adaptation process for the image [11]. Finally, the new adapted
image, its corresponding BSD description will be generated by the BSD Resource
Adaptation Engine [10]. Description can be updated to support multiple step
adaptation. A snapshot of BSD digital item adaptation is shown in Figure 2.

(a) (b)

Fig. 2. Example of Digital Item BSD Adaptation; (a) Adaptation Decision Description;
(b) JPEG2000 BSD Adaptation (Green - Original BSD, Blue - Adapted BSD).

The intelligent attentive region adaptation is decided according to the relation-


ship between image size (Isize ), attentive region size (ARsize ) and the terminal
screen size Csize .
– If Csize > Isize : No adaptation, the original image is sent to the user directly.
– If ARsize < Csize < Isize : Crop the attentive region according to the result
of visual attention analysis, removing non-attention areas.
JPEG2000 Image Adaptation for MPEG-21 Digital Items 475

(a) (b) (c) (d)

Fig. 3. Example of good intelligent adaptation; (a) Original Image; (b) Saliency Map;
(c) Adapted Image on PDA; (d) Directly down-scaling Image on PDA

– If Csize < ARsize : Crop the attentive region first and reduce the region
resolution to terminal screen size. (another adaptation can be performed by
the adaptation engine)

4 Experiment Evaluations
600 images were selected from different categories of the standard Corel Photo
Library as data set. The system was implemented as a web server application.
Users can view the image through desktop PC as well as PDA. Several output
examples of our intelligent visual attention based adaptation are shown in Fig-
ure 3 and Figure 4. Notice that the most interest information are reserved on
small screen to achieve possible better user experience. Compared with directly
downsizing image, it provides a better solution of viewing images on small de-
vices. Due to the subjectivity of visual attention perspective, we applied the
476 Y. Hu, L.-T. Chia, and D. Rajan

(a) (b) (c)

Fig. 4. Example of bad and failed intelligent adaptation; (a) Original Image; (b)
Saliency Map; (c) Cropped Image.

Table 1. User Study Evaluation - percentage of images in each category

Category Failed Bad Acceptable Medium Good


Animal 0.02 0.09 0.22 0.33 0.34
People 0.01 0.11 0.22 0.30 0.36
Scenery 0.03 0.13 0.22 0.40 0.22
Others 0.01 0.10 0.26 0.41 0.22
Average 0.017 0.108 0.23 0.38 0.29

user study experiment in [4] to test the effectiveness of the proposed algorithm.
8 human subjects were invited to assign a score to each output of our adaptation
for 4 different topics. The users were asked to grade the adapted images from 1
(failed) to 5 (good).
From the evaluation result shown in Table 1, we found that for different
categories of images, an average of close to 87% cases are acceptable including
67% are better than acceptable. Only 10% are bad and 1% are failed. Results
are bad mainly because not the whole visual object is included in the cropped
images (eg. the legs of a animal) and 1% failure rate is due to either wrong visual
object identified as the attention region or images like scenery shots where there
may not be specific visual objects. The framework works reasonably well for a
general set of natural images. Among the 8 testers, all of them agree that visual
attention based adaptation improves the experience of viewing images on small
devices.

5 Conclusion
In this paper, we design a JPEG2000 image adaptation engine for efficiently
displaying images on different devices. The engine intelligently analyzes visual
attentive region of images and provides different views of the image for different
devices which makes the best of terminal screen to provide most interest infor-
mation. The advantages of this engine over others is its capability of attentive
region automatic detection and because using standard MPEG-21 digital item
JPEG2000 Image Adaptation for MPEG-21 Digital Items 477

adaptation mechanism as well as JPEG2000 format, it is interoperable and ex-


tensible in future. Larger image test set and more extensive subjective test will
be done in future to validate its efficiency.

References
1. Vetro, A., Timmerer, C.: Iso/iec 21000-7 fcd - part 7: Digital item adaptation. In:
ISO/IEC JTC 1/SC 29/WG 11/N5845. (2003)
2. Itti, L., Koch, C., Niebur, E.: A model of saliency based visual attention for rapid
scene analysis. IEEE Tran on Pattern Analysis and Machine Intelligence 20 (1998)
3. Itti, L., Koch, C.: A comparison of feature combination strategies for saliency-based
visual attention systems. In: Proc. SPIE Human Vision and Electronic Imaging
IV (HVEI’99), San Jose, CA. Volume 3644. (1999) 473–482
4. Ma, Y., Zhang, H.: Contrast-based image attention analysis by using fuzzy growing.
In: Proc. ACM Multimedia, Berkeley, CA USA (2003)
5. Ferraro, M., Boccignone, G., Caelli, T.: On the representation of image structures
via scale space entropy conditions. IEEE Tran on Pattern Analysis and Machine
Intelligence 21 (1999)
6. Chen, L., Xie, X., Fan, X., Ma, W., Zhang, H., Zhou, H.: A visual attention model
for adapting images on small displays. ACM Multimedia Systems Journal (2003)
7. Liu, H., Xie, X., Ma, W.Y., Zhang, H.J.: Automatic browsing of large pictures on
mobile devices. In: Proceedings of the eleventh ACM international conference on
Multimedia. (2003) 148–155
8. Suh, B., Ling, H., Bederson, B.B., Jacobs, D.W.: Automatic thumbnail cropping
and its effectiveness. In: Proceedings of ACM symposium on user interface software
and technology, Vancouver, Canada (2003)
9. Bormans, J., Hill, K.: Mpeg-21 overview v.5. In: ISO/IEC JTC1/SC29/WG11/
N5231. (2002)
10. Panis, G., Hutter, A., Heuer, J., Hellwagner, H., Kosch, H., Timmerer, C., Dev-
illers, S., Amielh, M.: Bitstream syntax description: a tool for multimedia resource
adaptation within mpeg-21. Singal Processing: Image Communication, EURASIP
18 (2003)
11. Mukherjee, D., Kuo, G., Liu, S., Beretta, G.: Motivation and use cases
for decision-wise bsdlink, and a proposal for usage environment descriptor-
adaptationqoslinking. In: ISO/IEC JTC 1/SC 29/WG 11, Hewlett Packard Labo-
ratories. (2003)

Potrebbero piacerti anche