Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Digital Items
1 Introduction
MPEG-21 multimedia framework aims to provide universal multimedia access
and experience for users with different devices. The most vital limitation of de-
vice terminal is the screen size. Viewing images especially large images on the
small device is awkward. Many inconvenient scrolling operations are required
when viewing the large images in its original size on small screens. On the con-
trary, if the image is directly down-scaled to the screen size, users can not see
them efficiently. The ideal solution is to make the best of screen by only crop-
ping the region which attracts human visual attention and fitting to the screen
size. New JPEG2000 image compression standard provides flexible scalability
for transmission as well as adaptation. With its scalability, different image re-
lated applications are becoming efficient such as scalable coding and progressive
transmission. MPEG-21 Standard Part 7 Digital Item Adaptation [1] describes
a standardized framework to adapt format-dependent and format-independent
multimedia resources according to terminal capability. Both of these two factors
motivate our work of JPEG2000 image adaptation using standard MPEG-21 dig-
ital item adaptation. In this paper, visual attention analysis is integrated into the
K. Aizawa, Y. Nakamura, and S. Satoh (Eds.): PCM 2004, LNCS 3331, pp. 470–477, 2004.
c Springer-Verlag Berlin Heidelberg 2004
JPEG2000 Image Adaptation for MPEG-21 Digital Items 471
where Θ is the neighborhood of (i, j) whose size controls the sensitivity of perceive
field. pi,j and q denote the color features in the neighboring pixel and center pixel.
d is the gaussian distance of color in LUV space.
Another saliency measure is based on the information loss along fine-to-coarse
scale space which is proposed by Ferraro et al. [5]. They measured the saliency
using the density of entropy production (3) which is loss of information at a
given pixel for unit scale.
∇f (x, y, t) 2
σ=( ) (3)
f (x, y, t)
Where ∇ is gradient operator, (x, y) is spatial coordinates and t is scale param-
eter. Chen et al. [6] also proposed a semantic attention model combining visual
attention, face attention as well as text attention and different applications using
this attention model have also been proposed such as image navigation [7] and
thumbnail cropping [8].
In our system, our objective is to provide a general solution of efficient dis-
playing images on the devices with different screen size. Hence we currently use
Itti’s model [2] because of its generality.
now assign different value for each point according to their topology attention. In
our image adaptation model, a simple region growing algorithm whose similarity
threshold is defined as 30% of the gray level range in the saliency map is used to
generate the smallest bounding rectangle that includes the identified attention
area(s). Firstly, we take the pixels with maximum value (one or multiple) as
the seeds and execute the region growing algorithm. In each growing step, the
4-neighbour points are examined, if the difference between the point and the
current seed is smaller than a threshold (30% of the range of gray-level value),
the point will be added into the seed queue and will be grown later. The algorithm
will continue until the seed queue is empty. Finally, the output are one or several
separate regions and we generate a smallest rectangle to include these regions.
Different from current image engine, the proposed image adaptation server pro-
vides a transparent resolution adaptation for JPEG2000 images in the standard
way: MPEG-21 Digital Item Adaptation [9]. JPEG2000 image bitstream and its
Bitstream Syntax Description (BSD) [10] compose of digital item. BSD describes
the high-level structure of JPEG2000 bitstream and adaptation is performed on
the bitstream according to BSD. The adapted image is directly generated from
JPEG2000 bitstream according to both attentive region information and ter-
minal screen size. In our system, accessing image through different devices will
obtain different views of the original image each of which deliveries the best
experience using limited screen space. Figure 1 shows the view of accessing im-
age through desktop PC as well as the view through the PDA. We can see
only most attentive information is displayed on the small screen to avoid over
down-scaling the image or additional scrolling operations. Our standard im-
age adaptation engine automatically detects the visual attentive region(s) and
adapts JPEG2000 image in bitstream level using standard digital item adapta-
tion mechanism which differs itself from other similar wok. The advantage of
474 Y. Hu, L.-T. Chia, and D. Rajan
(a) (b)
Fig. 2. Example of Digital Item BSD Adaptation; (a) Adaptation Decision Description;
(b) JPEG2000 BSD Adaptation (Green - Original BSD, Blue - Adapted BSD).
Fig. 3. Example of good intelligent adaptation; (a) Original Image; (b) Saliency Map;
(c) Adapted Image on PDA; (d) Directly down-scaling Image on PDA
– If Csize < ARsize : Crop the attentive region first and reduce the region
resolution to terminal screen size. (another adaptation can be performed by
the adaptation engine)
4 Experiment Evaluations
600 images were selected from different categories of the standard Corel Photo
Library as data set. The system was implemented as a web server application.
Users can view the image through desktop PC as well as PDA. Several output
examples of our intelligent visual attention based adaptation are shown in Fig-
ure 3 and Figure 4. Notice that the most interest information are reserved on
small screen to achieve possible better user experience. Compared with directly
downsizing image, it provides a better solution of viewing images on small de-
vices. Due to the subjectivity of visual attention perspective, we applied the
476 Y. Hu, L.-T. Chia, and D. Rajan
Fig. 4. Example of bad and failed intelligent adaptation; (a) Original Image; (b)
Saliency Map; (c) Cropped Image.
user study experiment in [4] to test the effectiveness of the proposed algorithm.
8 human subjects were invited to assign a score to each output of our adaptation
for 4 different topics. The users were asked to grade the adapted images from 1
(failed) to 5 (good).
From the evaluation result shown in Table 1, we found that for different
categories of images, an average of close to 87% cases are acceptable including
67% are better than acceptable. Only 10% are bad and 1% are failed. Results
are bad mainly because not the whole visual object is included in the cropped
images (eg. the legs of a animal) and 1% failure rate is due to either wrong visual
object identified as the attention region or images like scenery shots where there
may not be specific visual objects. The framework works reasonably well for a
general set of natural images. Among the 8 testers, all of them agree that visual
attention based adaptation improves the experience of viewing images on small
devices.
5 Conclusion
In this paper, we design a JPEG2000 image adaptation engine for efficiently
displaying images on different devices. The engine intelligently analyzes visual
attentive region of images and provides different views of the image for different
devices which makes the best of terminal screen to provide most interest infor-
mation. The advantages of this engine over others is its capability of attentive
region automatic detection and because using standard MPEG-21 digital item
JPEG2000 Image Adaptation for MPEG-21 Digital Items 477
References
1. Vetro, A., Timmerer, C.: Iso/iec 21000-7 fcd - part 7: Digital item adaptation. In:
ISO/IEC JTC 1/SC 29/WG 11/N5845. (2003)
2. Itti, L., Koch, C., Niebur, E.: A model of saliency based visual attention for rapid
scene analysis. IEEE Tran on Pattern Analysis and Machine Intelligence 20 (1998)
3. Itti, L., Koch, C.: A comparison of feature combination strategies for saliency-based
visual attention systems. In: Proc. SPIE Human Vision and Electronic Imaging
IV (HVEI’99), San Jose, CA. Volume 3644. (1999) 473–482
4. Ma, Y., Zhang, H.: Contrast-based image attention analysis by using fuzzy growing.
In: Proc. ACM Multimedia, Berkeley, CA USA (2003)
5. Ferraro, M., Boccignone, G., Caelli, T.: On the representation of image structures
via scale space entropy conditions. IEEE Tran on Pattern Analysis and Machine
Intelligence 21 (1999)
6. Chen, L., Xie, X., Fan, X., Ma, W., Zhang, H., Zhou, H.: A visual attention model
for adapting images on small displays. ACM Multimedia Systems Journal (2003)
7. Liu, H., Xie, X., Ma, W.Y., Zhang, H.J.: Automatic browsing of large pictures on
mobile devices. In: Proceedings of the eleventh ACM international conference on
Multimedia. (2003) 148–155
8. Suh, B., Ling, H., Bederson, B.B., Jacobs, D.W.: Automatic thumbnail cropping
and its effectiveness. In: Proceedings of ACM symposium on user interface software
and technology, Vancouver, Canada (2003)
9. Bormans, J., Hill, K.: Mpeg-21 overview v.5. In: ISO/IEC JTC1/SC29/WG11/
N5231. (2002)
10. Panis, G., Hutter, A., Heuer, J., Hellwagner, H., Kosch, H., Timmerer, C., Dev-
illers, S., Amielh, M.: Bitstream syntax description: a tool for multimedia resource
adaptation within mpeg-21. Singal Processing: Image Communication, EURASIP
18 (2003)
11. Mukherjee, D., Kuo, G., Liu, S., Beretta, G.: Motivation and use cases
for decision-wise bsdlink, and a proposal for usage environment descriptor-
adaptationqoslinking. In: ISO/IEC JTC 1/SC 29/WG 11, Hewlett Packard Labo-
ratories. (2003)