Sei sulla pagina 1di 9

Robust Subspace Analysis for Detecting Visual Attention

Regions in Images

Yiqun Hu, Deepu Rajan and Liang-Tien Chia

Center for Multimedia and Network Technology
School of Computer Engineering
Nanyang Technological University, Singapore 639798
{y030070, asdrajan, asltchia}

Detecting visually attentive regions of an image is a chal- The bottom line for “efficient” transmission of multime-
lenging but useful issue in many multimedia applications. dia content lies in its fast and perceptually pleasing deliv-
In this paper, we describe a method to extract visual atten- ery. The visual components of multimedia data in the form
tive regions in images using subspace estimation and anal- of images and videos are more prone to user dissatisfaction
ysis techniques. The image is represented in a 2D space and to network induced distortions than text, speech and
using polar transformation of its features so that each re- audio. One possible remedy is to detect regions of interest
gion in the image lies in a 1D linear subspace. A new sub- in images and to process them to suit specific operational
space estimation algorithm based on Generalized Principal constraints. For instance, the JPEG2000 image compres-
Component Analysis (GPCA) is proposed. The robustness sion standard encodes the region of interest in more detail
of subspace estimation is improved by using weighted least than the background. In image browsing applications, a
square approximation where weights are calculated from the user could be provided with coarser versions initially and a
distribution of K nearest neighbors to reduce the sensitivity feedback mechanism could enable parts of the image (i.e.,
of outliers. Then a new region attention measure is defined regions of interest) to be presented at a higher resolution.
to calculate the visual attention of each region by consid- A variety of display devices with limited screen sizes would
ering both feature contrast and geometric properties of the require appropriate region of interests of images to be dis-
regions. The method has been shown to be effective through played.
experiments to be able to overcome the scale dependency of Visual attention is a mechanism of the human visual sys-
other methods. Compared with existing visual attention de- tem to focus on certain parts of a scene first, before attention
tection methods, it directly measures the global visual con- is drawn to the other parts. Such areas that capture pri-
trast at the region level as opposed to pixel level contrast mary attention are called visual attention regions (VARs).
and can correctly extract the attentive region. For multimedia applications, the VAR should then indeed be
the regions of interest and an automatic process of extract-
ing the VARs becomes necessary. Identification of VARs
Categories and Subject Descriptors has been shown to be useful for object recognition [1, 2, 3]
I.2.10 [Artificial Intelligence]: Vision and Scene Under- and region based image retrieval [4, 5]. Similarly, images
standing— Perceptual reasoning; I.5.2 [Pattern Recogni- can be adapted for different users with different device ca-
tion]: Design Methodology—Pattern analysis pabilities based on the VARs extracted from the image, thus
enhancing viewing pleasure. Examples of such adaptation
include automatic browsing for large images [6], image reso-
General Terms lution adaptation [7, 8] and automatic thumbnail generation
Algorithms [9]. Models for capturing VARs have been proposed in [10]
and [11]. In [10], Itti et al. constructed pyramids of image
features (intensity, colour and orientation) and used center-
Keywords surround differences to calculate contrast. Various combi-
nation strategies were proposed to combine the features in
Subspace Analysis, GPCA, Visual Attention
order to identify the VAR [12]. Another model proposed
by Ma and Zhang [11] relies on the HSV color space and
the contrast is calculated as the difference of features be-
tween the center block and the spatial neighborhood blocks.
Compared with [10], this model is computationally efficient
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
if color is the right feature to detect the visual attention
not made or distributed for profit or commercial advantage and that copies region. Another visual attention model used in [4, 13] mea-
bear this notice and the full citation on the first page. To copy otherwise, to sures competitive novelty to estimate the attention of each
republish, to post on servers or to redistribute to lists, requires prior specific pixel. The model estimates the mismatch of the neighboring
permission and/or a fee. configurations with other randomly selected neighbourhoods
MM’05, November 6–11, 2005, Singapore. to increase or decrease the attention value iteratively.
Copyright 2005 ACM 1-59593-044-2/05/0011 ...$5.00.

Although the above methods have yielded interesting re- visual attention measure: Cumulative Projection (CP) is
sults, they do not address the following issues: described in section 4. In section 5, four experiments are
designed to evaluate the proposed method. Finally, conclu-
• Global attention: The mechanism of capturing vi- sions and discussions related to future work are presented in
sual attention should be perceived as a global process section 6.
necessitating a global approach rather than consider-
ing local contrast calculated as spatial difference in
features among pixels or blocks. 2. POLAR TRANSFORMATION OF
• Scalability: The visual attention extraction algorithm
should be scale invariant. In the existing methods [10, In order to apply subspace analysis for visual attention
11], a priori knowledge of the levels in a pyramid or region detection, we need a transformation that maps image
the size of the blocks implies that the scale is fixed. regions onto linear subspaces. We consider a simple polar
transformation of a feature value at one location into a point
• Region extraction: The distribution of contrast at denoted by (θ, r) . The angle θ is given by
pixel level or block level as represented in a saliency
fi − minj (fj )
map, does not indicate similar difference in feature val- θ(fi ) = × π/2 (1)
ues (e.g. intensity) and hence does not contain infor- maxj (fj ) − minj (fj )
mation about regions. where fi is the feature value at pixel location i and θ is re-
stricted to [0, π/2]. The radius r is the euclidean distance of
In [10] and [11], attentive points are foveats distributed at
a pixel from the center pixel of the image. It is interesting
different locations. However, it is not trivial to relate such
to note that this simple transformation satisfies two condi-
distribution of attentive points to image regions. As for scal-
tions that ensure the correspondence of an image region to
ability, [10] uses a set of predefined scales that may not be
a subspace [16]. They are
optimal for other images with attentive objects of different
size, while in [11], blocks of size 8 × 8 are chosen empirically. • Subspace Constraint We require that each homoge-
Hence, we see that the use of spatial neighborhood configu- neous region in an image corresponds to one and only
rations to extract attention regions could fail (as shown later one subspace (linear or nonlinear). The angle θ in the
in the experiments) since (a) it is hard to establish corre- polar transformation ensures the mapping of the fea-
spondence between fovea locations and image regions and ture values of one region onto a line in 2D space.
(b) segmentation of the saliency map only groups together
pixels with similar contrasts without considering region in- • Cluster Constraint In addition to the features ly-
formation. ing on a subspace, they should also reside in clusters
In this paper, we try to overcome the above problems within the subspace. Thus, data not belonging to a
by proposing a visual attention region detection algorithm cluster are considered as outliers. The radius r in
based on subspace estimation and analysis. First, the image the polar transformation forces clusters to be formed
is mapped onto a 2-D space through a polar transforma- within the subspace.
tion so that possible visual attention regions are forced onto
the linear subspaces. The Generalized Principal Component
Analysis (GPCA) [14, 15] is then used to estimate the linear x 10

subspaces corresponding to VARs without actually segment- 1.5

ing these data. In order to handle noise in the transformed
space, we embed the distribution of K nearest neighbors 1
constraint within the GPCA framework. This extension im-
proves the robustness of the proposed method to noisy data 0.5
and results in a more accurate estimation of the subspaces.
We call the new subspace estimation method as the NN- 0
0 50 100 150
GPCA method. The attentive region is then determined
by a new attention measure that considers feature contrasts (a) (b)
as well as geometric properties of regions. We show that
the proposed visual attention detection method can solve Figure 1: (a) Synthetic monochrome image and (b)
the issue of scalability and determine attentive regions from polar transformation of its intensity
global rather than local perspective. Not only are inter-
region contrasts examined but intra-region similarities are
also involved to explore visual attention at the region level. Figure 1 shows a synthetic monochrome image and the
Notice that detection of VARs is different from conventional polar transformation of its intensity feature. The two lin-
image segmentation in that the former results in only those ear subspaces corresponding to the two regions are evident.
regions which are visually significant while the latter sim- Figure 2 shows a noisy synthetic color image and the polar
ply partitions an image into numerous homogeneous regions transformation of its color (hue+saturature). The four dis-
(e.g. in intensity). tinct regions are mapped into four linear subspaces although
The rest of the paper is organized as follows. In section 2, other subspaces are also formed due to noise. However, note
the simple polar transformation of features is described and that the sizes of the clusters formed by data and noise are
illustrated. The proposed NN-GPCA linear subspace esti- significantly different, causing them to be distinguished eas-
mation algorithm is described in section 3. The application ily. Hence, the true regions can be detected while noise is
of NN-GPCA to extract VARs in images using a proposed filtered out by imposing the subspace and cluster constraint.

Note also that the sizes of the clusters indicate the sizes of where cn ∈ R represents the coefficient of the monomial
the corresponding regions. xn . Given a collection of N sample points {xj }N
j=1 , a linear
system can be generated as
⎡ ⎤
vn (x1 )T
⎢ vn (x2 )T ⎥
⎢ ⎥
. ⎢ ⎥
⎢ . ⎥c = 0
Ln c = ⎢ ⎥ (4)
⎢ . ⎥
⎣ . ⎦
vn (xN )T
0 20 40 60 80

(a) (b) to solve all normal vectors {bi }n

i=1 to the subspaces.
In the absence of noise, the number of subspaces can be
Figure 2: (a) Synthetic color image and (b) polar estimated from the requirement that there is a unique solu-
transformation of its hue tion for c. The normal vectors {bi }n i=1 can be solved once
c is determined. However,in the presence of noise, the sub-
space estimation problem is cast as a constrained nonlinear
optimization problem which is initialized using the normal
3. NN-GPCA SUBSPACE ESTIMATION vectors obtained as above. Further details of the GPCA
Having transformed the image regions into a subspace rep- algorithm are available in [14, 15].
resentation, the objective is to estimate the subspaces. This While the GPCA algorithm provides an elegant solution
involves determining the number of subspaces and their ori- to the problem of linear subspace estimation without seg-
entations in the presence of noise. In doing so, we do not rely menting data, there are some inherent limitations that we
on any segmentation algorithms. Vidal et al. [14, 15] have discuss next.
proposed an algebraic geometric approach called General-
1. Subspace number estimation: GPCA relies on a pre-
ized Principle Component Analysis (GPCA) for subspace
defined threshold to combine the subspaces that are
modeling. As we will show later and for reasons elaborated
very close to each other. This threshold does not have
in the following sub-section, the performance of GPCA de-
a meaningful interpretation and there is no easy way
grades with noise. The GPCA algorithm is made more ro-
to decide its value.
bust to outliers by combining it with the distribution of K
nearest neighbors gknn that yields a weighted estimation of 2. Effect of outliers: Each data point is either true data or
the subspaces so that outliers are weighted less than the noise that appears as outliers in the subspace represen-
inliers.We design a new NN-GPCA method which extends tation. GPCA applies least square error approxima-
GPCA by combining gknn as weight coefficients for weighted tion on the entire data including outliers. This could
least square estimation of subspaces to fit both subspace lead to an erroneous estimation of the subspaces.
constraint and cluster constraint. This enables us to dis-
tinguish those linear subspaces that contain one or more big 3. Approximation bias: Since the objective function in
clusters from a set of noisy data. In this section, we first give the optimization problem consists of the sum of ap-
a brief review of the GPCA algorithm including its limita- proximation errors, for a fixed number of subspaces,
tions followed by the description of the proposed NN-GPCA the estimation will be biased towards those subspaces
algorithm. that are populated more.
3.1 Review of GPCA We illustrate the failure of the GPCA algorithm using
GPCA is an algebraic geometric approach to the prob- synthetic data containing outliers in Figure 3 where the red
lem of estimating a mixture of linear subspaces from sample lines represent the estimated subspaces. The data lie on
utilizes the fact that each data point x ∈ Rk
data points. It  four linear subspaces shown in Figure 3 (a) of which two
satisfies bi x = kj=1 bij xj = 0 where bi is the normal vector
T subspaces contain true data and the other two contain out-
of the subspace it belongs to. Since every sample point lies liers. In the absence of outliers as shown in Figure 3 (b) the
on one of the subspaces, the following equation about homo- GPCA estimation performs very well. However, the initial
geneous polynomial of degree n on x with real coefficients estimate of the subspaces shown in Figure 3 (c) and the final
holds estimate using nonlinear optimization shown in Figure 3 (d)
n are erroneous when noisy data is also taken into account.
pn (x) = (bTi x) = 0 (2) We propose to overcome these drawbacks by weighting the
i=1 data points using a K nearest neighbor distribution.
where n is the number of subspaces and {bi }n i=1 are normal
vectors of the subspaces. The problem of estimating sub-
3.2 Assigning weights to data points
spaces is to solve this nonlinear system for all {bi }n i=1 . It can
A subspace clustering method using the K th nearest neigh-
be converted to a linear expression by expanding the prod- bor distance (kNND) metric is shown to detect and remove
uct of all bTi x and viewing all monomials xn = xn 1 n2
1 x2 ...xK
outliers and small data clusters in [16]. The kNND met-
of degree n as system unknowns. Using the definition of ric uses the fact that in a cluster larger than K points, the
Veronese map [14, 15] vn : [x1 , ...xK ]T → [..., xn , ...]T , equa- kNND for a data point will be small; otherwise it will be
tion (2) becomes the following linear expression: large. According to the polar transformation of features, the
true data lies in not any cluster but the cluster inside its sub-
pn (x) = vn (x)T c = cn1 ...nK xn1
1 ...xK = 0, (3) space. Instead of using kNND, we utilize the distribution of

7 7 3.3 The NN-GPCA algorithm
6 6
5 5 The weights obtained from the analysis of the distribu-
4 4 tion gknn of K nearest neighbors are used in the GPCA
3 3 algorithm to improve robustness and accuracy of subspace
2 2
estimation. By taking the weight of each data point xi into
1 1
0 0
account, the linear system of equations (4) is modified as
0 2 4 6 8 0 2 4 6 8

(a) (b) .
W Ln c =
7 7
⎡ ⎤⎡ ⎤
6 6 W (x1 ) vn (x1 )T
⎢ W (x2 ) ⎥⎢ vn (x )
2 T ⎥
⎥⎢ ⎥
5 5
⎢ ⎢ ⎥
4 4
⎢ . ⎥⎢ . ⎥c = 0
3 3 ⎢ ⎥⎢ ⎥
⎢ . ⎥⎢ . ⎥
2 2
⎣ . ⎦⎣ . ⎦
1 1
0 2 4 6 8
0 2 4 6 8 W (xN ) vn (xN )T
(c) (d) (6)
where W (xi ) is the weight of xi .
Figure 3: Effect of outliers on subspace estimation In order to estimate the number of subspaces,we first mod-
by GPCA (a) Synthetic data (b) estimation using ulate Ln using the weights W (xi ) for each xi as
true data only (without outliers); (c) initial estimate v˜n (xi ) = v̄n + W (xi )(vn (xi ) − v̄n ) (7)
used to determine (d) final estimate after optimiza-
tion where v¯n is the mean of the data. If x is an outlier, its
small weight causes it to be pulled closer to the mean of the
data. Next, we do a Singular Value Decomposition (SVD)
all k nearest neighbors denoted as gknn to differentiate inliers on W Ln and eliminate the outliers using a very weak thresh-
and outliers. In this paper, we assign a weight to each data old. We emphasize that the choice of threshold in this case
point xi calculated from gknn (xi ). This provides a simple is not crucial since the weights allow less dependency on the
method to reduce the sensitivity of outliers without segmen- threshold. This is unlike the case for GPCA where the pres-
tation of data. The weight is related to the probability of a ence of outliers may cause the number of dominant singular
data point lying on a subspace corresponding to an image re- values to be large.
gion. Given a sample data xi , its K nearest neighbors are de- The subspace estimation problem is formulated as an ap-
tected and the variance svar(gknn (xi )) along the direction of proximation problem using weighted least square technique
the subspace of xi (from origin to the current point) and the for the purpose of calculating coefficient vector c. Since c
variance nvar(gknn (xi )) along the orthogonal direction are can be obtained only up to a scale factor, we normalize it
calculated using Principal Component Analysis (PCA). The by the first component c1 . Thus the left side of equation (6)
sum S(gknn (xi )) = svar(gknn (xi )) + nvar(gknn (xi )) corre- becomes
⎡ ⎤
sponds to the characteristic variance of K nearest neighbors vn (x1 )(2..M )T ⎡ ⎤ ⎡ ⎤
of the current point. It will be small if these K neighbors ⎢ vn (x2 )(2..M )T ⎥ c2 W (x1 )vn (x1 )(1)c1
⎢ ⎥ ⎢ c3 ⎥ ⎢ W (x2 )vn (x2 )(1)c1 ⎥
⎢ ⎥⎢ ⎢ ⎥
⎥⎢ . ⎥
form a cluster, otherwise it will be large. Since only the .
clusters inside the subspace are true data, we use the ratio W⎢ ⎢ ⎥⎣ ⎥+⎢ ⎢ . ⎥

⎢ . ⎥ . ⎦ ⎣ . ⎦
R(gknn (xi )) = nvar(gknn (xi ))/svar(gknn (xi )) as the factor ⎣ . ⎦
to bias the weights to those clusters in the subspace that cor- N T cN W (xN )vn (xN )(1)c1
vn (x )(2..M )
respond to true data. Hence, the weight for xi is calculated (8)
as where vn (xi )(2..M ) represents a vector containing all com-
W (xi ) =
(5) ponents of vn (xi ) except for the first component vn (xi )(1).
1 + S(gknn (xi )) × R(gknn (xi )) With c1 = 1, equation (6) can now be rewritten as
⎡ ⎤
When the data point xi lies in a cluster larger than K in- vn (x1 )(2..M )T ⎡ ⎤ ⎡ ⎤
side a subspace, nvar(gknn (xi )) is 0 in the absence of noise ⎢ vn (x2 )(2..M )T ⎥ c2 −W (x1 )vn (x1 )(1)T
⎢ ⎥ ⎢ c3 ⎥ ⎢ −W (x )vn (x )(1) ⎥
2 2 T
and the ratio R(gknn (xi )) is very small in the presence of ⎢ ⎥⎢ ⎢ ⎥
. ⎥⎢ . ⎥ ⎥=⎢ ⎥
noise. The sum S(gknn (xi )) is also small because these K ⎢ . ⎥⎣ ⎦ ⎢ . ⎥
⎢ ⎥ ⎣ ⎦
data form a cluster. So W (xi ) is equal to or close to 1. Oth- ⎣ . ⎦ . .
erwise, R(gknn (xi )) and/or S(gknn (xi )) are large and W (xi ) c N −W (x )v n (x )(1)
vn (xN )(2..M )T
is small or even close to zero. Additionally, the parameter K (9)
decides the minimum size of the cluster that is considered The above equation can be succinctly written as W Ac = d,
as true data. Since the outliers are always far away from where A is the matrix whose rows are vn (xi )(2..M )T , i =
the true data, any small value of K can differentiate outliers 1, 2..N and d is the right side of equation (9). By minimiz-
from true data and the selection of this value can be fixed for ing the objective function ||d − Ac||W , we can obtain the
all cases. Hence the weight of each data point relates to the weighted least square approximation of ci , i = 1, 2, 3..N as
probability that they are inside the cluster of a specific sub-
space corresponding to one image region. Thus, the analysis c1 = 1 and [c2 , ..cN ]T = (AT W T W A)−1 (AT W T W d)
of gknn (xi ) assigns small weights to both outliers and small (10)
clusters to reduce their effect on subspace estimation. The estimation error of coefficient vector c is reduced by

the diagonal matrix of weight coefficient W . Through W , 15

the contribution of the outliers to the system are reduced

by small weights. The normal vectors {bi }n i=1 are calculated

from c using the same method as in GPCA [14, 15]. These

vectors serve to initialize the following constrained nonlinear 5

optimization which differs from [14, 15] in the weight matrix:

0 10 20 30
N j j j 2 (a) (b)
min j=1 W (x )||x̃ − x ||

Figure 5: NN-GPCA on Natural Image (a) original
subject to (bTi x̃j ) = 0 j = 1, ..., N. (11) image; (b) estimation result of three subspaces

By using Lagrange multipliers λj for each constraint, the

above optimization problem is equivalent to minimizing the
function main subspaces (three red lines) corresponding to three im-
age regions with different hue. All outliers not lying in a

cluster are ignored because of their small weights. Notice
(W (xj )||x̃j − xj ||2 + λj (bTi x̃j )). (12)
that multiple objects with similar visual appearance at dif-
j=1 i=1
ferent locations will be mapped to a single subspace. Hence,
Taking partial derivatives w.r.t x̃j and equating it to 0, the proposed approach can detect similar regions simultane-
we can solve for λj /2 and W (xj )||x̃j − xj ||2 . By replacing ously and assign the same amount of attention to them using
them into the objective function (11), the simplified objec- the attention measure introduced later.
tive function on the normal vectors can be derived as
3.4 Computational Complexity
W (xj )(n n (bT xj ))2
En (b1 , ..., bn ) = n
i=1 Ti j 2 . (13) We derive the computational complexity of the NN-GPCA
|| i=1 bi l=i (bl x )|| algorithm for an image that contains n subspaces and which
is divided into N blocks of 8 × 8 pixels each, where N is
We found the convergence of equation (13) to be very slow. much smaller than the total number of pixels.
Hence, a weighted k-means iteration method was used to
determine the optimized {bi }n i=1 . The weighted data points Computation of weights
are assigned to the nearest subspace and the updated sub-
spaces are estimated. This process is continued till there is Estimation of the K nearest neighbors requires complex-
no change in the subspaces. This method of optimizing the ity of O(N 2 ) without any optimization. Since K is much
bi ’s achieves the same performance as equation (13) but at smaller than N , the subsequent process of calculating weights
a faster convergence rate. We illustrate the improvement in using SVD on K nearest neighbors is of much lower order
subspace estimation using NN-GPCA on the synthetic data and hence can be ignored. Therefore, the complexity in this
used in Figure 3. The initial estimation calculated from the step is O(N 2 ).
weighted linear system and the final optimized estimation
are shown as green lines in Figure 4 (a) and (b), respectively. Subspace estimation
Comparing with GPCA, we note that the effect of two sub- This process consists of (i) estimating the number of sub-
spaces due to outliers on the initial estimation is reduced. spaces (n) for which the complexity is O(N ) since n is a
Subsequently, the optimization process results in the correct small number; (ii) solving for ci , i = 1, 2, 3..N from a linear
estimation of the two subspaces that satisfy the cluster con- system of N linear equations in n + 1 unknowns. With-
straint while ignoring the outlier subspaces. Figure 5 shows out any optimization, the complexity is O(n2 · N ) without
noise and O(n · N 2 ) in the presence of noise; (iii) solving for
7 7
{bi }n
i=1 , in the case of 2D, it is only required to find n roots of
6 6
5 5
a linear equation of degree n in one unknown. Its complex-
4 4 ity can be ignored; (iv) optimization using weighted k-means
3 3 has complexity O(N ·log(N )) which is lower than O(N 2 ). So
2 2 the total complexity is O(N +n·N 2 +N ·log(N )) = O(n·N 2 )
1 1
0 0
which is the same as the complexity of GPCA.
0 2 4 6 8 0 2 4 6 8
From the above analysis, we can see that the computa-
(a) (b) tional complexity of NN-GPCA is O(N 2 +n·N 2 ) = O(n·N 2 )
which, again is the same as the complexity of GPCA. We
Figure 4: Subspace estimation using NN-GPCA (a) conclude that our extension of NN-GPCA improves the ro-
initial estimate used to determine (b) final estimate bustness of subspace estimation in the presence of outliers
after optimization (Compare with Figure 3). without increasing the computational complexity. Since di-
viding the image into small blocks makes the number of
data N much smaller than number of pixel and the number
the result of subspace estimation on real data. Here, we of subspace n also small, the algorithm with the complexity
transform hue information into the polar 2-D space resulting of O(n · N 2 ) is quite efficient. When applying it to mul-
in a noisy representation in the subspaces. However, the pro- tiple features, the complexity of NN-GPCA only increases
posed NN-GPCA method can correctly estimate the three linearly with the number of features.

Having described the NN-GPCA algorithm to estimate
the subspaces of a transformed image, we now present a new
attention measure to detect visual attention regions within
the subspaces. Due to the polar transformation of image
features, each region lies on a linear subspace and hence a
measure of feature contrast between the regions is simply
|θ1 − θ2 |, where θ1 and θ2 are the angles made by the sub- (a) (b) (c)
spaces. The projection of data points onto the normal to a
Figure 6: Images to illustrate effect of region size
subspace is an indication of how visually important the cor-
and location on Cumulative Projection
responding region is to the human visual system and serves
as a good measure for region attention because it considers
both feature contrast as well as the geometry properties of
the region. We call this measure as the Cumulative Projec- Table 1: Cumulative Projection values
tion (CP) and define it as Image Light Region Dark Region
Image (a) 0.5772 0.0005

CP (b̃j ) = (|(xi )T b̃j |)/ ||xl || (14) Image (b) 0.1549 0.4223
i=1 l=1 Image (c) 0.7071 0.5528

i.e., the sum of the projections of all normalized data onto

the normal of a subspace. The Cumulative Projection measures visual attention in
Besides feature contrast, the CP attention measure inher- a global perspective by considering inter-region contrasts as
its two important properties about the size and location of well as the size and location of the regions. In the case of
the region that make it consistent with the human visual multiple features, we do a similar analysis on each feature so
system (HVS) to correctly detect VARs. Firstly, an image that whichever feature yields the highest CP value is auto-
consisting of a small object within a large background draws matically selected as the one which is finally used to extract
attention to the object first. Even if the differences in the VARs. In Figure 7 (a), the hue and intensity features of an
feature values between object and background are not large, image are shown. Their corresponding subspaces are shown
the CP attention value biases the attention to the smaller in Figure 7 (b) in which the largest CP for the hue feature is
object in the foreground. This is because the projection of 0.8139 and for the intensity feature is 0.5816 (both shown in
all data onto the normal of the subspace representing the green). The system automatically selects hue to detect vi-
small object will be larger than that on the normal of the sual attention. From the extracted attentive region, we can
subspace representing the background. Secondly, CP also see that the detected region corresponds to the attentive
captures the variability of attention on location of regions object detected by the HVS.
within an image. Most often, attention is drawn to the cen-
ter of an image first. Closer a region is to the center of the
image, higher will be its CP value.
We illustrate these properties through a synthetic exam- 10

ple in Figure 6, which shows three images with (a) small 5

object on a large background, (b) a large object in the fore- 0

0 10 20 30

ground and (c) two objects in the foreground but at different 30

distances from the center of the image. As shown in table 1, 20

the smaller brighter region in Figure 6 (a) has larger CP at- 15

tention value than the background. Similarly, the attention 5

is drawn to the edges in Figure 6 (b) first, which is reflected 0

0 10 20 30

by the CP in table 1. In Figure 6 (c), because the sizes of (a) (b) (c)
the subspaces representing the white region and the black
region are the same and the contrast between these regions Figure 7: Multiple feature analysis: (a) feature map,
and the background are identical, both would possibly cap- (b) subspace estimation and (c) extracted attention
ture visual attention to the same extent. However, a higher region using Top row: Hue feature and Bottom row:
CP attention value is obtained for the white region by virtue Intensity feature
of it being closer to the center of the image. So the white
region will be focused before black region. Coming back to
Figure 6 (b), one could argue that since the location of the
darker edge region is farther from the center of the image, it
should garner lower CP value than the lighter region. But 5. EXPERIMENTAL RESULTS
in this case, the effect of the region size overrides the loca- We conduct several experiments to demonstrate various
tion property to generate a higher CP value, which is once properties of the proposed VAR extraction mechanism. We
again consistent with the HVS. Furthermore, if we were to illustrate specifically the robustness and scale invariance of
imagine the lighter square region to shrink so that its size the algorithm, and also compare the performance of the pro-
is the same as the size of the background region, attention posed algorithm with Itti’s model [10]. We also show the
would now be drawn to this region since it is closer to the utility of the algorithm to automatically detect VAR when
center. multiple features are used. Each data point consists of the

Intensity Hue Intensity Hue

Subspace Mask Subspace Mask Subspace Mask Subspace Mask

Intensity Win Hue Win

(a) (b)

Figure 8: Two examples of automatic feature selection

mean of the feature over an 8 × 8 block in the image. Un- 0.15 GPCA
like the partitioning of the image into blocks in [11] for the
purpose of measuring contrast among blocks, here this pro-
cess is done only to accelerate the analysis and to reduce the
effect of noise. Hence, the size of the block is not crucial. 0.1
The value of K in the subspace estimation is chosen as 15
for fixed image size of (256 × 384). The features used are
hue,intensity and the R,G,B channels. 0.05

Experiment 1
In the first experiment, we compare the proposed NN-GPCA
subspace estimation method with GPCA on synthetic data. 0 0.025 0.050 0.075 0.100 0.125
We randomly pick n = 2 subspaces each containing N = 400
points on 1-dimensional subspaces of R2 . Zero-mean Gaus- Figure 9: Subspace estimation error v/s number of
sian noise with standard deviation of 20 is added to the outliers for NN-GPCA (Blue) and GPCA (Red) for
sample points. We consider six cases of outliers that are synthetic data.
randomly added to the data such that the number of out-
liers are 0%, 2.5%, 5.0%, 7.5%, 10%, and 12.5% of the original
number of data points. For each of the six cases, the NN-
GPCA and GPCA algorithms are run 500 times. The error five features mentioned at the beginning of this section, al-
between the true unit normals {bi }n i=1 and their estimates though only the cases for intensity and hue are shown for the
{b̃i }n
i=1 is computed for each run as [14, 15]
sake of brevity. In Figure 8 (a) the intensity feature is more
suitable for detecting the visual attention region compared
to hue, while in Figure 8 (b), it is vice versa. The automatic
error = cos−1 (bTi b̃i ) (15)
n i=1 feature selection capability is similar to the method of using
convex hull of salient points in a contrast map [17]. How-
Figure 9 plots the mean error calculated over all the 500 ever, instead of using heuristic knowledge as in that case, we
trials as a function of the number of outliers. We can see rely on the computational measure of cumulative projection
when there are no outliers, the estimation errors correspond- to choose the feature which is more robust.
ing to NN-GPCA and GPCA are the same. As the number
of outliers increases, NN-GPCA outperforms GPCA . Experiment 3
The third experiment compares the attention detection re-
Experiment 2 sults of the proposed method and the widely used visual
The second experiment is designed to evaluate the auto- attention model of [10]. Figure 10 (a) shows the saliency
matic feature selection capability of the proposed method. maps generated from [10] and (b) are the bounding boxes
The largest CP on the subspaces formed from each feature indicating visual attention regions extracted by segmenting
is the cue to select the optimal feature for attention region the saliency map. Here, the saliency map only indicates the
extraction. In the examples shown in Figure 8, we use the foveats of attention. We used methods of segmenting the

saliency map as described in [2, 3], but the results were un- More VAR detection results are shown in Figure 12 in
satisfactory. Figure 10 (c) and (d) show, respectively, the which the first row contains the original images, the second
candidate visual attention regions and the smallest bound- row shows the candidate attention regions (white regions
ing boxes includes these regions extracted using the pro- indicate the region corresponding to the subspace with the
posed method. Notice that the proposed method detected largest CP ) and the last row shows the VAR represented
visual attention region correctly. by the smallest bounding box that includes all the elements
belonging to the selected subspace. While these results are
encouraging, we do realize that the choice of features is im-
portant in that they should reflect the homogeneity of the
region. Even if the subspaces are correctly estimated, the
VAR extraction could fail since there would be no corre-
spondence between the subspace and a region. Choice of a
suitable feature may not be trivial in some images shown
for example in Figure 13. We show intensity, hue and green
(a) (b) (c) (d) channel feature map of two images as well as their corre-
sponding transformation graphs. In these cases, neither in-
Figure 10: Comparison of proposed subspace tensity nor color can indicate interesting region (object) as a
method with [7] (a) saliency map; (b) VAR from (a); subspace such that the attention region cannot be detected
(c) attentive region using our proposed method; (d) whatever the performance of subspace estimation.
VAR region from (c)

Experiment 4 30 25

Models for VARs proposed in the literature calculate con- 20



trast among regions using a priori knowledge about a block 10


size or pyramid level. These limit the ability of the model 5

0 0
to detect regions only within the prescribed scale. The pro- 30
0 10 20 30
0 10 20 30

posed method achieves scale invariance since we use the CP 25 20

attention value which can identify regions of any size by 20


considering both inter-region contrast and intra-region ho- 10

5 5

mogeneity. Figure 11 shows the visual attention regions 0

0 10 20 30
0 10 20 30

30 20


15 10


0 0
0 10 20 30 0 10 20 30

(a) (b) (c) (d)

Figure 13: Failed examples (a, c) feature maps

of original images and (b, d) their transformation


In this paper, we propose to solve image attentive re-
gion detection problem using linear subspace estimation and
analysis techniques. Through a simple polar transformation,
each image feature value is transformed to a data point de-
noted by (θ, r) in a 2-D space. The value of θ ensures that all
pixels within a homogeneous region are mapped to a linear
(ii) subspace and r encodes the location of a region within the
image. We propose a new linear subspace estimation method
Figure 11: VAR extraction across different scales: called the NN-GPCA method in which the distribution of K
(left) small, (center) medium and (right) large using nearest neighbors is used to weight the data points accord-
(i) our proposed method and (ii) method of [11] ing to their probability of belonging to a big cluster in some
subspace corresponding a image region. A weighted least
square technique is used to estimate the subspaces. A new
extracted from the images which show flowers at different attention measure for image regions is then proposed that
scales. We compare our method to that proposed in [11] in defines the cumulative projection of all data onto the nor-
Figure 11 (ii) and observe that the results are the same for mal of the corresponding subspace. The subspace with the
the small scale but as the scale increases, the performance largest CP is extracted to detect attentive region. Several
of [11] degrades rapidly. experiments are performed to evaluate the performance of

Figure 12: More examples of visual attention region detection using the proposed method

the proposed method. The experimental results are promis- international conference on Multimedia, pages
ing as they show that the proposed method can detect VARs 340–343, New York, NY, USA, 2004. ACM Press.
correctly. Future work will involve (i) choosing optimal fea- [9] B. Suh, H. Ling, B. B. Bederson, and D. W. Jacobs.
tures to be transformed into the subspace estimation do- Automatic thumbnail cropping and its effectiveness.
main (ii) selecting a suitable transformation for the weights In Proceedings of the 16th annual ACM symposium on
as the data is embedded from low dimensional space into User interface software and technology, pages 95–104,
the Veronese map and (iii) developing an algorithm for es- New York, NY, USA, Novemember 2003. ACM Press.
timating the optimal features (in the multiple feature case) [10] L. Itti, C. Koch, and E. Niebur. A model of
simultaneously rather than as a sequential process. saliency-based visual attention for rapid scene
analysis. IEEE Transactions on Pattern Analysis and
7. REFERENCES Machine Intelligence, 20(11):1254–1259, November
[1] U. Rutishauser, D. Walther, C. Koch, and P. Perona. 1998.
Is bottom-up attention useful for object recognition? [11] Y.-F. Ma and H.-J. Zhang. Contrast-based image
In Proceedings of the 2004 IEEE Computer Society attention analysis by using fuzzy growing. In
Conference on Computer Vision and Pattern Proceedings of the eleventh ACM international
Recognition, volume 2, pages 37–44, Washington, DC, conference on Multimedia, pages 374–381, New York,
USA, July 2004. NY, USA, Novemember 2003. ACM Press.
[2] U. Rutishauser, D. Walther, C. Koch, and P. Perona. [12] L. Itti and C. Koch. A comparison of feature
On the usefulness of attention for object recognition. combination strategies for saliency-based visual
In 2nd International Workshop on Attention and attention systems. In Proceedings of SPIE Human
Performance in Computational Vision 2004, pages Vision and Electronic Imaging IV (HVEI’99), volume
96–103, Prague, Czech Republic, May 2004. 3644, pages 473–482, San Jose, CA, January 1999.
[3] D. Walther, U. Rutishauser, C. Koch, and P. Perona. [13] A. P. Bradley and F. W. Stentiford. Visual attention
Selective visual attention enables learning and for region of interest coding in jpeg2000. Journal of
recognition of multiple objects in cluttered scenes. Visual Communication and Image Representation,
Computer Vision and Image Understanding, pages 14(3):232–250, September 2003.
745–770, to be published 2005. [14] R. Vidal, Y. Ma, and S. Sastry. Generalized principal
[4] A. Bamidele, F. W. Stentiford, and J. Morphett. An component analysis (gpca). In Proceedings of the 2003
attention-based approach to content based image IEEE Computer Society Conference on Computer
retrieval. British Telecommunications Advanced Vision and Pattern Recognition, volume 1, pages
Research Technology Journal on Intelligent Spaces 621–628, Madison, Wisconsin, USA, June 2003.
(Pervasive Computing), 22(3), July 2004. [15] R. Vidal. Generalized Principal Component Analysis
[5] X.-J. Wang, W.-Y. Ma, and X. Li. Data-driven (GPCA): an Algebraic Geometric Approach to
approach for bridging the cognitive gap in image Subspace Clustering and Motion Segmentation. PhD
retrieval. In Proceedings of the 2004 IEEE thesis, School of Electrical Engineering and Computer
International Conference on Multimedia and Expo, Sciences, University of California at Berkeley, August
volume 3, pages 2231–2234, Taibei, Taiwan, June 2003.
2004. [16] Q. Ke and T. Kanade. Robust subspace clustering by
[6] H. Liu, X. Xie, W.-Y. Ma, and H.-J. Zhang. combined use of knnd metric and svd algorithm. In
Automatic browsing of large pictures on mobile Proceedings of the 2004 IEEE Computer Society
devices. In Proceedings of the eleventh ACM Conference on Computer Vision and Pattern
international conference on Multimedia, pages Recognition, volume 2, pages 592–599, Washington,
148–155, Berkeley, CA, USA, 2003. ACM Press. DC, USA, July 2004.
[7] L. Chen, X. Xie, X. Fan, W.-Y. Ma, H.-J. Zhang, and [17] Y. Hu, X. Xie, W.-Y. Ma, L.-T. Chia, and D. Rajan.
H. Zhou. A visual attention model for adapting Salient region detection using weighted feature maps
images on small displays. ACM Multimedia Systems based on the human visual attention model. In
Journal, 9(4):353–364, November 2003. Proceedings of the Fifth IEEE Pacific-Rim Conference
[8] Y. Hu, L.-T. Chia, and D. Rajan. Region-of-interest on Multimedia, volume 2, pages 993–1000, Tokyo
based image resolution adaptation for mpeg-21 digital Waterfront City, Japan, November 2004.
item. In Proceedings of the 12th annual ACM