0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

21 visualizzazioni9 pagineDetecting visually attentive regions of an image is a challenging but useful issue in many multimedia applications. In this paper, we describe a method to extract visual attentive regions in images using subspace estimation and analysis techniques. The image is represented in a 2D space using polar transformation of its features so that each region in the image lies in a 1D linear subspace. A new subspace estimation algorithm based on Generalized Principal Component Analysis (GPCA) is proposed. The robustness of subspace estimation is improved by using weighted least square approximation where weights are calculated from the distribution of K nearest neighbors to reduce the sensitivity of outliers. Then a new region attention measure is defined to calculate the visual attention of each region by considering both feature contrast and geometric properties of the regions. The method has been shown to be effective through experiments to be able to overcome the scale dependency of other methods. Compared with existing visual attention detection methods, it directly measures the global visual contrast at the region level as opposed to pixel level contrast and can correctly extract the attentive region.

Apr 25, 2009

© Attribution Non-Commercial (BY-NC)

PDF, TXT o leggi online da Scribd

Detecting visually attentive regions of an image is a challenging but useful issue in many multimedia applications. In this paper, we describe a method to extract visual attentive regions in images using subspace estimation and analysis techniques. The image is represented in a 2D space using polar transformation of its features so that each region in the image lies in a 1D linear subspace. A new subspace estimation algorithm based on Generalized Principal Component Analysis (GPCA) is proposed. The robustness of subspace estimation is improved by using weighted least square approximation where weights are calculated from the distribution of K nearest neighbors to reduce the sensitivity of outliers. Then a new region attention measure is defined to calculate the visual attention of each region by considering both feature contrast and geometric properties of the regions. The method has been shown to be effective through experiments to be able to overcome the scale dependency of other methods. Compared with existing visual attention detection methods, it directly measures the global visual contrast at the region level as opposed to pixel level contrast and can correctly extract the attentive region.

Attribution Non-Commercial (BY-NC)

0 valutazioniIl 0% ha trovato utile questo documento (0 voti)

21 visualizzazioni9 pagineDetecting visually attentive regions of an image is a challenging but useful issue in many multimedia applications. In this paper, we describe a method to extract visual attentive regions in images using subspace estimation and analysis techniques. The image is represented in a 2D space using polar transformation of its features so that each region in the image lies in a 1D linear subspace. A new subspace estimation algorithm based on Generalized Principal Component Analysis (GPCA) is proposed. The robustness of subspace estimation is improved by using weighted least square approximation where weights are calculated from the distribution of K nearest neighbors to reduce the sensitivity of outliers. Then a new region attention measure is defined to calculate the visual attention of each region by considering both feature contrast and geometric properties of the regions. The method has been shown to be effective through experiments to be able to overcome the scale dependency of other methods. Compared with existing visual attention detection methods, it directly measures the global visual contrast at the region level as opposed to pixel level contrast and can correctly extract the attentive region.

Attribution Non-Commercial (BY-NC)

Sei sulla pagina 1di 9

Regions in Images

Center for Multimedia and Network Technology

School of Computer Engineering

Nanyang Technological University, Singapore 639798

{y030070, asdrajan, asltchia}@ntu.edu.sg

ABSTRACT 1. INTRODUCTION

Detecting visually attentive regions of an image is a chal- The bottom line for “eﬃcient” transmission of multime-

lenging but useful issue in many multimedia applications. dia content lies in its fast and perceptually pleasing deliv-

In this paper, we describe a method to extract visual atten- ery. The visual components of multimedia data in the form

tive regions in images using subspace estimation and anal- of images and videos are more prone to user dissatisfaction

ysis techniques. The image is represented in a 2D space and to network induced distortions than text, speech and

using polar transformation of its features so that each re- audio. One possible remedy is to detect regions of interest

gion in the image lies in a 1D linear subspace. A new sub- in images and to process them to suit speciﬁc operational

space estimation algorithm based on Generalized Principal constraints. For instance, the JPEG2000 image compres-

Component Analysis (GPCA) is proposed. The robustness sion standard encodes the region of interest in more detail

of subspace estimation is improved by using weighted least than the background. In image browsing applications, a

square approximation where weights are calculated from the user could be provided with coarser versions initially and a

distribution of K nearest neighbors to reduce the sensitivity feedback mechanism could enable parts of the image (i.e.,

of outliers. Then a new region attention measure is deﬁned regions of interest) to be presented at a higher resolution.

to calculate the visual attention of each region by consid- A variety of display devices with limited screen sizes would

ering both feature contrast and geometric properties of the require appropriate region of interests of images to be dis-

regions. The method has been shown to be eﬀective through played.

experiments to be able to overcome the scale dependency of Visual attention is a mechanism of the human visual sys-

other methods. Compared with existing visual attention de- tem to focus on certain parts of a scene first, before attention

tection methods, it directly measures the global visual con- is drawn to the other parts. Such areas that capture pri-

trast at the region level as opposed to pixel level contrast mary attention are called visual attention regions (VARs).

and can correctly extract the attentive region. For multimedia applications, the VAR should then indeed be

the regions of interest and an automatic process of extract-

ing the VARs becomes necessary. Identiﬁcation of VARs

Categories and Subject Descriptors has been shown to be useful for object recognition [1, 2, 3]

I.2.10 [Artiﬁcial Intelligence]: Vision and Scene Under- and region based image retrieval [4, 5]. Similarly, images

standing— Perceptual reasoning; I.5.2 [Pattern Recogni- can be adapted for diﬀerent users with diﬀerent device ca-

tion]: Design Methodology—Pattern analysis pabilities based on the VARs extracted from the image, thus

enhancing viewing pleasure. Examples of such adaptation

include automatic browsing for large images [6], image reso-

General Terms lution adaptation [7, 8] and automatic thumbnail generation

Algorithms [9]. Models for capturing VARs have been proposed in [10]

and [11]. In [10], Itti et al. constructed pyramids of image

features (intensity, colour and orientation) and used center-

Keywords surround diﬀerences to calculate contrast. Various combi-

nation strategies were proposed to combine the features in

Subspace Analysis, GPCA, Visual Attention

order to identify the VAR [12]. Another model proposed

by Ma and Zhang [11] relies on the HSV color space and

the contrast is calculated as the diﬀerence of features be-

tween the center block and the spatial neighborhood blocks.

Compared with [10], this model is computationally eﬃcient

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

if color is the right feature to detect the visual attention

not made or distributed for profit or commercial advantage and that copies region. Another visual attention model used in [4, 13] mea-

bear this notice and the full citation on the first page. To copy otherwise, to sures competitive novelty to estimate the attention of each

republish, to post on servers or to redistribute to lists, requires prior specific pixel. The model estimates the mismatch of the neighboring

permission and/or a fee. conﬁgurations with other randomly selected neighbourhoods

MM’05, November 6–11, 2005, Singapore. to increase or decrease the attention value iteratively.

Copyright 2005 ACM 1-59593-044-2/05/0011 ...$5.00.

716

Although the above methods have yielded interesting re- visual attention measure: Cumulative Projection (CP) is

sults, they do not address the following issues: described in section 4. In section 5, four experiments are

designed to evaluate the proposed method. Finally, conclu-

• Global attention: The mechanism of capturing vi- sions and discussions related to future work are presented in

sual attention should be perceived as a global process section 6.

necessitating a global approach rather than consider-

ing local contrast calculated as spatial diﬀerence in

features among pixels or blocks. 2. POLAR TRANSFORMATION OF

FEATURES

• Scalability: The visual attention extraction algorithm

should be scale invariant. In the existing methods [10, In order to apply subspace analysis for visual attention

11], a priori knowledge of the levels in a pyramid or region detection, we need a transformation that maps image

the size of the blocks implies that the scale is ﬁxed. regions onto linear subspaces. We consider a simple polar

transformation of a feature value at one location into a point

• Region extraction: The distribution of contrast at denoted by (θ, r) . The angle θ is given by

pixel level or block level as represented in a saliency

fi − minj (fj )

map, does not indicate similar diﬀerence in feature val- θ(fi ) = × π/2 (1)

ues (e.g. intensity) and hence does not contain infor- maxj (fj ) − minj (fj )

mation about regions. where fi is the feature value at pixel location i and θ is re-

stricted to [0, π/2]. The radius r is the euclidean distance of

In [10] and [11], attentive points are foveats distributed at

a pixel from the center pixel of the image. It is interesting

diﬀerent locations. However, it is not trivial to relate such

to note that this simple transformation satisﬁes two condi-

distribution of attentive points to image regions. As for scal-

tions that ensure the correspondence of an image region to

ability, [10] uses a set of predeﬁned scales that may not be

a subspace [16]. They are

optimal for other images with attentive objects of diﬀerent

size, while in [11], blocks of size 8 × 8 are chosen empirically. • Subspace Constraint We require that each homoge-

Hence, we see that the use of spatial neighborhood conﬁgu- neous region in an image corresponds to one and only

rations to extract attention regions could fail (as shown later one subspace (linear or nonlinear). The angle θ in the

in the experiments) since (a) it is hard to establish corre- polar transformation ensures the mapping of the fea-

spondence between fovea locations and image regions and ture values of one region onto a line in 2D space.

(b) segmentation of the saliency map only groups together

pixels with similar contrasts without considering region in- • Cluster Constraint In addition to the features ly-

formation. ing on a subspace, they should also reside in clusters

In this paper, we try to overcome the above problems within the subspace. Thus, data not belonging to a

by proposing a visual attention region detection algorithm cluster are considered as outliers. The radius r in

based on subspace estimation and analysis. First, the image the polar transformation forces clusters to be formed

is mapped onto a 2-D space through a polar transforma- within the subspace.

tion so that possible visual attention regions are forced onto

the linear subspaces. The Generalized Principal Component

−13

Analysis (GPCA) [14, 15] is then used to estimate the linear x 10

ing these data. In order to handle noise in the transformed

space, we embed the distribution of K nearest neighbors 1

constraint within the GPCA framework. This extension im-

proves the robustness of the proposed method to noisy data 0.5

and results in a more accurate estimation of the subspaces.

We call the new subspace estimation method as the NN- 0

0 50 100 150

GPCA method. The attentive region is then determined

by a new attention measure that considers feature contrasts (a) (b)

as well as geometric properties of regions. We show that

the proposed visual attention detection method can solve Figure 1: (a) Synthetic monochrome image and (b)

the issue of scalability and determine attentive regions from polar transformation of its intensity

global rather than local perspective. Not only are inter-

region contrasts examined but intra-region similarities are

also involved to explore visual attention at the region level. Figure 1 shows a synthetic monochrome image and the

Notice that detection of VARs is diﬀerent from conventional polar transformation of its intensity feature. The two lin-

image segmentation in that the former results in only those ear subspaces corresponding to the two regions are evident.

regions which are visually signiﬁcant while the latter sim- Figure 2 shows a noisy synthetic color image and the polar

ply partitions an image into numerous homogeneous regions transformation of its color (hue+saturature). The four dis-

(e.g. in intensity). tinct regions are mapped into four linear subspaces although

The rest of the paper is organized as follows. In section 2, other subspaces are also formed due to noise. However, note

the simple polar transformation of features is described and that the sizes of the clusters formed by data and noise are

illustrated. The proposed NN-GPCA linear subspace esti- signiﬁcantly diﬀerent, causing them to be distinguished eas-

mation algorithm is described in section 3. The application ily. Hence, the true regions can be detected while noise is

of NN-GPCA to extract VARs in images using a proposed ﬁltered out by imposing the subspace and cluster constraint.

717

Note also that the sizes of the clusters indicate the sizes of where cn ∈ R represents the coeﬃcient of the monomial

the corresponding regions. xn . Given a collection of N sample points {xj }N

j=1 , a linear

system can be generated as

50

⎡ ⎤

40

vn (x1 )T

⎢ vn (x2 )T ⎥

⎢ ⎥

. ⎢ ⎥

30

⎢ . ⎥c = 0

Ln c = ⎢ ⎥ (4)

20

⎢ . ⎥

10

⎣ . ⎦

vn (xN )T

0

0 20 40 60 80

i=1 to the subspaces.

In the absence of noise, the number of subspaces can be

Figure 2: (a) Synthetic color image and (b) polar estimated from the requirement that there is a unique solu-

transformation of its hue tion for c. The normal vectors {bi }n i=1 can be solved once

c is determined. However,in the presence of noise, the sub-

space estimation problem is cast as a constrained nonlinear

optimization problem which is initialized using the normal

3. NN-GPCA SUBSPACE ESTIMATION vectors obtained as above. Further details of the GPCA

Having transformed the image regions into a subspace rep- algorithm are available in [14, 15].

resentation, the objective is to estimate the subspaces. This While the GPCA algorithm provides an elegant solution

involves determining the number of subspaces and their ori- to the problem of linear subspace estimation without seg-

entations in the presence of noise. In doing so, we do not rely menting data, there are some inherent limitations that we

on any segmentation algorithms. Vidal et al. [14, 15] have discuss next.

proposed an algebraic geometric approach called General-

1. Subspace number estimation: GPCA relies on a pre-

ized Principle Component Analysis (GPCA) for subspace

deﬁned threshold to combine the subspaces that are

modeling. As we will show later and for reasons elaborated

very close to each other. This threshold does not have

in the following sub-section, the performance of GPCA de-

a meaningful interpretation and there is no easy way

grades with noise. The GPCA algorithm is made more ro-

to decide its value.

bust to outliers by combining it with the distribution of K

nearest neighbors gknn that yields a weighted estimation of 2. Eﬀect of outliers: Each data point is either true data or

the subspaces so that outliers are weighted less than the noise that appears as outliers in the subspace represen-

inliers.We design a new NN-GPCA method which extends tation. GPCA applies least square error approxima-

GPCA by combining gknn as weight coeﬃcients for weighted tion on the entire data including outliers. This could

least square estimation of subspaces to ﬁt both subspace lead to an erroneous estimation of the subspaces.

constraint and cluster constraint. This enables us to dis-

tinguish those linear subspaces that contain one or more big 3. Approximation bias: Since the objective function in

clusters from a set of noisy data. In this section, we ﬁrst give the optimization problem consists of the sum of ap-

a brief review of the GPCA algorithm including its limita- proximation errors, for a ﬁxed number of subspaces,

tions followed by the description of the proposed NN-GPCA the estimation will be biased towards those subspaces

algorithm. that are populated more.

3.1 Review of GPCA We illustrate the failure of the GPCA algorithm using

GPCA is an algebraic geometric approach to the prob- synthetic data containing outliers in Figure 3 where the red

lem of estimating a mixture of linear subspaces from sample lines represent the estimated subspaces. The data lie on

utilizes the fact that each data point x ∈ Rk

data points. It four linear subspaces shown in Figure 3 (a) of which two

satisﬁes bi x = kj=1 bij xj = 0 where bi is the normal vector

T subspaces contain true data and the other two contain out-

of the subspace it belongs to. Since every sample point lies liers. In the absence of outliers as shown in Figure 3 (b) the

on one of the subspaces, the following equation about homo- GPCA estimation performs very well. However, the initial

geneous polynomial of degree n on x with real coeﬃcients estimate of the subspaces shown in Figure 3 (c) and the ﬁnal

holds estimate using nonlinear optimization shown in Figure 3 (d)

n are erroneous when noisy data is also taken into account.

pn (x) = (bTi x) = 0 (2) We propose to overcome these drawbacks by weighting the

i=1 data points using a K nearest neighbor distribution.

where n is the number of subspaces and {bi }n i=1 are normal

vectors of the subspaces. The problem of estimating sub-

3.2 Assigning weights to data points

spaces is to solve this nonlinear system for all {bi }n i=1 . It can

A subspace clustering method using the K th nearest neigh-

be converted to a linear expression by expanding the prod- bor distance (kNND) metric is shown to detect and remove

nk

uct of all bTi x and viewing all monomials xn = xn 1 n2

1 x2 ...xK

outliers and small data clusters in [16]. The kNND met-

of degree n as system unknowns. Using the deﬁnition of ric uses the fact that in a cluster larger than K points, the

Veronese map [14, 15] vn : [x1 , ...xK ]T → [..., xn , ...]T , equa- kNND for a data point will be small; otherwise it will be

tion (2) becomes the following linear expression: large. According to the polar transformation of features, the

nK

true data lies in not any cluster but the cluster inside its sub-

pn (x) = vn (x)T c = cn1 ...nK xn1

1 ...xK = 0, (3) space. Instead of using kNND, we utilize the distribution of

718

7 7 3.3 The NN-GPCA algorithm

6 6

5 5 The weights obtained from the analysis of the distribu-

4 4 tion gknn of K nearest neighbors are used in the GPCA

3 3 algorithm to improve robustness and accuracy of subspace

2 2

estimation. By taking the weight of each data point xi into

1 1

0 0

account, the linear system of equations (4) is modiﬁed as

0 2 4 6 8 0 2 4 6 8

(a) (b) .

W Ln c =

7 7

⎡ ⎤⎡ ⎤

6 6 W (x1 ) vn (x1 )T

⎢ W (x2 ) ⎥⎢ vn (x )

2 T ⎥

⎥⎢ ⎥

5 5

⎢ ⎢ ⎥

4 4

⎢ . ⎥⎢ . ⎥c = 0

3 3 ⎢ ⎥⎢ ⎥

⎢ . ⎥⎢ . ⎥

2 2

⎣ . ⎦⎣ . ⎦

1 1

0

0 2 4 6 8

0

0 2 4 6 8 W (xN ) vn (xN )T

(c) (d) (6)

where W (xi ) is the weight of xi .

Figure 3: Eﬀect of outliers on subspace estimation In order to estimate the number of subspaces,we ﬁrst mod-

by GPCA (a) Synthetic data (b) estimation using ulate Ln using the weights W (xi ) for each xi as

true data only (without outliers); (c) initial estimate v˜n (xi ) = v̄n + W (xi )(vn (xi ) − v̄n ) (7)

used to determine (d) ﬁnal estimate after optimiza-

i

tion where v¯n is the mean of the data. If x is an outlier, its

small weight causes it to be pulled closer to the mean of the

data. Next, we do a Singular Value Decomposition (SVD)

all k nearest neighbors denoted as gknn to diﬀerentiate inliers on W Ln and eliminate the outliers using a very weak thresh-

and outliers. In this paper, we assign a weight to each data old. We emphasize that the choice of threshold in this case

point xi calculated from gknn (xi ). This provides a simple is not crucial since the weights allow less dependency on the

method to reduce the sensitivity of outliers without segmen- threshold. This is unlike the case for GPCA where the pres-

tation of data. The weight is related to the probability of a ence of outliers may cause the number of dominant singular

data point lying on a subspace corresponding to an image re- values to be large.

gion. Given a sample data xi , its K nearest neighbors are de- The subspace estimation problem is formulated as an ap-

tected and the variance svar(gknn (xi )) along the direction of proximation problem using weighted least square technique

the subspace of xi (from origin to the current point) and the for the purpose of calculating coeﬃcient vector c. Since c

variance nvar(gknn (xi )) along the orthogonal direction are can be obtained only up to a scale factor, we normalize it

calculated using Principal Component Analysis (PCA). The by the ﬁrst component c1 . Thus the left side of equation (6)

sum S(gknn (xi )) = svar(gknn (xi )) + nvar(gknn (xi )) corre- becomes

⎡ ⎤

sponds to the characteristic variance of K nearest neighbors vn (x1 )(2..M )T ⎡ ⎤ ⎡ ⎤

of the current point. It will be small if these K neighbors ⎢ vn (x2 )(2..M )T ⎥ c2 W (x1 )vn (x1 )(1)c1

⎢ ⎥ ⎢ c3 ⎥ ⎢ W (x2 )vn (x2 )(1)c1 ⎥

⎢ ⎥⎢ ⎢ ⎥

⎥⎢ . ⎥

form a cluster, otherwise it will be large. Since only the .

clusters inside the subspace are true data, we use the ratio W⎢ ⎢ ⎥⎣ ⎥+⎢ ⎢ . ⎥

⎥

⎢ . ⎥ . ⎦ ⎣ . ⎦

R(gknn (xi )) = nvar(gknn (xi ))/svar(gknn (xi )) as the factor ⎣ . ⎦

to bias the weights to those clusters in the subspace that cor- N T cN W (xN )vn (xN )(1)c1

vn (x )(2..M )

respond to true data. Hence, the weight for xi is calculated (8)

as where vn (xi )(2..M ) represents a vector containing all com-

W (xi ) =

1

(5) ponents of vn (xi ) except for the ﬁrst component vn (xi )(1).

1 + S(gknn (xi )) × R(gknn (xi )) With c1 = 1, equation (6) can now be rewritten as

⎡ ⎤

When the data point xi lies in a cluster larger than K in- vn (x1 )(2..M )T ⎡ ⎤ ⎡ ⎤

side a subspace, nvar(gknn (xi )) is 0 in the absence of noise ⎢ vn (x2 )(2..M )T ⎥ c2 −W (x1 )vn (x1 )(1)T

⎢ ⎥ ⎢ c3 ⎥ ⎢ −W (x )vn (x )(1) ⎥

2 2 T

and the ratio R(gknn (xi )) is very small in the presence of ⎢ ⎥⎢ ⎢ ⎥

W⎢

. ⎥⎢ . ⎥ ⎥=⎢ ⎥

noise. The sum S(gknn (xi )) is also small because these K ⎢ . ⎥⎣ ⎦ ⎢ . ⎥

⎢ ⎥ ⎣ ⎦

data form a cluster. So W (xi ) is equal to or close to 1. Oth- ⎣ . ⎦ . .

N N T

erwise, R(gknn (xi )) and/or S(gknn (xi )) are large and W (xi ) c N −W (x )v n (x )(1)

vn (xN )(2..M )T

is small or even close to zero. Additionally, the parameter K (9)

decides the minimum size of the cluster that is considered The above equation can be succinctly written as W Ac = d,

as true data. Since the outliers are always far away from where A is the matrix whose rows are vn (xi )(2..M )T , i =

the true data, any small value of K can diﬀerentiate outliers 1, 2..N and d is the right side of equation (9). By minimiz-

from true data and the selection of this value can be ﬁxed for ing the objective function ||d − Ac||W , we can obtain the

all cases. Hence the weight of each data point relates to the weighted least square approximation of ci , i = 1, 2, 3..N as

probability that they are inside the cluster of a speciﬁc sub-

space corresponding to one image region. Thus, the analysis c1 = 1 and [c2 , ..cN ]T = (AT W T W A)−1 (AT W T W d)

of gknn (xi ) assigns small weights to both outliers and small (10)

clusters to reduce their eﬀect on subspace estimation. The estimation error of coeﬃcient vector c is reduced by

719

the diagonal matrix of weight coeﬃcient W . Through W , 15

by small weights. The normal vectors {bi }n i=1 are calculated

10

vectors serve to initialize the following constrained nonlinear 5

0

0 10 20 30

N j j j 2 (a) (b)

min j=1 W (x )||x̃ − x ||

n

Figure 5: NN-GPCA on Natural Image (a) original

subject to (bTi x̃j ) = 0 j = 1, ..., N. (11) image; (b) estimation result of three subspaces

i=1

above optimization problem is equivalent to minimizing the

function main subspaces (three red lines) corresponding to three im-

age regions with diﬀerent hue. All outliers not lying in a

N

n

cluster are ignored because of their small weights. Notice

(W (xj )||x̃j − xj ||2 + λj (bTi x̃j )). (12)

that multiple objects with similar visual appearance at dif-

j=1 i=1

ferent locations will be mapped to a single subspace. Hence,

Taking partial derivatives w.r.t x̃j and equating it to 0, the proposed approach can detect similar regions simultane-

we can solve for λj /2 and W (xj )||x̃j − xj ||2 . By replacing ously and assign the same amount of attention to them using

them into the objective function (11), the simpliﬁed objec- the attention measure introduced later.

tive function on the normal vectors can be derived as

N

3.4 Computational Complexity

W (xj )(n n (bT xj ))2

En (b1 , ..., bn ) = n

i=1 Ti j 2 . (13) We derive the computational complexity of the NN-GPCA

j=1

|| i=1 bi l=i (bl x )|| algorithm for an image that contains n subspaces and which

is divided into N blocks of 8 × 8 pixels each, where N is

We found the convergence of equation (13) to be very slow. much smaller than the total number of pixels.

Hence, a weighted k-means iteration method was used to

determine the optimized {bi }n i=1 . The weighted data points Computation of weights

are assigned to the nearest subspace and the updated sub-

spaces are estimated. This process is continued till there is Estimation of the K nearest neighbors requires complex-

no change in the subspaces. This method of optimizing the ity of O(N 2 ) without any optimization. Since K is much

bi ’s achieves the same performance as equation (13) but at smaller than N , the subsequent process of calculating weights

a faster convergence rate. We illustrate the improvement in using SVD on K nearest neighbors is of much lower order

subspace estimation using NN-GPCA on the synthetic data and hence can be ignored. Therefore, the complexity in this

used in Figure 3. The initial estimation calculated from the step is O(N 2 ).

weighted linear system and the ﬁnal optimized estimation

are shown as green lines in Figure 4 (a) and (b), respectively. Subspace estimation

Comparing with GPCA, we note that the eﬀect of two sub- This process consists of (i) estimating the number of sub-

spaces due to outliers on the initial estimation is reduced. spaces (n) for which the complexity is O(N ) since n is a

Subsequently, the optimization process results in the correct small number; (ii) solving for ci , i = 1, 2, 3..N from a linear

estimation of the two subspaces that satisfy the cluster con- system of N linear equations in n + 1 unknowns. With-

straint while ignoring the outlier subspaces. Figure 5 shows out any optimization, the complexity is O(n2 · N ) without

noise and O(n · N 2 ) in the presence of noise; (iii) solving for

7 7

{bi }n

i=1 , in the case of 2D, it is only required to ﬁnd n roots of

6 6

5 5

a linear equation of degree n in one unknown. Its complex-

4 4 ity can be ignored; (iv) optimization using weighted k-means

3 3 has complexity O(N ·log(N )) which is lower than O(N 2 ). So

2 2 the total complexity is O(N +n·N 2 +N ·log(N )) = O(n·N 2 )

1 1

0 0

which is the same as the complexity of GPCA.

0 2 4 6 8 0 2 4 6 8

From the above analysis, we can see that the computa-

(a) (b) tional complexity of NN-GPCA is O(N 2 +n·N 2 ) = O(n·N 2 )

which, again is the same as the complexity of GPCA. We

Figure 4: Subspace estimation using NN-GPCA (a) conclude that our extension of NN-GPCA improves the ro-

initial estimate used to determine (b) ﬁnal estimate bustness of subspace estimation in the presence of outliers

after optimization (Compare with Figure 3). without increasing the computational complexity. Since di-

viding the image into small blocks makes the number of

data N much smaller than number of pixel and the number

the result of subspace estimation on real data. Here, we of subspace n also small, the algorithm with the complexity

transform hue information into the polar 2-D space resulting of O(n · N 2 ) is quite eﬃcient. When applying it to mul-

in a noisy representation in the subspaces. However, the pro- tiple features, the complexity of NN-GPCA only increases

posed NN-GPCA method can correctly estimate the three linearly with the number of features.

720

4. THE CUMULATIVE PROJECTION

Having described the NN-GPCA algorithm to estimate

the subspaces of a transformed image, we now present a new

attention measure to detect visual attention regions within

the subspaces. Due to the polar transformation of image

features, each region lies on a linear subspace and hence a

measure of feature contrast between the regions is simply

|θ1 − θ2 |, where θ1 and θ2 are the angles made by the sub- (a) (b) (c)

spaces. The projection of data points onto the normal to a

Figure 6: Images to illustrate eﬀect of region size

subspace is an indication of how visually important the cor-

and location on Cumulative Projection

responding region is to the human visual system and serves

as a good measure for region attention because it considers

both feature contrast as well as the geometry properties of

the region. We call this measure as the Cumulative Projec- Table 1: Cumulative Projection values

tion (CP) and deﬁne it as Image Light Region Dark Region

Image (a) 0.5772 0.0005

N

N

CP (b̃j ) = (|(xi )T b̃j |)/ ||xl || (14) Image (b) 0.1549 0.4223

i=1 l=1 Image (c) 0.7071 0.5528

the normal of a subspace. The Cumulative Projection measures visual attention in

Besides feature contrast, the CP attention measure inher- a global perspective by considering inter-region contrasts as

its two important properties about the size and location of well as the size and location of the regions. In the case of

the region that make it consistent with the human visual multiple features, we do a similar analysis on each feature so

system (HVS) to correctly detect VARs. Firstly, an image that whichever feature yields the highest CP value is auto-

consisting of a small object within a large background draws matically selected as the one which is ﬁnally used to extract

attention to the object ﬁrst. Even if the diﬀerences in the VARs. In Figure 7 (a), the hue and intensity features of an

feature values between object and background are not large, image are shown. Their corresponding subspaces are shown

the CP attention value biases the attention to the smaller in Figure 7 (b) in which the largest CP for the hue feature is

object in the foreground. This is because the projection of 0.8139 and for the intensity feature is 0.5816 (both shown in

all data onto the normal of the subspace representing the green). The system automatically selects hue to detect vi-

small object will be larger than that on the normal of the sual attention. From the extracted attentive region, we can

subspace representing the background. Secondly, CP also see that the detected region corresponds to the attentive

captures the variability of attention on location of regions object detected by the HVS.

within an image. Most often, attention is drawn to the cen-

ter of an image ﬁrst. Closer a region is to the center of the

15

image, higher will be its CP value.

We illustrate these properties through a synthetic exam- 10

0 10 20 30

25

distances from the center of the image. As shown in table 1, 20

10

tention value than the background. Similarly, the attention 5

0 10 20 30

by the CP in table 1. In Figure 6 (c), because the sizes of (a) (b) (c)

the subspaces representing the white region and the black

region are the same and the contrast between these regions Figure 7: Multiple feature analysis: (a) feature map,

and the background are identical, both would possibly cap- (b) subspace estimation and (c) extracted attention

ture visual attention to the same extent. However, a higher region using Top row: Hue feature and Bottom row:

CP attention value is obtained for the white region by virtue Intensity feature

of it being closer to the center of the image. So the white

region will be focused before black region. Coming back to

Figure 6 (b), one could argue that since the location of the

darker edge region is farther from the center of the image, it

should garner lower CP value than the lighter region. But 5. EXPERIMENTAL RESULTS

in this case, the eﬀect of the region size overrides the loca- We conduct several experiments to demonstrate various

tion property to generate a higher CP value, which is once properties of the proposed VAR extraction mechanism. We

again consistent with the HVS. Furthermore, if we were to illustrate speciﬁcally the robustness and scale invariance of

imagine the lighter square region to shrink so that its size the algorithm, and also compare the performance of the pro-

is the same as the size of the background region, attention posed algorithm with Itti’s model [10]. We also show the

would now be drawn to this region since it is closer to the utility of the algorithm to automatically detect VAR when

center. multiple features are used. Each data point consists of the

721

Intensity Hue Intensity Hue

(a) (b)

mean of the feature over an 8 × 8 block in the image. Un- 0.15 GPCA

like the partitioning of the image into blocks in [11] for the

NN−GPCA

purpose of measuring contrast among blocks, here this pro-

cess is done only to accelerate the analysis and to reduce the

eﬀect of noise. Hence, the size of the block is not crucial. 0.1

The value of K in the subspace estimation is chosen as 15

for ﬁxed image size of (256 × 384). The features used are

hue,intensity and the R,G,B channels. 0.05

Experiment 1

In the ﬁrst experiment, we compare the proposed NN-GPCA

0

subspace estimation method with GPCA on synthetic data. 0 0.025 0.050 0.075 0.100 0.125

We randomly pick n = 2 subspaces each containing N = 400

points on 1-dimensional subspaces of R2 . Zero-mean Gaus- Figure 9: Subspace estimation error v/s number of

sian noise with standard deviation of 20 is added to the outliers for NN-GPCA (Blue) and GPCA (Red) for

sample points. We consider six cases of outliers that are synthetic data.

randomly added to the data such that the number of out-

liers are 0%, 2.5%, 5.0%, 7.5%, 10%, and 12.5% of the original

number of data points. For each of the six cases, the NN-

GPCA and GPCA algorithms are run 500 times. The error ﬁve features mentioned at the beginning of this section, al-

between the true unit normals {bi }n i=1 and their estimates though only the cases for intensity and hue are shown for the

{b̃i }n

i=1 is computed for each run as [14, 15]

sake of brevity. In Figure 8 (a) the intensity feature is more

suitable for detecting the visual attention region compared

1

n

to hue, while in Figure 8 (b), it is vice versa. The automatic

error = cos−1 (bTi b̃i ) (15)

n i=1 feature selection capability is similar to the method of using

convex hull of salient points in a contrast map [17]. How-

Figure 9 plots the mean error calculated over all the 500 ever, instead of using heuristic knowledge as in that case, we

trials as a function of the number of outliers. We can see rely on the computational measure of cumulative projection

when there are no outliers, the estimation errors correspond- to choose the feature which is more robust.

ing to NN-GPCA and GPCA are the same. As the number

of outliers increases, NN-GPCA outperforms GPCA . Experiment 3

The third experiment compares the attention detection re-

Experiment 2 sults of the proposed method and the widely used visual

The second experiment is designed to evaluate the auto- attention model of [10]. Figure 10 (a) shows the saliency

matic feature selection capability of the proposed method. maps generated from [10] and (b) are the bounding boxes

The largest CP on the subspaces formed from each feature indicating visual attention regions extracted by segmenting

is the cue to select the optimal feature for attention region the saliency map. Here, the saliency map only indicates the

extraction. In the examples shown in Figure 8, we use the foveats of attention. We used methods of segmenting the

722

saliency map as described in [2, 3], but the results were un- More VAR detection results are shown in Figure 12 in

satisfactory. Figure 10 (c) and (d) show, respectively, the which the ﬁrst row contains the original images, the second

candidate visual attention regions and the smallest bound- row shows the candidate attention regions (white regions

ing boxes includes these regions extracted using the pro- indicate the region corresponding to the subspace with the

posed method. Notice that the proposed method detected largest CP ) and the last row shows the VAR represented

visual attention region correctly. by the smallest bounding box that includes all the elements

belonging to the selected subspace. While these results are

encouraging, we do realize that the choice of features is im-

portant in that they should reﬂect the homogeneity of the

region. Even if the subspaces are correctly estimated, the

VAR extraction could fail since there would be no corre-

spondence between the subspace and a region. Choice of a

suitable feature may not be trivial in some images shown

for example in Figure 13. We show intensity, hue and green

(a) (b) (c) (d) channel feature map of two images as well as their corre-

sponding transformation graphs. In these cases, neither in-

Figure 10: Comparison of proposed subspace tensity nor color can indicate interesting region (object) as a

method with [7] (a) saliency map; (b) VAR from (a); subspace such that the attention region cannot be detected

(c) attentive region using our proposed method; (d) whatever the performance of subspace estimation.

VAR region from (c)

Experiment 4 30 25

20

15

10

0 0

to detect regions only within the prescribed scale. The pro- 30

0 10 20 30

25

0 10 20 30

15

15

10

considering both inter-region contrast and intra-region ho- 10

5 5

0 10 20 30

0

0 10 20 30

30 20

25

15

20

15 10

10

5

5

0 0

0 10 20 30 0 10 20 30

of original images and (b, d) their transformation

graphs

In this paper, we propose to solve image attentive re-

(i)

gion detection problem using linear subspace estimation and

analysis techniques. Through a simple polar transformation,

each image feature value is transformed to a data point de-

noted by (θ, r) in a 2-D space. The value of θ ensures that all

pixels within a homogeneous region are mapped to a linear

(ii) subspace and r encodes the location of a region within the

image. We propose a new linear subspace estimation method

Figure 11: VAR extraction across diﬀerent scales: called the NN-GPCA method in which the distribution of K

(left) small, (center) medium and (right) large using nearest neighbors is used to weight the data points accord-

(i) our proposed method and (ii) method of [11] ing to their probability of belonging to a big cluster in some

subspace corresponding a image region. A weighted least

square technique is used to estimate the subspaces. A new

extracted from the images which show ﬂowers at diﬀerent attention measure for image regions is then proposed that

scales. We compare our method to that proposed in [11] in deﬁnes the cumulative projection of all data onto the nor-

Figure 11 (ii) and observe that the results are the same for mal of the corresponding subspace. The subspace with the

the small scale but as the scale increases, the performance largest CP is extracted to detect attentive region. Several

of [11] degrades rapidly. experiments are performed to evaluate the performance of

723

Figure 12: More examples of visual attention region detection using the proposed method

the proposed method. The experimental results are promis- international conference on Multimedia, pages

ing as they show that the proposed method can detect VARs 340–343, New York, NY, USA, 2004. ACM Press.

correctly. Future work will involve (i) choosing optimal fea- [9] B. Suh, H. Ling, B. B. Bederson, and D. W. Jacobs.

tures to be transformed into the subspace estimation do- Automatic thumbnail cropping and its eﬀectiveness.

main (ii) selecting a suitable transformation for the weights In Proceedings of the 16th annual ACM symposium on

as the data is embedded from low dimensional space into User interface software and technology, pages 95–104,

the Veronese map and (iii) developing an algorithm for es- New York, NY, USA, Novemember 2003. ACM Press.

timating the optimal features (in the multiple feature case) [10] L. Itti, C. Koch, and E. Niebur. A model of

simultaneously rather than as a sequential process. saliency-based visual attention for rapid scene

analysis. IEEE Transactions on Pattern Analysis and

7. REFERENCES Machine Intelligence, 20(11):1254–1259, November

[1] U. Rutishauser, D. Walther, C. Koch, and P. Perona. 1998.

Is bottom-up attention useful for object recognition? [11] Y.-F. Ma and H.-J. Zhang. Contrast-based image

In Proceedings of the 2004 IEEE Computer Society attention analysis by using fuzzy growing. In

Conference on Computer Vision and Pattern Proceedings of the eleventh ACM international

Recognition, volume 2, pages 37–44, Washington, DC, conference on Multimedia, pages 374–381, New York,

USA, July 2004. NY, USA, Novemember 2003. ACM Press.

[2] U. Rutishauser, D. Walther, C. Koch, and P. Perona. [12] L. Itti and C. Koch. A comparison of feature

On the usefulness of attention for object recognition. combination strategies for saliency-based visual

In 2nd International Workshop on Attention and attention systems. In Proceedings of SPIE Human

Performance in Computational Vision 2004, pages Vision and Electronic Imaging IV (HVEI’99), volume

96–103, Prague, Czech Republic, May 2004. 3644, pages 473–482, San Jose, CA, January 1999.

[3] D. Walther, U. Rutishauser, C. Koch, and P. Perona. [13] A. P. Bradley and F. W. Stentiford. Visual attention

Selective visual attention enables learning and for region of interest coding in jpeg2000. Journal of

recognition of multiple objects in cluttered scenes. Visual Communication and Image Representation,

Computer Vision and Image Understanding, pages 14(3):232–250, September 2003.

745–770, to be published 2005. [14] R. Vidal, Y. Ma, and S. Sastry. Generalized principal

[4] A. Bamidele, F. W. Stentiford, and J. Morphett. An component analysis (gpca). In Proceedings of the 2003

attention-based approach to content based image IEEE Computer Society Conference on Computer

retrieval. British Telecommunications Advanced Vision and Pattern Recognition, volume 1, pages

Research Technology Journal on Intelligent Spaces 621–628, Madison, Wisconsin, USA, June 2003.

(Pervasive Computing), 22(3), July 2004. [15] R. Vidal. Generalized Principal Component Analysis

[5] X.-J. Wang, W.-Y. Ma, and X. Li. Data-driven (GPCA): an Algebraic Geometric Approach to

approach for bridging the cognitive gap in image Subspace Clustering and Motion Segmentation. PhD

retrieval. In Proceedings of the 2004 IEEE thesis, School of Electrical Engineering and Computer

International Conference on Multimedia and Expo, Sciences, University of California at Berkeley, August

volume 3, pages 2231–2234, Taibei, Taiwan, June 2003.

2004. [16] Q. Ke and T. Kanade. Robust subspace clustering by

[6] H. Liu, X. Xie, W.-Y. Ma, and H.-J. Zhang. combined use of knnd metric and svd algorithm. In

Automatic browsing of large pictures on mobile Proceedings of the 2004 IEEE Computer Society

devices. In Proceedings of the eleventh ACM Conference on Computer Vision and Pattern

international conference on Multimedia, pages Recognition, volume 2, pages 592–599, Washington,

148–155, Berkeley, CA, USA, 2003. ACM Press. DC, USA, July 2004.

[7] L. Chen, X. Xie, X. Fan, W.-Y. Ma, H.-J. Zhang, and [17] Y. Hu, X. Xie, W.-Y. Ma, L.-T. Chia, and D. Rajan.

H. Zhou. A visual attention model for adapting Salient region detection using weighted feature maps

images on small displays. ACM Multimedia Systems based on the human visual attention model. In

Journal, 9(4):353–364, November 2003. Proceedings of the Fifth IEEE Pacific-Rim Conference

[8] Y. Hu, L.-T. Chia, and D. Rajan. Region-of-interest on Multimedia, volume 2, pages 993–1000, Tokyo

based image resolution adaptation for mpeg-21 digital Waterfront City, Japan, November 2004.

item. In Proceedings of the 12th annual ACM

724