Sei sulla pagina 1di 12

This article appeared in a journal published by Elsevier.

The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elseviers archiving and manuscript policies are encouraged to visit: http://www.elsevier.com/copyright

Author's personal copy


Pattern Recognition 45 (2012) 31143124

Contents lists available at SciVerse ScienceDirect

Pattern Recognition
journal homepage: www.elsevier.com/locate/pr

A saliency map based on sampling an image into random rectangular regions of interest
Tadmeri Narayan Vikram a,b,n, Marko Tscherepanow a, Britta Wrede a,b
a b

Applied Informatics Group, Bielefeld University, Bielefeld 33615, Germany Research Institute for Cognition and Robotics (CoR Lab), Bielefeld University, Bielefeld 33615, Germany

a r t i c l e i n f o
Available online 22 February 2012 Keywords: Visual attention Saliency map Salient region detection Eye-gaze prediction Bottom-up attention

a b s t r a c t
In this article we propose a novel approach to compute an image saliency map based on computing local saliencies over random rectangular regions of interest. Unlike many of the existing methods, the proposed approach does not require any training bases, operates on the image at the original scale and has only a single parameter which requires tuning. It has been tested on the two distinct tasks of salient region detection (using MSRA dataset) and eye gaze prediction (using York University and MIT datasets). The proposed method achieves state-of-the-art performance on the eye gaze prediction task as compared with nine other state-of-the-art methods. & 2012 Elsevier Ltd. All rights reserved.

1. Introduction Visual attention and its associated cognitive neurobiology has been a subject of intense research over the last ve decades. The attention mechanism in humans helps them in focusing their limited cognitive resources to the context relevant stimuli while suppressing the ones which are not important. Having an articial system which could simulate this attention mechanism could positively impact upon the way in which various cognitive and interactive systems are designed. In view of this, the computer science community has invested considerable time and efforts to realize a computational model of visual attention which at least partially exhibits the characteristics of a human visual attention system. Several computational approaches to visual attention have been proposed in the literature since the seminal work of Koch and Ullman in 1985 [1]. The same authors proposed the concept of a saliency map, which constitutes the surrogate representation of the visual attention span over an input image for a free viewing task. Saliency maps are now employed for several applications in robotics, computer vision and data transmission. Some of the successful applications are object detection [2,3], object recognition [4], image summarization [5], image segmentation [6,7], image compression [8] and attention guidance for humanrobot interaction [9,10]. Most of the existing visual attention approaches process pixels

or image windows sequentially, which is in contrast to the way the human visual system operates. It has been shown in the literature that the receptive elds inside the retina operate randomly at both positional and scale spaces while processing the visual stimuli [11]. Motivated by this, we propose a simple and efcient method which is based on random sampling of the image into rectangular regions of interest and computing local saliency values. The proposed method is shown to have comparable performance with existing methods for the task of salient region detection. Furthermore, the proposed method also achieves state-of-the-performance for the task of predicting eye-gaze. The method has been tested on MSRA [12], York University [13], MIT [14] datasets and benchmarked along with nine of the other existing state-of-theart methods to corroborate its performance. The paper is organized as follows. We provide a review of the existing computational approaches to visual attention with a brief description of their strengths and shortcomings in Section 2. The motivation leading to the current work is described in Section 3. The proposed approach is described by means of a pseudo-code and a graphical illustration in Section 4. In Section 5, we describe the experiments carried out to validate the performance of the proposed method for the tasks of eye-gaze prediction and salient region detection. Finally, we conclude this work by highlighting its current shortcomings with a brief discussion about the future directions of the current work in Section 6.

n Corresponding author at: Applied Informatics Group, Bielefeld University, Bielefeld 33615, Germany. Tel.: 49 521 106 12229. E-mail addresses: nvikram@techfak.uni-bielefeld.de, tnvikram@gmail.com (T.N. Vikram), marko@techfak.uni-bielefeld.de (M. Tscherepanow), bwrede@techfak.uni-bielefeld.de (B. Wrede).

2. Literature review The theoretical framework for the computation of saliency maps was rst proposed in [1] and later realized in [15].

0031-3203/$ - see front matter & 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.patcog.2012.02.009

Author's personal copy


T.N. Vikram et al. / Pattern Recognition 45 (2012) 31143124 3115

Subsequently it has led to the development of several other saliency approaches based on different mathematical and computational paradigms. The existing approaches can be classied into seven distinct categories based on the computational scheme they employ.

 Hierarchical approaches: They perform a multi-scale image  


processing and aggregate the inputs across different scales to compute the nal saliency map. Spectral approaches: They operate by decomposing the input image into Fourier or Gabor spectrum channels and obtain the saliency maps by selecting the prominent spectral coefcients. Power law based approaches: These approaches compute saliency maps by removing redundant patterns based on their frequency of occurrence. Rarely occurring patterns are considered salient while frequently occurring patterns are labeled redundant. Image contrast approaches: The mean pixel intensity value of the entire image or of a specied sub-window is utilized to compute the contrast of each pixel in the image. The contrast is analogously treated as the image saliency. Entropy-based approaches: The mutual information between patterns is employed to optimize the entropy value, where a larger entropy value indicates that a given pattern is salient. Center-surround approaches: These approaches compute the saliency of a pixel by contrasting the image features within a window centered on it. Hybrid approaches: Models of this paradigm employ a classier in combination with one or more approaches to compute saliency.

carried out to compute the master saliency map. Two dimensional Gabor wavelets of Fourier spectrum have also been utilized for computing saliency maps in [20] and the nal saliency map is reweighted using a center-bias matrix. Spectral residual approaches proposed in [21,22] are another class of approaches which highlight the saliency by removing redundancy from the Fourier spectrum. The saliency map is thus constructed from a residual frequency spectrum obtained by the difference between an image frequency spectrum and a mean frequency spectrum. Though straight forward, the method is compounded due to the fact that the pattern of mean frequency spectrum is subjective. These methods are found to be successful in detecting proto-objects in images. As it can be seen from the illustrations in [8], Fourierbased methods are affected by the number of coefcients selected for image reconstruction and the scale at which the input image is processed. Like subspace analysis, the method results in the loss of information during image reconstruction and is compromised by illumination, noise and other image artifacts. 2.3. Power law based approaches Power laws are basically used to model human visual perception. Several approaches to saliency which employ power laws have thus been proposed. A Zipfs law based saliency map has been proposed in [23]. This work is similar to those of [21,22], except that it operates directly on the pixels rather than on the spectral space. An integrated Weibulls distribution based saliency map has been proposed in [24], and its performance on the eye-gaze data is shown to be better than other power law based approaches. The method employs maximum likelihood estimates (MLE) to set the parameters of various distributions. Despite its theoretic appeal, the Weibulls distribution based saliency map has a large parameter set and the heuristic to x and optimize them constitutes a major drawback. 2.4. Image contrast approaches

  

We briey explore the existing saliency approaches based on the aforementioned categories in the sub-sections to follow. Interested readers are pointed to [16,17] for a more detailed and exhaustive review on saliency approaches. 2.1. Hierarchical approaches The most popular approach in this category is the one proposed by Itti et al. [15]. This approach computes 41 different feature maps for a given input image based on color, texture, gradient and orientation information. The resulting feature maps are fused into a single map using a winner takes all (WTA) network and an inhibition of return mechanism (IOR). This approach has served as a classical benchmark system. Despite being biologically inspired and efcient, its performance is constrained by the fusion of multiple maps and the requirement to process multiple features. The fusion method employed to compute the master saliency map from the various feature maps plays a vital role in its accuracy. Arriving at a generalized rule of fusion for various maps is complicated and requires intensive crossvalidation to ne tune the fusion parameters. In general, hierarchical methods tend to ignore visually signicant patterns which are locally occurring as they are primarily driven by global statistics of an image [15]. 2.2. Spectral approaches Several approaches to compute a saliency map which operate at the spectral domain have thus been proposed in the literature. A Gabor wavelet analysis and multi-scale image processing based approach was proposed in [18]. Fourier transform has also been extensively utilized to generate saliency maps as seen from [8,19]. In the approach of [8], the input image is decomposed into four different channels based on opponent colors and intensity. A quaternion Fourier analysis of the decomposed channels is

A simple but efcient approach based on computing the absolute difference of a pixel and the image mean was proposed in [25]. Similar approaches proposed by the same authors in [26,7], employ maximal symmetric regions and novel dissimilarity metrics respectively to compute the nal saliency map. Another symmetry based saliency approach was proposed in [27], which employs color, orientation and contrast symmetry features to generate the nal saliency map. It also reasonably simulates eye-gaze for free viewing task in case of natural sceneries with no specic salient objects in it. A distance transform based approach was proposed in [28], which computes an edge map for each gray scale threshold and fuses them to generate a nal saliency map. These methods are successfully applied for proto-object detection and out-beats many state-of-theart methods without having many of their drawbacks. Poor global contrast of an image affects the performance of these methods and local-statistics based approaches for saliency computation have been proposed in the literature to address this problem. 2.5. Entropy-based approaches The popular approach of [29] is based on local contrasts and maximizes the mutual information between features by employing independent component analysis (ICA) bases. A set of ICA bases is pre-computed using a patch size of 7 7 pixels. Subsequently it is used to compute the conditional and joint distribution of features for information maximization. Experiments conducted on the York University eye-gaze dataset [13] have proven its efciency. But this method is constrained by its emphasis on edges and neglects salient regions [30]. It also adds a spurious border effect to the resultant

Author's personal copy


3116 T.N. Vikram et al. / Pattern Recognition 45 (2012) 31143124

image and requires re-scaling of the original image to a lower scale in order to make the computational process more tractable. Another ICA based approach was proposed in [31] where image selfinformation is utilized to estimate the probability of a target at each pixel position. It is further fused with top-down features derived from ICA bases to build the nal saliency map. The method proposed in [32] employs sparse bases to extract sub-band features from an image. The mutual information between the sub-band features is calculated by realizing a random-walk on them and initializing the site entropy rate as the weight of the edges in the graph. An extension of this paradigm can been seen in the recent approach proposed in [33], where entropy of a center versus a surround region is computed as the saliency value of a pixel. Other entropy-based approaches like [34,35] employ incremental coding length to compute the nal saliency map. These methods which rely on information theoretic approaches are in general constrained by the requirements of training bases, the patch size parameters and the size of the training bases. 2.6. Center-surround approaches A saliency map based on discriminant center-surround entropy contrast which does not require training bases was proposed in [36]. It correlates well with human eye-gaze data but is constrained by the subjectivity involved in the computation of weights for fusing the different maps to compute the master saliency map. A recent center-surround contrast method [37], identies pre-attentive segments and computes the mutual saliency between them. An innovative application of this saliency map has been found useful for pedestrian detection. A multi-scale center-surround saliency map was proposed in the work of [38], where the saliency of a pixel is determined by the dissimilarity of the center to its surround over multiple scales. Local analysis of gradients along with center-surround paradigm is advocated as an alternative to multi-scale image analysis as it can be seen in [39,40]. In [39], local steering kernels are employed to compute the saliency of a pixel by contrasting the gradient covariance of a surrounding region. The method is shown to be robust to noise and drastic illumination changes, but it is computationally expensive as a set of compound features needs to be computed for each pixel in the image. In order to achieve a tractable runtime, the image is down-scaled. The approach proposed in [40] uses color and edge orientation information to compute the saliency map, using a framework similar to the one presented in [39]. Selecting an appropriate window size for center and surround patches plays an important role in obtaining a higher quality saliency map. 2.7. Hybrid approaches To overcome the issue of patch size parameters, many machine learning based saliency approaches have been proposed. A graph-based visual saliency approach is proposed in [41] which implements a Markovian representation of feature maps and utilizes a pyscho-visual contrast measure to compute the dissimilarities between features. A saliency map based on modeling eyemovements using support vector machines (SVMs) is proposed in [42]. However, this approach requires enormous amounts of data to learn saliency weights reasonably. Another successful approach based on kernel density estimation (KDE) is proposed in [43]. The authors recommend to construct a set of KDE approaches based on region segmentation using the mean shift algorithm. Based on its color distinctiveness and spatial distribution, the color saliency and spatial saliency of each KDE approach are evaluated. The method is found to be successful in salient object detection, but has unwieldy run-time as it relies on image segmentation.

A salient object selection mechanism based on an oscillatory correlation approach was proposed in [44]. The method outputs an object saliency map, which resembles image segmentation. The method requires ne tuning of neural network weights for an inhibitory mechanism which is heuristic and hence cannot be used for real-time visual attention. A Hebbain neural network based saliency mechanism which simulates lateral surround inhibition is proposed in [45]. Unlike [44], the method in [45] is computationally efcient as it employs the pulsed cosine transform to produce the nal saliency map. A color saliency boosting function is introduced in [46] which is obtained from an isosalient image surface. The nal saliency map is based on the statistics of color image derivatives and employs an information theoretic approach to boost the color information content of an image. An extension of this work can be found in [47]. It performs reasonably well on both eye-gaze xation and salient object detection datasets. In [48], the probability distribution of colors and orientations of an image is computed and either of these features are selected to compute the nal saliency map. This method avoids multi-scale image analysis but fails when the image has a low color or gradient contrast. A few hybrid approaches attempt to combine the positive aspects of multi-scale analysis, sub-band decomposition, color boosting and as well as center-surround paradigms. Fusion of the center-surround hypothesis and oriented sub-band decomposition to develop a saliency map has been proposed in [49]. A radically different approach which tries to detect salient regions by estimating the probability of detecting an object in a given sliding window is proposed in [50]. They employ the concept of super pixel straddling, coupled with edge density histograms, color contrasts and the saliency map of [15]. A linear classier is trained on an image dataset to build a bag-of-features to arrive at a prior for an object in an image. The method is theoretically very attractive, but is subjected to high variations in the performance as too many features, maps and parameters are involved which require ne tuning.

3. Motivation As it can observed from our previous review, many of the existing computational approaches to visual attention are constrained by one or more of the following issues like : requirement of training bases, large set of tunable parameters, integration of complex classiers, downscaling of the original image in order to obtain a tractable computational run-time, high, complexity associated with programming, and not being biologically motivated. Most of the existing approaches to compute saliency maps process input image pixels sequentially with xed sliding windows. But salient regions or objects can occur at arbitrary positions, shapes and scales in an image. Research in vision sciences has shown that the human visual system has its receptive elds scattered randomly and does not process input stimuli in a sequential manner [11,51]. Furthermore, it has been shown in [52] that each visual stimulus is biased by every other stimulus present in the attention space. In view of the aforementioned discussion, we propose a random center-surround method which operates by computing local saliencies over random regions of an image. This captures local contrasts unlike the global methods for computing saliency maps. Furthermore, it does not require any training priors and has only a single parameter which needs tuning. The proposed method also avoids competition between multiple saliency maps which is in contrast to the approaches proposed by [15,41].

Author's personal copy


T.N. Vikram et al. / Pattern Recognition 45 (2012) 31143124 3117

4. The proposed saliency map We consider a scenario where the input I is a color image of dimension r c 3, where r and c are the number of rows and columns respectively. The input image I is initially subjected to a Gaussian lter in order to remove noise and abrupt onsets. It is n further converted into the CIE1976 Ln an b 1 space and decomposed n n n into the respective L , a and b channels each of dimension r c. n The Ln an b space is preferred over the other color spaces because of its similarity to the human psycho-visual space [26] and is also recommended as a standard to compute saliency maps in [30,25,26,7]. The saliency maps due to Ln, an and bn channels are referred to as SL, Sa and Sb respectively and their dimensions are also equal to r c. Furthermore, n random sub-windows are generated over each of the Ln, an and bn channels. The upper left and the lower right coordinates of the ith random sub-window is denoted by x1i ,y1i and x2i ,y2i respectively. The saliency value at a particular coordinate position in a given channel is dened as the sum of the absolute differences of the pixel intensity value to the mean intensity values of the random sub-windows in which it is contained. The nal saliency map S is also of dimension r c. Computing the nal saliency value for a given position as the n Euclidean norm of saliencies over different channels in the Ln an b space is recommended as an ideal fusion rule in [25,26,30]. A discrete uniform probability distribution function is used to generate the random sub-window co-ordinates, as this helps in placing windows without having any bias towards a specic window size or spatial region of the image. This property is important because salient regions or objects in an image can occur at arbitrary positions and scales. The resulting saliency map S is normalized in the interval [0,255] and subsequently subjected to median ltering. The median lter is chosen because of its ability to preserve edges despite eliminating noise. To further enhance the contrast of S, we subject it to histogram equalization as recommended in [36,53]. This image enhancement is in consonance with the fact that the human visual system enhances the perceptual contrast of the salient stimulus in a visual scene [54,55]. Note that our approach does not downscale the input image to a lower resolution like other approaches [41,15,39]. In addition our method does not require prior training bases in contrast to [31,13]. Also n is the only parameter which requires tuning, the details of which are presented in the next section. The algorithm for the proposed approach is as follows. Algorithm: Random center-surround saliency. Input: Output: Method Step 1 : Step 2 : Step 3 : Step 4 : (1) IMG (RGB Image) of size r c 3 (2) nNumber of random windows SSaliency map of size r c Apply Gaussian lter on IMG n Convert input IMG to Ln an b space Generate random window co-ordinates x1 ,y1 ,x2 ,y2 Generate_Windows n,r,c Generate saliencies for each component SL CompSal L ,n,x1 ,y1 ,x2 ,y2 Sa CompSal a ,n,x1 ,y1 ,x2 ,y2 Step 5 : Sb CompSal b ,n,x1 ,y1 ,x2 ,y2 Compute S by pixel-wise Euclidean norm

Step 6 : Step 7 : Step 8 :

Apply median lter on S Normalize S in [0,255] Apply histogram equalization to S

Algorithm: CompSal Input : (1) I (Gray scale Image) of size r c (2) nNumber of random windows (3) Co-ordinate vectors x1 ,y1 ,x2 ,y2 each of size n ISMInterim saliency map of size r c Set all elements of ISM to 0 Update ISM for i 1 to n Areai x2i x1i 1 y2i y1i 1 Sumi 0 for j x1i to x2i for k y1i to y2i Sumi Sumi ISM j,k end-k end-j Meani
Sumi Areai

Output : Method Step 1 : Step 2 :

for j x1i to x2i for k y1i to y2i ISM j,k ISMj,k 9Ij,k Meani 9 end-k end-j end-i Algorithm : Generate_Windows Input : (1) nNumber of random windows (2) rNumber of rows (3) cNumber of columns x1 ,y1 ,x2 ,y2 each of size n Generate random window co-ordinates Set all elements of x1 ,y1 ,x2 ,y2 to 0 for i1 to n x1i Random number in 1,r1 y1i Random number in 1,c1 x2i Random number in x1i 1,r y2i Random number in y1i 1,c end-i

Output : Method Step 1 :

Algorithm: Fusion Input : Output : Method Step 1 : (1) Matrices A, B and C of size r c (1) Fused matrix FM of size r c Apply pixel-wise Euclidean norm Set all elements of FM to 0 for i 1 to r for j 1 to c p FM i,j Ai,j Ai,j Bi,j Bi,j C i,j C i,j end-j end-i

An illustration of the above paradigm is given in Fig. 1. 5. Experimental results

S Fusion SL ,Sa ,Sb

1 The original CIE documenthttp://www.electropedia.org/iev/iev.nsf/display? openform&ievref=845-03-56.

Several experiments were conducted to validate the performance of the proposed saliency map for two distinct tasks of

Author's personal copy


3118 T.N. Vikram et al. / Pattern Recognition 45 (2012) 31143124

Fig. 1. An illustration of the proposed method. The input image is subjected to Gaussian lter in the rst stage. Subsequently it is converted into the Ln an b space and the individual Ln, an and bn channels are obtained. For the sake of simplicity we have considered three random regions of interest (ROI) on the respective Ln, an and bn channels. Local saliencies are computed over each of these ROIs and the channel specic saliency maps (SL, Sa and Sb) are updated. The nal saliency map is then computed by fusing the channel specic saliency maps by a pixel-wise Euclidean norm.

eye-gaze prediction and salient region detection in free viewing conditions. Salient region detection and eye-gaze prediction are the two most signicant applications of saliency maps. Salient region detection is relevant in the context of computer vision tasks like object detection, object localization and object tracking in videos [25]. Automatic prediction of eye-gaze is important in the context of image aesthetics, image quality assessment, humanrobot interaction and other tasks which involve detecting image regions that are semantically interesting [14]. The contemporary saliency maps are either employed to detect salient regions as in the case of [30,25,28], or are used to predict eye-gaze patterns as in [31,39]. Only few of the existing saliency approaches like [41,15] have consistent performance on both of these tasks. Although these two tasks appear similar, there are subtle differences between them. Salient regions of an image are those which are visually interesting. But human eye-gaze which focuses mainly on salient regions are also distracted by semantically relevant regions [56]. The performance on the eye-gaze prediction task was validated on two different datasets of York University [13] and MIT [14]. Subsequent experiments to corroborate the performance on salient object detection task were conducted on the popular MSRA dataset [12]. The following parameter settings were used as a standard for all the experiments carried out. A rotational symmetric Gaussian low pass lter (size 3 3 with s 0:5; the default Matlab conguration) was used as a pre-processor on the images for noise removal as recommended in [25,26]. The number of distinct random sub-windows n was set to 0:02 r c, as it led to a more stable performance. Details about ne tuning n is given in Section 5.1. A median lter of size 11 11 was employed to smooth the resultant saliency map before being enhanced by histogram equalization. The employed Gaussian lter, median lter and histogram equalization are based on the usual straight forward methods. All experiments were conducted using Matlab v7.10.0 (R2010a) on an Intel Core 2 Duo processor with Ubuntu 10.04.1 LTS (Lucid Lynx) as operating system. The inbulit srgb2lab Matlab

routine was used to convert the input image from RGB colorspace n to Ln an b colorspace. This results in Ln, an and bn images whose pixel intensity values are normalized in the range of [0, 255]. We selected nine state-of-the-art methods of computing saliency maps to compare and contrast the proposed method. The methods are global contrast2 [25], entropy3 [29], graphical approach4 [41], multi-scale5 [15], local steering kernel6 [39], distance transform7 [28], self-Information8 [31], weighted contrast9 [30] and symmetric contrast10 [26] based image saliency approaches. The experiments conducted to corroborate the performance of the proposed method for eye-gaze correlation in a free viewing task is described in the following sub-section. 5.1. Experiments on eye-gaze prediction task In order to empirically evaluate the performance on eye-gaze correlation task, we have employed the receiver operating characteristic (ROC)area under the curve (AUC) as a benchmarking metric. Several popular and recent works like [41,42,49,57,20,30] employ the ROCAUC metric to evaluate eye-gaze xation correlation. An ROC graph is a general technique for visualizing, ranking and selecting classiers based on their performance [58]. The ROC graphs are two-dimensional graphs in which the true positive rate (TPR) is plotted on the Y axis and the false positive rate (FPR) rate is plotted on the X axis. The TPR (also
2 http://ivrg.ep.ch/supplementary_material/RK_CVPR09/SourceCode/Salien cy_CVPR2009.m 3 http://www-sop.inria.fr/members/Neil.Bruce/AIM.zip 4 http://www.klab.caltech.edu/ $ harel/share/gbvs.zip 5 http://www.klab.caltech.edu/ $ harel/share/simpsal.zip 6 http://users.soe.ucsc.edu/ $ rokaf/download.php 7 http://users.cs.cf.ac.uk/Paul.Rosin/resources/salience/salience.zip 8 http://cseweb.ucsd.edu/ $ l6zhang/code/imagesaliency.zip 9 Available from our previous work 10 http://ivrg.ep.ch/supplementary_material/RK_ICIP2010/code/Saliency_MSSS_ ICIP2010.m

Author's personal copy


T.N. Vikram et al. / Pattern Recognition 45 (2012) 31143124 3119

called hit rate and recall) and FPR (also called false alarm rate) metrics are computed as in [30,32]: TPR tp tp fn fp fp tn 1

FPR

where tp is the number of true positives, tn is the number of true negatives, fp is the number of false positives and fn is the number of false negatives. An ROC graph depicts relative trade-offs between benets (TPR) and costs (FPR). Since the AUC is a portion of the area of the unit square, its value will always be between 0 and 1.0. An ideal classier would give an AUC of 1.0 while random guessing produces an AUC of less than 0.5 [59]. The saliency map and the corresponding ground truth xation density map are binarized at each discrete threshold in [0, 255]. This results in a predicted binary mask (from the saliency map) and a ground truth binary mask (from the xation density map) for each binarizing threshold. The TPR and FPR for each threshold are subsequently computed. The ROC curve is generated by plotting the obtained FPRs versus TPRs and the AUC is calculated. In our case, the AUC indicates how well the saliency map predicts actual human eye xations. This measurement also has the desired characteristic of transformation invariance, in that the ROCAUC does not change when applying any monotonically increasing function (such as logarithm) to the saliency measure [31]. We have considered two popular datasets of York University [13] and MIT [14] for the eye-gaze prediction task. The York University [13] dataset contains eye xation records from 20 subjects for a total of 120 images of size 681 511 pixels. This dataset does not consist of images with human faces, animals or images of activities like sports, music concerts etc. which have semantic meanings attributed to them. The MIT dataset [14] consists of eye xation records from 15 subjects for 1003 images of size 1024 768 pixels. Unlike the York University dataset [13], the [14] consists of images which may contain faces, expressions, sporting activities which attract attention and are semantically relevant. The variation in image dimensions, the objects consisted, semantics, topology of arrangement in objects between these two considered eye-gaze datasets offers a test-bed to evaluate the robustness of a saliency map. Also note that the age group of the viewers, the eye-gaze recording equipments, and viewing conditions were not identical while creating these two eye-gaze datasets. The performance in terms of ROCAUC11 is measured and the resulting plots for the York University dataset [13] and the MIT dataset [14] are shown in Figs. 2 and 3 respectively. It can be observed from Fig. 2, that the proposed method has a similar ROC plot as compared to graph-based [41] saliency, and clearly outperforms all the other methods considered. The same holds true in case of its performance vis-a-vis the MIT dataset [14] as it can be seen in Fig. 3. To quantitatively evaluate the performance we computed the ROCAUC which is shown in Table 1. Note that the proposed method has the highest ROCAUC performance on the York dataset [13] and attains a performance equivalent to the graph-based [41] method on the MIT dataset [14] as it can be observed in Table 1. This is despite the fact that the proposed method is simple and is devoid of the sophistication associated with the graph-based [41], multi-scale [15] or self-information [31] based saliency methods. For the MIT dataset [14] we have not been able to obtain the ROC plots and area under the curve for the self-information [31] and random center-surround
11 Source code used for ROCAUC computation in this articlehttp://mark. goadrich.com/programs/AUC/auc.jar.

Fig. 2. ROC performance plots for the York University [13] dataset. Note that the proposed method has similar performance like the graph-based [41] saliency approach and outperforms all the other methods in consideration. Observe that symmetric contrast [26] and global contrast [25] based methods have near linear slopes for their ROC-plots, which indicates their ineffectiveness for the task of eye-gaze prediction.

Fig. 3. ROC performance plots for the MIT [14] dataset. The proposed method attains a similar performance like the graph-based [41] saliency approach while outperforming the others. Note that symmetric contrast [26] and global contrast [25] based methods have a near linear slope, which was similar to their performance on the York dataset [13] as shown in Fig. 2. Observe that the performance of the proposed method and the graph-based [41] method does not suffer despite the change in eye xation dataset.

Table 1 The performance of the methods under consideration in terms of ROCAUC on the York University [13] and MIT [14] datasets. It can be observed that the proposed method has state-of-the-art performance on both of the eye xation datasets. Saliency map Global contrast [25] Symmetric contrast [26] Graph-based [41] Multi-scale [15] Entropy-based [29] Self information [31] Local steering kernel [39] Weighted contrast [30] Distance transform [28] Proposed York University [13] 0.54 0.64 0.84 0.81 0.83 0.67 0.75 0.75 0.77 0.85 MIT [14] 0.53 0.63 0.81 0.76 0.77 NA 0.72 NA 0.75 0.81

approaches [30], since the original source codes of these methods do not work on images as large as 1024 768 pixels. We preferred not to rescale the input images as this might lead to a bias during the evaluation.

Author's personal copy


3120 T.N. Vikram et al. / Pattern Recognition 45 (2012) 31143124

The experimental parameter n is correlated with the scale of the image where the standard setting is 0:02 r c. We therefore evaluated the performance of the proposed method on the York dataset [13] by varying n and recorded the ROCAUC values. It can be observed from Fig. 4 that for 10 random samplings, the proposed method already achieves an ROCAUC of 0.76 and saturates to an ROCAUC of 0.85 for 60 random samplings and above. We varied n (where n 10, 60, 110, 260 and standard conguration) and obtained ROC plots on the York dataset [13] which are given in Fig. 5. We deliberately included minimal number of ROC plots, as all the ROC plots looked very similar when n 460. Despite this, we recommend to use the standard conguration for n. As the size of the image increases so does the

Fig. 6. ROCAUC performance of the proposed method for different trails on the York University [13] dataset. Observe that the ROCAUC values do not change signicantly during 10 different trails. The mean ROCAUC is 0.8554 and the standard deviation (s) is 9:3743e04. The low standard deviation indicates the stability of the proposed method.

Fig. 4. Variations in ROCAUC of proposed method due to change in n on York University [13] dataset. Observe that the proposed method has an ROCAUC value of 0.76 for n 10 which is better than the optimal performance of the local steering kernel [39] (ROCAUC 0.75) and self information [31] (ROCAUC0.67) based approaches as seen from Table 1. The method saturates in terms of ROC AUC for a small value of n, and hence rigorous testing to x this parameter could be avoided. The lack of oscillation in ROCAUC values versus n shows that the proposed method is stable despite changes in the value of n.

possibility that it consists of more objects. In this context, sampling the image in tandem with its size helps in computing a better saliency map. Our method employs random sampling which means that the selected windows at each trail may be different. Hence we decided to study the stability of the proposed method in terms of ROCAUC for the standard conguration of n on the York University [13] dataset. The results shown in Fig. 6 reveal that the variations in ROCAUC values over 10 different trails are not signicant, hence adding to the stability of the proposed method. 5.2. Experiments on salient region detection task The MSRA dataset [12] consists of 5000 images annotated by nine users. The annotators were asked to enclose what they thought was the most salient part of the image with a rectangle. Fig. 7 shows an example of this labeling. Naturally occurring objects do not necessarily have a regular contour which can be enclosed accurately inside a rectangle. As it can be seen from Fig. 7, unnecessary background is enclosed by this kind of annotation. It has been recently shown in [25,26,60] that a more precise-to-contour ground truth leads to a more accurate evaluation. Motivated by this a precise-to-contour ground truth was released in [25], for a subset of 1000 images from the MSRA dataset [12]. Thus we consider this subset of 1000 images which have accurate ground truth annotations for our experiments. In general, the MSRA dataset [12] is signicantly different from the eye-gaze datasets in terms of test protocol, content and image size. Fundamentally, the eye movements were not recorded and annotators were required to enclose the most visually interesting region of the image. Such an annotation task involves high-level cognitive mechanisms and is not stimulus driven. Thus there is a weak association between the exact segmentation masks and the saliency of the image. Despite involving a high-level visual task, the MSRA dataset [12] is still relevant to test whether there is an association between the predicted saliency map and the ground truth masks of this dataset. Previous studies [61,57] have shown that the positions of the principal maxima in a saliency map are signicantly correlated to the positions of areas that people would choose to put a label indicating a region of interest. In order to quantitatively evaluate the performance for the task of detecting salient regions, we followed the method

Fig. 5. Variations in ROC plots of the proposed method due to change in n on the York University [13] dataset. Observe that for n10, the proposed method obtains comparable ROC plot to local steering kernel [39] and distance transform [28] as seen from Fig. 2. With n 60 the performance of the proposed method starts saturating and this helps in processing large images without compromising on run-time.

Author's personal copy


T.N. Vikram et al. / Pattern Recognition 45 (2012) 31143124 3121

Fig. 7. Rectangular and exact segmentation masks. To the left is the sample image from the MSRA dataset [12], the image in the center shows the original rectangular annotation and at right is the accurate-to-countour annotation. Observe the reduction in background information between rectangular and exact annotations.

Table 2 The recall,precision and F-measure performance of saliency maps from the various methods under consideration. Observe that the proposed method has an equivalent performance to the graph-based [41] method. Saliency map Global contrast [25] Symmetric contrast [26] Graph-based [41] Multi-scale [15] Entropy-based [29] Self information [31] Local steering kernel [39] Weighted contrast [30] Distance transform [28] Proposed Recall 0.49 0.40 0.73 0.67 0.98 0.55 0.48 0.84 0.85 0.71 Precision 0.63 0.66 0.62 0.56 0.29 0.44 0.50 0.66 0.33 0.63 Fmeasure 0.51 0.47 0.64 0.58 0.43 0.46 0.46 0.71 0.45 0.64

Fig. 8. Recallprecision plot on the MSRA [12] dataset. Observe that the proposed method has a superior recallprecision plot as compared to all the methods except for the graph-based [41] and weighted contrast [30] based approaches.

recommended in [25,28,26,30], where the saliency map is binarized and compared with the ground truth mask. The saliency map is thresholded within [0, 255] to obtain a binary mask and is further compared with the ground truth. The thresholds are varied from 0 to 255 and recallprecision metrics are computed at each binarizing threshold. The recallprecision metric is found to be more suitable to evaluate the salient region detection performance than the ROCAUC metric [25,30]. The recall (also called TPR or hit-rate) and precision metrics are computed as in [30]: recall tp tp fn tp tp fp 3

image specic statistics. Thus we employ Tsais moment preserving algorithm12 [62] which has also been recommended in [28] to select the map specic binarization threshold. We further evaluate the accuracy of the obtained binary masks to the ground truth masks by employing F-measure as suggested in [49,25,48,26,28,30]. The formula for F-measure is given in Eq. (5). F-measure 2 precision recall precision recall 5

precision

The resulting recall versus precision curve is shown in Fig. 8. This curve provides a reliable comparison of how well various saliency maps highlight salient regions in images. It can be observed from Fig. 8, that the proposed method has comparable performance with the graph-based [41] method and outperforms all the other approaches in consideration with an exception of the weighted contrast [30] based method. However, it could be noted from our previous experiments on eye xation datasets that the weighted contrast [30] based method fails to work when the size of the input is large and has far below performance in terms of ROCAUC as compared to the graph-based [41] approach and the proposed method. Despite being simple, the proposed method fares well in both the tasks of eye-gaze correlation as well as salient region detection and has as an equivalent performance to the graph-based [41] approach. A common threshold to binarize different saliency maps might not be suitable, as the binarization threshold depends on the

It can be observed from Table 2 that the proposed method and the graph-based method have an identical F-measure performance of 0.64. The weighted contrast-based [30] saliency attains the highest F-measure of 0.71, but does not perform well in the eye-gaze correlation task. The entropy-based [29] method has the lowest F-measure performance of 0.43 despite achieving a good performance on the eye-gaze task. Apart from the proposed method and graph-based [41] saliency, only the multi-scale approach [15] with a F-measure of 0.58 is seen to perform equally well on eye-gaze detection task. We demonstrate the performance of the various approaches considered on the task of salient region detection by considering three sample images from the MSRA [12] dataset as shown in Fig. 9. For the purpose of illustration we overlaid the original image with ground truth masks and the other masks obtained after binarizing the respective saliency maps by Tsais moment preserving algorithm [62]. It can be observed from Fig. 9, that the entropy-based [29] saliency approach fails to effectively localize the salient object in the image. In case of the image consisting of the yellow ower, the entire image is shown as salient; while the salient region of the glowing bulb is treated as irrelevant. Despite being efcient the global contrast [25] and symmetric contrast [26] based methods ignore the ner details of the salient region, as both the methods operate on mean pixel intensity value of the image. Conversely, the multi-scale [15] and local steering kernel [39] based saliencies highlight the minute details but suppress
12

http://www.cs.tut./ $ ant/histthresh/HistThresh.zip

Author's personal copy


3122 T.N. Vikram et al. / Pattern Recognition 45 (2012) 31143124

Fig. 9. Qualitative analysis of results for MSRA [12] dataset. The rst column contains the original image. The remaining columns contain the images overlaid with thresholded (using Tsais algorithm [62]) saliency maps obtained from the respective methods mentioned in the gure header. Observe that the proposed method highlights the salient regions along with its ner details effectively, which most of the other methods in consideration fails to achieve.

salient regions as they have a bias towards strong gradients. The self information [31] based approach highlights rare segments of the image as salient, and consequently it discards regions of uniform appearance which can be observed from column nine of Fig. 9. Only the weighted contrast [30] based method has a near perfect performance on detecting the salient regions. But the previous experiments on eye-gaze prediction have shown that the weighted contrast [30] based approach is not highly effective. The proposed method highlights the salient regions along with the ner details effectively which shows that it is not biased towards edges like the other methods. 5.3. Computational run time We evaluated the run-time of the proposed saliency approach with reference to the other methods in consideration. The runtime of the various methods were benchmarked on three different scales of a color image as shown in Table 3. The original plugins of global contrast [25], symmetric contrast [26], weighted contrast [30], entropy-based [29] and self information-based [31] saliency approaches are pure Matlab codes. While the codes pertaining to local steering kernel [39], multi-scale [15] and graph-based methods [41] are quasi Matlab codes which call C functions for run-time optimization. The original plugin for distance

Table 3 The computational run-time(in s) of various saliency methods under consideration. Run-times were computed using Matlab v7.10.0 (R2010a), on an Intel Core 2 Duo processor with Ubuntu 10.04.1 LTS (Lucid Lynx) as operating system. Saliency map Code type Run-time(in s) w.r.t. image size 205 103 308 195 410 259 Global contrast [25] Symmetric contrast [26] Graph-based [41] Multi-scale [15] Entropy-based [29] Self information [31] Local steering kernel [39] Weighted contrast [30] Distance transform [28] Proposed Matlab Matlab Matlab with C Matlab with C Matlab Matlab Matlab Matlab Binary Matlab 0.0652 0.0976 0.6256 0.4388 2.1448 1.6466 3.0590 0.2701 0.1266 0.3430 0.0898 0.1172 0.4788 0.3820 5.0761 4.0714 3.1187 0.5939 0.2806 0.5422 0.1539 0.2050 0.5577 0.3661 10.3915 7.6242 3.1133 1.2038 0.5308 1.0766

transform [28] is a binary executable while its original coding language is unknown. The proposed method is programmed in Matlab. An absolute comparison on the basis of run-time might penalize the methods which are coded in Matlab script as they are relatively slower than their C counterparts. Nevertheless, it gives a relative overview of run-time performance of the all methods under consideration.

Author's personal copy


T.N. Vikram et al. / Pattern Recognition 45 (2012) 31143124 3123

It can be observed from Table 3 that the run-time of the local steering kernel [39], the graph-based [41] and the multi-scalebased [15] saliency approaches do not change signicantly irrespective of the size of the input image. This is on account that the input image is rescaled to pre-specied dimension mentioned in their source codes. The rest of the methods process the input image in its original scale and hence the run-time changes with the input image dimension.

6. Discussion and conclusion We have compared and contrasted nine existing state-of-the-art approaches to compute a saliency map along with the proposed method for the two different tasks of eye-gaze correlation and salient region detection. Experiments were carried out on large publicly available datasets of York University [13], MIT [14] for eyegaze correlation and the MSRA dataset [12] for salient region detection. Our results have shown that the proposed method attains state-of-the-art performance on the task of eye-gaze correlation and also has a comparable performance with the other methods for the task of salient region detection. Most of the existing approaches to compute a saliency map fail to perform equally well on both of the aforementioned tasks. But our method along with the graph-based [41] approach has a reliable performance on both these diverse tasks. It should be observed that the proposed method achieves this performance despite being simple as compared to graph-based [41], local steering kernel-based [39] and other existing sophisticated methods. It has also been shown in the experiments that a very small number of random rectangles (n60) is adequate to attain good results. Despite this we recommend to tune the sampling rate in direct proportion to the size of the image, as generating large number of arbitrary sized image windows often leads to a higher probability of completely enclosing a salient region or an object. In spite of its simplicity and elegance the proposed method fails to capture saliency when color contrast is either low or not being the principal inuence. In future, we plan to address this bottleneck and further improve our method to capture higher level semantics like detecting pop-outs in terms of orientation, shape and proximities. Exploiting the ability of the proposed method to detect a salient region as a pre-processor for object detection is also envisaged.

Acknowledgments Tadmeri Narayan Vikram would like to thank his colleagues Miranda Grahl, Torben Toniges, Christian Schnier and Subhashree Vikram who proof read this article at various stages. This work has been supported by the RobotDoC Marie Curie Initial Training Network funded by the European Commission under the 7th Framework Programme (Contract no. 235065). References
[1] C. Koch, S. Ullman, Shifts in selective visual attention: towards the underlying neural circuitry, Human Neurobiology 4 (1985) 219227. [2] F. Moosmann, D. Larlus, F. Jurie, Learning saliency maps for object categorization, in: Proceedings of the ECCV International Workshop on the Representation and Use of Prior Knowledge in Vision, 2006. [3] P. Khuwuthyakorn, A. Robles-Kelly, J. Zhou, Object of interest detection by saliency learning, in: Proceedings of the European Conference on Computer Vision, 2010, pp. 636649. [4] J. Bonaiuto, L. Itti, Combining attention and recognition for rapid scene analysis, in: Proceedings of the IEEE Conference on Computer Vision and Pattern RecognitionWorkshops, 2005, pp. 9097. [5] L. Shi, J. Wang, L. Xu, H. Lu, C. Xu, Context saliency based image summarization, in: Proceedings of the IEEE International Conference on Multimedia and Expo, 2009, pp. 270273.

[6] M. Donoser, M. Urschler, M. Hirzer, H. Bischof, Saliency driven total variation segmentation, in: Proceedings of the IEEE International Conference on Computer Vision, 2009, pp. 817824. [7] R. Achanta, F. Estrada, P. Wils, S. Susstrunk, Salient region detection and segmentation, in: Proceedings of the International Conference on Computer Vision Systems, 2008, pp. 6675. [8] C. Guo, L. Zhang, A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression, IEEE Transactions on Image Processing 19 (2010) 185198. [9] A. Borji, M.N. Ahmadabadi, B.N. Araabi, M. Hamidi, Online learning of taskdriven object-based visual attention control, Image and Vision Computing 28 (2010) 11301145. [10] Y. Nagai, From bottom-up visual attention to robot action learning, in: Proceedings of the IEEE International Conference on Development and Learning, 2009, pp. 16. [11] C.L. Colby, M.E. Goldberg, Space and attention in parietal cortex. in: Proceedings of the Annual Review of Neuroscience, vol. 22, 1999, pp. 319349. [12] T. Liu, J. Sun, N.-N. Zheng, X. Tang, H.-Y. Shum, Learning to detect a salient object, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 18. [13] N.D.B. Bruce, Features that draw visual attention: an information theoretic perspective, Neurocomputing 65-66 (2005) 125133. [14] T. Judd, K. Ehinger, F. Durand, A. Torralba, Learning to predict where humans look, in: Proceeding of the IEEE International Conference on Computer Vision, 2009, pp. 21062113. [15] L. Itti, C. Koch, E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence 20 (1998) 12541259. [16] M. Begum, F. Karray, Visual attention for robotic cognition: a survey, IEEE Transactions on Autonomous Mental Development 3 (2011) 92105. [17] S. Frintrop, R. Erich, H.I. Christensen, Computational visual attention systems and their cognitive foundations: a survey, ACM Transactions on Applied Perception 7 (2010). 6:16:39. [18] D.B. Walther, C. Koch, Attention in hierarchical models of object recognition, Progress in Brain Research 165 (2007) 5778. [19] C. Guo, Q. Ma, L. Zhang, Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2008, pp. 18. [20] M. Wang, J. Li, T. Huang, Y. Tian, L. Duan, G. Jia, Saliency detection based on 2d log-gabor wavelets and center bias, in: Proceedings of the International Conference on Multimedia, 2010, pp. 979982. [21] X. Hou, L. Zhang, Saliency detection: a spectral residual approach, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 18. [22] X. Cui, Q. Liu, D. Metaxas, Temporal spectral residual: fast motion saliency detection, in: Proceedings of the ACM International Conference on Multimedia, 2009, pp. 617620. [23] Y. Caron, P. Makris, N. Vincent, Use of power law models in detecting region of interest, Pattern Recognition 40 (2007) 25212529. [24] V. Yanulevskaya, J.M. Geusebroek, Signicance of the Weibull distribution and its sub-models in natural image statistics, in: Proceedings of the International Conference on Computer Vision Theory and Applications, 2009, pp. 355362. [25] R. Achanta, S. Hemami, F. Estrada, S. Susstrunk, Frequency-tuned salient region detection, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 15971604. [26] R. Achanta, S. Susstrunk, Saliency detection using maximum symmetric surround, in: Proceedings of the IEEE International Conference on Image Processing, 2010, pp. 26532656. [27] G. Kootstra, B. de Boer, L. Schomaker, Predicting eye xations on complex visual stimuli using local symmetry, Cognitive Computation 3 (2011) 223240. [28] P.L. Rosin, A simple method for detecting salient regions, Pattern Recognition 42 (2009) 23632371. [29] N.D.B. Bruce, J.K. Tsotsos, Saliency based on information maximization, Advances in Neural Information Processing Systems (2005) 155162. [30] T.N. Vikram, M. Tscherepanow, B. Wrede, A random center surround bottom up visual attention model useful for salient region detection, in: Proceedings of the IEEE Workshop on Applications of Computer Vision, 2011, pp. 166173. [31] L. Zhang, M.H. Tong, T.K. Marks, H. Shan, G.W. Cottrell, Sun: a Bayesian framework for saliency using natural statistics, Journal of Vision 8 (2008) 120. [32] W. Wang, Y. Wang, Q. Huang, W. Gao, Measuring visual saliency by site entropy rate, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 23682375. [33] Y. Lin, B. Fang, Y. Tang, A computational model for saliency maps by using local entropy, in: Proceedings of the AAAI Conference on Articial Intelligence, 2010, pp. 967973. [34] X. Hou, L. Zhang, Dynamic visual attention: searching for coding length increments, Advances in Neural Information Processing Systems (2008) 681688. [35] Y. Li, Y. Zhou, L. Xu, X. Yang, J. Yang, Incremental sparse saliency detection, in: Proceeding of the IEEE International Conference on Image Processing, 2009, pp. 30933096.

Author's personal copy


3124 T.N. Vikram et al. / Pattern Recognition 45 (2012) 31143124

[36] D. Gao, V. Mahadevan, N. Vasconcelos, The discriminant center-surround hypothesis for bottom-up saliency, Advances in Neural Information Processing Systems (2008) 497504. [37] T. Avraham, M. Lindenbaum, Esaliency (extended saliency): meaningful attention using stochastic image modeling, IEEE Transactions on Pattern Analysis and Machine Intelligence 32 (2010) 693708. [38] R. Huang, N. Sang, L. Liu, Q. Tang, Saliency based on multi-scale ratio of dissimilarity, in: Proceedings of the International Conference on Pattern Recognition, 2010, pp. 1316. [39] H.J. Seo, P. Milanfar, Static and space-time visual saliency detection by selfresemblance, Journal of Vision 9 (12) (2009) 127. 15. [40] W. Kim, C. Jung, C. Kim, Saliency detection: a self-ordinal resemblance approach, in: Proceedings of the IEEE International Conference on Multimedia and Expo, 2010, pp. 12601265. [41] J. Harel, C. Koch, P. Perona, Graph-based visual saliency, Advances in Neural Information Processing Systems (2007) 545552. [42] W. Kienzle, F.A. Wichmann, B. Scholkopf, M.O. Franz, A nonparametric approach to bottom-up visual saliency, Advances in Neural Information Processing Systems (2007) 689696. [43] Z. Liu, Y. Xue, L. Shen, Z. Zhang, Nonparametric saliency detection using kernel density estimation, in: Proceedings of the IEEE International Conference on Image Processing, 2010, pp. 253256. [44] Y. Yu, B. Wang, L. Zhang, Hebbian-based neural networks for bottom-up visual attention systems, in: Proceedings of the International Conference on Neural Information Processing, 2009, pp. 19. [45] M.G. Quiles, D. Wang, L. Zhao, R.A.F. Romero, D.-S. Huang, Selecting salient objects in real scenes: an oscillatory correlation model, Neural Networks 24 (2011) 5464. [46] R. Valenti, N. Sebe, T. Gevers, Image saliency by isocentric curvedness and color, in: Proceedings of IEEE International Conference on Computer Vision, 2009, pp. 21852192. [47] G. Cao, F. Cheikh, Salient region detection with opponent color boosting, in: Proceedings of the European Workshop on Visual Information Processing, 2010, pp. 1318. [48] V. Gopalakrishnan, Y. Hu, D. Rajan, Salient region detection by modeling distributions of color and orientation, IEEE Transactions on Multimedia 11 (2009) 892905.

[49] O. Le Meur, P. Le Callet, D. Barba, D. Thoreau, A coherent computational approach to model bottom-up visual attention, IEEE Transactions on Pattern Analysis and Machine Intelligence 28 (2006) 802817. [50] B. Alexe, T. Deselaers, V. Ferrari, What is an object? in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2010, pp. 7380. [51] A.J. Ahumada, Learning receptor positions, in: Computational Models of Visual Processing, MIT Press, 1991, pp. 2334. [52] R.M. Nosofsky, Stimulus bias, asymmetric similarity, and classication, Cognitive Psychology 23 (1991) 94140. [53] M.H. Wilder, M.C. Mozer, C.D. Wickens, An integrative, experience-based theory of attentional control, Journal of Vision 11 (2) (2011) 130. [54] J.A. Taosheng Liu, M. Carrasco, Voluntary attention enhances contrast appearance, Psychological Science 20 (3) (2009) 354362. [55] S. Treue, Perceptual enhancement of contrast by attention, Trends in Cognitive Sciences 8 (10) (2004) 435437. [56] A.L. Rothenstein, J.K. Tsotsos, Attention links sensing to recognition, Image and Vision Computing 26 (2008) 114126. [57] O. Le Meur, J.-C. Chevet, Relevance of a feed-forward model of visual attention for goal-oriented and free-viewing tasks, IEEE Transactions on Image Processing 19 (2010) 28012813. [58] P. Sharma, F.A. Cheikh, J.Y. Hardeberg, Face saliency in various human visual saliency models, in: Proceedings of International Symposium on Image and Signal Processing and Analysis, 2009, pp. 327332. [59] J. Davis, M. Goadrich, The relationship between precisionrecall and ROC curves, in: Proceedings of International Conference on Machine Learning, 2006, pp. 233240. [60] Z. Wang, B. Li, A two-stage approach to saliency detection in images, in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, 2008, pp. 965968. [61] L. Elazary, L. Itti, Interesting objects are visually salient, Journal of Vision 8 (2008) 115. [62] W.H. Tsai, Moment-preserving thresholding: a new approach, Computer Vision, Graphics, and Image Processing 29 (1985) 377393.

Tadmeri Narayan Vikram has received Masters of Computer Applications from University of Mysore, India, in 2006. He worked as a research engineer at GREYC, University of Caen, France between 2008 and 2010. He is awarded the prestigious Marie Curie fellowship since 2010, to pursue Ph.D. at the Applied Informatics Group in Bielefeld University, Germany. His research interests include computational modeling of visual attention and its applications in developmental robotics.

Marko Tscherepanow received his diploma degree in computer science from Ilmenau Technical University, Germany, in 2003. Afterwards, he joined the Applied Informatics Group at Bielefeld University, Germany. In December 2007, he nished his Ph.D. thesis entitled Image Analysis Methods for Location Proteomics. His current research interests include incremental on-line learning, neural networks, evolutionary optimization, feature selection, and the application of such methods in cognitive robotics.

Britta Wrede received her masters degree in computational linguistics and the Ph.D. degree (Dr.-Ing.) in computer science from Bielefeld University in 1999 and 2002 respectively. From 1999 to 2002 she pursued a Ph.D. program at Bielefeld University in joint afliation with the Applied Informatics Group and the graduate program taskoriented communication. After her Ph.D. she received a DAAD PostDoc fellowship at the speech group of the International Computer Science Institute (ICSI) at Berkeley, USA. Currently she heads the Applied Informatics Group at Bielefeld University.

Potrebbero piacerti anche