Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Abstract—In this paper, we provide an insight of visual saliency the image intensity channel and adopt sign value after Dis-
modeling from the perspective of simulating visual information crete Cosine Transformation as visual saliency, respectively.
manipulation process in human vision system. For humans, visual A problem is that the heuristic combination of the results
stimuli are converted into spike trains by retina, and the spike
signals are then transferred to brain areas for further analysis.
from various features would fail when multiple features give
Therefore, we propose to estimate image saliency based on a different saliency maps. Consequently, some works [3], [18],
retina neural network, which is built with realistic morphology [22], [23], [32] extract features as many as possible and learn
and electrophysics data. Then, we analyze the correlation between the optimal fusion strategy from training images. For example,
the spike trains generated by retina and image saliency. The Judd et al. [18] train a linear model with the fusion weights
experimental results show the effectiveness of retinal spike trains
in image saliency analysis.
of predefined low-level (e.g., features used in [16], [28] etc.),
Index Terms—Visual Saliency, Retina Simulation, Neural Spike mid-level (e.g., the horizon line), high-level (e.g., the face
Trains and the person) features as well as the center prior. Bruce et
al. [5] present a deep learning model for visual saliency predic-
tion based on fully convolutional networks. Compared to the
I. I NTRODUCTION
approaches combining features heuristically, the data-driven
Visual attention is a filter that choose the most important approaches often achieve much better prediction performances.
information from a large amount of visual stimuli in a scene However, features would contribute differently for saliency
so that only a small subsets can get further analysis. By estimation in different scenes.
detecting the most important visual subsets, called salient For human beings, visual stimuli is processed into spike
targets, in images or videos, the performance of computer trains by retina and transferred to LGN, V1 and high-level
applications [1], [25], [26], [29], [30] could be improved. brain areas [4]. Along the visual pathway, visual subsets com-
To detect salient subsets (pixels, macroblocks or regions), pete each other in the process of neural excitations, inhibitions
a common solution is representing each subset with various and modulations of different brain regions. In this manner,
perception features and measuring its saliency as the rarity the visual subsets that win the competition could become
of features. The most frequently used features are color oppo- salient [24]. We consider an novel solution for visual saliency
nencies, orientations, luminance and semantic detection results estimation by simulating the process of visual information ma-
etc.. For example, itti et al. [17] propose to estimate saliency nipulation. Specifically, subsequent brain areas process visual
by fusing multi-scale local center-surround contrasts from information based on the output of retina, which converts light
multiple perception features. In [12], saliency is calculated as into spike trains. Thus, producing the spike train of retina is
the global rarity, which is derived from the random walking critical for solving visual saliency estimation problem.
process on a fully-connected graph. This graph consists nodes Inspired by this idea, we propose to analyze image saliency
of image patches that are connected with edges weighted by based on neural spike trains generated by a retina neural
the mutual similarities of multiple perception features. Bruce et networks. The retina neural network is built with realistic
al. [6] represent image patches by projecting the RGB data of data, and it can reproduce electrophysiological characteristics
patches onto the learned independent components and compute of retina cells. Then, the relationship between image saliency
saliency as self-information. Cerf et al. [7] incorporate face and the spike trains generated by the retina neural network is
detection results with the bottom-up saliency map of [12] to analyzed. The experimental results show the effectiveness of
achieve a better performance in saliency prediction. Beyond retinal spike trains in image saliency estimation.
these spatial saliency models, some approaches [11], [14], The rest of this paper is organized as follows: the details of
[20], [21] estimate visual saliency in the transform domain. retina simulation would be introduced in the first section II. In
For example, [15] and [14] extract spectral residuals over section III, we will describe how to analysis image saliency
via retina simulations. Experimental results are presented in
∗Kai Du, Yonghong Tian and Tiejun Huang are corresponding authors. section IV. And this paper would be concluded in the last
143
Fig. 2. The built neural circuit at retina fovea and the neural network. The left part shows the structure of fovea neural circuit, along with spike trains
produced by each type of cells. Based on the neural circuit, we construct a neural network, as shown in the right part, by repeating the same circuit along
two dimensions. In this manner, each neural circuit can process one pixel with the same spatial location.
144
[12] J. Harel, C. Koch, and P. Perona. Graph-based visual saliency. In
Advances in Neural Information Processing Systems (NIPS), pages 545–
552, 2007.
[13] A. L. Hodgkin and A. F. Huxley. A quantitative description of membrane
current and its application to conduction and excitation in nerve. The
Journal of physiology, 117(4):500, 1952.
[14] X. Hou, J. Harel, and C. Koch. Image signature: Highlighting sparse
salient regions. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 34(1):194–201, 2012.
[15] X. Hou and L. Zhang. Saliency detection: A spectral residual approach.
In IEEE Conference on Computer Vision and Pattern Recognition
(CVPR), pages 1–8, 2007.
[16] L. Itti and C. Koch. A saliency-based search mechanism for overt and
covert shifts of visual attention. Vision Research, 40(10–12):1489–1506,
2000.
[17] L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual
Fig. 5. The distribution of metric scores of the STSs that perform the best attention for rapid scene analysis. IEEE Transactions on Pattern Analysis
for each image in Toronto. and Machine Intelligence, 20(11):1254–1259, 1998.
[18] T. Judd, K. Ehinger, F. Durand, and A. Torralba. Learning to predict
where humans look. In IEEE International Conference on Computer
Vision (ICCV), pages 2106–2113, 2009.
V. C ONCLUSIONS [19] H. Kolb and D. Marshak. The midget pathways of the primate retina.
Documenta Ophthalmologica, 106(1):67–81, 2003.
In this paper, we propose a new perspective to estimate [20] J. Li, L.-Y. Duan, X. Chen, T. Huang, and Y. Tian. Finding the secret of
image saliency based on the spike trains generated by a image saliency in the frequency domain. IEEE transactions on pattern
detailed retina neural network. This neural network is built analysis and machine intelligence, 37(12):2428–2440, 2015.
[21] J. Li, M. Levine, X. An, X. Xu, and H. He. Visual saliency based on
with realistic data of retina, and it can produce neural spike scale-space analysis in the frequency domain. IEEE Transactions on
trains that close to retina. The experimental results show Pattern Analysis and Machine Intelligence, 35(4):996–1010, 2013.
[22] J. Li, Y. Tian, T. Huang, and W. Gao. Cost-sensitive rank learning from
that the spike trains not only are designed for convenient positive and unlabeled data for visual saliency estimation. IEEE Signal
signal transportation, but also can separate salient targets and Processing Letters, 17(6):591–594, 2010.
backgrounds effectively. [23] J. Li, Y. Tian, T. Huang, and W. Gao. Multi-task rank learning for
visual saliency estimation. IEEE Transactions on Circuits and Systems
for Video Technology, 21(5):623–636, 2011.
ACKNOWLEDGMENT [24] Z. Li. A saliency map in primary visual cortex. Trends in cognitive
sciences, 6(1):9–16, 2002.
This work is partially supported by the National Basic [25] Z. Li, S. Qin, and L. Itti. Visual attention guided bit allocation in video
Research Program of China under grant 2015CB351806, the compression. Image Vision Computing, 29(1):1–14, Jan. 2011.
National Natural Science Foundation of China under contract [26] Z. Ma, L. Qing, J. Miao, and X. Chen. Advertisement evaluation
using visual saliency based on foveated image. In IEEE International
No. 61390515,and No. 61425025, and Beijing Municipal Conference on Multimedia and Expo (ICME), pages 914–917, 2009.
Commission of Science and Technology under contract No. [27] I. C. Mann. The development of the human eye. 1928.
[28] A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic
Z151100000915070,. representation of the spatial envelope. International Journal of Computer
Vision, (3):145–175, 2001.
REFERENCES [29] S. Wei, D. Xu, X. Li, and Y. Zhao. Joint optimization toward
effective and efficient image search. IEEE Transactions on Cybernetics,
[1] S. Avidan and A. Shamir. Seam carving for content-aware image 43(6):2216–2227, 2013.
resizing. In ACM SIGGRAPH, 2007, New York, NY, USA, 2007. ACM. [30] S. Wei, Y. Zhao, C. Zhu, C. Xu, and Z. Zhu. Frame fusion for video
[2] S. Barnes and B. Hille. Ionic channels of the inner segment of tiger copy detection. IEEE Transactions on Circuits and Systems for Video
salamander cone photoreceptors. The Journal of General Physiology, Technology, 21(1):15–28, 2011.
94(4):719–743, 1989. [31] J. Zhang and S. Sclaroff. Saliency detection: A boolean map approach.
[3] A. Borji. Boosting bottom-up and top-down visual features for saliency In IEEE International Conference on Computer Vision (ICCV), pages
estimation. In IEEE Conference on Computer Vision and Pattern 153–160, 2013.
Recognition (CVPR), pages 438–445, 2012. [32] Q. Zhao and C. Koch. Learning visual saliency by combining feature
[4] G. S. Brindley. Physiology of the retina and the visual pathway. 1960. maps in a nonlinear manner using adaboost. Journal of Vision, 12(6):22,
[5] N. D. Bruce, C. Catton, and S. Janjic. A deeper look at saliency: Feature 1–15, 2012.
contrast, semantics, and beyond. In IEEE Conference on Computer
Vision and Pattern Recognition (CVPR), pages 516–524, 2016.
[6] N. D. Bruce and J. K. Tsotsos. Saliency based on information
maximization. In Advances in Neural Information Processing Systems
(NIPS), pages 155–162, Vancouver, BC, Canada, 2005.
[7] M. Cerf, J. Harel, W. Einhauser, and C. Koch. Predicting human gaze
using low-level saliency combined with face detection. In Advances in
Neural Information Processing Systems (NIPS), Vancouver, BC, Canada,
2009.
[8] S. M. Chase and E. D. Young. First-spike latency information in single
neurons increases when referenced to population onset. Proceedings of
the National Academy of Sciences, 104(12):5175–5180, 2007.
[9] D. Dacey, O. S. Packer, L. Diller, D. Brainard, B. Peterson, and B. Lee.
Center surround receptive field structure of cone bipolar cells in primate
retina. Vision research, 40(14):1801–1811, 2000.
[10] T. Gollisch and M. Meister. Rapid neural coding in the retina with
relative spike latencies. science, 319(5866):1108–1111, 2008.
[11] C. Guo, Q. Ma, and L. Zhang. Spatio-temporal saliency detection using
phase spectrum of quaternion fourier transform. In IEEE Conference on
Computer Vision and Pattern Recognition (CVPR), pages 1–8, 2008.
145