Evolutionary Segment Selection For

ICACSIS 2015
Evolutionary Segment Selection for

Higher-order Conditional Random Fields in
Semantic Image Segmentation
Novian Habibie*l , Vektor Dewanto*2, Jogie Chandra*l, Fariz Ikhwantci1, Harry Budi Santoso1, Wisnu Jatmikol
1 Faculty of Computer Science, Universitas Indonesia; 2Department of Computer Science, Bogor Agricultural University
Email: novian.habibie@ui.ac.id.vektor.dewanto@gmail.com
Abstract-One promising approach for pixel-wise semantic segmentation. To this end, we formulate the segment selection
segmentation is based on higher-order Conditional Random as an optimization problem. We propose three optimization cri
Fields (CRFs). We aim to selectively choose segments for the teria in relation to the selected segments, namely: a) averaged
higher-order CRFs in semantic segmentation. To this end, we goodness, b) coverage area, and c) overlapped area. Essentially,
formulate the selection as an optimization problem. We propose
we desire to have best segments with maximum coverage area
three optimization criteria in relation to the selected segments,
and maximum non-overlapped area. The goodness of segments
namely: a) averaged goodness, b) coverage area and c) non
overlapped area. Essentially, we desire to have best segments
is estimated using the Latent Dirichlet Allocation approach;
with maximum coverage area and maximum non-overlapped more in section IV. We employ two evolutionary optimization
area. We apply two evolutionary optimization algorithms, namely: algorithms to perform segment selection, namely: the genetic
the genetic algorithm (GA) and the particle swarm optimization algorithm (GA) and particle swarm optimization (PSO). Fig. 1
(PSO). The goodness of segments is estimated using the Latent depicts the pipeline of CRF-based semantic segmentation with
Dirichlet Allocation approach. Experiment results show that our proposed segment selection.
semantic segmentation with GA-or-PSO-selected segments yields
competitive semantic segmentation accuracy in comparison to Experiment results show that semantic segmentation with
that of naively using all segments. Moreover, the fewer number GA-or-PSO-selected segments yields competitive semantic
of segments used in semantic segmentation speeds up its compu segmentation accuracy in comparison to that of naively using
tation time up to six times faster. all segments. In particular, we achieve the semantic segmen
tation accuracy of 82.1 30%, 82.412% and 82.490% from
1. INTRODUCTION semantic segmentation using GA-selected, PSO-selected and
Semantic segmentation aims to label every pixel in an all segments, respectively. Moreover, the fewer number of
image with a semantic object-class label from some predefined segments used in semantic segmentation (approximately one
set. In contrast, (standard) segmentation produces segments third of all segments) speeds up its computation time up to six
without object-class labels. One promising approach for pixel times faster. It also is proven that the goodness of segments can
wise semantic segmentation is based on Maximum A Posteriori be well estimated via the Latent Dirichlet Allocation approach.
(MAP) and higher-order Conditional Random Fields (CRFs)
frameworks as in [1], [2], [3]. In particular, higher-order CRFs II. R ELAT ED WORKS
for semantic segmentation rely on segments to compute the
Gould [4] uses segments as nodes on Markov Random
optimal pixel labeling. The fact is that the more the number of
Field models. Segments used on that method based on non
segments used, the longer the computation time of semantic
overlap segmentation. Pantofaru et al. [5] also uses multiple
segmentation. The number of segments, however, does not nec
segmentation for semantic segmentation. They found that size
essarily corresponds to higher semantic segmentation accuracy.
of segments can give a tradeoff to process of finding object's
We observe that there is no perfect segmentation for border. Bigger segments contain more feature and information
all objects in an image. Some produce under-segmentation, of object, but harder to determine object's border. Otherwise,
i.e. a segment contains more than one objects. While some segment with small size can define object's border easily but
yield over-segmentation, i.e one object is spread over several only contains small amount of information. To overcome that,
segments. Additionally, different segment generators gives Pantofaru using Intersection of Region (loR) method to obtain
different segments. This surely affects higher-order CRF-based border of object.
semantic segmention in terms of its accuracy and computation
Kumar and Koller [6] also use segments to do a semantic
time. As a result, we are motivated to perform multiple
segmentation. They said that result of unsupervised segment
segmentations on an image in order to obtain a large number
generator is not very well. They propose a method to gen
of segments. We expect that some of them are near-perfect
erating segments by utilizing its pixel with energy function.
segments, which contain mostly one object-class label.
This method will generates segments as an integer program.
We aim to selectively choose segments from a bag of Morover, this method makes generated segments can adapt to
segments yielded by multiple segmentations. The selected seg energy formulation function from labeling process.
ments are then used for higher-order CRFs in semantic image
Another approach to use segments to utilize semantic
*These authors contributed equally segmentation were done by Kohli et al. [7]. They use segments
249 978-1-5090-0363-1/15/$31.00 2015 IEEE

ICACSIS 2015
Multiple segmentation
r-=-l Bag of Segments
Bag of
Semantic Segmentation
-.

'-------'
Selection
,-------,
Selected Segments Annotation
r------'
Gross
..
.. Evolutionary
...
CRF-based
semantic
. Segment
Selector

segmentation,
Cow
Robustpn

Fig. 1: The pipeline of CRF-based semantic segmentation with our proposed evolutionary segment selection.
as one of potentials used in CRF-based energy function.

Compared to previous mentioned methods, this method can
use multiple overlapping segments to enhance its accuracy.
Because of this approach provided more informations because
of segments variety ,we use this as our baseline. However,
this method use all generated segments instead only use a
good quality segments. Because of this problem, our proposed
method focus on segment selection process.
III. CRF-BASED SEMANTIC SEGMENTATION
The goal of semantic segmentation is to label every pixel

in an image with a semantic object-class label from some
predefined set, i.e. Y E L. One promising approach for pixel
wise semantic image segmentation is based on the Maximum A
Posteriori (MAP) and higher-order Conditional Random Fields
(CRFs) frameworks as in [1], [2], [3]. In particular, a vector
of optimal pixel labels, i.e. y*, for an image is formulated as: Fig. 2: The graphical model of CRFs in a 3-by-3 pixel image, where
Xi is an observed variable, while Yi is a random variable (an object
y* = argmaxyELP(ylx) = argminyELE(y) (1) class label for the corresponding Xi). There are two segments (colored
green and blue) as higher-order cliques.
where P(ylx) is the probability of unobserved pixel labels y
given observations x and E(y) is an energy formulation.
Furthermore,
respectively. While, another segment 82 contains 100% dog
E(y) =
LiEV 'l/Ji(Yi) LiEV,jENi 'l/Ji,j(Yi, Yj)
'
+
v
'
pixels. Then, it can be inferred that a segment 82 is "better"
than 81.
Unary Term Pairwise Term
LCEC 'l/Jc(yc)
To compute the probability that a segment contains some
(2) pixel labels, we draw ideas from the Natural Language Pro
+
-------- cessing (NLP) domain. Essentially, in NLP, Latent Dirichelt
Higher-order Term
Allocation (LDA) [9] is used to discover hidden topic in a
where V is a set of random variables associated with pixel document. For instance, a sentence of "I like to eat bananas
labels, Ni returns a set of neighbour pixels of i, and C is a and carrots" contains
100% topic 1, while "Bunnies and kittens
set of higher-order cliques, i.e. cliques whose size is more are cute" contains
100% topic 2. On the other hand, "This
than two. A clique is essentially a collection of random hamster munching on a piece of carrot" contains 50% topic 1
variables. We remark that this work aims to selectively choose and 50% topic 2.
segments to become member cliques of C. For the derivation
of equation 1 and 2, we refers the readers to [8]. Fig. 2 We use the LDA to measure such segment goodness via
illustrates the graphical model of the aforementioned CRFs, the following analogies.
in which there are two higher-order cliques (segments). Hidden Topics M Object-Classes
Documents M Segments
IV. ESTIMATING SEGMENT GOODNESS Words M (Affine Regions + Sift Descriptor).
We hy pothesize that a good segment should contains one Hence, a "good" segment has a high probability for a hidden
single object-class with high confidence. Let a segment 81 topic. In other words, the goodness level is proportional to
contains 50%, 40%, and 10% of dog, sofa and unknown pixels, maximum probability of a hidden topic given a document.
250 978-1-5090-0363-1/15/$31.00 2015 IEEE

ICACSIS 2015
Similar method has ever been proposed by Russell et al. Iset(i) I returns the number of unique pixels of selected
in [10]. segments
Ilist(i) I returns the number of pixels of selected
Fig. 3 shows the pipeline for predicting segment goodness
segments, pixel duplication may occur
via LDA. In the training phase, firstly, we have to extract
and cluster visual words of a segment. In particular,Harris
Affine regions [11] define those visual words in the form C. Evolutionary algorithms
of SIFT representations [12]. This makes the visual words
1) Genetic Algorithm (GA): is a metaheuristic algorithm
are invariant to scales and viewing angles. Secondly, we
that solves optimization problems with evolution processes
generate a training corpus (a collection of documents). It is
inspired from the nature. GA represents the search space as
then used for constructing the LDA model of training data
chromosomes of an individual. A chromosom contains the
via Gibbs sampling estimation [13]. The LDA model gives us
solution configuration.
the word-topic distributions, p(wordwltopiCt), and the topic
document distributions, p(topictldocumentd). In the testing Chromosomes evolve in every generation and evaluated by
phase, we also performs word extraction up to generating a fitness function to measure its qUality. Evolution process ends
testing corpus. Afterwards, Gibbs sampling inference is carried when fitness value is not changing anymore, which means it
out for previously unseen testing data. The inference outputs reach the convergence state. A chromosome that evolves last
are almost the same as those of the trained LDA model except becomes the solution configuration.
that the information is of the testing data.
In essence, GA have three main processes: crossover,
mutation, and selection. Crossover combines one chromosome
v. EV OLUTIONARY SEGMENT SELECTION
with another chromosome to generate a new configuration.
We formulate segment selection as an optimization prob Mutation changes some part of chromosomes randomly to
lem. Because the number of (to be selected) segments can increase chromosomes' variation. Selection chooses some best
be very large, we argue that stochastic optimization methods chromosomes to use in the next evolution process.
are more suitable than the deteministic ones, with the cost
2) Particle Swarm Optimization (PSO): is an optimization
of merely approximately-optimal solutions. Particularly, we
algorithm inspired by movement a group of animals search
use two evolutionary optimization algorithms, namely: the
ing food by swarming/flocking around [15]. Essentially, PSO
genetic algorithm (GA) and the particle swarm optimization
works by spreading particles in search space to find the most
(PSO) [14]. Fig. 4 shows the pipeline of our proposed evolu
optimum solution. It evaluates solutions using fitness function
tionary segment selection for higher-order CRFs in semantic
and move all particles based on its velocity. The velocity of
segmentation.
a particle is calculated based on a) the distance between the
particle and the best solution it has found (the local best) and
A. The representation of an individual b) the distance between the particle and the best solution of all
We use a binary string representation to represent an particles (the global best). This process repeats until its fitness
individual. One bit of a string represent a segment. If a value converge or it reaches its iteration limit.
segment is selected, then the corresponding bit has a value The PSO algoritm requires of two main functions as
of 1. Otherwise, it is zero-ed. The number of bits in a string is follows.
equal to the number of segments from multiple segmentations
X+l
.
= x + v+l
"
(4)
on an image. Figure 4 (left-part) illustrates the encoding of
this representation. v:+1 = (}vf +arand(O, l)(x;Ct) -x!)+.Brand(O, l)(g;Ct) -x!) (5)
B. The fitness function Equation 4 is for updating set of particle's new position in
search space x+1 using set of particle's current positions x!
For optimal segments for higher-order CRFs,we propose and set of particle's velocity V;+l. Meanwhile, equation 5
three optimization criteria, which are related to the selected is for updating particle's new velocity V;+ l based on its
segments, namely: a) averaged goodness Oag, b) coverage
area Oca, and c) non-overlapped area Ona. In essence, we
previous velocity vi x;(t) and
and best position (local best
desire to have best segments with maximum coverage area and

t
global best g;( ). () is inertial value, a and f3 are constants of
maximum non-overlapped area. particle acceleration for local and global best,and rand(O, 1)
is random function which produce decimal value between 0
As a result, for an individual i (represented in a binary and 1.
string), we formulate the fitness function f as follows.
Kennedy and Eberhart introduced an alternate version of
f(i) = oag (i) + oca (i) + ona (i) (3) PSO which can be used in discrete search space called Binary
()
0 ()
0 ,and Ona ( ) 0 Iset(i)1 PSO (BPSO) [16], in which a particle is represented as binary
WIth Oag 1 = one(i)' Oca 1 = np 1 = list(i)' string contains 0 or 1. Each element of binary string is an
where: occurrence status of a solution element of specific solution
g(i) =
Ei E i maXjEO (P(j Ii = 1)), where 0 is a set (1:exist,0 not exist). Meanwhile, velocity represented as set
of object classes of probability value in decimal. It represents probability each
one(i) returns the number of selected segments in i, element change from 0 to 1 and vice versa. Probability value
i.e. bits that have a value of 1 calculated by threshold function in logistic transformation
np is the number of pixels in an image S(Vi,j) : if(randO S(Vij) Xij = 1 else Xij = 0 where
251 978-1-5090-0363-1/15/$31.00 2015 IEEE

ICACSIS 2015
/ '" Segments with goodness

Training Data: / LDA ModAl: '"
SAgmAnts with gound.
Extract Generate
P(topiclword) '\
truth pixel label.
Visual Words Trailing Corpus
'{' (topiel document}/ p(tl s[1 ])=0.8
" .
Training Phase
p(ll s[2J)=O.6
p(ll s(3))={l2
Inf&r&nce distribution of
Testing Data: Extract
Segments
---t., topic over document
Visual Words
(using Gibbs Sampilogl
p(tl s[n]) :0.7
Fig. 3: The pipeline for predicting segment goodness via LDA.
Mapping into Evolutionary Bag of selected Semantic

A bag of n segments
a binary string Segment Selection segments Segmentation
1>5[01
bs[1]
Population of m individual
bs[21
m[O; I' 1 0 1 '1 1'1
mi'; 10 I Ia I
n 11

ml I I. 101 ..... 101

L-------T-----t:= bslo-1]
Fig. 4: The pipeline of our proposed evolutionary segment selection for higher-order CRFs in semantic segmentation.
SO is sigmoid function S(t)

l+ -t and
= randO
is a random where Co is a dissimilarity rate of current state with previous
function which produce decimal value in range 0 to 1. state xt-1, Cl and C2 are respectively similarit r rate of current
Changing the domain of PSO from continuous to discrete
state with local best x;(t) and global best g;(t . After normal
ization process, those functions become :
search space have several drawbacks which will affect its
performance. Some of them are different definition of velocity P'{Xit+1 - O}
and memoryless state changing [17]. To overcome those prob P{XiHI O} - _
(8)
- - p'{X!+1 = O} + p'{X+1 = 1}
lems, there's an alternative version of BPSO which based on
proportion probability by Chen[17].
This version of BPSO reform equation 4 and 5 using

- 1}
P'{Xit+1 -
P{XiHI - 1} -
- _
(9)
elimination process to substitute parameter of velocity and p'{X!+1 = O} + p'{X+1 = 1}
learning rate with parameters of similarity probability. After
t+l
elimination process, function for generating new state X VI. EXP ERIMENT RESULT S
based on current state xt of particles defined as:
A. Setups
p'{x+1 = o} =rand(O, 1)(1 - xD + eorand(O, 1)X!-1
We use the MSRC-21 dataset that has 591 images with
+ c rand(O, 1)(1 - x;(t) (6) 21 object classes [1]. All images are utilized for experiments
I
+ c2rand(O, 1)(1 g;(t) _
on estimating segment goodness. For evaluating our proposed
segment selection for semantic segmentation, only images that
have accurate ground-truth annotations are used, i.e. 93 images
p'{x!+1 = 1} =rand(O, 1)x + corand(O, 1)(1 - X!-I) from [18]. We utilize a standard PC with Intel i7 3.4 GHz and
(7)
+ cIrand(O, 1)x;(t) + c2rand(O, 1)g;(t) 16 GB RAM that runs the Linux Ubuntu 14.04.
252 978-1-5090-0363-1/15/$31.00 2015 IEEE

ICACSIS 2015
To build a bag of segments from an image, we perform
multiple segmentation, employing Scikit-Image [19]. Con
cretely, on each image, we run several segmentation parameters
as follows.
SLIC [20] (clustering method with modified K
means), with parameters of numbers of segments:
{3,4,6,...,30},
Quickshift [21] (aproximates from mean-shift
kemelization), with parameters of max distance:
{10,1 5,20,...,80}
Felzenswalbs method [22] (a graph based method),
with parameters of scale: {100,226, 3 52,...,2 500}
For estimating segment goodness via LDA, we utilize the Gibss
LDA++ [23] and the extractor of Affine-Region plus Sift
Descriptor [24]. Furthermore, we set LDA hyperparameters
(a) With respect to topic 6 (intrepreted as Flower)
to O! = 0.5 and (3 = 0.5 as in [10], and 2000 visual words via
the Elbow method.
In the GA-based segment selection, we set a fixed number

of individuals in a population per generation to 80, with a
maximum of 100 population generation. We use one-point
crossover with a rate of 90% and inversion mutation with
a rate of 2%. On the other hand, for PSO-based segment
selection, we obtain the optimal hyperparameter values as
follows: Co = 0.5, C1 = 0.5, C2 = 1.0, 5 particles, 100
iteration. To minimize the bias of random numbers, we repeat
the evolutionary segment selection up to several times. We
benefit from the evolutionary-algoritm implementation in the
DEAP library [25] and PyEvolve [26].
Meanwhile, for CRF-based semantic segmentation, we

employ the RobustPn implementation of Ladicky et al. [27],
which relies on alpha-beta expansion for inference. The CRF (b) With respect to topic 12 (intrepreted as Cow)
parameters are the same as in [27]. Besides, unary potentials
are generated using the Textonboost generator of [18]. We Fig. 5: Goodness level of segments from 15 segmentation processes.
follow the evaluation metrics of the VOC competition [28], Yellow lines mark segment borders. Brighter areas indicate higher
namely global-accuracy (GAcc) and class-accuracy (CAcc). goodness levels. The top-left figure is the original image.
They are formulated as folows.
ntp
GAcc = (10)
np TABLE I: The quantitative comparison of semantic segmentation
perfomance among several methods. LEGEND: GAcc and CAcc
where np denotes the number of pixels, while ntp, nfn and denote the global accuracy and the class accuracy, respectively; while
nfp denote the number of true positives, false negatives and Tse/, Tann, Ttot denote the selection time, annotation (labeling) time,
false positives, respectively. and total time, respectively.
GAcc CAcc Tsel Ta Ttot

Selection Method nn
(%) (%) (s) (s) (s)
B. Results Unary CRF 58.612 1.14 1.14
79.846 0
PaITwise CRF 80.076 59.032 0 3.52 3.52
addition to semantic segmentation performance, we also
In RobustP (No Selection) 82.490 62.883 0 443.01 443.01
present results on estimating segment goodness. Robustpn (50-best) 81.513 60.922 O.OO 18.31 18.31
Robustpn (l00-best) 81.761 61.627 O.OO 40.29 40.29
1) Estimating segment goodness: Fig. 5 shows goodness Robustpn (150-best) 81.887 61.511 O.OO 115.45 115.45
Robustpn (LDA+GA) 82.130 61.790 34.87 21.20 56.07
level of segments from multiple segmentation on two images.
RobustP" (LDA+BPSO) 82.412 62.744 25.38 50.89 76.27
As can be seen in subfigure 5(a), most segments that lie
on the Flower area have higher levels of goodness (brighter
color). The same goes to most segments in the Cow area as
in subfigure 5(b). 2) Semantic segmentation: Table I shows the quantitave
comparison among several CRF-based semantic segmentation.
In principle, the segment goodness can be well estimated In comparison with the RobustPn with all segments (no
using the LDA approach. We observe consistent findings that segment selection), the evolutionary segment selection speeds
good segments are those with high probability on some topic. up semantic segmentation up to a factor of six: 56.078 and
Additionally, probability distributions of visual words on an 76.278 for segments selected with GA and PSO, respectively,
object-class are distinguishable among all 21 object classes. compared with 443.018. The main reason for this is obviously
253 978-1-5090-0363-1/15/$31.00 2015 IEEE

ICACSIS 2015
Fig. 6: Qualitative results of semantic segmentation using several methods, namely (from left to right): Pairwise CRFs (the 2nd column),
RobustPn with all segments, RobustPn with GA-selected segments, and RobustPn with PSO-selected segments (the right-most column).
because the reduction on the number of segments that are VII. CONCLUSIONS
involved in the higher-order CRFs. Recall that those selected
segments become the higher order cliques of CRFs. We aim to selectively choose segments for higher-order
Conditional Random Fields in semantic image segmentation.
To this end, we formulate the selection as an optimization
We found that in average, there are 300 segments per problem. We propose three optimization criteria in relation to
image. Interestingly, our proposed evolutionary segment selec the selected segments, namely: a) averaged goodness, b) cov
tion selects approximately one third of those. The GA gives erage area and c) non-overlapped area. We desire to have best
fewer segments than the PSO, but with the cost of longer segments with maximum coverage area and maximum non
selection time. A fewer number of GA-selected segments overlapped area. The goodness of segments is estimated via
results in shorter annotation time, compared with that using the Latent Dirichlet Allocation (LDA). We investigate the per
PSO-selected segments. formance of two evolutionary optimization methods, namely:
the Genetic Algorithm and the Particle Swarm Optimization
for segment selection. Experiment results show that seman
In terms of GAcc and CAcc (equation 10), however, the tic segmentation with GA-or-PSO-selected segments y ields
RobustPn with evolutionary segment selection yields slightly competitive semantic segmentation accuracy in comparison to
worse performance. Particularly, the best performance is ob that of naively using all segments. In particular, we achieve
tained from the RobustPn with all segments, whose GAcc is of the semantic segmentation accuracy of 82.130%, 82.412%
82.490%, while the GAcc's of RobustPn with GA-selected and and 82.490% from semantic segmentation using GA-selected,
PSO-selected are of 82.130% and 82.412%, respectively. We PSO-selected and all segments, respectively. Moreover, the
believe that this is a reasonable trade-off between speed and fewer number of segments used in semantic segmentation
performance, where we gain up to 6 times speed-up with the (approximately one third of all segments) speeds up its com
price of slightly reduced semantic segmentation quality. Fig. 6 putation time up to six times faster. It also is proven that the
depicts several qualitative results of semantic segmentation. goodness of segments can be well estimated via LDA.
254 978-1-5090-0363-1/15/$31.00 2015 IEEE

ICACSIS 2015
REFERENCES [16] J Kennedy and R. Eberhart, "A discrete binary version of the par
:
ticle swarm algorithm," in Systems, Man, and Cybernetics, 1997.
[1] J. Shotton, J. Winn, C. Rother, and A. Criminisi, "Textonboost for Computational Cybernetics and Simulation., 1997 IEEE International
image understanding: Multi-class object recognition and segmentation Coriference on, vol. 5, Oct 1997, pp. 4104-4108 vol.5.
by jointly modeling texture, layout, and context," Int. J. Comput.
Vision, vol. 81, no. 1, pp. 2-23, Jan. 2009. [Online]. Available:
[17] E. Chen, Z. Pan, Y. Sun, and X. Liu, "A binary particle swarm opti
http://dx.doi.org/1O.1007/s11263-007-0109-1 mization based on proportion probability," in Business Intelligence and
Financial Engineering (BIFE), 2010 Third International Conference on,
[2] P. Kohli, L. Ladicky, and P. H. Torr, "Robust higher order potentials Aug 2010, pp. 15-19.
for enforcing label consistency," Int. J. Comput. Vision, vol. 82, no. 3,
pp. 302-324, May 2009. [18] P. Kriihenbiihl and V. Koltun, "Parameter learning and convergent infer
ence for dense random fields," in Proceedings of the 30th International
[3] P. Kriihenbiihl and V. Koltun, "Efficient inference in fully connected
Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16-21
crfs with gaussian edge potentials," in Advances in Neural Iriformation
June 2013, 2013, pp. 513-521.
Processing Systems 24, J. Shawe-Taylor, R. Zemel, P. Bartlett,
F. P ereira, and K. Weinberger, Eds. Curran Associates, Inc., 2011, pp. [19] S. van der Walt, J. L. SchOnberger, J. Nunez-Iglesias, F. Boulogne, J. D.
109-117. [Online]. Available: http://papers.nips.cc/paper/4296-efficient Warner, N. Y ager, E. Goui11art, T. Y u, and the scikit-image contributors,
inference-in-fully-connected-crfs-with-gaussian-edge-potentials.pdf "scikit-image: image processing in P ython," PeerJ, vol. 2, p. e453, 6
2014.
[4] S. Gould, "Probabilistic models for region-based scene understanding," [20] R. Achanta, S. Hemami, F. Estrada, and S. Susstrunk, "Frequency-tuned
P h.D. dissertation, Stanford University, June 2010. salient region detection," in Computer Vision and Pattern Recognition,
[5] C. Pantofaru, C. Schmid, and M. Hebert, "Object recognition by 2009. CVPR 2009. IEEE Conference on, June 2009, pp. 1597-1604.
integrating multiple image segmentations," in Proceedings of the 10th [21] A. Vedaldi and S. Soatto, "Quick shift and kernel methods for mode
European Coriference on Computer Vision: Part III, ser. ECCV '08.
seeking," in In European Conference on Computer Vision, volume IV,
Berlin, Heidelberg: Springer-Verlag, 2008, pp. 481-494. 2008, pp. 705-718.
[6] M. P. Kumar and D. Koller, "Efficiently selecting regions for scene [22] P. F. Felzenszwalb and D. P. Huttenlocher, "Efficient graph-based image
understanding," in Computer Vision and Pattern Recognition (CVPR), segmentation," Int. J. Comput. Vision, vol. 59, no. 2, pp. 167-181, Sep.
2010 IEEE Conference on. IEEE, 2010, pp. 3217-3224.
2004.
[7] P. Kohli, L. Ladicky, and P. H. S. Torr, "Robust higher order poten
[23] X.-H. P han and C.-T. Nguyen, "Gibbslda++: A clc++ implementation
tials for enforcing label consistency," in Computer Vision and Pattern
of latent dirichlet allocation Oda)," Tech. Rep., 2007. [Online].
Recognition, 2008. CVPR 2008. IEEE Coriference on, June 2008, pp.
Available: http://gibbslda.sourceforge.net/
1-8.
[24] (2007) Affine covariant features. [Online]. Available:
[8] S. Z. Li, Markov Random Field Modeling in Image Analysis, 3rd ed.
http://www.robots.ox.ac.ukI vgglresearchlaffinel
Springer P ublishing Company, Incorporated, 2009.
[25] F.-A. Fortin, F.-M. De Rainville, M.-A. Gardner, M. P arizeau, and
[9] D. M. Blei, A. Y. Ng, and M. 1. Jordan, "Latent dirichlet allocation,"
C. Gagne, "DEAP: Evolutionary algorithms made easy," Journal of
J. Mach. Learn. Res., vol. 3, pp. 993-1022, Mar. 2003. [Online].
Machine Learning Research, vol. 13, pp. 2171-2175, jul 2012.
Available: http://dl.acm.org/citation.cfm?id=944919.944937
[26] C. S. P erone, "Pyevolve: A python open-source framework for genetic
[10] B. Russell, W. Freeman, A. Efros, J. Sivic, and A. Zisserrnan, "Using
algorithms," SIGEVOlution, vol. 4, no. 1, pp. 12-20, Nov. 2009.
multiple segmentations to discover objects and their extent in image
[Online]. Available: http://doi.acm.orgl10.1145/1656395.1656397
collections," in Computer Vision and Pattern Recognition, 2006 IEEE
Computer Society Conference on, vol. 2, 2006, pp. 1605-1614. [27] P. Kohli, L. Ladicky, and P. H. S. Torr, "Robust higher order poten
[11] K. Mikolajczyk and C. Schmid, "Scale & affine tials for enforcing label consistency," in Computer Vision and Pattern
Recognition, 2008. CVPR 2008. IEEE Coriference on, June 2008, pp.
invariant interest point detectors," Int. J. Comput. Vision,
1-8.
vol. 60, no. 1, pp. 63-86, Oct. 2004. [Online]. Available:
http://dx.doi.org/l0.l0231B:VISI.0000027790.02288.f2 [28] M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn,
[12] D. G. Lowe, "Distinctive image features from scale-invariant keypoints," and A. Zisserrnan, "The PASCAL Visual Object Classes
Int. J. Comput. Vision, vol. 60, no. 2, pp. 91-110, Nov. 2004. [Online].
Challenge 2012 (VOC2012) Results," http://www.pascal
Available: http://dx.doi.orgl10.10231B:Y.1S1.OOoo029664.99615.94 network.orglchallengesNOClvoc2012lworkshop/index.html.
[13] T. Griffiths, "Gibbs sampling in the generative model of latent dirichlet

allocation," Tech. Rep., 2002.
ACKNOWLEDGMENTS
[14] Y. Xin-She, Nature-Inspired Optimization Algorithms, 1st ed., ser.
Elsevier Insights. Elsevier, 2014.
This research was supported by Universitas Indone
[15] J. Kennedy and R. C. Eberhart, "Particle swarm optimization," in
Proceedings of the 1995 IEEE International Coriference on Neural
sia and Directorate General of Higher Education, under
Networks, vol. 4, P erth, Australia, IEEE Service Center, Piscataway, Grant Research Collaboration and Scientific Publication No:
NJ, 1995, pp. 1942-1948. 0403/UN2.RI2IHKP.OS.OO/201S.
255 978-1-5090-0363-1/15/$31.00 2015 IEEE

Evolutionary Segment Selection For

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Evolutionary Segment Selection For

Caricato da

Copyright:

Formati disponibili

ICACSIS 2015

Evolutionary Segment Selection for

249 978-1-5090-0363-1/15/$31.00 2015 IEEE

as one of potentials used in CRF-based energy function.

III. CRF-BASED SEMANTIC SEGMENTATION

The goal of semantic segmentation is to label every pixel

250 978-1-5090-0363-1/15/$31.00 2015 IEEE

desire to have best segments with maximum coverage area and

251 978-1-5090-0363-1/15/$31.00 2015 IEEE

/ '" Segments with goodness

Fig. 3: The pipeline for predicting segment goodness via LDA.

Mapping into Evolutionary Bag of selected Semantic

ml I I. 101 ..... 101

SO is sigmoid function S(t)

This version of BPSO reform equation 4 and 5 using

252 978-1-5090-0363-1/15/$31.00 2015 IEEE

In the GA-based segment selection, we set a fixed number

Meanwhile, for CRF-based semantic segmentation, we

GAcc CAcc Tsel Ta Ttot

253 978-1-5090-0363-1/15/$31.00 2015 IEEE

254 978-1-5090-0363-1/15/$31.00 2015 IEEE

[13] T. Griffiths, "Gibbs sampling in the generative model of latent dirichlet

255 978-1-5090-0363-1/15/$31.00 2015 IEEE

Potrebbero piacerti anche