Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Novian Habibie*l , Vektor Dewanto*2, Jogie Chandra*l, Fariz Ikhwantci1, Harry Budi Santoso1, Wisnu Jatmikol
1 Faculty of Computer Science, Universitas Indonesia; 2Department of Computer Science, Bogor Agricultural University
Email: novian.habibie@ui.ac.id.vektor.dewanto@gmail.com
Abstract-One promising approach for pixel-wise semantic segmentation. To this end, we formulate the segment selection
segmentation is based on higher-order Conditional Random as an optimization problem. We propose three optimization cri
Fields (CRFs). We aim to selectively choose segments for the teria in relation to the selected segments, namely: a) averaged
higher-order CRFs in semantic segmentation. To this end, we goodness, b) coverage area, and c) overlapped area. Essentially,
formulate the selection as an optimization problem. We propose
we desire to have best segments with maximum coverage area
three optimization criteria in relation to the selected segments,
and maximum non-overlapped area. The goodness of segments
namely: a) averaged goodness, b) coverage area and c) non
overlapped area. Essentially, we desire to have best segments
is estimated using the Latent Dirichlet Allocation approach;
with maximum coverage area and maximum non-overlapped more in section IV. We employ two evolutionary optimization
area. We apply two evolutionary optimization algorithms, namely: algorithms to perform segment selection, namely: the genetic
the genetic algorithm (GA) and the particle swarm optimization algorithm (GA) and particle swarm optimization (PSO). Fig. 1
(PSO). The goodness of segments is estimated using the Latent depicts the pipeline of CRF-based semantic segmentation with
Dirichlet Allocation approach. Experiment results show that our proposed segment selection.
semantic segmentation with GA-or-PSO-selected segments yields
competitive semantic segmentation accuracy in comparison to Experiment results show that semantic segmentation with
that of naively using all segments. Moreover, the fewer number GA-or-PSO-selected segments yields competitive semantic
of segments used in semantic segmentation speeds up its compu segmentation accuracy in comparison to that of naively using
tation time up to six times faster. all segments. In particular, we achieve the semantic segmen
tation accuracy of 82.1 30%, 82.412% and 82.490% from
1. INTRODUCTION semantic segmentation using GA-selected, PSO-selected and
Semantic segmentation aims to label every pixel in an all segments, respectively. Moreover, the fewer number of
image with a semantic object-class label from some predefined segments used in semantic segmentation (approximately one
set. In contrast, (standard) segmentation produces segments third of all segments) speeds up its computation time up to six
without object-class labels. One promising approach for pixel times faster. It also is proven that the goodness of segments can
wise semantic segmentation is based on Maximum A Posteriori be well estimated via the Latent Dirichlet Allocation approach.
(MAP) and higher-order Conditional Random Fields (CRFs)
frameworks as in [1], [2], [3]. In particular, higher-order CRFs II. R ELAT ED WORKS
for semantic segmentation rely on segments to compute the
Gould [4] uses segments as nodes on Markov Random
optimal pixel labeling. The fact is that the more the number of
Field models. Segments used on that method based on non
segments used, the longer the computation time of semantic
overlap segmentation. Pantofaru et al. [5] also uses multiple
segmentation. The number of segments, however, does not nec
segmentation for semantic segmentation. They found that size
essarily corresponds to higher semantic segmentation accuracy.
of segments can give a tradeoff to process of finding object's
We observe that there is no perfect segmentation for border. Bigger segments contain more feature and information
all objects in an image. Some produce under-segmentation, of object, but harder to determine object's border. Otherwise,
i.e. a segment contains more than one objects. While some segment with small size can define object's border easily but
yield over-segmentation, i.e one object is spread over several only contains small amount of information. To overcome that,
segments. Additionally, different segment generators gives Pantofaru using Intersection of Region (loR) method to obtain
different segments. This surely affects higher-order CRF-based border of object.
semantic segmention in terms of its accuracy and computation
Kumar and Koller [6] also use segments to do a semantic
time. As a result, we are motivated to perform multiple
segmentation. They said that result of unsupervised segment
segmentations on an image in order to obtain a large number
generator is not very well. They propose a method to gen
of segments. We expect that some of them are near-perfect
erating segments by utilizing its pixel with energy function.
segments, which contain mostly one object-class label.
This method will generates segments as an integer program.
We aim to selectively choose segments from a bag of Morover, this method makes generated segments can adapt to
segments yielded by multiple segmentations. The selected seg energy formulation function from labeling process.
ments are then used for higher-order CRFs in semantic image
Another approach to use segments to utilize semantic
*These authors contributed equally segmentation were done by Kohli et al. [7]. They use segments
Robustpn
Fig. 1: The pipeline of CRF-based semantic segmentation with our proposed evolutionary segment selection.
Furthermore,
respectively. While, another segment 82 contains 100% dog
E(y) =
LiEV 'l/Ji(Yi) LiEV,jENi 'l/Ji,j(Yi, Yj)
'
+
v
'
pixels. Then, it can be inferred that a segment 82 is "better"
than 81.
Unary Term Pairwise Term
LCEC 'l/Jc(yc)
To compute the probability that a segment contains some
(2) pixel labels, we draw ideas from the Natural Language Pro
+
-------- cessing (NLP) domain. Essentially, in NLP, Latent Dirichelt
Higher-order Term
Allocation (LDA) [9] is used to discover hidden topic in a
where V is a set of random variables associated with pixel document. For instance, a sentence of "I like to eat bananas
labels, Ni returns a set of neighbour pixels of i, and C is a and carrots" contains
100% topic 1, while "Bunnies and kittens
set of higher-order cliques, i.e. cliques whose size is more are cute" contains
100% topic 2. On the other hand, "This
than two. A clique is essentially a collection of random hamster munching on a piece of carrot" contains 50% topic 1
variables. We remark that this work aims to selectively choose and 50% topic 2.
segments to become member cliques of C. For the derivation
of equation 1 and 2, we refers the readers to [8]. Fig. 2 We use the LDA to measure such segment goodness via
illustrates the graphical model of the aforementioned CRFs, the following analogies.
in which there are two higher-order cliques (segments). Hidden Topics M Object-Classes
Documents M Segments
IV. ESTIMATING SEGMENT GOODNESS Words M (Affine Regions + Sift Descriptor).
We hy pothesize that a good segment should contains one Hence, a "good" segment has a high probability for a hidden
single object-class with high confidence. Let a segment 81 topic. In other words, the goodness level is proportional to
contains 50%, 40%, and 10% of dog, sofa and unknown pixels, maximum probability of a hidden topic given a document.
individual. One bit of a string represent a segment. If a value converge or it reaches its iteration limit.
segment is selected, then the corresponding bit has a value The PSO algoritm requires of two main functions as
of 1. Otherwise, it is zero-ed. The number of bits in a string is follows.
equal to the number of segments from multiple segmentations
X+l
.
= x + v+l
"
(4)
on an image. Figure 4 (left-part) illustrates the encoding of
this representation. v:+1 = (}vf +arand(O, l)(x;Ct) -x!)+.Brand(O, l)(g;Ct) -x!) (5)
B. The fitness function Equation 4 is for updating set of particle's new position in
search space x+1 using set of particle's current positions x!
For optimal segments for higher-order CRFs,we propose and set of particle's velocity V;+l. Meanwhile, equation 5
three optimization criteria, which are related to the selected is for updating particle's new velocity V;+ l based on its
segments, namely: a) averaged goodness Oag, b) coverage
area Oca, and c) non-overlapped area Ona. In essence, we
previous velocity vi x;(t) and
and best position (local best
p(ll s(3))={l2
Inf&r&nce distribution of
Testing Data: Extract
Segments
---t., topic over document
Visual Words
(using Gibbs Sampilogl
p(tl s[n]) :0.7
bs[1]
Population of m individual
bs[21
m[O; I' 1 0 1 '1 1'1
mi'; 10 I Ia I
n 11
L-------T-----t:= bslo-1]
Fig. 4: The pipeline of our proposed evolutionary segment selection for higher-order CRFs in semantic segmentation.
Fig. 6: Qualitative results of semantic segmentation using several methods, namely (from left to right): Pairwise CRFs (the 2nd column),
RobustPn with all segments, RobustPn with GA-selected segments, and RobustPn with PSO-selected segments (the right-most column).
because the reduction on the number of segments that are VII. CONCLUSIONS
involved in the higher-order CRFs. Recall that those selected
segments become the higher order cliques of CRFs. We aim to selectively choose segments for higher-order
Conditional Random Fields in semantic image segmentation.
To this end, we formulate the selection as an optimization
We found that in average, there are 300 segments per problem. We propose three optimization criteria in relation to
image. Interestingly, our proposed evolutionary segment selec the selected segments, namely: a) averaged goodness, b) cov
tion selects approximately one third of those. The GA gives erage area and c) non-overlapped area. We desire to have best
fewer segments than the PSO, but with the cost of longer segments with maximum coverage area and maximum non
selection time. A fewer number of GA-selected segments overlapped area. The goodness of segments is estimated via
results in shorter annotation time, compared with that using the Latent Dirichlet Allocation (LDA). We investigate the per
PSO-selected segments. formance of two evolutionary optimization methods, namely:
the Genetic Algorithm and the Particle Swarm Optimization
for segment selection. Experiment results show that seman
In terms of GAcc and CAcc (equation 10), however, the tic segmentation with GA-or-PSO-selected segments y ields
RobustPn with evolutionary segment selection yields slightly competitive semantic segmentation accuracy in comparison to
worse performance. Particularly, the best performance is ob that of naively using all segments. In particular, we achieve
tained from the RobustPn with all segments, whose GAcc is of the semantic segmentation accuracy of 82.130%, 82.412%
82.490%, while the GAcc's of RobustPn with GA-selected and and 82.490% from semantic segmentation using GA-selected,
PSO-selected are of 82.130% and 82.412%, respectively. We PSO-selected and all segments, respectively. Moreover, the
believe that this is a reasonable trade-off between speed and fewer number of segments used in semantic segmentation
performance, where we gain up to 6 times speed-up with the (approximately one third of all segments) speeds up its com
price of slightly reduced semantic segmentation quality. Fig. 6 putation time up to six times faster. It also is proven that the
depicts several qualitative results of semantic segmentation. goodness of segments can be well estimated via LDA.