Sei sulla pagina 1di 6

Dominant Color Region Based Indexing for CBIR

K.C.Ravishankar
Department of Computer Science & Engg.
Malnad College of Engg., Hassan, INDIA
kcr@mcehas.kar.nic.in

Abstract
Our world is dominated by visual information and a
tremendous amount of such information is being added dayby-day. It would be impossible to cope with this explosion
of visual data, unless they are organized such that we can
retrieve them efficiently and effectively. The main problem
in organizing and managing such visual data is indexing,
the assignment of a synthetic descriptor which facilitates its
retrieval. It involves extracting relevant entities or characteristics from images as index keys. Then a representation
is chosen for the keys and specific meaning is assigned to
it. Color is an important cue for Content Based Image Retrieval (CBIR) systems. We propose a technique to index
and store images based on dominant color regions. Features like region size and location of the region are extracted
and used as similarity measures. Images with similar indices are stored as an image cluster in a Hash table. A
prototype of the retrieval system is developed using JAVA
language.

1. Introduction
Our world is dominated by visual information and a
tremendous amount of such information is being added dayby-day. It would be impossible to cope with this explosion
of visual data, unless they are organized such that we can
retrieve them efficiently and effectively.
The main problem in organizing and managing such visual data is indexing, the assignment of a synthetic descriptor which facilitates its retrieval. It involves extracting relevant entities or characteristics from images as index keys.
Then a representation is chosen for the keys and specific
meaning is assigned to it.
Visual database systems require efficient indexing to facilitate fast access to the images in the database. Recent
Content-Based Image Retrieval (CBIR) techniques cited in
the literature [11] [8] [5] [12] [15] [1] are based on features
such as color, texture, shape, spatial relationships, object

B.G.Prasad, S.K.Gupta, K.K.Biswas


Department of Computer Science & Engg.
Indian Institute of Technology, New Delhi, INDIA
fbgprasad,skg,kkbg@cse.iitd.ernet.in

motion, etc. As the number of digital images grows, there


is a need for automatic image retrieval based on their content.
Current CBIR systems such as IBMs QBIC [9] [11] [15]
allow automatic retrieval based on simple characteristics
and distribution of color, shape and texture . But they do not
consider structural and spatial relationships and fail to capture meaningful contents of the image in general. Also the
object identification is semi-automatic. The Chabot project
[14] integrates a relational database with retrieval by color
analysis. Textual meta-data along with color histograms
form the main features used. VisualSEEK [6] allows query
by color and spatial layout of color regions. Text based tools
for annotating images and searching is provided. A new
image representation which uses the concept of localized
coherent regions in color and texture space is presented by
Carson et al. [3]. Segmentation based on the above features
called Blobworld is used and query is based on these features.
Ideally, a combination of these methods are required to
form an automated CBIR system. As discussed in [8] [5],
many of the present CBIR systems proposed in the literature
have been shown to be effective in terms of recall and precision. The issue of speedy retrieval is not well addressed in
almost all of these proposed methods. To make them useful
and practical, there is a need for designing efficient access
methods.
Color is an important cue for image retrieval. Many
CBIR systems have been designed with color as the main
feature for retrieval [10] [2]. Though a global property, its
distribution is independent of view and resolution and does
not require knowledge of component objects of the image.
But, since color alone is not enough to characterize an image we use the spatial locations of the extracted dominant
color regions in the image as the second feature in our proposed system [7]. A combined index based on the color
and spatial location of dominant regions segmented out of
the image is constructed and used as a retrieval key for the
querying process. Both example-based query (EBQ) and
feature-based query (FBQ) are supported.

The paper is organized as follows : Section 2 describes


the procedure used to segment out dominant color regions
in the image and find their boundaries. The proposed indexing method is presented in Section 3. In Section 4, we
describe the Java-based GUI for image querying. Experimental results with a set of sample images are shown in
Section 5.

2 Dominant region segmentation


Efficient database indexing involves extracting visual
features off-line and storing them as metadata in the image
database. The variety and dimension of visual features are
very high and this poses the greatest challenge in building
efficient index structures. The semantics of images is much
richer or complex and thus requires more levels of interpretation. We are interested in understanding certain objects
in the image and relationships between them. Since object
recognition itself is not easy to accomplish, it is difficult to
distinguish between image and object.
We need to have the ability to look at images along attributes to judge their relevance with respect to what the
user has in mind. Also, search engines now organize indexes based on general taxonomy. So issues like organization of data, techniques used to access data, etc. are quite
important for construction of image databases and the retrieval aspect.
The strong motivation for using color to perform selection comes from the fact that it provides region information
and that , when specified appropriately, it can be relatively
insensitive to variations in normal illumination conditions
and appearance/viewpoint of objects. Dominant color region in an image can be represented as a connected fragment of homogeneous color pixels which is perceived by
human vision [13].
Our technique to index images [7] is based on this concept of dominant color regions present in the image. The
segmented out dominant regions along with their features
are used as an aid in the retrieval of similar images from the
image database.

2.1 Color space categorization


The entire RGB color space is described using a small set
of color categories. This is summarized into a color lookup table as depicted in table 1. A smaller set is more useful
since it gives a coarser description of the color of a region
thus allowing it to remain same for some variations in imaging conditions. We have taken a table of 25 colors chosen
from 256 color palette table.

Table 1. Color look-up table


Color
Black
Sea Green
Light Green
Olive Green
Aqua
Bright Green
Blue
Green
Turquoise
Dark Red
Blue Gray
Lime
Lavender

R
0
0
0
36
36
36
73
73
73
109
109
109
146

G
0
182
255
73
146
255
36
146
219
36
109
219
0

B
0
0
170
0
170
0
170
0
170
0
170
0
170

Color
Plum
Teal
Brown
Magenta
Yellow Green
Flouro Green
Red
Rose
Yellow
Pink
Orange
White

R
146
146
182
182
182
182
219
219
219
255
255
255

G
109
182
0
73
182
255
73
146
255
36
146
255

B
0
170
0
170
0
170
0
170
0
170
0
255

2.2 Color matching and region selection


The method relies on the fact that boundaries where perceptual color changes occur must be found before any cluster in color space can be interpreted as corresponding to a
region in image space. The RGB color space is partitioned
into subspaces called color categories. The perceptual color
of a pixel can be specified by the color category into which
it maps. For details refer to [7].
The procedure below segments the image into regions
according to their perceived color. It involves mapping all
pixels to their categories in color space, and grouping pixels
belonging to same category. A color will be selected from
25 pre-defined colors which is very near to image pixel
color and it will be stored as new color pixel in the image.
Using p the image pixel value and C the corresponding
color table entry, Color distance Cd is calculated using
Euclidean distance formula and is as specified in the
equation below:

Cd = MINi25=1

pp

( r

CiR )2 + (pg CiG )2 + (pb CiB )2

Region marking is done on updated image. A boundary


rectangle is drawn on each dominant region selected.
The area of boundary rectangle is used in determining
normalized area of dominant region. Then the location of
the region is determined. Image path, number of regions
found, each region information like color, normalized
area and location of each region are stored in a meta-file
for further processing. This file information is used for
constructing Image index tree. When the search engine is
initiated, index tree is constructed. For a given query, the
tree will be searched for similar images based on color,
area of regions and location of regions present in the image.
Steps involved in segmentation and boundary detection
1. Read the image and create an image array which contains the RGB components of each pixel in the image.
2. For each pixel in the image do

Search color-look-up-table for the nearest color


by finding the distance between the pixel color (I)
and the color in the color-lookup-table (C) using
the distance formula D.
D

((I

r)

+ (I

g)

+ (I

b)

specify the location of the region in his query to retrieve the


images from the database. The classification is according to
the location map as shown in figure 3 and an illustration of
find location is shown in figure 4.

Assign RGB components of color look-up table


to the pixel where distance D is minimum. Determine color response of each color in the modified
image and store them in frequency table. Sort the
frequency table in descending order.

3. Determine the first occurrence of a pixel which has the


same RGB value as in the sorted frequency table.

Left top

Middle top

(x1,y1)

Left center

Center

(x3,y3)

Left bottom

Right top
(x2,y2)

Right center
(x4,y4)

Middle bottom

Right bottom

Figure 3. Location map.

Figure 4. Illustration
of Find location.

4. Assign the pixel location to horizontal and vertical


seeds viz: iseed, jseed.
5. Following iseed and jseed, mark the entire region using
8-connected neighboring region growing method.
6. Obtain (x,y) co-ordinates of boundary of marked region.
7. Determine normalized size r(R) of bounding rectangle
using:
jx1 x2 jjy1 y2 j where x ; x ; y ; y are x and
r (R) =
1
2
1
2
Image size
y coordinate of bounding rectangle and
if the Normalized size r(R) > T only then consider
the region as dominant region, where T is a threshold
for dominance.
8. Repeat steps 3 to 7 to find upto three dominant regions.
The illustration of segmentation and boundary detection
are shown in figure 1 and figure 2

The steps involved in determining the locations of the


regions in the image are as follows:
1. Determine four corners of the location named as center in location map using
X1 = imgwidth/3,
Y1 = imgheight/3
X2 = 2 * imgwidth/3, Y2 = imgheight/3
X3 = imgwidth/3,
Y3 = 2 * imgheight/3
X4 = 2 * imgwidth/3, Y4 = 2 * imgheight/3
2. Determine the approximate position of the region by
comparing coordinates of the bounding rectangle with
the above coordinates.
In the above procedures, assignment of color to each
pixel (for matching pixels with color look-up table) takes
O(L) time, where L is the size of the color look-up table.
The Boundary detection algorithm takes O(m2 ) time for
each region since it searches entire image of size m x m for
similar color pixel. The extracted dominant region features
viz., color, area and location are stored in a sequential file.
Image database is constructed by processing all images offline as this saves query processing time. When a query is
made based on an example image, only the example image
is processed for extracting region features.

3 Indexing technique
Figure 1. Illustration
of assign-color.

Figure 2. Image
with Boundaries
marked.

To provide easy and fast access, the generated meta-data


have to be stored in suitable index structures. Logical structures are used to store normalized area and spatial location of the regions/objects in the image. Bertino et al. [4]
have suggested general image indexing taxonomy for image
databases based on various image features.

2.3 Finding location of the region

3.1 Multi-level multidimensional index

The image space is divided into 9 sub locations. The


approximate position of the region is determined. User can

A multi-level multidimensional index structure based on


partitioning and R-tree [4] is used to store the images. The

structure employs pruning to speed up retrieval. Figure 5


shows the complete two-level structure.

Image Space

Number of Regions
1 Region
1

Color Bins

10

2 Regions
(0,1)

(0,2)

..................................

3 Regions
(0,1,2)

(0,2,1)

................

Image
Index

Image
Cluster

The features like color (RGB) components, normalized area


of the region and the relative location of the region are
stored with the image index. The images with the similar
Index are grouped together and stored as a bunch of images.
The similarity and dissimilarity depends upon the difference
in the saliency measures (features) stored along with the image. For details refer to [7].
An example construction of the hash table is shown in
figure 6. Suppose the index is 28, then all images with
regions containing 1st and 3rd color of the color look-up table respectively are mapped and stored as an image cluster.
Each image entity is filled with image path and two region
information.

Features

1(1)

2(2)

Figure 5. Two-level region and location index


structure.
The first level is the dominant region classification. Images with same dominant regions are assigned to a particular partition. It allows us to prune away images belonging to classes that would never satisfy the query to narrow
down the search space to some classes. The total number
of classes/categories is given by n Ck = (n nk!)!k! where n is
the number of different regions/objects and k is the number
of dominant regions/objects chosen.
The second level is a multidimensional R-tree structure
used to prune away regions within images in a partition
based on spatial relations. Each node contains the features
in a particular region along with bounding rectangle for that
region. The leaf node points to the address in the database
that contains the image data. The image data contains the
ID of the object and other parameters required for queries.

3.2 Determination of the index


A composite index is formed based on the indices in the
color look-up table. We have chosen a color table consisting
of 25 colors. An unique index is formed using a base of 25
and the equation is given below:
I ndex

PiNo RGN
=1

i  25n i

Suppose (C1, C2, C3) are the color indices of three dominant regions found within an image, where C1 represents
index of the first dominant region, C2 represents index of
the second dominant region and C3 represents index of the
third dominant region,
Then, the index is given by
I ndex

C1

 25

+ C2

 25

+ C3

 25

28(1,3)

278(1,2,3)

Image
Path

Image
RGN 1

Path

RGN 1

Image
Path

RGN 1

Image
Path

Image
RGN 1 RGN 2

Path

RGN 1 RGN 2

Image
Path

RGN 1 RGN 2 RGN 3

Figure 6. An instance of Hash table.

4 Image querying
A prototype CBIR system has been implemented for retrieving images which are stored in an image database based
on dominant color regions using Java language [7]. Image
processing, Region detection and Hash based storage of images has been carried out. The programs have been implemented and tested on Windows-NT platform. The image
database consists of three classes of images such as flags,
flowers and simulated images.
Graphical user interface has been developed in Java since
it provides a more powerful and flexible interface. For a
query, Java GUI displays thumbnails of images. ExampleBased query takes an image as input and Feature-Based
Query reads color and location of regions for the retrieval
of images.
The images are processed and the features extracted are
then stored in a hash table. The object oriented approach
provided by JAVA is used to store the image information.
The images with similar index are stored as Vectors. Each
Vector contains the set of image references and the region
features. When the search is made, hash table will be
searched for index. Image clusters are then searched for
similar images.

4.1 Retrieval engine


A Retrieval engine is designed with two applets. The first
applet is Example-Based Query applet. In this applet, example images are browsed in a window. Any image can be
seen by clicking Prev and Next buttons provided. A typical
example-based query would be find images similar to the
given image. If the user clicks the Select button, image at
the window is selected for processing. Color matching and
region selection programs are executed over selected image
to get the region features using which an image index is
formed. The Hash table is searched for the similar index.
Similarity measures are applied to get the most likely images from the image clusters. In our prototype, first four
images which are matched by index and then by features
are displayed in the thumbnail windows provided. Sample
outputs are shown in figures 7, 9 and 10.

Figure 7. Example Based Query.


The second applet is Feature-Based Query applet. In
this module the user can select region color and its location
from the pull down lists provided. The color selected by
the user is shown in thumbnail. The image index is formed
and searched in the Hash table. The location provided by
the user is compared with the location stored in feature vectors. Location selection is optional. If any of the locations
is not selected, all images with similar index are displayed
suppressing the location constraint. Images are displayed in
the window provided. Sample outputs are shown in figures
8, 11 and 12.

5 Experiments and results


The image database consist of image classes such as
flowers, flags and simulated pictures. We have tested 190
flag images , 100 flower images and 50 simulated images.
The images were transformed such that each pixel is assigned nearest color in the color look-up table. In table 2

Figure 8. Feature Based Query.

percentage of proper color matching and proper region selection are shown for different categories of images. Out of
190 flag images tested for color matching, 149 flags were
found to match on color properly. 60 of 100 flower images
showed proper color matching. The percentage of color
matching is less with flowers as compared to flags because
of the fact that, colors in flowers are a combination of various colors.
Percentage of success for two types of queries are shown
in table 2. Out of 50 queries made on Flag and Flower
database, 35 queries returned correct results in Examplebased Query and 32 in Feature-based Queries respectively. The success in relevant image retrieval is greater
in example-based queries than retrieval by feature-based
queries. Some of the images have not matched properly
because of distribution of colors. This also gave less success in feature-based query. Wrong color matching does not
cause much problem in example-based query since example
image is also processed for feature extraction.
Sample outputs of querying and segmentation of color
regions are shown in figures 9 thru 15.

Table 2. Matching and query results


Image Domain
Flags Flowers Simulated
Images
Total Images Tested
190
100
50
Color Matching(%)
79.4
60
86
Region Selection(%)
68
50
80
Total Images Tested
190
100
50
No. of Queries made
50
50
25
Success in EBQ
35
32
18
Success in FBQ
30
27
18

Figure 9. ExampleBased Query.

Figure 10. ExampleBased Query.

Figure 11. FeatureBased Query.

Figure 12. FeatureBased Query.

References

Figure 13. Segmentation of one


region.

Figure 14. Segmentation of two


regions.

Figure 15. Segmentation of three regions.

6 Conclusions
A technique for image indexing based on color is described. Dominant regions present in the image were found
and region features like color, size and approximate location
within the image are determined and stored in the database.
A multi-level multidimensional indexing scheme is used to
store the images. A Java based search engine has been developed on Windows-NT platform. Flags, flowers and simulated images have been segmented and retrieved from the
database.
We have considered color and spatial relations of the regions detected in the image. This can be further extended
to include shape and texture properties of each region in
the image. We are currently working towards this goal. A
combined module with color, shape, texture and spatial location could be used to give an unique index to each image
to achieve efficient and effective image search and retrieval.

[1] A.Pentland, P.W.Picard, and S.Sclaroff.


Photobook:
Content-based manipulation of image databases. International Journal of Computer Vision, 18(3):233254, 1996.
[2] B.M.Mehtre et al. Color matching for image retrieval. Pattern Recognition Letters, 16:325331, 1995.
[3] C. Carson et al. Region-based image querying. In CVPR97
Workshop on Content-based Access to Image and Video libraries (CAIVL97), 1997.
[4] E.Bertino et al. Indexing Techniques for Advanced Database
Systems. Kluwer Academic Publishers, 1997.
[5] V. Gudivada and V.V.Raghavan. Special issue on contentbased image retrieval systems - guest eds. IEEE Computer,
28(9):1822, 1995.
[6] J.R.Smith and S.F.Chang. Visualseek: A fully automated
content-based image query system. ACM Multimedia, pages
8798, 1996.
[7] K.C.Ravishankar. Color based indexing technique for image retrieval. M.Tech Thesis, Dept. of CSE, IIT, New Delhi,
India, 1998.
[8] M. Marsicoi, L.Cinque, and S.Levialdi. Indexing pictorial
documents by their content: a survey of current techniques.
Image and Vision Computing, 15(2):119141, 1997.
[9] M.Flickner et al. Query by image and video content:the qbic
system. IEEE Computer, 28(9):2332, 1995.
[10] M.S.Kankanahalli, B.M.Mehtre, and J.K.Wu. Cluster-based
color matching for image retrieval. Pattern Recognition,
29(4):701708, 1996.
[11] P.Aigrain, H.J.Zhang, and D.Petkovic. Content-based representation and retrieval of visual media: A state-of-the-art review. Multimedia Tools and Applications., 3:179202, 1996.
[12] R.S.Gray. Content-based image retrieval:color and edges.
Technical Report PCS-TR95-252, Dartmouth College,
1995.
[13] T.F.Syeda-Mahmood. Data and model-driven selection using color regions. International Journal of Computer Vision,
21(1/2):936, 1997.
[14] V.E.Ogle and M.Stonebaker. Chabot: Retrieval from a relational database of images. IEEE Computer, 28(9):4048,
1995.
[15] W.Niblack et al. The qbic project: Querying images by
content using color, texture and shape. In Storage and Retrieval for Image and Video Databases (SPIE), 1908:173
187, 1993.

Potrebbero piacerti anche