Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
htm
Dept of Information Technology, Research Scholar, Sathyabama University, Old Mahapalipuram Road, Chennai, Tamil Nadu, Pin-600119. 2 Dept of Electronics and communication Engg., Anna University, Chennai
Abstract In a content based image retrieval system, target images are sorted by feature similarities with respect to the query (CBIR)[5].In this paper, we propose to use K-means clustering for the classification of feature set obtained from the histogram. Histogram provides a set of features for proposed for Content Based Image Retrieval (CBIR). Hence histogram method further refines the histogram by splitting the pixels in a given bucket into several classes[1]. Here we compute the similarity for 8 bins and similarity for 16 bins. Standard histograms, because of their efficiency and insensitivity to small changes, are widely used for content based image retrieval. But the main disadvantage of histograms is that many images of different appearances can have similar histograms because histograms provide coarse characterization of an image.
Introduction
Color histograms are widely used for retrieval of results based on queries. For such queries, color histograms can be employed because they are very efficient regarding computations as well as they offer insensitivity to small changes regarding camera position[1]. But the main problem with color histograms is their coarse characterization of an image. That may itself result in same histograms for images with different appearances. Color histograms are employed in systems such as QBIC, Chabot etc. They all utilize the advantages of color histogram. In this paper, a modified scheme based on color histogram is used. This modified method is based on histogram refinement [1]. The histogram refinement method provides that the pixels within a given bucket be split into classes based upon some local property and these split histograms are then compared on bucket by bucket basis just like normal histogram matching but the pixels within a bucket with same local property are compared. So the results are better than the normal histogram matching. So not only
the color features of the image are used but also the spatial information is incorporated to refine the histogram. Introduction to Content Based image Retrieval The size of image databases had increased dramatically in recent years. Causes include the development of image capturing devices such as digital cameras and the internet. New techniques and tools need to be proposed with efficient results for sorting, browsing, searching and retrieving images. The text-based approach can be tracked back to 1970s for retrieving images by using annotations. In 1980s contentbased image retrieval - CBIR [4] was introduced to overcome some disadvantages of the text-based approach. Content-based image retrieval(CBIR) has become an important practicable technique to support effective searching and browsing of larger and larger collections of unstructured images and videos. Content-based image retrieval - CBIR uses visual content (low-level features) of images such as color, texture, shape, etc. to represent and to index images. These features are described by multi-dimensional vectors called feature vectors that are used in the process of retrieve similar images. Extensive experiments on CBIR show that low-level features not represent exactly the high-level semantic concepts and can fail when used to retrieve similar images. In order to overpass this problem, different approaches aim to propose new methods that use different techniques combined with low level descriptors. Different CBIR systems have been developed such as SIMPLIcity ,CLUE [5] and others. More specifically, the discrepancy between the limited descriptive power of low-level image feature and the richness of user semantics, is referred to as the semantic gap [4]. In order to bridge this gap, different approaches aim to propose new methods by combining low level features and other techniques as textual annotations for creating new descriptors that improve the results in image retrieval. However, the retrieve process become more complex and any method does not warranty the absolute accuracy of results. Low level Features for Image Retrieval Different techniques were proposed for extracting low-level features. Color feature is one of the most widely used features in image retrieval because it is efficient in describing colors although it is not directly related to high-level semantics. MPEG-7 is an ISO/IEC standard developed by Moving Pictures Expert Group for standardizing the description of multimedia content data. This standard defines seven color descriptors: Color space, color quantization, dominant colors, scalable color, color layout, color-structure, and GoF/GoP color. The scalable color descriptor is a color histogram in HSV color space, which is encoded by a Haar transform. Its binary representation is scalable in terms of bin numbers and bit representation accuracy over a broad range of data rates. Color layout descriptor represents the spatial distribution of color of visual signals in a very compact form. This compactness allows visual signal matching functionality with high retrieval efficiency at very small computational costs. The color-structure descriptor captures color content and
information about the structure of this content. To represent each image for image retrieving, their low level features are calculated. After that, an algorithm of searching and indexing is applied for retrieving the most similar images.
Existing System
In existing system the standard histograms are used. Standard histograms, because of their efficiency and insensitivity to small changes, are widely used for content based image retrieval. But the main disadvantage of histograms is that many images of different appearances can have similar histograms[4] because histograms provide coarse characterization of an image.
Proposed System
In this project, we propose to use K-means clustering for the classification of feature set obtained from the histogram refinement method. Histogram refinement provides a set of features for proposed for Content Based Image Retrieval (CBIR). Histogram refinement method further refines the histogram by splitting the pixels in a given bucket into several classes and producing the comparison graph of 8-bin(bucket) and 16 bin.
conventional clustering, which measures similarity based on geometric distance. Conceptual clustering consists of two components: (1) it discovers the appropriate classes, and (2) it forms descriptions for each class, as in classification. The guideline of striving for high intraclass similarity and low interclass similarity still applies.
Such a redistribution forms new silhouettes encircled by dashed curves, as shown in Fig.1(b). Eventually, no redistribution of the objects in any cluster occurs and so the process terminates. The resulting clusters are returned by the clustering process. The k-means algorithm Algorithm: k-means. The k-means algorithm for partitioning based on the mean value of the objects in the cluster. Input: The number of clusters k and a database containing n objects. Output: A set of k clusters that minimizes the squared-error criterion. Method: (1) arbitrarily choose k objects as the initial cluster centers: (2) repeat (3) (re)assign each object to the cluster to which the object is the most similar, based on the mean value of the objects in the cluster; (4) Update the cluster means, i.e., calculate the mean value of the objects for each cluster; (5) Until no change; Block Diagram
Query Image Compute Similarity for 8-bin Retrieve Images
Database Image
Retrieve Images
Plot a Graph
Find Histogram
Find Difference
Find Histogram
Find Histogram
Find Difference
Find Histogram
Quantize to 8-Bin
Quantize to 16-Bin
The purpose of K-mean clustering is to classify the data. We selected K-means clustering because it is suitable to cluster large amounts of data. K-means creates a single level of clusters unlike hierarchical clustering methods tree structure. Each observation in the data is treated as an object having a location in space and a partition is found in which objects within each cluster are as close to each other as possible, and as far from objects in other clusters as possible. Selection of distance measure is an important step in clustering. Distance measure determines the similarity of two elements. It greatly influences the shape of the clusters, as some elements may be close to one another according to one distance and further away according to another. We selected to use quadratic distance measure which provides the quadratic between the various features. We calculated the distance between all the row vectors of our feature set obtained from previous section, hence finding similarity between every pair of objects in the data set. The result is a distance matrix. Next, we used the member objects and the centroid to define each cluster. The centroid for each cluster is the point to which the sum of distances from all objects in that cluster is minimized. The distance information generated above is utilized to determine the proximity of objects to each other. The objects are grouped into Kclusters using the distance between the centroids of the two groups. Let Op is the number of objects in cluster p and Oq is the number of objects in cluster q, dpi is the ith object in cluster p and dqj is the jth object in cluster q. The centroid distance between the two clusters p and q is given as:
Where,
Image Classification by K-means Clustering Retrieval For 8-Bin Retrieval For 16-Bin
Plotting a Graph
Find the Precision rate for the retrieved images of using 8-bin and 16-bin. Precision rate = relevant image/ returned image Find the Recall rate for the retrieved images of using 8-bin and 16-bin. Recall rate = relevant image /total number of image.
Conclusion
This project proposes to use K-means clustering for the feature set obtained using the histogram refinement method which is based on the concept of coherency and incoherency. The feature selection is based on the number, color and shape of objects present in the image. The grayscale values difference, mean, sizes of the objects are considered as appropriate features for retrieval. For indexing of images, we proposed
K-means clustering. We have shown that K-means clustering is quite useful for relevant image retrieval queries. Future Enhancement This classification of feature set can be enhanced to heterogeneous (shape, texture) so that we can get more accurate result. It can also enhanced to merging of heterogeneous features and neural network. The schemes proposed in this work can be further improved by introducing fuzzy logic concepts into the clustering process.
References
[1] [2] [3] [4] [5] Youngeun An, Junguk Baek etal,Classification of Feature set using K-means Clustering from Histogram Refinement method,IEEE 2008. Donn Morrison, Stephane Marchand Maillet, Eric Bruno, Semantic clustering of images using patterns of relevance feedback, IEEE 2008. Bink Wang, Xin Zhang,XiaoYan Zhao, Zhi-De Zhang,Hong- Xia zhang, A semantic description for content based Image retrieval ,IEEE 2008. Raquel E.Patino-Escarcina and Jose Alfredoferreira costa, The semantic clustering of images and its relation with low level color features, IEEE 2008. Yixin Chen, James Z. Wang, Krovetz, CLUE: cluster-based retrieval of images by unsupervised learning, IEEE Transaction on Image Processing vol.14, No.8, August 2005. Xiaoxin Yin, Mingjing Li, Lei Zhang, Hongjiang Zhang, Semantic image clustering using relevance feedback, IEEE 2003. Gholamhosein, Sheikholeslami et al., Semquery:Semantic Clustering and Querying on Heterogeneous Features For Visual Data, IEEE Transaction on Knowledge and Data Engineering ,vol .14,No.5,September/October 2002. J.Z.Wang, J. Li, and G.Wiederhold. Simplicity: semanticssensitive integrated matching for picture libraries. IEEE Trans. Pattern Anal. Mach. Intell., 23(9):947963, 2001.
[6] [7]
[8]