Dimensions in Data Processing 2402

Dimensions in Data Processing & Data Management Technology -
Data Structures & Algorithms Efficiency Concerns

Vishwambhar Pathak Sr. Lecturer, Dept. of Computer Science & Engineering, BITIC-RAK (UAE)
The efficiency of a computer application is greatly affected by the features of the underlying Database Management System. Led by the fast advancements in applications of computing and information technology, the database technology is also expanding at a phenomenal rate. We find aspects of database management: data structures and data processing algorithms with regard to the varying data characteristics and processing environment. Present work aims at providing comprehensive and integrated account of different types of databases concurrent in literature with comments and review of them, bringing out the point of similarities among them. INTRODUCTION The contemporary database management methodologies viz. Relational data model, Object data model, AI techniques based Knowledge (base) management, Information Retrieval & Exploration and Data Configuration (XML) techniques are largely under enhancements as the characteristics of data and the processing environment of various applications largely vary and pose considerable challenges. Data Characteristics: Multimedia data, Time-Series data, Temporal data, XML data, Multidimensional data Processing Environments: Real time processing, Parallel & Distributed processing, Mobile Computing, P2P n/w. The focus of current research will be i) To find solutions (Data representation, Indexing, querying, processing algorithms) to unsolved difficulties arising due to data characteristics and typicality of the processing environment as summarized above. ii) To find better ways of enhancement of performance in data management techniques.
Review Of Contemporary Research Related To Representation And Processing Of Multimedia Data

[ Data Representation Concerns: 3D graphic/ objects handling ( may be studied later..); Event centric; Logic Based representation of Multimedia data; Representation of Audio data; Universal Common format ] [ Processing Concerns and Solutions: Content- based retrieval (Querying){ CBIR, Color based retrieval, Content based image authentication, Browsing 3D tool, reactive retrieval in distributed env., semantic-based access in mobile n/w}; Data hiding {Blockbased-lossless, Distortion~, ERC, Quantization based data hiding}; Error Concealment {ERC, Error Concealment, SAR image denoising}; Protecting sensitive data ; Replication {Multi-quality data replication, Transparent ~}; Information Exploration (Learning from data){Data history tool, Clustering using time-series, Feature extraction}; Indexing{ Content based, graph based, transform based, wavelet based, for human motion, image data, for multi-feature music, for VLDB-multidimensional ( 2n-tree)}; Geo-spatial-temporal data processing ]
For data to be gainfully and meaningfully used in various applications, it is essential to have efficient schemes for data management and manipulation, which broadly involve acquisition, organization, storage, query, retrieval, transmission, and presentation of data. A DataBase Management System (DBMS) organizes huge amounts of data into a database and provide utilities for the efficient storage, usage, and management of data. A multimedia database management system (MMDBMS) should have capabilities of traditional DBMSs and much more. With multimedia data, the ability to access all the data with similar features becomes limited with keyword-based indexing and exact (or range) searching. This makes automatic analysis, classification and content-based query, and similarity-based search as part of MMDBMS a necessity. Data Representation Models and Concerns The use of multimedia data in many applications has increased significantly. Some examples of these applications are distance learning, digital libraries, video surveillance systems, and medical videos. As a consequence, there are increasing demands on modeling, indexing and retrieving these data.
[ Concerns: 2D Video, Graph based video, moving object detection and tracking, human motion compressed ]
I. Modeling and refinement of Scalable Video Coding Modeling and refinement of Scalable Video Coding is much under study [1]. The scalable extension of H.264/MPEG4-AVC is a current standardization project of the Joint Video Team (JVT) of the ITU-T Video Coding Experts Group (VCEG) and the ISO/IEC Moving Picture Experts Group (MPEG). The basic SVC design can be classified as layered video codec.
In general, the coder structure as well as the coding efficiency depends on the scalability space that is required. An important feature of the SVC design is that scalability is provided at a bit-stream level. Bit-streams for a reduced spatial and/or temporal resolution can be simply obtained by discarding NAL units (or network packets) from a global SVC bit-stream that are not required for decoding the target resolution. NAL units of PR(progressive refinement) slices can additionally be truncated in order to further reduce the bit-rate and the associated reconstruction quality. Temporal Scalability: In a recent model named H.264/MPEG4-AVC, any picture can be marked as reference picture and used for motion-compensated prediction of following pictures independent of the corresponding slice coding types. These features allow the coding of picture sequences with arbitrary temporal dependencies.
So-called key pictures are coded in regular intervals by using only previous key pictures as references. The pictures between two key pictures are hierarchically predicted as shown in Fig. 2. It is obvious that the sequence of key pictures represents the coarsest supported temporal resolution, which can be refined by adding pictures of following temporal prediction levels. Spatial scalability is achieved by an oversampled pyramid approach. The pictures of different spatial layers are independently coded with layer specific motion parameters as illustrated in Fig. 1. However, in order to improve the coding efficiency of the enhancement layers in comparison to simulcast, additional inter-layer prediction mechanisms have been introduced. Inter-layer prediction techniques: The following three inter-layer prediction techniques are included in the SVC design. In the following, only the original concepts based on simple dyadic spatial scalability are described.
4
1. Inter-layer motion prediction: In order to employ base layer motion data for spatial enhancement layer coding, additional macroblock modes have been introduced in spatial enhancement layers. The macroblock partitioning is obtained by upsampling the partitioning of the co-located 8x8 block in the lower resolution layer. The reference picture indices are copied from the co-located base layer blocks, and the associated motion vectors are scaled by a factor of 2. These scaled motion vectors are either used unmodified or refined by an additional quartersample motion vector refinement. Additionally, a scaled motion vector of the lower resolution can be used as motion vector predictor for the conventional macroblock modes. I. Inter-layer residual prediction: A flag that is transmitted for all inter-coded macroblocks signals the usage of inter-layer residual prediction. When this flag is true, the base layer signal of the co-located block is block-wise upsampled and used as prediction for the residual signal of the current macroblock, so that only the corresponding difference signal is coded. II. Inter-layer intra prediction: Furthermore, an additional intra macroblock mode is introduced, in which the prediction signal is generated by upsampling the co-located reconstruction signal of the lower layer. For this prediction it is generally required that the lower layer is completely decoded including the computationally complex operations of motion-compensated prediction and deblocking. However, this problem can be circumvented when the inter-layer intra prediction is restricted to those parts of the lower layer picture that are intra- coded. With this restriction, each supported target layer can be decoded with a single motion compensation loop. Recently a fast search motion estimation algorithm for H.264/AVC SVC (scalable video coding) base layer with hierarchical B-frame structure for temporal decomposition has been presented [2]. The proposed technique is a block-matching based motion estimation algorithm working in two steps, called Coarse search and Fine search. The Coarse search is performed for each frame in display order, and for each 16x16 macroblock chooses the best motion vector at half pel accuracy. Fine search is performed for each frame in encoding order and finds the best prediction for each block type, reference frame and direction, choosing the best motion vector at quarter pel accuracy using R-D optimization. Both Coarse and Fine Search test 3 spatial and 3 temporal predictors, and add to the best one a set of updates. The spatial predictors for the fine search are the result of the Fine search already performed for the previous blocks, while the temporal predictors are the results of Coarse Search scaled by an appropriate coefficient. This scaling is performed since in the Coarse search each picture is always estimated with respect to the previous one, while in the Fine Search the temporal distance between the current picture and its references depend on the temporal decomposition level. Moreover in Fine search the number and the value of the updates tested depend on the distance between the current picture and its references. These sets of updates are the result of a huge number of simulations on test sequences with different motion features. II. Storage Concerns and solutions for 2D Scalable video (H.264/MPEG-4 scalable video coding (SVC) ) SVC provides a multi-dimensional scalability, it supports multiple temporal, spatial and SNR resolutions simultaneously. For the multi-dimensional scalability, SVC enables much more flexible adaptation to various demands of users and network conditions. With a scalable video, the video server has to extract the exact sub-stream data that corresponds to the requested resolution, from the full resolution stream. In this case, the extracted sub-
5
stream data may disperse in the disk. Thus, an access of scalable video stream at a corresponding resolution might incur more disk requests that degrade overall disk throughput severely. Alternatively, server retrieves all streams including extra sub-streams that are not requested but located between the currently requested sub-streams. However, it may also cause huge waste of disk bandwidth and memory buffer since a large amount of disk throughput should be consumed to retrieve unnecessary data and these should be retained in memory until transmitting them. The disk throughput is a crucial factor that must be taken into account in a video server design, since disk throughput may restrict the maximum number of clients serviced simultaneously. There has been several works for placement of scalable video stream or multi-resolution non-scalable video streaming one disk or a disk array. In multi-resolution non-scalable video stream, Shenoy [7.1] and Lim [7.2] have proposed a placement strategy that interleaves multi-resolution video stream on a disk array and enables a video server to efficiently support playback of these streams at different resolution levels. This placement algorithm ensures that each sub-stream within a stream is independently accessible at any resolution and the seek time and rotational latency overheads are minimized. In addition, they presented an encoding technique that enables a video server to efficiently support scan operations such as fast-forward and rewind. Rangaswami [7.4] developed the interactive media proxy that transforms non-interactive broadcast or multicast streams into interactive ones. They carefully manage disk device by considering disk geometry for allocation and making several stream files according to the fast-forward levels. However, this method consumes large amount of storage space, and they did not consider disk array management. For the scalable video data, Chang [7.3] have proposed a strategy for scalable video data placement that maximizes the total data transfer rate on a disk for an arbitrary distribution of requested data rates. The main concept of this strategy is the frame grouping, which orders data rate layers within one storage unit on a disk. It allows the optimal disk operation in each service-round by performing one seek and a contiguous read of the exact amount of data requested. Kang [7.9] presented harmonic placement strategy. In this scheme, the layers are partitioned into a set of lower layers and a set of upper layers. In the lower layer group, they interleave data blocks of all layers within the same service round. Meanwhile, in the upper layer group, they cluster the data blocks in a layer together. Using this scheme, they can reduce disk seek time, since they can cluster the frequently accessed layers together. However, these schemes described above are not fully utilize the characteristics of scalable video in the video server that can provide multidimensional scalable video stream. They are limited to only a single dimensional scalable video. In a recent work [7], an efficient data reorganization and placement scheme for two dimensional scalable video in a disk array-based video server has been proposed which considered both disk utilization and load balancing in a disk array based video server. According to it, we reorganize sub-streams taking into account both of the decoding dependency of two-dimensional scalable video and the location to be stored in a disk array.
The Two Dimensional SVC Rearrangement
6
SVC provides tools for three scalability dimensions, which are temporal scalability, spatial scalability and quality (SNR) scalability. We focus on the two of them, spatial and temporal scalability, for the sake of simplicity. Spatial scalability technique encodes a video into several levels that have different spatial resolutions each other. On the other hand, temporal scalability is a technique to encode a video sequence into several levels having different frame rate. These scalability dimensions including spatial and temporal can be easily combined to a general scalable coding scheme which can provide a wide range of spatial and temporal scalability.
Figure 1. An Illustration of Two Dimensional Scalable Video
Figure 1(a) describes a combined scalability which support simultaneously for spatial and temporal scalability. When a combined scalability is considered, strict notion of layer does not need to apply any more [2]. Instead, we define combined scalability level that consists of Ls and Lt, i.e. each scalability dimension has its own level. Ls and Lt represent spatial and temporal scalability level, respectively. The scalability level in each dimension represents the quality of the video in the corresponding dimension. In scalable video stream, data segments can be grouped into a minimum sub-stream that is capable of extending scalability level. Thus, in a scalable video server, data retrievals are requested in units of this minimum sub-stream. We define this sub-stream as unit sub-stream (US) for two dimensional scalability. The US, Uk(l, m), is defined as a partial stream of kth GOP, which is an essential sub-stream for reconstruction of video at the resolution of higher than spatial scalability level l and temporal scalability level m. Thus, to reconstruct kth GOP at spatial scalability level Ls and temporal scalability level Lt, all the US's, U k(l, m), such that l <= Ls and m <= Lt, should be extracted from the entire stream. GOPk(Ls, Lt), sub-streams for kth GOP at spatial scalability level Ls and temporal scalability level Lt, is represented with US's as follows.
7
GOP k ( Ls , Lt ) = {U k (l , m) | l Ls , m Lt }
(1)
In the Figure 1(a), relation between scalability level and US's is described. It also represents how scalability level is related to frame rate and frame size. The encoded scalable video streams are stored in unit of frame, as shown in Figure 1(b). The number marked on the top of each frame represents the decoding order. Basically, data of encoded video stream are stored with their decoding order. To exploit access pattern determined by scalability, data should be partitioned according to US's for the first time. Figure 1(c) shows this data placement partitioned according to US's. Starting from this placement, the work proposed more efficient placement scheme. For the scalable video, the requested video streams are likely to be retrieved with discontinuous manner in one service round duration, since the extracted sub-stream data disperse in the disk. Thus, an access of these streams at a corresponding resolution might incur more disk requests. To reduce seek-overhead, server can retrieve the substreams including extra sub-streams that are not requested but located between the currently requested sub-streams. In the view of this, our retrieval policy is that one disk request is generated per one round duration for each disk, even though it retrieves unnecessary sub-streams. We try to find the optimal placement based on this retrieval policy. Meanwhile, the request load balancing is important between disks of a disk array. When video streams are stored into disk array, disk striping is performed by dividing the video data into blocks according to their decoding order and storing these blocks into different disks. Sub-streams at a corresponding resolution might be located in some disks but not in some disks. It incurs a biased disk requests and load imbalance between disks, which is not efficient in a disk array-based server. Thus, the optimal data placement can be obtained by finding the placement which satisfies both of two criteria: Criterion 1. For each disk request, the server should retrieve minimum unnecessary sub-streams to maximize disk utilization during one service round Criterion 2. The server should generate disk requests to balance loads between disks during one service round Let us suppose two dimensional scalable video that has three spatial scalability levels and five temporal scalability levels, and we have a disk array consisting of four disks. Scalable video stream is originally arranged and partitioned into USs, as described in the previous section. Then, these are initially stored into disks, in which the stripe means the closed set of one round duration, as shown in Figure 2. The GOP data can be filled to match with striping distance using FGS layer of quality scalability, which is described as U(F) in the Figure.
8
For a general approach, the optimal data placement can be obtained by finding the placement that optimize the request size to be retrieved for each disk and distribute as even as possible between disks, during one service round. Let pij be the probability of retrieving sub-stream at Ls = i, Lt = j, and let Sk = [s1, s2, ., sN] denote one of the possible data placement sequences in kth disk, where sn denotes nth US of GOP. Accordingly, S =< S 1, ., SK > denote the continuous sequence across the K disks of one GOP. Let R ij (Sk) denote the request size that occur when sub-stream at Ls = i, Lt = j is retrieved from the stream organized as sequence Sk for kth disk. Let the spatial scalability level and temporal scalability level be L and M, respectively, and the number of disks be N. In the first step, from the criterion 1, we obtain the first data placement by finding S k for each disk that can minimize R(S), the total request size to be retrieved during one service round from the following equation.
We can obtain several candidate placements from the Eq. 2. In the next step, we select the one that can maximize the disk load balancing from the criterion 2. Let Lij (S) denote the load balancing factor for scalability level Ls = I and Lt = j. Load balancing between disks means how the disk requests are distributed as even as possible, so the overall load balancing factor, L(S), can be described as following equation.
where
ij
denote the number of disks to be accessed for scalability level Ls = i and Lt = j.
Later the placement
policy is finding the stream sequence Sk for each disk by finding maximum of the Eq. 3. The procedure of local optimal placement search is described as follows. 1. Reorganize a raw scalable video stream, of which data are basically placed in their decoding order, into US's. Thus, data are ordered according to scalability level. Then, let i = 1 and = 1, accordingly the initial sequence is considered as S1(1).
2.
Whenever i increases, the sequence of stream S (1), is re-ordered, in which the scraper, US , is relocated in the
ith location within the sequence. 3. For each sequence S (1), it is splitted into sub sequences, Sk(1), for each disk in a disk array. Then, the total retrieval size, R(S (i)), and load balancing factor, L (S (i)) is calculated for that sequence from the Eq. 2 and 3. If it is better than previous one, we replace it to the optimal sequence S. For the current , this search is repeated until i reaches (L . M). 4. While the increases from 1 to (L . M), the scraper is changed. Using this US, US , the search algorithm is repeated from 2 to 3. Finally, the local optimal sequence of stream, S, is selected at the end of the repeat. When we apply this search algorithm to the initial sequence of Figure 2, we can obtain the placement of Figure 3. In the above placement, the client distribution probability is assumed to be pre-defined parameter. In particular, the placement can be optimal when all the scalability level is requested in the same probability.
III. Graph-based Approach for modeling and indexing Video:
In [5], a new graph-based video data structure, called Spatio-Temporal Region Graph (STRG), which represents spatio-temporal features, and the relationships among the video objects. Region Adjacency Graph (RAG) is generated from each frame, and STRG is constructed by connecting RAGs. The STRG is segmented into a number of pieces corresponding to shots for efficiency. Then, each segmented STRG is decomposed into its subgraphs, called Object Graph (OG) and Background Graph (BG) in which redundant BGs are eliminated to reduce index size and search time. The proposed indexing starts with clustering OGs using Expectation Maximization (EM) algorithm [5.1] for more accurate indexing. To cluster them, we need a distance measure between two OGs. For the distance measure, the paper proposed Extended Graph Edit Distance (EGED) because the existing measures are not very suitable for OGs. The EGED is defined in non-metric space for clustering OGs, and it is extended to metric space to compute the key values for indexing. Based on the clusters of OGs and the EGED, it proposed a new indexing structure STRG-Index which provides efficient retrieval. Spatio-Temporal Region Graph For a given video, each frame is segmented into a number of regions using a region segmentation technique. Then, Region Adjacency Graph (RAG) is obtained by converting each region into node, and spatial relationships among regions into edges [5.2], which is defined as follows: Definition 1 Given the nth frame fn in a video, a Region Adjacency Graph of fn, Gr(fn), is a four-tuple Gr(fn) ={V, ES, , }, where V is a finite set of nodes for the segmented regions in fn, ES
V V is a finite set of spatial edges between adjacent nodes in fn,
: V AV is a set of functions generating node attributes, and : ES AE S is a set of functions generating spatial edge attributes. The node attributes (AV ) represent size (i.e., number of pixels), dominant color and location of corresponding region, the spatial edge attributes ( AE S ) represent the relationships between two adjacent nodes such as spatial distance and orientation. RAG is good for representing spatial relationships among nodes indicating the segmented regions. However, it cannot represent temporal characteristics of video. In the new graph-based data structure for video, Spatio-Temporal Region Graph (STRG) which is temporally connected RAGs [5.3]. The STRG can handle both temporal and spatial characteristics of video, and defined as follows: Definition 2: Given a video segment S, a Spatio-Temporal Region Graph, Gst(S), is a six-tuple Gst(S) = {V,E S,ET , , , }, where V is a finite set of nodes for segmented regions from S, ES ET
V V is a finite set of spatial edges between adjacent nodes in S, V V is a finite set of temporal edges between temporally consecutive nodes in S,
: V AV is a set of functions generating node attributes,
10
: ES AE S is a set of functions generating spatial edge attributes, and : ET AET is a set of functions generating temporal edge attributes. In STRG, the temporal edge attributes ( AET ) represent the relationships between corresponding nodes in two consecutive frames such as velocity and moving direction. Figure 1 (a) and (b) are actual frames in a sample video and their region segmentation results, respectively. Figure 1(c) shows a part of STRG for frames #141 #143 constructed by adding temporal edges which are horizontal lines between the frames.
An STRG is an extension of RAGs by adding temporal edges (E T) to them. ET represents temporal relationships between corresponding nodes in two consecutive RAGs. The main procedure of building STRG is therefore, how to construct ET , which is similar to the problem of objects tracking in a video sequence. To find the corresponding nodes in two consecutive RAGs, a graph isomorphism and maximal common subgraph was used. These algorithms are conceptually simple, but have a high computational complexity. To address this, a RAG was decomposed into its neighborhood graphs (GN(v)) which are subgraphs of RAG as follows: Definition 3 GN(v) is the neighborhood graph of a given node v in a RAG, if for any nodes u adjacent node of v, and has one edge such that eS = (v, u).
m m+ 1 Let G N and G N be sets of the neighborhood graphs in mth and (m + 1)th frames
GN(v), u is the
respectively. For each node v in mth frame, the goal is to find the corresponding target node v in (m+1)th frame. To decide these corresponding nodes, we use the neighborhood graphs
m in Definition 3. For each neighborhood graph GN(v) in G N , the goal is converted to finding
11
m+ 1 the corresponding target graph GN (v) in G N , which is an isomorphic or the most similar m+ 1 graph to GN (v). First, we find the neighborhood graph in G N , which is isomorphic to m+ 1 GN(v). Second, if we cannot find any isomorphic graph in G N , we find the most similar
neighborhood graph to GN(v) using a similarity measure, SG(GN(v),GN(v)), which is defined as follows:
SG (G N (v), G N (v' )) =
| GC | min(| G N (v) |, (G N (v' ) |)
(1)
where |G| denotes the number of nodes of G, and GC is the maximal common subgraph of GN(v) and GN(v). GC can be computed based on maximal clique detection. For GN(v)
m GN ,
m+ 1 GN(v) is the corresponding neighborhood graph in G N , whose SG with GN(v) is the largest m+ 1 among neighborhood graphs in G N , and greater than a certain threshold value. In this
way, we find all pairs of corresponding neighborhood graphs (eventually corresponding

m m+ 1 nodes) from G N to G N .
Object Graph An STRG is first decomposed into Object Region Graphs (ORGs) to model moving objects. We consider a temporal subgraph that can be defined as a set of sequential nodes connected to each other by a set of temporal edges (ET). An ORG is a special case of temporal subgraph of STRG when the spatial edge set ES is empty. However, due to the limitations of region segmentation techniques, different color regions belonging to a single object cannot be detected as a single region. For instance, a body of person may consist of several regions such as head, upper body and lower body. Figure 2 (a) shows an object that is segmented into four regions over three frames. Since there are four regions in each frame, we build four ORGs, i.e. (v1, v5, v9), (v2, v6, v10), (v3, v7, v11), and (v4, v8, v12) like Figure 2 (b). Since they belong to a single object, it is better to merge those ORGs into one.
12
For convenience, we refer to the merged ORGs as Object Graph (OG). In order to merge two ORGs which belong to a single object, we consider the attributes (i.e. velocity and moving direction) of temporal edge (ET ). If two ORGs have same moving direction and same velocity, these can be merged into one. In Figure 2 (c), four ORGs are merged into a single OG, i.e. (v2, v6, v10). After OGs are extracted, the remainders of STRG represent background information of a video. We call this graph as a Background Graph (BG) and it is used in indexing. STRG Indexing In this section, the paper proposed a graph-based video indexing method, called SpatioTemporal Region Graph Index (STRG-Index), which uses the Extended Graph Edit Distance(EGED)M as a distance measure in metric space, and clustered OGs.
t Gs The Extended Graph Edit Distance (EGED) between two object graphs O m and OG n is
defined as:
13
In order to satisfy the triangle inequality, EGED is specialized to be metric distance function (see Theorem 1) by comparing the current value with the fixed constant. Theorem 1: If gi is a fixed constant, then EGED is a metric.
STRG-Index Tree Structure To build an index for video data, we adapt the procedure of tree construction proposed in Mtree [5.4] since it has a minimum number of distance computations and a good I/O performance. In M-tree, a number of representative data items are selected for efficient indexing. There are several ways to select them such as Sampling or Random selection. In the STRG-Index, we employ the clustering results to determine the representative data items. The STRG-Index tree structure consists of three levels of nodes; shot node, cluster node, and object node as seen in Figure 3.
14
The top-level has the shot node which contains the information of each shot in a video. Each record in the shot node represents a segmented shot whose frames share a background. The record has a shot identifier (ShotID), a key RAG (Grkey), an actual BG (BGr), and an associated pointer (ptr) which references the top of corresponding cluster node. The following figure shows an example of a record in the shot node.
The mid-level has the cluster nodes which contain the centroid OGs that represent cluster centroids. Each record indicates a representative OG among a group of similar OGs. A record contains its identifier (ClusID), a centroid OG (OGc) of each cluster, and an associated pointer (ptr), which references the top of corresponding object node. The following figure shows an example of a record in a cluster node.
The low-level has the object nodes which contain OGs belonging to a same cluster. Each record in the object node represents an object in a video, and has the index key (which is computed by EGEDM(OGm,OGc)), an actual OG (OGm), and an associated pointer (ptr) which references the actual video clip in the disk. The following figure shows an example of a record in the object node.
STRG-Index Tree Construction
Based on the STRG decomposition described above, an input video is separated into foreground (OG) and background (BG) as subgraphs of the STRG. The extracted BGs are stored at the root node without any parent. All the OGs sharing one BG are in a same cluster node. This can reduce the size of index significantly. For example, in surveillance videos a camera is stationary so that the background is usually fixed. Therefore, only one record (BG) in the shot node is sufficient to index the background of the entire video. We synthesize a centroid OG (OGc) for each cluster which is a representative OG for the cluster. This centroid OG is inserted into an appropriate cluster node as a record. This centroid OG is updated as the member OGs are changed such as inserting, deleting, etc. Also, each record in a cluster node has a pointer to an object node. The object node has actual OGs in a cluster, which are indexed by EGEDM. To decide an indexing value for each OG, we compute EGEDM between the representative OG (OGc) in the corresponding cluster
15
and the OG (OGm) to be indexed. Since EGEDM is a metric distance by Theorem 1, the value can be the key of OG to be indexed. IV. Moving Object Detection
Moving object detection is very important in intelligent surveillance. Currently, the main detection algorithms include frame difference method, background subtraction method, optical flow method and statistical learning method. Optical flow method is the most complex algorithm. It spends more time than other methods, and statistical learning method needs many training samples and also has much computational complexity. These two methods are not suitable for real-time processing. Background subtraction method is extremely sensitive to the changes of light. Frame difference method is simple and easy to implement, but the results are not accurate enough, because the changes taking place in the background brightness cause misjudgment [6.1,6.2,6.3,6.4]. According to that eyes are sensitive to both of movement and edges, in a recent work [6], an efficient algorithm based on frame difference and edge detection is presented for moving object detection. Figure 1 gives The flow chart of frame difference method
Figure 1 The flow chart of frame difference method
The flow chart of the detecting process by moving edge method is as Figure 2.
Figure 2 The flow chart of moving edge method
The flow chart of the detection process by using the method based on frame difference and edge detectionpresented in this paper is as figure 3.
16
Figure 3 The flow chart of the improved algorithm
Further, Object segmentation is performed to divide the image into moving area and static area. Then After separating moving objects and background, we need to locate the object so as to get the exact position of moving objects. The common approach is to calculate connected components in binary images, delete those connected components whose area are so small, and get circum-rectangle of the object.
V. Motion Picture Storage with Compression [8]
NIMATION
of human-like virtual characters has potential applications in the design of human
computer interfaces, computer games, and modeling of virtual environments using powerconstrained devices such as laptop computers in battery mode, pocket PCs, and PDAs. Distributed virtual human animation is used in many applications that depict human models interacting with networked virtual environments. The two major issues involved in the streaming of MoCap (Human Motion Capture) animation data to mobile devices are 1) limited bandwidth available for streaming MoCap data, and 2) limited power available to receive, decompress, and render the compressed MoCap data. It is desirable to have a compression method which reduces the network bandwidth enough to allow streaming/ using in mobile devices, and also requires less computation, hence, power consumption, at the client side to reconstruct the motion data from the compressed data stream. In order to standardize virtual human animation, MPEG-4 has proposed H-Anim standards for representation of virtual humans and the format of the corresponding motion capture (MoCap) data to be used for rendering and animating the virtual human [8.1], [8.2], [8.3]. A recent compression algorithm for MoCap data (or, equivalently, MPEG-4 Body Animation Parameters (BAP) data), termed as BAP-Indexing [8.4], which uses indexing techniques for compression of BAP data, resulting in a significant reduction in power consumption required
17
for decompression. BAP-Indexing exploits the structural hierarchy of the virtual human to achieve efficient compression, which, though lossy, results in reconstructed motion of good quality.
Fig. 1. The standard compression and decompression pipeline for MPEG-4 Motion Capture (MoCap) data or Body Animation Parameter (BAP) data.
Matrix Reprsentation Of MoCap Data
The MoCap Data (or, equivalently, MPEG-4 BAP data) is represented by an n x mdimensional matrix X, where n is a multiple of the video sampling rate or frame rate expressed as frames per second (fps) and m is the number of degrees of freedom for the virtual human (the maximum value of m = 296 as defined in the MPEG-4 standard). Each row of the matrix represents a pose of the virtual human for a small time step. Each column of the matrix corresponds to either the displacement of the model from a fixed origin, or the Euler angle of rotation needed to achieve the desired pose. We have used a 62-dimensional virtual human, with a frame rate of 33 fps. This means that, for a 10 second motion sequence, the motion matrix X is a 330 x 62 array of floating point numbers. The first three columns of X represent the absolute displacement of the virtual human with respect to a fixed origin in the 3D virtual world. The next three columns represent the absolute orientation of the virtual human with respect to the virtual world coordinates. The remaining 56 columns correspond to the angles made by the degrees of freedoms associated with the various joints in the skeletal virtual human model. As a first step in the compression process, the matrix X is equivalently represented as a difference matrix, d Ij = X1j dij = Xi+1, j - Xij
n-1x m
, and the initial pose vector I, where I is assigned the first row of X, j = 1, 2, , m i = 1n-1; j= 1m.
and the rows of d are the differences between successive rows of X.
The difference matrix d, subsequently termed the motion matrix, can be interpreted as successive small angular increments (floating point numbers) needed by the virtual human,
18
for each of its degrees of freedom, in order to realize the desired animation. Without loss of generality, we assume that d has n rows.
Bap-Indexing: Indexing Of The BAP Data
For approximately periodic and regular motions such as walking, jogging, and running, a collection of all the n x m floating point numbers within the corresponding motion matrix d exhibit a tendency to form a finite number of well separated clusters. Taking a cue from this observation, we assign the n x m floating point numbers in d to a finite number of buckets. Each bucket, in turn, is associated with a representative number which best describes the collection of the numbers within the bucket. The basic concept underlying the proposed indexing technique is to be able to index some (perhaps all) of the numbers within the original motion matrix d and generate a corresponding lookup table for the indices.
Indexing the Motion Matrix d
Step 1: All the data in matrix d is collected into a single 1D array A of size n x m. The array A is sorted in ascending order. All the numbers in A are multiplied by the resolution quantization term (RQT), M. The RQT depends on the number of significant digits used to represent the floating point number. For example, if the required accuracy of the floating point numbers is a maximum of four digits, RQT = 10, 000. The numbers are rounded off to represent integers in the range [Amin . M, Amax .M]. Step 2: The integers in the range [Amin . M, Amax .M] are divided into buckets numbered from 0 to 255. It is desirable to allocate each of the 256 buckets an equal share of the n x m numbers in A. The rationale behind assigning the 256 buckets an equal share of n x m numbers in the motion sequence is that BAP data clusters that contain more data points are assigned more buckets (hence, more indices). In essence, the indices are distributed among the clusters in proportion to the cluster size (note that the number of indices is fixed by fixing the number of bits per index). This scheme is similar to adaptive vector quantization which is known to reduce the overall encoding error. Thus, each bucket should have freq = (Amin . M - Amax .M)/256 numbers allocated to it. This is achieved by computing the histogram of the integers in A, and dividing the histogram into 256 vertical strips such that each strip has the same area, freq. After all the numbers in A have been allocated a bucket numbered from 0 to 255, the numbers in A are divided by the RQT to recover the original numbers. At the end of Step 2, we get a set of 256 buckets denoted by bucket (j) for j = 0 to 255, such that each floating point entry in the motion data matrix, d is contained in exactly one of the 256 buckets. An index matrix dindex is used to store the bucket number (index) for the corresponding entry in the matrix d.
Lookup Table for the Index Matrix dindex
19
The lookup table is used to map each of the indices to a corresponding representative number such that a suitable approximation to the original motion matrix d can be recovered. The creation of an appropriate lookup table for the recovery of the original motion data matrix d from the index matrix dindex is critical, since recovery of the original data after discretization invariably results in motion distortion. A straightforward method to recover the number associated with a bucket is to compute the simple average of all the floating point numbers assigned to the bucket. However, this invariably leads to poor approximation of the original motion matrix d. We have observed that intelligent exploitation of the hierarchical structure of the skeletal virtual human model can lead to better construction of the lookup table Tlookup, which in turn results in reduced error in the reconstructed motion using the lookup table. The steps for creating the lookup table are detailed as follows: Step 1: The virtual human is represented by a hierarchical skeletal model. For each mdimensional pose vector, each dimension, or column in the motion matrix d, is assigned a level li (Fig. 2). The level li signifies the importance of the degree of freedom associated with a particular joint in the overall displacement of the model joints. A joint i, at level li = 1, when given a small angular displacement, affects the model more in terms of the overall displacement, than a joint j at level lj = 2, 3, 4, 5, or 6. Step 2: After assigning level values to the various joints of the virtual human model, these joint level values are used to compute a weighted sum of the numbers belonging to a bucket. The jth lookup value in lookup table Tlookup is given by:
where is a constant. Empirical observations have revealed that as
increases, the
Tlookup values result in a better approximation to the data, resulting in reduced displacement error. This is due to the fact that the numbers associated with level = 1 affect the displacements in the body the most. Hence, emphasizing the numbers within a bucket with level = 1 leads to better approximation of the motion data. As ->, all the weighting terms in (1) tend to zero, except for the terms with level = 1. Hence, when computing the weighted sum of the numbers in a bucket, we consider only those numbers with level = 1 (selective averaging), and compute a simple mean of these numbers. If none of the entries in a bucket have level = 1, we use the next smallest level to compute the weighted sum. Our empirical observations have shown that the BAP data values from all levels of the virtual human model form compact and well separated clusters. The data values with level = 1 in each bucket are fairly close to each other. This allows selective averaging (1) to be performed without introducing too much visual distortion.
20
Fig. 2. An example of the hierarchical structure of a virtual human skeletal model consisting of 31 nodes, with a total of 62 degrees of freedom of motion (rotational and translational). For convenience, the root node is drawn at the bottom.
The above mentioned paper further provides Motion Matrix Decomposition for Motion Sequences of Long Durations. VI. Other Techniques for Motion Pictures Besides MPEG-4, there exist other ad hoc quantization methods for efficient use and distribution of MoCap data over a network. Endo et al. [8.5] propose quantization of the motion type, rather than the motion data itself. Hijiri et al. [8.6] describe a new data packet format which allows flexible scalability of the transmission rate, and a data compression method, termed as SHCM, which maximizes the efficacy of this format by exploiting the 3D scene structure. The proposed method in this paper uses quantization to achieve data compression in a manner somewhat similar to the above work, but incorporates intelligent exploitation of the hierarchical structure of the human skeletal model. Giacomo et al. [8.7] present methods for adapting a virtual humans representation and the resulting animation stream, and provide practical details for the integration of these methods into MPEG-4 and MPEG-21 architectures. Aubel et al. [8.8] present a technique for using impostors to improve the display rate of animated characters by acting solely on the geometric and rendering information. Recently, Arikan [8.9] has presented a comprehensive MoCap database compression scheme which is shown to result in a significantly compressed MoCap database. The above techniques, although very efficient in terms of compression ratio, do not address the need for customized compression of BAP data for power aware devices. To this effect, the proposed BAP-Indexing technique is a refined and special case of standard clustering, quantization, and lookup (CQL) based compression schemes. BAP-Indexing not only allows for low-bitrate encoding of motion data, but is also suitable for data reception and data reconstruction on power-constrained devices.
21
Clues for future work on motion picture: A drawback associated with most animation research is that there is no perfect quantitative measure for the quality of the reconstructed motion. The compression error (the displacement of a body segment from its original location) is easily perceptible when the body segment touches an environment object, whereas a relatively large error is acceptable if the body segment is moving in an empty space. This observation can be exploited for enhancing the compression ratio provided that detailed models of the environment and the interaction of the virtual human with the environment are available. Finally, the intelligent use of the hierarchical structure of the model yields good results for full body motions of the virtual human; for small delicate motions such as movement of the fingers, or for facial animation, the proposed technique offers considerable scope for future improvement. VII. Modeling and refinement of Audio data The management of large collections of music data in a multimedia database has received much attention in the past few years. Due to several inherent characteristics of audio data, there have been demands for huge storage spaces, large bandwidth and real-time requirements for transmission, content-based queries, similarity-based search and retrievals, and synchronization of retrieval results. Of interest to the user are easy-to-use queries with fast and correct retrievals from the audio/multimedia database. To this end, (1) derivation of good features from the data to be used as indices during search, (2) organization of these indices in a suitable multi-dimensional data structure with efficient search, and (3) a good measure of similarity (distance measure) are important factors. An audio database supporting content-based retrievals should have the indices structured with respect to the audio features, which are extracted from the data. In the researches of music content-based retrieval, many approaches extract the features, such as key melodies, rhythms, and chords, from the music objects and develop indices that will help to retrieve the relevant music efficiently [9.5][9.8][9.12]. Several reports have also pointed out that these features of music can be transformed and represented in the forms of music feature strings [9.1][9.2][9.4][9.6][9.7] or numeric values [9.10][9.11] such that the indices can be created for music retrievals. We also can combine these features to support various types of queries. Existing Multi-feature Indexing for Music Data In the researches of indexing for music database retrieval, most of existing works were concentrated in constructing single-feature index structures for query searching: for instance, in 1999, the Key Melody Extraction and N-note Indexing by Tseng, Melodic Matching Techniques by Uitdenbogerd, et al., and Approximate String Matching Algorithm by
22
Liu, et al. [9.9]; in 2000, Query by Music Segments by Chen, et al. [9.2]; and in 2002, Numeric Indexing by Lo, et al. [9.10]. There are only a couple of researches emphasized on how to create a multi-feature index for music data retrieval. The most of recent works are Multi-Feature Index Structures [9.6] and Multi-feature Numeric Indexing [9.11]. We briefly discuss these two approaches in the following subsections.
i. Grid-Twin Suffix Trees There were four multi-feature index structures for music data retrieval proposed by Lee and Chen [9.6], in which it consists of Combined Suffix Trees, Independent Suffix Trees, Twin Suffix Trees, and Grid-Twin Suffix Trees. This research claimed that the structure of GridTwin Suffix Trees provides most scalability among them. The Grid-Twin Suffix Trees is an improved version from Twin Suffix Trees. An example of Twin Suffix Trees is shown in Figure 1. There could be two music features in the Twin Suffix Trees and each feature has its own index structure of independent suffix tree and there are links between them pointing from each node in one independent suffix tree to the corresponding feature nodes in another independent suffix tree.
Figure 1. Construction of the Twin Suffix Tree Trees
Figure 2. An example of the Grid-Twin Suffix
To construct Grid Twin Suffix Tree, they first use a hash function to map each suffix of the feature string into a specific bucket of a 2-dimensional grid. The hash function uses the first n symbols of the suffix to map it into a specific bucket. After hashing all suffixes, the remaining symbols of feature string following the suffixes are used to construct the Twin Suffix Trees and accompanied under the buckets. The Figure 2 shows an overview of the structure for Grid-Twin Suffix Tree. Considering melody and rhythm only, the hash function is as following,
23
where x and y are the row and column coordinates, respectively, and P(x, y) denotes the position of the bucket. The Numm, Numr, Mi, and Ri are the sizes of the melody and rhythm, the values of the ith symbols of melody and rhythm, respectively. The length of the suffix is denoted by n. ii. Multi-Feature Numeric Indexing The Multi-Feature Numeric Index for music data retrieval was proposed by Lo and Chen [9.11]. For translating music data into numeric value, they assume that the music symbols, a, b, c, , m, can map into integer values 0, 1, 2, , m-1, respectively. If we pick out a music segment with n sequential notes from a melody feature string, denoted x1x2xn, the integer value of each note can be represented by P(xi), 1 i n. Therefore, this segment of n sequential notes can be transformed into a numeric value by the conversion function v(n), as shown below.
Each music feature segment can be converted into a numeric value by equation (2) and these values for a music feature segment can be looked as a coordinate for multidimensional space. Such that the coordinate can be inserted into a multi-dimensional index tree, such as R-tree [9.3], for music retrieval. Therefore, it also can be extended for converting three or more features into high dimensional index tree. Although, the authors claimed that Grid-Twin Suffix Trees provides more scalability than the other three index structures in [6]. However, if there are more features or we use more symbols of suffixes (n > 2) to map into the buckets, a massive of memory space is needed for Grid-Twin Suffix Trees to construct buckets of grid structure. They may need a huge memory space and a sparse matrix may occur in the grid structure. In addition, since numeric index is created by transforming fixed length, n in equation (2), of music segment into numeric value, the main drawback of Multi-Feature Numeric Index is that the length of a query (Query By Example, QBE) is inflexible. It had better equal to the length of music segment which the index created. Otherwise, searching time for the query will be a multiple times increasing. iii. Hybrid Multi-Feature Indexing In a work [9], a hybrid multi-feature indexing has been proposed. It takes advantages of Multi-Feature Numeric Index and Grid-Twin Suffix Trees to construct a new index structure such that our proposed index structure will be less memory space needed than Grid-Twin
24
Suffix Trees, as well as, unlike Multi-Feature Numeric Index, be without any query length restriction. To construct Hybrid Multi-Feature Index, it uses a multi-feature tree structure instead of grid structure in Grid-Twin Suffix Trees. The Twin Suffix Trees original under each bucket are now linked under corresponding leaf node of multi-feature tree in Hybrid MultiFeature Index. The work organizes the creating of the indexing approach in the following three steps: Step 1: Suppose that there are d features in music data and, in each music feature string, the first n symbols of the suffix will be transformed into a coordinate. We design equation (3) for d-feature coordinate P(x1, .., xd) as follows,
where F1(i), , Fd(i) and N1, , Nd represent the values and sizes of alphabet symbols, respectively, for d music features. We note that a suffix within any music segments, such as a1 or a1b2, will have only one corresponding coordinate. Step 2: The coordinate derived from Step 1 is then inserted into a d-feature (d dimensional) tree. The degree of each non-leaf node in this d-feature tree is 2d. There is also a center point for each non-leaf node. The coordinate, (x1c, x2c, , xdc), of the center point is computed by averaging the coordinates inserted under current node or its descendent nodes. Such that, if there are 2 features and the center point is (x1c, x2c), the node will be partitioned into four domains, ( x1c, x2c), (< x1c, x2c), ( x1c, < x2c), and (< x1c, < x2c). To keep the index tree balancing as R-tree, each non-leaf node in this d-feature tree contains at least 2d-1 not null links (half full). Therefore, to insert a new coordinate into a node, it may cause the center point to be recomputed or may cause the index tree be reorganized. Step 3: The remaining symbols behind the first n symbols of suffix are then used to construct the Twin Suffix Trees linked under d-feature tree. Figure 3 and Figure 4 represent the structures of Hybrid 2-Feature Index and Hybrid 3-Feature Index, respectively.
Figure 3. The structure of Hybrid 2-Feature Index
Figure 4. The structure of Hybrid 3-Feature Index
25
Use of Stochastic (Statistical) Analysis in Image-processing

I. Image Compression A proprietary method of image compression has been developed [10] with this technology which intelligently stores a version of the imagery and recovers it by means of the Stochastic Matrix Method SMM function recovery. Figure 5 illustrates it.
Figure 5. Closeup look at the face of a bird. Top: input image; Bottom: interpolated image. The input image is what is stored and the interpolated image is what is viewed by a user.
The input image is very coarse and certain features are hard to discern, but the interpolated image recovers much of this content and makes it intelligible to the human eye. Figure 6 illustrates the advantages enjoyed over the ubiquitous JPEG DCT coder. On close inspection the JPEG DCT coder reveals its 8 pixel by 8 pixel blocks in its characteristic artifact which becomes a nuisance at significant compression levels. It is clear from the figure that the function recovery by means of SMMs does not suffer from this problem.
Figure 6. This is a closeup look at the back of the head of a bird in order to illustrate the blockiness of JPEG compression and the lack of it with the compression scheme by means of SMM function recovery. Left: JPEG DCT image, the 8 x 8 blocks are apparent. Centre: image interpolated from Right by means of SMM function recovery. Right: image that is actually stored and which is the input image for the function recovery.
26
II. Moving Object Detection [11] A common approach to detect foreground objects is to collect pixels in the current frame that deviate significantly from the model estimations. Those methods can generally be classified as predictive or non-predictive manners. Predictive methods develop dynamical time series models to recover the current input based on past observations. The Kalman filter was firstly introduced by Koller et al. [11.1] for modeling the dynamic states of background pixels. The optical flow based method is a natural approach to model persistent motion behavior. Wixson [11.2] presented a method to detect salient motion by accumulating directionally-consistent flow. Tian [11.3] combined temporal difference imaging and a temporal filtered motion field to detect salient motion in complex environments. Recent methods are based on more complicated models. In [11.4], an autoregressive model was proposed to capture the properties of dynamic scenes. Nonpredictive density-based methods neglect the order of observations and build a probabilistic representation (PDF) of the observations at a particular pixel. Wren [11.5] used a single Gaussian intensity distribution for each pixel. Consequently, the idea extended to the mixture of Gaussians model (MGM) proposed by Stauffer and Grimson [11.6] to address the multi-modality of background. When density functions are so complex that cannot be modeled parametrically, non-parametric approaches, proposed by Elgammal [11.7], are considered more suitable to handle arbitrary densities, where kernel density functions [11.8] are used for pixel-wise background modeling. However, it is computational costly and use no spatial correlate on of the pixel features explicitly. In a work by Tang, Gao, and Liu [11], they propose a real-time moving object detection algorithm by clustering salient motion points into spatial and kinetic mixture of Gaussian model recursively. In each frame, temporal difference filtering first generates a set of feature points; then evaluations of validation and salience are performed for every feature points preceded by resampling operations, so as to only preserve those samples that strongly support the cluster of salient moving object in the feature space. The clusters are instantiated and updated applying an online approximation algorithm, and are terminated when their component weights drop below a threshold.
Brief overview of the model:
Model Specification
A four dimensional feature vector is taken to describe the state of each sample, i.e., (x, y, x, y)i i
{ zi =
[1, N] and zi
, N
4
, where (x, y)i represents the points
coordinates, (x, y)i denotes motion speed values, and N is the number of samples. For simplification, let si = (x, y)i and vi = (x, y)i describe spatial and motion information
27
respectively. Assuming we have the initial mixture distribution, feature points can be associated with one of the K clusters (Fig. 3). The likelihood of a feature point belonging to the foreground can be written as:
(1)
where qk is the prior of the kth Gaussian component in the mixture model, and ( z i ; k , k ) is the kth Gaussian component defined as:
(2) where d = 4 is the dimension of the MGM models. We further assume that the spatial and kinetic component of the MGM models are decoupled, i.e., the covariance matrix of each Gaussian component takes the block diagonal form:
(3)
where s and v stand for the spatial and kinetic features respectively. With such decomposition, each Gaussian component has the following factorized form:
(4)
Based on the above model of representation of moving objects, further Clustering Analysis was performed employing Gaussian distribution based K-means technique over the sample data. To address to the selection / estimation of the features, several steps are performed: a motion map is firstly obtained by temporal difference of Gaussian from each frame, from which a number of feature points are extracted using Monte Carlo importance sampling, with their associated velocities in the sequence are calculated using LK optical flow algorithm. Feature points are extracted in the position-velocity space. The temporal difference imaging helps to detect slow moving objects, give better object boundaries, and speed up the algorithm because the temporal filter of optical flow is only applied to the regions of change, which are detected by temporal difference imaging. In that region, for each pixel, the motion is salient motion if the pixel and its neighborhood move in the same direction in a period of time.
28
Figure 1. Foreground (light gray) and background
Figure 2. Example of clustering motion extracted from two reversing cars by using SKMGM. (a) represents the sample points and (b) distribution. is a instance of SKMGM
vectors
(dark gray) pixel colour distributions.
Figure 3. Example of the MGM distributions in position and velocity space respectively from Fig. 3. (a) specifies the spatial distribution, and (b) depicts velocity space distributions.
Even though many background models have been proposed in the literature, the problem of moving objects detection in complex environment is still far from being completely solved. The above-mentioned techniques are important for object detection and tracking in video surveillance and similar applications.
Indexing: An explicit discussion

Since the relative proportion of multimedia (video, image and audio) data within databases is expected to in-crease substantially in the future, keyword-based indexing would be inadequate and eficient content-based query and retrieval are required. The problem of devising content-based query, indexing, and retrieval for these newer data types remains an open and challenging problem. Apart from the techniques discussed as above i.e. in particular case of multifeature music representation and retrieval, and that in development of graphs based model for video data, and more alike, we find the following approaches available in literature and in practice for other data type like audio, and that in general. I. Content Based Indexing & Retrieval Content-based retrieval of multimedia database calls for content-based indexing techniques. Different from conventional databases, where data items are represented by a set of attributes of elementary data types, multimedia objects in multimedia databases are
29
represented by a collection of features; similarity of object contents depends on context and frame of reference; and features of objects are characterized by multimodal feature measures. These lead to great challenges for content-based indexing. On the other hand, there are special requirements on content-based indexing: To support visual browsing, similarity retrieval, and fuzzy retrieval, nodes of the index should represent certain meaningful categories. Indexes are crucial for those large databases to speed up the retrieval. On the other hand, visual, fuzzy and similarity queries in those large content-based databases cannot be implemented using conventional indexing techniques such as B-trees and inverted files, which are proved to be very effective in traditional databases to index attributes and text. This is because the feature measures of object contents are complex and are usually multidimensional and multimodal. Conventional indexing techniques are based on individual keys, which are definite and not visual. For the purpose of handling complex feature measures, there have been researches to extend the concept of indexing using abstraction and classification [12.8], [12.9], [12.10], [12.20]. To handle multimodal feature measures, to gain self-organization and learning capabilities in indexing, Jian-Kang Wu [12] developed a Content based Indexing (ContIndex) method for indexing multimedia objects. The feature measures of object contents Content-Based Retrieval For completeness of the discussion, let us start from the multimedia object definition in [5] as follows: Multimedia Object (MOB) can be defined using a six-tuple Omob = {U, F, M, A, Op, S}, where: U is multimedia data component. F = {F1, F2, ...} represents a set of features derived from data. A feature F i can be numerically characterized by feature measures in feature spaces
either
i i i i F1 F2 F3 .... Fn
or conceptually described by a set of concepts.
Mj = { M1j, M2j, . . .} represents the interpretation of features Fi, i = 1, 2, ... A stands for a set of attributes or particulars of Omob. Op is a set of pointers or links, and is expressed as,
O p = {O p sup , O p subO p other}

are three type of pointers/links pointing/linking to superobjects, subobjects, and other objects, respectively.
S represents set of states of Omob.
30
Content of a multimedia object is the content of its data set U, which is restricted to a certain set of features Fi, i = 1, 2, ... of the object and characterized by feature measure sets
Fki , k = 1, 2, . . . and further described by concept sets Mj, j = 1,2, In many cases,
i feature measures are vectors and written as F j = {x1, x2, ., xn}T.
For example, representation of a facial image can be done by focusing our attention to some visual features such as chin, hair, eyes, eyebrows, nose, and mouth. To characterize eyes, we need to extract measures such as area, fitting parameters to a deformed template. These feature measures are vectors and can be considered as points in feature spaces. Eyes can also be described by a set of concepts such as big eyes, medium eyes, or small eyes. The concepts big, medium, and small are interpretations of facial feature eyes. Fig. 1 shows a representation hierarchy for images in content-based image databases. In image archival phase, a bottom-up process is performed to derive from the original image data the feature measures of regions-of-interest, and interpretations if necessary. This bottom-up process consists of three steps, namely, segmentation, feature extraction, and concept mapping. It performs information abstraction, and provides keys for easy access of large image data. In retrieval phase, the image data are accessed through their feature measures (similarity query) or interpretations (descriptive query), which are considered as keys from database point of view. Content-based retrieval usually does not access the data through attributes A, or directly through the data component U. Instead, it operates on feature measures.
Fig. 1. Image representation hierarchy. To archive images into content-based image database, images are first segmented to identify regions of interest. Feature measures are then extracted from the image data within these regions. Interpretations can be finally generated by mapping of the feature measures into a set of concepts.
31
Content based retrieval is to find the best matches from large databases for a given query object. The best match is defined in terms of similarity measure. Since the contents of objects are represented by features, the similarity is then defined with respect to these features:
(1) where wi denotes the weight for ith feature, and sim ( F , F ) denotes the similarity between the query object and an object in the database with respect to ith feature. Here we simply express the similarity between objects as a linear combination of the measures of their common and distinctive features [12.15]. Content-based indexing is aimed to create indexes in order to facilitate fast contentbased retrieval of multimedia objects in large databases. The index in traditional databases is quite simple. It operates on attributes, which are of primitive data types such as integer, float, and string. For example, to build a binary index tree on age of people in a database, the first two branches can be created for age >= 35 and age != 35. Here the operation is simple and the meaning is definite and obvious. The situation becomes very complex in content based indexing, which operates on complex feature measures. The challenges for content-based indexing are: The index must be created using all features of an object class, so that visual browsing of the object class is facilitated, and similarity retrieval using similarity measure, in (1), can be easily implemented. The context and frame of reference in similarity evaluation suggest that nodes in index tree show consistency with respect to the context and frame of reference. For example, if, in a level of an index tree, the similarity is evaluated with respect to eye size, the nodes in this level will represent object categories with various eye sizes. This implies that the index tree has similar property as classification tree. Multiple multimodal feature measures should be fused properly to generate index tree so that a valid categorization can be possible. Two issues must be addressed here: first, one measure only is usually not adequate because of the complexity of objects. Second, to ensure the specified context and frame of reference, care must be taken in feature selection process. Content Based Indexing developed by [12], tries to find solution to above difficulties. It shares features with classification tree. Horizontal links among nodes in the same level enhance the flexibility of the index. A special neural-network model, called Learning based on Experiences and Perspectives (LEP), has been developed to create node categories by fusing multimodal feature measures. It brings into the index the capability of self-organizing
i q i
32
nodes with respect to certain context and frames of reference. Algorithms have been developed to support multimedia object archival and retrieval using ContIndex. Assume
1 is a set of multimedia objects, ={ , 2 ,...... m } represents a set of m
classes to which
is to be classified. Assume also that satisfies that
1) i for all i= 1,2,, m; 2) 1i m i = ; 3) i j for i j The indexing process consists of recursive application of mapping denoted by
=( D, , where D is a set of parameters to define the mapping, and classes in )

represent the categories of multimedia object set , and are associated with nodes of the index tree {N1, N2, ..., Nm}. In ContIndex tree, number of classes m is kept the same for all intermediate nodes for manipulation efficiency. In this case, the index tree is an m-tree. The mapping is defined by D and . According to the definition, is a set of classes representing the goal of the mapping. D is related to a set of feature measures used for the mapping. When the mapping is defined, D is represented by a set of reference feature vectors. For simplicity, only one feature is used to create a level of the index tree. Fig. 2 shows the first three levels of a ContIndex tree. Features selected for creation of these three levels are:
F10 = Fi , Fl1 = F j , Fl2 = F k . Nodes are labeled with a number of
digits that is equal to their level number (the root is at level 0). For example, N 21 is a node in second level, and is the first child of node N2, N21 N22, ... are children of node N2. They are similar with respect to feature
Fl0 = F i , inherit the reference feature Fl0 = F i .
vectors of feature Fi, and represent categories (21, 22, ...) with respect to feature
New reference feature vectors will be created for them upon the creation of these nodes.
33
Fig. 2. The structure of content-based index ContIndex. As indicated in the figure, features selected for creation of
these three levels of the index tree are first child of node N2.
F10 = Fi , Fl1 = F j , Fl2 = F k
Nodes are labeled with the number of
digits, which is equal to their level number. For example, N 21 is a node in second level (the root is at level 0). It is the
A top-down algorithm for the creation of m-tree ContIndex is summarized as follows: 1) Attach all objects to root and start the indexing process from the root and down to leaf node level. 2) For each node at a level: Select a feature, partition the multimedia objects into m classes by using a set of feature measures, create a node for each class, and generate a reference feature vector(s) of the selected feature and an iconic image for each node. 3) Repeat the second step until each node has, at most, m descendants. 4) Start from second level, build horizontal links with respect to features, which have been already used at the levels above. Horizontal zooming is facilitated by horizontal links between nodes in the same level. Let us have a look at nodes in the second level. Nodes at this level under the same parent
N p , N p1 , N p2 ,..., N pm , represent categories with respect to feature Fl1 and under the
same category with respect to feature Fl0 . Now suppose user finds N pq is preferable with respect to feature Fl1 and wants to have a look at categories of feature Fl0 , which are
34
represented by nodes N1q , N 2q ..., N mq . To achieve that, we can simply create horizontal links among these nodes.
Fig. 3. ContIndex indexing tree and its horizontal links.
Multimedia objects in the database represent event/object cases. ContIndex performs abstraction/generalization of these events/objects cases and produces content-based index. Intermediate nodes in the index tree represent categories of cases. They are generalization of cases, and cases are instances of these categories. If, for example, under a category there are similar patients a doctor has been cured, this category represents the experience of this doctor regarding this type of patients. In general, an intermediate node represents a certain concept, which is an abstraction of cases under it. To capture the validity of the concept, for each intermediate node, a record of confidence is maintained. The confidence record of a concept is high if the number of cases supporting it is large.
Content-Based Retrieval Using ContIndex
The retrieval process is a top-down classification process starting from the root of the tree. At each node, the process chooses from the child nodes one or more nodes which are the nearest to the query object with respect to the feature used for creation of this node in
35
the index creation process. How many child nodes should be chosen depends on the weight for the feature. A higher weight implies that the feature is more critical. Less child nodes should be chosen.
Spatial Self-Organization
For visual browsing of databases, spatial organization of nodes are preferable. For example, to view types of eyes with respect to eye size, we prefer all icon images are displayed with the size from largest to smallest on the screen. For this purpose, SelfOrganizing Map (SOM) by Kohonen [12.13] is an effective neural network paradigm for the ContIndex creation. II. Transform based Indexing of Audio Data [13] For representation and indexing of audio data various methods are available including methods that use pitch characterizations [13.10] or several acoustical characteristics [13.9]. In a work by Subramanya et. al. [13], transform based indexing method has been developed that accrues the many useful properties of working in the frequency domain familiar to data compression and signal processing applications such as low sensitivity to additive or multiplicative scaling, low sensitivity to (high-frequency or white) noise and low space utilization.
Basics of transforms
Transforms are applied to signals (time domain signals, like audio or spatial, like images) to transform the data to the frequency domain. This offers several advantages such as easy noise removal, compression, and facilitates several kinds of processing. Specifically, given a vector X = (x1, x2, , xN) representing a discrete signal, the application of a transform yields a vector Y = (y1, y2,, yN) of transform coefficients and the original signal X can be recovered from Y by applying the inverse transform. The basics of transform-inverse-transform pairs for DFT and DCT have been used here. In particular, the standard DFT pair is given by:
36
One of the features of a good transform is that, after the application of the transform, only a fraction of the coefficients in the resulting vector Y can be used to reconstruct a good approximation of the original signal. Outline of indexing scheme Each audio file or stream is divided into small blocks of contiguous samples and a transform like discrete fourier transform or discrete cosine transform is applied to each block. This yields a set of coefficients in the frequency domain. With a suitable transform, only a few significant coefficients are adequate to reconstruct a good approximation of the original signal. (This feature of the transform has also been the main basis for lossy data compression). Selecting an appropriate subset of the frequency coefficients and retaining a pointer to the original audio block create an index entry. Thus, the index occupies less space than the data and allows for faster searching. Next, a query is similarly divided into blocks to each of which the transform is applied and a subset of transform coefficients is selected. This forms the pattern. Then, the index is searched for an occurrence of this pattern. In this case, two strings are considered matched if they are within a small enough distance of each other when distance is measured according to the root-mean-square-difference of the real-valued components of the strings. Specifically, suppose A = a1 a2 an represents the discrete samples of the original audio signal (basically the contents of the audio file) and Q = q1 q2 qm represents the samples of a given query. Both the original signal and the query are divided into blocks of size L. Without loss of generality assume that the lengths of the data and query are integral multiples of the block size. (The other case can be suitably handled). Let the blocks of original audio and the query be Al, A2, , AN and Q1, Q2, QM. Generally M << N. Consider a block FFT, DCT or any similar transform) to
Ai
of the original signal. Application of a transform (say will yield a new sequence of values Yi = y1y2 . . . yL, where Yi = T . Ai,
where T is the transform matrix and is independent of the input signal. With a suitable transform, usually a few significant values of Yi (the first few values by position (zonal selection) or the largest few values by magnitude (threshold selection)) would be enough to reconstruct a good approximation of the original data. Suppose k significant values of each of the blocks are retained to serve as index for the original data. Specifically, Let be the index for block Ai denoted by DBCi. With threshold selection, we need to remember the locations (positions) of the coefficients and these are saved in DBCLi. There will be N such indices for A, one for each block of A. Together, these form the index set for the original data. Similarly application of the same transform to a block Qi of the query will yield a sequence of values QBCi = z1 z2 zL, where QBCi = T.W. The appropriate k values of QBCi are compared against the index sets to determine a match (exact or close).
Blocking and segmentation
37
To derive the transform-based index, the audio data (signal) is divided into fixed-size units called blocks, a process referred to as blocking. A suitable transform is then applied to these individual blocks. The advantages of blocking are the following: When transforms are applied to the whole signal, the transform coefficients capture global averages but not Blocks of appropriate sizes would contain samples which are highly intercorrelated, so that when the finer details. transforms are applied, there is more energy compaction and thus fewer transform coefficients would adequately describe the data. The transforms on the individual blocks could be carried out in parallel. In segmentation on the other hand, the audio data is divided into variable-length units called segments. The data within a segment does not vary much. The positions in audio data where very sharp changes occur define the segment boundaries.
Search algorithm and analysis In the work presented in this paper, the audio data and the query are divided into fixed-size blocks. In the index searches, the transform coefficients of the query are compared with corresponding coefficients of the data blocks and the distance between them is determined. If the distance is below an experimentally determined threshold, it is accepted as a match. In the following algorithms and their analysis, the following notations are used: L: The length of a block (Number of samples); N: Number of blocks of the data; M: Number of blocks of the query; k: The number of significant transform coeffs. per block retained as index. QBC: Query Block Coefficients. (Obtained by applying transform on query blocks). DBC: Data Block Coefficients. (Obtained by applying transform on data blocks). DBCL: Data Block Coefficient Locations. RBC: Reconstructed Data Block Coefficients. RBCL: Reconstructed Data Blk Coeff. Locations. Each block of QBC contains L elements and each block of DBC and DBCL contains k elements. (Each block of RBC and RBCL also contains k elements).
Robust search algorithm
It assumes that the query block boundaries are aligned with those of the data block boundaries.
38
39
40
III. Indexing for Very Large Multidimensional data [14] As the speed of processors continues to improve, researchers are performing large-scale scientific simulations to study very complex phenomena at increasingly finer resolution scales. Such studies have resulted in the generation of datasets that are characterized by their very large sizes ranging from hundreds of gigabytes to tens of terabytes, thereby generating an imperative need for new interactive visualization capabilities. A typical way of visualizing such a large multidimensional volumetric data set is to first reduce the dimension of the data set using techniques such as slicing and then to render the result using one of the isosurface or volume rendering techniques. Slicing is a very useful tool because it removes or reduces occlusion problems in visualizing such a multidimensional volumetric data set and it enables fast visual exploration of such a large data set. In order to efficiently handle the process, we need an efficient out-of-core indexing structure because such a data set very often does not fit in main memory. A typical approach to build indexing structures in the case of time-varying volumetric data is to build a separate indexing structure on each time step of the data set. For example, Sutton and Hansens temporal branch-on-need structure (T-BON) [14.3] is the most representative. Their strategy is to build an out-of-core version of Branch-OnNeed-Octree (BONO) [14.4], in which each leaf node is of disk page size, for each time step and to store general common infrastructure of the trees in a single file. However, the method of building (n-1)-dimensional trees along a particular dimension such as that used for the T-BON unfortunately results in the size increase linearly with the resolution size at the particular dimension (the number of time steps in the case of T-BON). This is due to the fact that it does not exploit any type of possible coherence across the particular dimension. This lack of scalability becomes more problematic as we generate higher and higher resolution data in every dimension including the time dimension. Building a series of (n-1)-Dimensional indexing structures on n-Dimensional data causes a scalability problem in the situation of continually growing resolution in every dimension. However, building a single n-Dimensional indexing structure can cause an indexing effectiveness problem compared to the former case. The information-aware 2n-tree has been proposed [14] to maximize the indexing structure efficiency by ensuring that the subdivision of space has as similar coherence as possible along each dimension. It is particularly useful when data distribution along each dimension constantly shows a different degree of coherence from each other dimension. Information-Aware 2n-Trees Information-Aware 2n-trees (IA 2n-trees) are basically 2n-trees (e.g. quadtrees for 2-D and octrees for 3-D [14.13]) for n-dimensional space. However, it is different in terms of how it decides the extent ratios of a subvolume when multiple dimensions are integrated into one hierarchical indexing structure. The coherence information along each dimension is extracted and used for the decision so that each subvolume contains as similar coherence as possible along each dimension. A. Dimension Integration
41
We present an entropy-based dimension integration technique. Entropy [14] is a numerical measure of the
uncertainty of the outcome for an event x, given by H(x) =
pi log 2 pi , where x is a random variable, n is the

i =1
number of possible states of x, and pi is the probability of x being in state i. This measure indicates how much information is contained in observing x. The more the variability of x, the more unpredictable x is, and the higher the entropy. For example, consider a series of scalar field values for a voxel v over the time dimension. The temporal entropy of v indicates the degree of variability in the series. Therefore, high entropy implies high information content, and thus more resources are required to store the series. Note that the entropy is maximized when all the probabilities pi are equal.
Fig. 1. Entropy estimation in each dimension. Note that the almost zero entropy in this example.
Fig. 2. Different supercell sizes and corresponding (a) standard supercell; (b) information-aware supercell.
y dimension has
hierarchical indexing structures for the data of Figure 1:
Higher entropy of a dimension relative to the other dimensions implies that this dimension needs to be split at finer scales than the other dimensions. For example, if a temporal entropy is twice as much as the spatial entropy, we design the supercell to be of size s s s
s ( x y z t ) , where s is the size of the spatial dimension of 2
the supercell. Figures 1 and 2 show how this entropy-based dimension integration leads to an indexing structure for the 3-D case. Figure 1 shows an extreme case in which the values along the y dimension remain almost constant over all possible (x, z) values (that is, the entropy of y is almost zero) while each of the x and z dimensions has some degree of variability. The supercell size and the corresponding hierarchical indexing structure will be designed as shown in Figure 2 (b), that is, it has a quadtree structure unlike the standard octree of Figure 2 (a) in which the supercell has the same size in each dimension. To estimate the ratios of the entropy values among n dimensions, we randomly select a set of n-Dimensional subvolumes and for each subvolume, obtain the ratios by simply computing each entropy value along each dimension. The ratios are averaged and globally applied in building indexing structures. In computing the entropy values, if the number of the possible scalar field values is large (as in the case of floating point values), we first quantize the original values into n values using a non-uniform quantizer such as the lloyd-max quantizer. Further, we compute the spatiotemporal entropy ratio defined as the ratio of the average spatial entropy to the temporal entropy.
42
B. Indexing Structures We make use of the entropy ratios for the purpose of guiding the branching of the tree and ultimately adjusting the size of supercells by dividing the dimension of high entropy more finely and that of low entropy more coarsely. It is simply carried out by multiplying the original size of each dimension by its entropy value, which becomes the effective size of the dimension, and then using the effective size instead of the original size in branching of the tree. In addition to that, we adopt the Branch-On-Need strategy [14.4] by delaying the branching of each dimension until it is absolutely necessary. For efficient isosurface rendering, each tree node contains the minimum and maximum values of the scalar fields in the region represented by the node. The size of the tree can be reduced by pruning nodes in which the minimum and maximum values are the same because they do not contribute to isosurface extraction. Further work may include evaluating the goodness of the entropy measure in comparison to other measures and finding out a more adaptive way of applying the coherence difference in the subdivision as well as a more effective way of decomposing the time series.
Universal Communication Format for Multimedia Data

Common data format is desired in expanding machine communication field. XDR [15.1] is a hardware level data standard, and ACL [15.2] is an agent level logical transaction standard. The authors have been developing an application level content representation called as UDF (Universal Data Format), which is flexible and capable for representing multimedia data [15.3- 15.4]. However, multimedia data transmissions tend to be in large quantities even if the receiver requires only a part of it. Data receiver needs to communicate with the sender about what quantity to send, what kind of quality required. To meet this requirement, we designed UCF (Universal Communication Format), which has bi-directional communication capability as an extension of UDF. Brief description of UDF The UDF is designed to represent any data that can be used on intelligent equipments and software. The followings are basics of UDF. (1) Content indication: Data section is wrapped by tags as <text> TEXT DATA </text>. (2) Tag and data flexibility: Any tag can be defined and any data can be presented in the data section. What kind of tags can be processed depends on the receiving software. (3) Multimedia multiplexing: Multimedia data sequences are switched as </video><audio></audio> The followings are key features of UCF, which enables bi-directional communication. (1) Target addressing: Although we used tags as data type identifiers in UDF. They can be thought as the names of objects, specified by the sender, to receive the data. Therefore, we define UCF tags as the names of object. Data to program A on host B is expressed as <B><A>DATA</A></B>. The wrapped tag (in this case, <A>) in the data <audio></audio><video>
43
section indicates the inner object address. In this manner, any communication object including host and program can be expressed by tag. (2) Source addressing: A receiving object may need the address to reply when to return some data or messages. In UCF, <s> tag indicates the reply-to address as in <A><s>B</s>SEND ME DATA</A>. (3) Data interpretation: We intended to make a common data representation for each data type, such as text, graphic object, image, audio, video and etc. However, it is difficult to define, ultimately, the best and common data format. Then, a practical solution is to leave details to each object. Each named object can define its data format and interpretations. The hierarchically wrapped addressing scheme of UCF enables cross-layer communications, naturally. Sequential nature of UCF multimedia multiplexing implies synchronization of media, e.g. audio and video. Standard schemes for message generation and handling are to be investigated.
Figure: Example of UCF control data.
Data Hiding and Error Concealment

[ to do]
Review of contemporary researches in concerns related to representation and processing of Time-series data
Data Representation Models and Concerns
[ Concerns: Data management, framework ]
The concept of time series data becomes relevant in context of for example, videos, images, audios, financial data, time series of traffic flow and so on, where we now have higher expectations in exploring these data at hands. Typical manipulations are in some forms of video/image/audio processing, including automatic speech recognition, which require fairly large amount of storage and are computationally intensive. Many approaches and techniques that address the time series data representation and manipulation, have been proposed in the past decade. Most commonly used representations are the Discrete Fourier Transform (DFT), the Discrete Wavelet
44
Transform (DWT), Singular Value Decomposition (SVD), Adaptive Piecewise Constant Approximation (APCA), and Piecewise Aggregate Approximation (PAA). Recently, one promising representation method was proposed called Symbolic Aggregate Approximation (SAX).
Major Processing Concerns and Solutions

[Concerns: Clustering: traffic, K-means, hierarchical, for clinical data; Correlation analysis; Unsupervised-outlier; Periodic patterns; Similarity mining; Visual exploration for financial data; An improved data mining algorithm for traffic flow ]
There has been much recent work on adapting data mining algorithms to time series databases. [17.1] introduced a kernel-density-based algorithm, It ensures those uninteresting sequences would not affect the clustering result. [17.2] firstly used k-means algorithm, then the prototypes of the resulting clusters were used as time-variables to develop an Auto Regressive model relating the expression of the prototypes to each other. On hierarchical clustering algorithms, [17.3] developed a method called Gecko which is similar to Chameleon. This method divides the process of clustering into three steps (segmentation; merging; determine the best clustering level). The method is used for time-series anomaly detection. The disadvantage of the algorithm is it takes too much time to cluster. [17.4] proposed a density-based hierarchical clustering method and proved the method is not sensitive to noisy data. Also, Pedor Rodrigues et al. developed an online divisive-agglomerative clustering system for time-series data streams in [17.5]. However, the researches mentioned above are almost for time series of gene expression data in biology.
I. Time Series for Gene Expressions

For estimating gene networks from time series gene expression data measured by microarrays, a lot of attention has been focused on statistical methods, including Boolean networks [19.1,19.11] differential equations [19.3, 19.5] dynamic Bayesian networks [19.6, 19.7, 19.8] state space models [19.2, 19.4] and so on. While these methods have provided many successful applications, a serious drawback for using these method to estimate gene networks had been that : a basic assumption of these methods is that the network structure does not change through all time points, while the real gene network has timedependent structure. In a recent work [19], a solution of this problem was provided and a statistical methodology was established to estimate gene networks with time-dependent structure by using dynamic linear models with Markov switching. This model is based on the linear state space model, also known as the dynamic linear model (DLM). In the DLM, the high-dimensional observation vector is compressed into the lower dimensional hidden state variable vector. For the microarray analysis, the observation vector corresponds to the gene expression value vector and the state variables can be considered as a transcriptional module that is a set of co-regulated genes.
Dynamic Linear Model
Let yt be a vector of d observed random variables which contains expression values of d genes at time point t. The DLM relates a collection of , to the hidden k-dimensional state vector xt in the following way:
Here, the At is a d x k measurement matrix and the wt is the Gaussian white noise as are modeled by a first-order Markov process as
Usually the
dimension of state vector is taken to be much smaller than that of data, k < d. In DLM, the time evolution of the state variables
45
Where . state transition matrix and the additive system noise follows form the Gaussian distribution as the noise covariance matrices are assumed to be diagonal,
, respectively. Notice that the model parameters depend on the time index. This implies that the underlying dynamics changes discontinuously at certain undetermined points in time. The process of the DLM starts with an initial Gaussian state DLM, the dynamics of distribution. The all composition in this representation are the that has mean and covariance matrix . In
are governed by the joint probability Gaussian density . in which
The DLM, in its canonical form, implicitly assumes an interesting casual relationship among the d variates (genes ). To sum up, the time-dependent DLM describes the consecutive changes in module sets of genes, module-module interactions and gene-gene interactions with the underlying canonical form (see Figure 1). After learning and the projection matrix
we can identify the time-dependent network structure by testing whether or not these parameters lie in a region significantly far from zero. This problem amounts to the classical testing method or the bootstrap confidential intervals.
DLM with Markov Switching
The problem of modeling change in an evolving time series can be handled by incorporating the dynamics of some underlying model change discontinuously at certain undetermined points in time. In view of real biological system, the structural change might occur in smooth. To incorporate a reasonable switching structure, we employ the DLM-MS approach that assumes the is generated by one of the G possible regimes evolving according to a Markov chain. In this context, the model are assumed to take one of the G possible configurations
parameters
at each time point. For notational convenience, we introduce the hidden vector of G class labels to indicate the configurations in the following way:
The DLM-MS, in its basic form, assumes that the discrete variable
evolves according to the first-order Markov
chain with the transition probability matrix M of order G x G where the (h,g) element defines a probability of event i.e.,
46
Each row of controlled by the entropy of

Bayesian Inference
denoted by for h = 1,. . . ,G.
is restricted to be
Smoothness of change in regimes are
For some gene expression data, each array contains some genes with fluorescence intensity measurements that were flagged by the experimenter and recorded as missing data points. In such a case, problem, we define the partition of d observed vector missing components, respectively. Consequently, the DLM-MS takes having the joint distribution where is incomplete. To deal with the missing contain the observed and as a complete dataset
The
parameters
to
be
learned The
from and
the
observed
dataset
are
collected
into
set
denote the initial distributions to derive the dynamic
system. Our attention turns to the Bayesian learning of DLM-MS that requires the prior distribution of all model parameters and the initial distribution of the hidden states Let and . In this study, we employ the natural conjugate priors.
be the i-th row of A, and B, respectively. A family of the conjugate priors of DLM-MS that we use are
expressed as follows:
Where
stands for the inverse-gamma distribution with the shape denotes the Dirichlet distribution with the prior sample size
and the scale parameter 6, and Note that the prior distribution
of A, is specified by the truncated Gaussian distribution
whose support are restricted to the positive part and
For DLM setting the underlying dynamical system is invariant under the transformations as
To avoid the lack of identifiability, we use the truncated prior distribution. Once the prior distributions are given, the augmented parameters are estimated through the posterior distribution
47
Within Bayesian framework, all inferences are made based on the marginal posterior distribution, for instance,
48
from these full-conditional distributions. If the iteration have proceeded long enough, the simulations is grossly representative of the target distribution. To diminish the effect of the starting point, we generally discard the first p simulated samples and focus attention on the rest n-p. The set compute quantiles and the other summaries of interest as needed. is used to summarize the posterior distribution and to
II. Time Series for Traffic Data

Using data mining technology to analyze time series of traffic flow not only can forecast the short-time or long-time traffic volume, but also can judge which street of a city is bottleneck. So that it helps a lot to analyze the traffic situation of the city. In fact, clustering of similar change trends of traffic flow time series is an interesting issue now. On one hand we can get some typical patterns of traffic flows, on the other hand we can group the section of highway where the detectors located according to different flow characteristics. Therefore the sections of highway in one group have similar traffic flow characteristics, and the sections of highway in different groups have distinct characteristics. Combined with spatial information, some useful spatial and temporal distributed patterns in transportation could be revealed.
Linkage Difference
In this paper, they used average linkage as distance (or similarity) between clusters. Given two clusters A and B: {a1, a2, , am}, B = {b1, b2, , bm} m and n are the sizes of A and B. W is the similarity matrix among time
series, that is
. Let
and then between cluster A and cluster B is:
49
An Algorithm of Similarity Mining in Time Series Data on the Basis of Grey Markov Scgm(1,1) Model has also been proposed in a recent work by Xiong et.al. [18].
Encoded-Bitmap-Approach-Based Swap
Given two clusters A and B, that is to say now the number of clusters k=2. For an arbitrary time series u can get all the linkage difference 1.
{A,B}, we
D(u,A,B). For every u A, there are two conditions about the value of D(u,A,B):
D(u,A,B) = D(A,u) D(B,u) < 0. It means series u has a relative larger linkage to cluster B even though it is located in
cluster A. Then we move series u to cluster B. 2.
D(u,A,B) = D(A,u) D(B,u)
0. It means series u has a relative larger linkage to its initial cluster A. Then we do
nothing in this situation. For every v 1.
B, there are similar two conditions:
D(v,B,A) = D(B,v) D(A,v) < 0. It means series v has a relative larger linkage to cluster A even though it is located in
cluster B. Then we move series v to cluster A. 2.
D(v,B,A) = D(B,v) D(A,v) 0. It means series v has a relative larger linkage to its initial cluster B. Then we do
nothing in this situation. If the number of existing clusters k k>2(assume them to be A,B,C,D,), then the same we can have: for every u
A, if all
D(u,A,B) are greater than zero then we do nothing to this situation, else we select the biggest one in absolute value from where
D(u,A,X)<0. This procedure of swapping series is called the Encoded-Bitmap-Approach-Based Swap. The algorithm when
k=2 is presented below.
Algorithm: EncodedBitmap_Based_Swap(ACluster,BCluster)
// Here k=2; Begin
Input: original clusters ACluster and BCluster
Output: the two new clusters
Step 1.Use Encode Bitmap Approach to calculate the similarity matrix: W(ACluster,BCluster) Step 2.For every time-series u If
ACluster, Calculate D(A,u) and D(B,u) BCluster, Calculate D(B,v) and D(A,v)
move u to BCluster
D(u,A,B) = D(A,u) D(B,u) <0, then
Step 3.For every time-series v If
D(v,B,A) = D(B,v) D(A,v) <0, then move v to ACluster
50
End. Both grey relation and Encoded-Bitmap-Approach-Based Swap were adopted [17] to improve the classic hierarchical clustering algorithm.
Algorithm: Improved Hierarchical Clustering Method
// Input: Time Series Datasets Begin
Output: K Clusters
1. Start by assigning each item to its own cluster, so that if there are N items, there are N clusters, each containing just one item. 2. Use grey relation as time series similarity measurement, and then let the similarities between the clusters equal the similarities between the items they contain. 3. Find most similar pair of clusters and merge them into a single cluster, so that now one cluster can be reduced. 4. Compute the average linkage as similarities between the new cluster and each of the old clusters. 5. Repeat steps 3 and 4 until get K clusters. 6. Adopt encoded-bitmap-approach-based swap to refine the K clusters from step 5 and then get the new K clusters. End. Experimental results show that, comparing with the classic hierarchical clustering method, the above method has a better performance on the separation of time series change trend.
III. Time Series in Multimedia data

Typical multimedia manipulations require considerable amount of storage and are computationally intensive. Generally, we can use various image processing techniques [16.8][16.12][16.14][16.24] [16.26] to cluster multimedia data, by measuring similarities among the raw videos or images, using certain features such as color, texture, or shape. However, recent work [16.17][16.22] have demonstrated the utility of time series representation as an efficient alternative to the raw multimedia data, whose advantages include time and space complexity reduction on clustering, classification, and other data mining tasks. In clustering multimedia time series data, k-medoids algorithm with Dynamic Time warping distance measure is often used. In fact, there are many other distance measures that can be effectively used for time series data, but we will mainly focus on DTW due to its ideal shape-based similarity measurement that can break the limitation of one-to-one mapping in Euclidean distance, the most well-known distance metric. Although k-medoids with DTW gives satisfactory results, k-means clustering is conceivably much more typical in clustering task, where an averaging algorithm is a crucial subroutine in finding a data representation of each cluster. In general, Euclidean distance metric (or other types of Minkowski metric) is used to find an average of all the data within the clusters. However, its one-to-one mapping nature is unable to capture the average shape of the two time series, in which case the Dynamic Time Warping is more favorable. The work by Gupta et al. [16. 9], introduced the shape averaging approach using Dynamic Time Warping. In their work, Niennattrakul [16] et. al. provided a generic time
series shape averaging method with a proof of correctness.

Distance Measurement
Distance measure is extensively used in finding the similarity/dissimilarity between any two time series. The two well known measures are Euclidean distance metric and DTW
51
distance measure. As a distance metric, it must satisfy the four properties symmetry, selfidentity, non-negativity, and triangular inequality. A distance measure, however, does not need to satisfy all these properties. DTW [21] is a well-known shape-based similarity measure for time series data. Unlike the Minkowski distance function, dynamic time warping breaks the limitation of one-to-one alignment, and also supports non-equal-length time series. It uses dynamic programming technique to find all possible paths, and selects the one that yields a minimum distance between the two time series using a distance matrix, where each element in the matrix is a cumulative distance of the minimum of the three surrounding neighbors. Suppose we have two time series, a sequence Q = q1, q2, , qi, , qn and a sequence C = c1, c2, , cj, , cm. First, we create an n-by-m matrix, where every (i, j) element of the matrix is the cumulative distance of the distance at (i, j) and the minimum of the three elements neighboring the (i, j) element, where 0 i n and 0 j m . We can define the (i, j) element as: (1) where is (i, j) element of the matrix which is the summation between
the squared distance of qi and cj, and the minimum cumulative distance of the three elements surrounding the (i, j) element. Then, to find an optimal path, we have to choose the path that gives minimum cumulative distance at (n, m). The distance is defined as:
(2)
where P is a set of all possible warping paths, and wk is (i, j) at kth element of a warping path and K is the length of the warping path. The algorithm generates optimal warping paths even though the warping distance will always turn out to be the same.
Dynamic Time Warping Averaging
In some situations, we may need to find a template or a model of a collection of time series, in which case, shape averaging algorithm is desired for a more accurate/meaningful template. DTW distance measure will be exploited to find appropriate mappings for an average. More specifically, the algorithm needs to create a DTW distance matrix and find an optimal warping path. After the path is discovered, a time series average is calculated along this path by using the index (i, j) of each data point wk on the warping path, which corresponds to the data points qi and cj on the time series Q and C, respectively. Each data point in the averaged time series is simply the mean of two values on the two time series
52
that index (i, j) maps to. W = w1, w2, , wk, , wK is an optimal warping path, where wk is the mean value between time series whose indices are i and j.
(3)
in query refinement, where the two time series may have different weights, Q for a sequence Q and C for a sequence C, eq. (3) above may then be simply generalized according to the desired weight below
(4)
As shown in Figure 1, what we want from a shape averaging algorithm is illustrated in Figure 1 (a) where DTW is used. If the Euclidean or any one-to-one mapping distance measures were used, we would probably end up with undesirable result, as shown in Figure 1 (b).
Figure 1. A comparison between (a) shape averaging and (b) amplitude averaging
K-means Clustering
As shown in Table 1 below, the k-means algorithm [16.4] tries to divide N data objects into k partitions or clusters, where each would have one object (mean) as its cluster center, representing all data objects within that cluster. We then assign the rest of the objects to proper clusters and recalculate new centers. We repeat this step until all cluster centers are stable. In general, after each iteration, the quality of the clusters and the means themselves will essentially be improved.
K-medoids Clustering
53
Unlike k-means clustering, k-medoids [16.11] only differs from k-means in the way the cluster centers are chosen and represented (step 4), i.e., it will find new cluster centers by choosing an existing data member within each cluster that best represents its cluster center, instead of calculating the cluster members average.
Figure 4. Examples of six-class Leaf images
Figure 3. Examples of six species in Leaf dataset
Figure 7. Tracking hand position in each video frame
Figure 6. Examples of four different Face profiles after converted into time series
K-means Clustering with DTW
It has been demonstrated that k-medoids clustering for multimedia time series data runs smoothly with DTW. In contrast, it has been observed [16], that if k-means method is instead used in clustering, there is a high probability of failure, comparing to the k-medoids algorithm (and that is probably why Euclidean averaging is often used for k- means shape averaging despite the use of DTW in cluster membership assignment). The paper[16], pointed out some interesting problem, which occurs when using kmeans clustering and DTW. Future study may be done to investigate how these problems can be resolved and come up with possible remedies in accurately averaging shape-based time series data. IV. Time Series in Financial Applications
54
Financial time series data has its own characteristics over other time series data. One of its special characteristics is that it is typically characterized by a few critical points and multi-resolution consideration is always necessary for long term and short-term analyses. Second one is that financial time series data is continuous, large and unbound. There are many technical analytical methods for financial time series data to identify patterns of market behavior. In those financial analytical methods, critical or extreme points, which the original SAX cannot handle, are very important to discover. To reduce a loss of these important points, Extended SAX representation especially for financial data analysis and mining tasks is devised by Lkhagva et. al. [20]. The basic idea of the method proposed in the paper, is based on two previously proposed representation techniques. These two methods are the PAA and the SAX representations,
Piecewise Aggregate Approximation (PAA)
Yi and Faloutsos [20.7] and Keogh et al. [20.4] independently proposed PAA. In PAA, each sequence of time series data is divided into k segments with equal length and the average value of each segment is used as a coordinate of a k-dimensional feature vector.
Figure 1: A time series C is represented by PAA (by the mean values of equal segments). In the example above, the dimensionality is reduced from n = 60 to k = 6.
The advantages of this transform are that 1) it is very fast and easy to implement, and 2) the index can be build in linear time. As shown in Figure 1, in order to reduce the time series from n dimensions to k dimensions, the data is divided into k equal sized segments. The mean value of the data falling within a segment is calculated and a vector of these values becomes the data-reduced representation. More formally, a time series C of length n can be represented in a k-dimensional space by a vector k and the ith element of C is calculated by the following equation [20.4]:
(1) However since the PAA approach minimizes dimensionality by the mean values of equal sized frames. This mean value based representation may cause a possibility to miss some important patterns in some time series data analysis.
Symbolic Aggregate Approximation (SAX)
55
Lin and Keogh et al. [20.3] proposed new approach called SAX. SAX is based on PAA [20.4, 20.7] and assumes normality of the resulting aggregated values. SAX is the first symbolic representation of time series with an approximate distance function that lower bounds the Euclidean distance. In SAX, firstly the data is transformed into the PAA representation and then the transformed PAA representation is symbolized into a sequence of discrete strings. There are two important advantages to doing this: Dimensionality Reduction: Dimensionality reduction of PAA [20.4, 20.7] is automatically carried over to this representation. Lower Bounding: Distance measure between two symbolic strings can be proved by simply pointing to the existing proofs for the PAA representation itself [20.4]. In order to obtain string representation after a time series data is transformed into the PAA representation, symbolization region should be determined. By empirically testing more than 50 datasets, it was defined that normalized subsequences have highly Gaussian distribution [20.3]. From this result, the breakpoints that will produce equal-sized areas under Gaussian curve is determined. breakpoints is defined as the following. Definition 1 [3]: Breakpoints: breakpoints are a sorted list of numbers N(0,1) Gaussian curve from are defined as such that the area under a , respectively). These
breakpoints can be determined by looking them up in a statistical table, e.g.
Table 1: A lookup table that contains the breakpoints that divides a Gaussian distribution in an arbitrary number (from 3 to 5) .
Using these defining breakpoints, a time series is discretized in the following example. First a PAA of the time series is obtained. Then, all PAA coefficients that are below the smallest breakpoint are mapped to the symbol A, all coefficients greater than or equal to the smallest breakpoint and less than the second smallest breakpoint are mapped to the symbol B, etc. Figure 2 illustrates the idea.
Figure 2: A time series is discretized by SAX. In the example above, with n = 60, k = 6 and a = 3, the time series is mapped to the word ABCBBA.
SAX has also some disadvantages such as the dimensionality reduction nature that has possibility to miss important patterns in some datasets as depicted in figure 3.
56
Figure 3: Financial time series data is represented by SAX. Some important points (shown in red) are missing. (US$
Figure 4: Financial time series data is represented by Extended SAX. The SAX representation is
Extended and Japanese yen exchange rate data of 2 months.) The SAX representation is CFCBFD. yen exchange rate data of 2 months.)
ACFFDFFCAABFFFFDCA. (US$ and Japanese
Further modified technique Extended SAX has been proposed in [20], as the result depicted in figure 4.
Review of contemporary researches in concerns related to representation and processing of Spatial data
Data Representation Models and Concerns A pictorial database plays an important role in many applications including geographical information systems, computer aided design, office automation, medical image archiving, and trademark picture registration. In such fields there is a need to manage geometric, geographic, or spatial data, which means data related to space. The space of interest can be, for example, the two-dimensional abstraction of (parts of) the surface of the earth that is, geographic space, the most prominent example , a man-made space like the layout of a VLSI design, a volume containing a model of the human brain, or another 3dspace representing the arrangement of chains of protein molecules. Representation of relative spatial relations between objects is required in many multimedia database applications. Quantitative representation of spatial relations taking into account shape, size, orientation and distance is often required. This cannot be accomplished by assimilating an object to elementary entities such as the centroid or the minimum bounding rectangle. Thus many authors have proposed numerous representations based on the notion of histograms of angles. There are many general-purpose content-based image retrieval systems, e.g. the QBIC [21.6] system and the Photobook [21.14]. They mainly use color, texture and shape as image features. However, representing the spatial relations between objects is also an important component of image content description and access. For example, the spatial relationship between brain lesions and anatomical brain structures in medical images is critically important for early disease diagnosis and thus important for image retrieval. Typical
57
applications of spatial relation representations are content-based image retrieval (e.g. [21.3, 21.8, 21.15, 21.17]), video indexing and retrieval (e.g. [21.5]), computer vision, robot navigation, and Geographic Information Systems (GIS). To assess the degree of similarity of two images according to the spatial relations between objects, first we need to extract a compact representation of spatial relations from images, and then define a (dis)similarity measure (e.g. a distance function) on such representations. Our ultimate goal is to answer queries like find similar MR images to one with a lesion inside the frontal lobes, or find similar surveillance video sequences to one in which a man walks from the middle of a room to the east side. Significant work has been reported on spatial relation representation. Many authors have stressed the importance of qualitative spatial relationships [21.4]. Approaches have been based on Allens interval relations [21.1] (e.g. [21.13, 21.16]), 2D strings [21.3] and their variants, Attributed Relational Graphs (ARGs) (e.g. [21.15]) or the spatial orientation graph (e.g. [21.8]). All of these approaches assimilate an object to very elementary entities such as the centroid (e.g. [21.8, 21.9]) or the minimum-bounding rectangle (e.g. [21.13]). This simplification process cannot give a satisfactory modelling of the spatial relations. For example, projecting two objects to each of the dimensions and considering each dimension separately is inadequate, because the two objects may not overlap at all when their projections onto the x and y axes overlap simultaneously. In [21.12], Miyajima and Ralescu introduced the notion of the histogram of angles to represent directional relations. I. Histogram based representation In [21], a new histogram representation of spatial relations called R-Histogram. Here, we assume the images are segmented and each object is assigned a unique label i.e., we deal with symbolic images, as defined formally in [21.8]. The dissimilarity between two images is then defined by the distance between the two corresponding R-Histograms.
The R-Histogram
Given a reference object R and an object of interest A, the goal is to represent, quantitatively, the spatial relations between R and A. Consider the vector originating from a pixel x on the boundary of R to a pixel y on the boundary of A. If x and y dont coincide, we compute the angle between the x-axis of the coordinate frame and xy . This angle, denoted by (x,y), takes values in[- , ]. As in histogram of angles [21.12], the set of angles from any pixel on the boundary of R to a pixel on the boundary of A expresses the directional relations between R and A. The novel idea introduced in this paper is the labeled distance. The labeled distance from x to y, denoted by LD(x, y), is defined as a pair (d(x, y), l(x, y)), where d(x, y) is the Euclidean distance from x to y and l(x, y) is defined in Table 1.
58
Here,
column1 describes whether pixel x is inside object A, and column2 describes whether pixel y is inside object R. For the set of vectors originating from any pixel on the boundary of R to any pixel on the boundary of A,
we construct a histogram as follows: Let x and y be the pixels on the boundary of R and A respectively. The bin H(I, J,L) is incremented as follows:
(1) where AI is the range of angle values spanned by bin H(I, J,L), DJ is the range of distance values spanned by bin H(I, J,L), and L
{0, 1, 2, 3} is the label associated with the distance values spanned by bin H(I, J,L).
Then the histogram is normalized as follows:
(2) where nA is the number of angle bins and nD the number of distance bins. The normalized histogram, denoted as RH(A,R), is defined to be the R-Histogram of object A relative to object R. A R-Histogram example is illustrated in Figure 2, where the x-axis is associated with angles and the y-axis with distances.
59
Figure 2: RH(A,R) for the two objects in Figure 1. Each quadrant is associated with a unique label.
Time Complexity concerns: Let N be the number of pixels in an image. We assume the objects are homeomorphic to a 2-ball. In the worst case, the number of pixels on the boundary of an object is O(N). Therefore, the computation of R-Histogram takes O(N2) time. If the objects are convex, the number of boundary pixels will be O(N1/2) and the time complexity will drop to O(N).
Distance Metric
The dissimilarity between two images is defined by the distance between corresponding R-Histograms. There are many histogram distance metrics. The distance metric used here is the histogram intersection. It is shown in [21.18] that when the histograms are normalized, the histogram intersection is given by
(3) Future work may be performed to model the spatial relations of multiple objects in an image, we can use RHistograms as the arc attributes in ARGs. Moreover an attempt is likely to improve the time complexity of RHistogram computation and investigate the possibility of extracting semantic meanings from the R-Histogram representations. II. Content Based Image Retrieval Content-based image retrieval (CBIR) is the current trend of designing image database systems as opposed to textbased image retrieval [22.7], [22.11], [22.14], [22.18], [22.23], [22.24], [22.25], [22.27]. The features used in content-based image retrieval can be roughly divided into two categories: the low-level visual features (such as color, texture, and shape) and the highlevel features (such as pairwise spatial relationships between objects). Some examples of content-based image retrieval systems are QBIC [22.8], Virage [22.1], Retrieval Ware [22.29], VisualSEEK [22.26], WaveGuide [22.17], and Photobook [22.21]. They allow users to retrieve similar pictures from a large image database based on low-level visual features. On the other hand, there is also a large group of researchers emphasizing image retrieval based on spatial relationships between objects [22.3], [22.4], [22.5], [22.10], [22.15], [22.16], [22.20], [22.22], [22.28].
60
The method of representing images is one of the major concerns in designing an image database system. An ideal representation method for symbolic pictures should provide image database systems with many important functions such as similarity retrieval, visualization, browsing, spatial reasoning, and picture indexing. One way of representing an image is to construct a symbolic picture for that image which in turn is encoded into a 2D-string [22.5]. The 2D string representation method opened up a new approach to spatial reasoning, picture indexing, and similarity retrieval. There are many followup research works based on the concept of 2D string such as 2D C-string [22.15], [22.16], and 2D C+-string [22.9]. In [22], we find a new scheme for encoding spatial relations called 9Direction SPanning Area (9D-SPA) representation method.
Overview Of Spatial Knowledge Representation
Binary spatial relationships between objects have been identified as one of the most important features for describing the contents of images [22.6]. For example, a query such as finding all the pictures containing a house to the east of a tree relies on spatial relations to retrieve the desired pictures. Different kinds of spatial knowledge representations have been proposed so far. Chang et al. [22.5] proposed the 2D string as a spatial knowledge representation to capture the spatial information about the content of a picture. The fundamental ideal of 2D string is to project the objects of a picture along the x and y-directions to form two strings representing the relative positions of objects in the x and y-axis, respectively. Since a 2D string preserves the spatial relationships between any two objects in a picture, it has the advantage of facilitating spatial reasoning. Moreover, since a query picture [22.6] can also be represented as a 2D string, the problem of similarity retrieval becomes a problem of 2D string subsequence matching. Jungert [22.12], Chang et al. [22.4], and Jungert and Chang [22.13] extended the idea of 2D strings to form 2D G-strings by introducing several new spatial operators to represent more relative positional relationships among objects of a picture. The 2D G-string representation embeds more information about spatial relationships between objects and, thus, facilitates spatial reasoning about sizes and relative positions of objects. Following the same concept, Lee and Hsu [22.15] proposed the 2D C-string representation based on a special cutting mechanism. Since the number of subparts generated by this new cutting mechanism is reduced significantly, the lengths of the strings representing pictures are much shorter while still preserving the spatial relationships among objects. The 2D C-string representation is more economical in terms of storage space efficiency and navigation complexity in spatial reasoning. The 2D C+-string representation [22.9] extended the 2D C-string representation by adding relative metric information about the picture to the strings. As a consequence, reasoning about relative sizes and locations of objects, as well as the relative distance between objects in a symbolic picture becomes possible. Chang [22.3] proposed a structure called 9DLT to encode the spatial relationships between objects in terms of nine directions. Since the 9DLT method uses centroid to represent the position of an object, such a representation is too sensitive in spatial reasoning. For example, the spatial relationships between the two objects shown in Figs. 1a, 1b, and 1c are all different in 9DLT representation; however, they seem not too much different in human visual perception. The representation of spatial relations proposed by Zhou and Ang [22.28] combines the nine directional relations proposed in 9DLT with the five topological relations, namely, disjoint, meet, partly-overlap, contain, and inside. The topological relation can record the 2D relationship between any two nonzero-sized objects with irregular
61
shapes and, therefore, makes spatial reasoning more accurate as compared to using MBR or centroid to represent an object. However, Zhou and Angs method still has the problem with being too sensitive when reasoning about directional relations. Instead of combining the nine directional relations with the five topological relations, the 2DPIR proposed by Nabil et al. [22.19], [22.20] combines the 13 projection interval relations with the topological relations. Although 2D-PIR seems2particularly useful in similarity retrieval, it did not provide any picture reconstruction mechanism for visualization. Besides, incorporating 2D-PIR into any indexing structure is difficult. Thus, similarity retrieval based on 2D-PIR becomes inefficient if the volume of images in the database increases.
9D-SPA Representation
The picture has to be preprocessed first. We assume that the objects in a picture can be identified by some image segmentation and object recognition procedures. Various techniques of image segmentation and object recognition can be found in [22.2]. Suppose that a picture P contains n objects (O1,O2,. . .,On). Then, the 9D-SPA representation of P can be encoded as a set of 4-tuples: R ={(O ij;Dij;Dji; Tij )| Oi , Oj P, and 1 <= i<j<= n}, where Oij is the code for object-pair (Oi,Oj), Dij is the code for the direction-relation between objectsOi,Oj with Oj as the reference object, Dji is the code for the direction relation between Oi and Oj with Oi as the reference object, and Tij is the code for the topological relation between Oi and Oj. It is obvious that the number of 4-tuples in R is n(n-1)/2.
Let Oi be the ith object in the image database (1 <= i<= n). We assign integer i to object Oi as its object number. Then, Oij is called the object-pair code for object-pair (Oi, Oj). Given two objects O i and Oj, we can easily compute the object-pair code Oij using the following formula:
To obtain the two object numbers i and j from Oij (or to decode Oij), we use the formula where a is the largest integer such that .
Dij represents the value assigned to the directional relationship between objects Oi and Oj with Oj as the reference object. The value of Dij is determined by the following procedure. First, we find the Minimal Bounding Rectangle (MBR) for reference object Oj. Then, we extend the four boundaries of this MBR horizontally and vertically until
62
they cut the whole picture into nine neighborhood areas and then assign each area a binary code as shown in Table 1. The value of Dij is determined by the formula where wk is the binary code of neighborhood
area k; bk = 1 if object Oi overlaps area k, otherwise, bk = 0. The value of Tij indicates the topological relationship between objects Oi and Oj. Possible values assigned to topological relations are: 0 (stands for disjoint), 1 (stands for meet), 2 (stands for partly_overlap), 3 (stands for cover), and 4 (stands for contain or inside).
Fig. 2. Pictures (a) and (b) are not distinguishable in all 2D+-string representations. However, the difference can be easily determined by the 9D-SPA representation.
Let us look at the two pictures shown in Figs. 2a and 2b. Assume that object B is the reference object in both pictures. Then, in Fig. 2a, the code for DAB is (00001000 + 00010000+ 00100000 + 01000000 + 10000000)2 = 248 and the code for TAB is 0. In Fig. 2b, the code for DAB is (00000001 + 00000010 + 00100000 + 01000000 + 10000000)2 = 227 and the code for TAB is 0. In 2D+-string representations, the pictures in Figs. 2a and 2b are not distinguishable because they have the same spatial representation (i.e., A%B in both x and y-directions). However, we can easily tell the difference between them by using 9D-SPA representation because D AB in Fig. 2a is 248, while DAB in Fig. 2b is 227. Moreover, from DAB = 248= (11111000)2, we can easily determine that object A spans five neighborhood areas of object B, namely, the northwest, the west, the southwest, the south, and the southeast neighborhood areas as shown in Fig. 2a. Similarly, from DAB = 227= (11100011)2, we can easily determine that object A spans another different five neighborhood areas of object B, namely, the northeast, the east, the southeast, the south, and the southwest neighborhood areas as shown in Fig. 2b. III. Content Based Image Retrieval and Spatial Data Mining for Medical data In database systems for supporting contemporary advanced applications like medical image analysis and disease detection and prediction systems, the techniques of content based image retrieval and of spatial data mining in images are of much importance. Similar techniques are applicable to applications like surveillance systems and GIS based decision support systems. As an example, we find the work by Chung, Wang [25] in which they have discussed creation of a skin cancer image database using a three-tier system.
Database Design
An automatic segmentation method for the images of skin tumors is developed in [25.2]. This method first reduces a color image into an intensity image and then finds an approximate segmentation by intensity thresholding. Finally, it refines the segmentation using image edges. One table is designed for this skin cancer database to store
63
the features of the tumors. Besides the tumor features, some other attributes are added into the table. These include record number as a primary key of this table, patient id number, the date that the image was taken, the image id number to identify the image, and the image file name. Image file names are stored in the database instead of image file themselves. Although images can be stored in the database as BLOB type, our approach is more flexible because image files can be stored elsewhere, like on a multimedia server. A DBMS can be easily integrated with multimedia servers. One advantage is that it is easy to integrate multimedia files with existing databases. Another advantage is that other non-database applications can access those multimedia files without going through the database. While performing browsing or content based retrieval, Java applets will try to find and display the images using their file names stored in the database. The skin cancer database can be used for medical information retrieval, expert diagnosis, and medical pattern discovery.
Image Feature Definitions
Irregularity is associated with skin malignancies, including malignant melanoma, but it remains undefined up to now, other than with some subjective terms, such as jagged, notched, not smooth, or not round. One common way to measure irregularity (I) is
, where p and A are the perimeter and area, respectively [25.3]. Asymmetry is determined about the near-axis of symmetry by comparing absolute area differences to the total area of the tumor shape [25.4]. Entropy a feature which measures the randomness of gray-level distribution. It is defined as [25.5] P[i, j] is the gray-level co-occurrence matrix. It is defined by first specifying a displacement vector d = (dx, dy) and counting all pairs of pixels separated by d having gray levels i and j. Entropy is highest when all entries in P[i, j] are equal [11]. Energy is defined as [25.5] Homogeneity is defined as [25.5] Inertia is defined as [25.6]
Database Browsing and Retrieval
The project is implemented in a three-tier architecture. The applets run in browser is the front layer, the web server is the middle layer and the backend database server is the third layer. JDBC-ODBC is used for the
64
communication between web server and database server. Users can retrieve images by their content, i.e. by specifying the attribute values or by using a synthesized color. Data mining, which is also referred to as knowledge discovery in databases, means a process of nontrivial extraction of implicit, previously unknown and potentially useful information from databases. Its goal is to extract significant patterns or interesting rules from databases. Data mining can be broadly classified into three categories: Classification (Clustering)---finding rules that partition the database into finite, disjoint, and previously known (unknown) classes. Sequences---extracting commonly occurring sequences in ordered data. Association rules (a form of summarization)--- find the set of most commonly occurring groupings of items [25.7]. In the project, mining association rules in a skin cancer database has been implemented. IV. Image Analysis for brain data Studies of schizophrenia, Parkinsons disease, Alzheimers disease and other illnesses caused by disruption of brain functions, are often based on collections of brain images, usually obtained at different resolutions through computer tomography for human subjects, or through surgical procedures for other species. Atlases have been a common way to organize such image series. Multiple examples of such atlases with corresponding image segmentation and 2D and 3D visualization techniques have been developed [e.g., for mouse brain: 24.3, 24.4, 24.8, 24.12, 24.13]; several are available online [24.11]. A comprehensive list of available brain data sources and atlases is maintained by [24.6]. In [24], principles and techniques that enable spatial data interoperability, including spatial registration, discovery, query, and visualization across brain data sources have been explored. V. XML based spatial data management for Geo-Cmputation in distributed system Today, the research interest of geo-computation such as data mining and knowledge discovery purchases more orders to the data infrastructure of geo-computation. A new data infrastructure which is distributed, extensible and platform-independent is needed to provide more powerful and flexible data services for the geo-computation research and applications. Grid computing is a new research agenda which evolves from the distributed computing and meta-computing. It tries to provide virtual computation resources by strip the power of resources from the computer hardware and software. When the grid computing technology is adopt in the research domain of spatial information and geo-computation, Spatial Information Grid [26.6] (shortly SIG) is proposed and studied. In SIG, the computing power, data, model, arithmetic, and other resources are shared and assembled as abstract resource through a series of middleware, toolkits and infrastructure. It will be a powerful and easy-to-use infrastructure for spatial information applications. An SIG based 4-layered data infrastructure which is built up by data nodes, data sources, data agencies, support libraries, and other components is proposed in [26]. It is distributed in geography and management. The data infrastructure has a lot of service nodes distributed geographically, and the nodes are managed distributed by their owners instead of a centralized organization. the SIG based data infrastructure are well designed by XML schema for co-operation and implemented in java language, so that it can run on almost any platform and supports any type of data sources.
65
Architecture
The SIG based data infrastructure also adopts the SOA (service oriented architecture) as its footstone, and regards all components including data sources and data agencies as web services. In logic, it takes a 4-layered architecture show as figure 1. By invoking the web service in a well-defined XML based protocol; data stored in the data node can be searched and accessed. In order to make the design of data infrastructure simple and neat, the data agencies are required to adopt the same protocol as the data sources. It is called eXtensible Data Accessing Language, shortly XDAL. The eXtensible Data Accessing Language The data infrastructure shares spatial data in different types, with different formats, and for different goals in a uniform infrastructure. Because the data are often stored on different platforms, the data sources have to be invoked by a platform-independent protocol such as SOAP. Furthermore, a well-defined extensible data accessing language which suits any data source and any data type is needed for the data source accessing based on SOAP. In order to make it platform-independent, XML should be adopted as its format. There are several frequently used operations on a data source or data agency: searching data, downloading data, and querying its capability description. The grammars and usage rules of request and response for these operations are standardized by the well-defined XDAL in XML schema. Users can accomplish the operation by invoking the web service provided by data source or data agency, passing the request to it, and analyzing the response for the result. In the XDAL, a REQUEST should has a root element named <query>, <access>, <getCapability>, <getStatus>, or <getResult>, for functions of searching data, downloading data, getting capability description of data source, and for commands of getting operation status, getting operation result. A RESPONSE should has a root element named <response>, <status>, or <result>, for responses of starting an operation, getting operation status, and getting operation result. Figure 3 is a sample of searching data from satellite Landset-7 with a given acquirement date.
66
Considering the extensibility of the system, the XDAL is designed as a XML based extensible language. Both the request of <query>, <access>, <getCapability>, and the response of <result> can be extended a lot. By extending the request and response, the XDAL will be suit for almost all kinds of data sources and geo-computation applications.
67
The paper further provides some useful data sources, data agencies and support libraries. A test on the user interface of the data infrastructure shows that it can organize the distributed data nodes and data agencies dynamically; build an extensible, robust, and autonomic data infrastructure; and serve the users on-stop as an organic whole.
ON Management of object data base in distributed, real time environment etc.

I. High-performance data management support for real-time visualization time-varying flow fields e.g. Aneurysm surgery remains dangerous because surgeons have limited knowledge of the 3D geometry of aneurysm and its complex, time-dependent themodynamic factors such as flow, shear force and pressure. This information is essential to determine if the aneurysm is suitable for a certain surgical procedure. To make it possible for physicians to obtain such information, we have designed and developed a Virtual Aneurysm (VA) system that supports an interactive exploration environment suited for the particular needs of brain aneurysm specialists and directly assists them in their investigations. The handling of large amounts of data in real-time virtual reality visualization systems is an important and complex problem of high significance. Navigation and exploration of such large datasets stress computational resources, requiring users and visualization systems make tradeoffs between time, space, and flexibility. To make it possible for physicians to obtain such information, Liu and Karplus [26] have designed and developed a Virtual Aneurysm (VA) system that supports an interactive exploration environment suited for the particular needs of brain aneurysm specialists and directly assists them in their investigations. VA system mainly consists of a client-server configuration that provides an immersive environment allowing a physician to move around and into an aneurysm, interactively navigate to explore its complex computer simulated fluid dynamics within the vascular system using virtual reality and scientific visualization techniques [26.3]. VA system description in brief VA system is based on the numerical solutions of Navier-Stokes equations for the case of threedimensional time-varying ows. Flow simulations are computed over time as the heart goes through its pumping cycle. To ensure numerical stability, simulations are computed using small time stepsize such that only a very small fraction of the total data changes their values from one step to the next. Adding the time dimension drastically increases the dataset size, increasing storage requirements and computational complexity. Simulations typically run for tens or hundreds of hours on high-performance computing machines and periodically generate snapshots of states. The large quantities of simulated data are subsequently stored in archives on disk. After data are off-loaded, they are analyzed and post-processed using scientific visualization and animation techniques to explore the evolving state of the simulated fluid dynamics within the vascular system from local graphics workstations. Data of such unprecedented size often exceeds the memory and performance capacity of typical desktop graphics workstations. The frame rate is the frequency with which the renderer processes new frames. The frame rate of a visualization is to be kept as constant and as high as possible so that whatever the animation is it will be smooth. The greater the frame
blood flow
68
rate, because there will be more frames, more work will be required to produce the animation. Typical visualizations require a significant amount of I/O bandwidth for accessing data at different time steps when there is not enough memory space for the en tire time sequence. The results of data access must be communicated to the graphics workstations for display, which not only causes significant data movement across slow networks, but also interfaces with complex human-computer interactions. Data Representation Successful visualization systems must be designed to handle dataset of arbitrary size. We have exploited a method for the production and a hierarchy of representations of reduced data at various levels of detail, which retain as many as possible of the essential features in the original data but are small enough to be loaded one chunk at a time into main memory. We further expanded this multi resolution data reduction method to allow any number of data variables by developing an algorithm for construction of octree-like data measuring the introduced error in the multivariate data, ensuring that the errors in multivariate data do not exceed a pre-defined upper bound.
Figure 1: Data flows through the visualization pipeline.
Figure 2: Schematic of the flow visualization environment.
Octrees, like quadtrees, are hierarchical data structures based on decomposition of space. In quadtrees, space is recursively subdivided into four subregions [26.6]. Octrees are three-dimensional extensions of quadtrees, where space is recursively subdivided into eight subvolumes [26.4]. The octree-based approach illustrates the advantage of a regular partition of the 3D space. A hierarchical partition of the space into octants and suboctants, down to any desired level of granularity, provides a general-purpose scheme for organizing the space as a skeleton to which any kind of spatial data can be attached for systemic access. This skeleton supports multi-resolution
69
visualization of large three-dimensional datasets so that the current regions of interest are always displayed at a higher resolution than the rest. In many instances, these higher resolution regions make up a relatively small percentage of the entire data that leads to accelerate the visualization with only slight degradation of image quality. The notion of multi-resolution is also a key concept to control the traffic of network connections to guarantee the quality of service (QoS) in many multimedia applications Octree nodes are partitioned and then restructured into many page-sized blocks for efficient storage on disk. For simple partitioning, tree nodes are visited in depth-first order and are accumulated into the current block if the number of nodes does not exceed the block size. The traversal recursively descends the tree and continues. If a node overfills the current block, close the current block, leaving it slightly unfilled, and start working on the next block. Because page size can be controlled, any size data can be run on any size computer, with good scalable characteristics as the computer system grows in memory, computational power, and data bandwidth. Furthermore, each page-sized data is loaded into main memory one time step ahead of when it is actually required, resulting in smooth streaming of data from storage device to graphics pipeline. II. Data Management for Distributed Moving Object databases The need to manage massive volume of continuously produced information is ever increasing rapidly in many future applications, such as Location-Based Service (LBS), stream data processing, wireless sensor networks and RFID-enabled ubiquitous computing. To realize location based service, it is essential to develop efficient management scheme of location information for heavy volume of moving objects, which can be anything that can change its position, including human, equipment, or vehicle. There have been lots of related research efforts, but most of current research activities are single node oriented, making it difficult to handle the extreme situation that must cope with a very large volume, at least millions, of moving objects. The architecture named the Gracefully Aging Location Information System (GALIS) is a cluster-based distributed computing system architecture which consists of multiple data processors, each dedicated to keeping records relevant to a different geographical zone and a different time-zone. Much work has further been done in [27, 27.7, 27.8, 27.9, 27.10].
CONCLUSION Various data processing and computing needs based on different data characteristics and processing typicality posed by the requirements of different applications and computing environments, has been studied and presented above. Mostly, an approach to address to the needs of a particular application requires focusing on design of proper data- representation (data structure). It has to be with regard to efficiency of various operations required to be performed over that data, as expected by users of the application. As we can observe, the major operations include search over data. To provide efficient search, the core data representation may be supported with proper indexing. Further, the recent applications are mostly demanding descriptive as well as predictive learning from the data sets. This requires techniques for such queries over data forms ranging from static text, image, audio, and raw
70
bit streams to the data about time varying data e.g. for moving objects etc. To solve such problems, several techniques have been applied, devised, invented, and are being studied. Integration of Pattern Recognition, Machine Learning, Image Processing techniques [30,31,32,33,34] with upcoming Computational Intelligence/ Optimization Techniques bears many promises to make to the growing needs of the industry. Among the various applications studied, need for improvement of spatial-data-mining techniques has been found of much importance. It addressed to a wide range of application areas including multimedia applications, geographic information systems (GIS), medical image processing, surveillance system and many more. Statistical methods play vital role in all above solutions. For example, the recently growing techniques of Swarm Intelligence extension of evolutionary computing vastly apply the blending of algorithms design techniques and statistical technique [24,36,37,38,39,40]. Particle Swarm Optimizations (PSO) technique based on genetic algorithms and Stochastic Processes, is remarkable one to solve many such problems with better efficiency. Further, development of Programming languages/environment facilitating the functionality devised (by applying the underlying mathematical/ computing techniques) for a particular application area, to its users, with users friendly interface is also under demand. All such effort may make considerable contribution to the development of Relational Data Base/ Object Oriented Database/ Object Relational Data Base Management Systems adding to their applicability in respective fields of contemporary applications.
References:
[1] Heiko Schwarz, Detlev Marpe, and Thomas Wiegand, Overview Of The Scalable H.264/Mpeg4-Avc Extension,2007, Fraunhofer Institute for Telecommunications Heinrich Hertz Institute, Image Processing Department [2] Livio Lima, Daniele Alfonso, Luca Pezzoni, Riccardo Leonardi, New fast search algorithm for base layer of H.264 scalable video coding extension, 2007 Data Compression Conference (DCC'07), IEEE. [5] Jeongkyu Lee, Department of Computer Science and Engineering, University of Bridgeport, A Graph-based Approach for Modeling and Indexing Video Data, Proceedings of the Eighth IEEE International Symposium on Multimedia (ISM'06) [6] ZHAN Chaohui DUAN Xiaohui* XU Shuoyu SONG Zheng LUO Min, An Improved Moving Object Detection Algorithm Based on Frame Difference and Edge Detection, Fourth International Conference on Image and Graphics, 2007. [7] Seung-Ho Lim, Man-Keun Seo and Kyu Ho Park, Scrap : Data Reorganization and Placement of Two Dimensional Scalable Video in a Disk Array-based Video Server, Computer Engineering Research Laboratory, Department of Electrical Engineering and Computer Science, KAIST, Ninth IEEE International Symposium on Multimedia 2007 - Workshops [8] Siddhartha Chattopadhyay, Suchendra M. Bhandarkar, Member, IEEE, and Kang Li, Human Motion Capture Data Compression by Model-Based Indexing: A Power Aware Approach, IEEE Transactions On Visualization And Computer Graphics, Vol. 13, No. 1, January/February 2007. [9] Yu-lung Lo, Chun-hsiung Wang, Hybrid Multi-Feature Indexing for Music Data Retrieval, 6th IEEE/ACIS International Conference on Computer and Information Science (ICIS 2007). [10] Daniel Howard, Joseph Kolibal, Image Analysis by Means of the Stochastic Matrix Method of Function Recovery, 2007 ECSIS Symposium on Bio-inspired, Learning, and Intelligent Systems for Security [11] Peng Tang, Lin Gao and Zhifang Liu, Salient Moving Object Detection Using Stochastic Approach Filtering, Fourth International Conference on Image and Graphics, IEEE, 2007. [12] Jian-Kang Wu, Content-Based Indexing of Multimedia Databases, IEEE Transactions On Knowledge And Data Engineering, Vol. 9, No. 6, November/December 1997. [13] S.R. Subramanya Rahul Simha B. Narahari Abdou Youssef, Transform-Based Indexing of Audio Data for Multimedia Databases, IEEE, 1997. [14] Jusub Kim, Joseph JaJa, Volumetric Data Information-Aware 2n-Tree for Efficient Out-of-Core Indexing of Very Large Multidimensional
71
[15] Yukio Hiranaka, Hitoshi Sakakibara and Toshihiro Taketa, [16] Universal Communication Format for Multimedia Data,
Proceedings of the Sixth International Conference on Computational Intelligence and Multimedia Applications (ICCIMA05) Vit Niennattrakul Chotirat Ann Ratanamahatana, On Clustering Multimedia Time Series Data Using K-Means and Dynamic Time Warping, 2007 International Conference on Multimedia and Ubiquitous Engineering(MUE'07) [17] JIAN YIN1, DUANNING ZHOU2 AND QIONG-QIONG XIE1, A Clustering Algorithm For Time Series Data, Proceedings of the Seventh International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'06), 2006 [18] Guoqiang Xiong, Qingjing Gao, An Algorithm of Similarity Mining in Time Series Data on the Basis of Grey Markov Scgm(1,1) Model, 2007 IFIP International Conference on Network and Parallel Computing - Workshops
2007 [19] Ryo Yoshida, Seiya Imoto, Higuchi, Estimating Time-Dependent Gene Networks from Time Series Microarray Data by Dynamic Linear Models
with Markov Switching, Proceedings of the 2005 IEEE Computational Systems Bioinformatics Conference (CSB05) [20] Battuguldur Lkhagva, Yu Suzuki and Kyoji Kawagoe, New Time Series Data Representation ESAX for Financial Applications, Proceedings of the 22nd International Conference on Data Engineering Workshops (ICDEW'06), 2006. [21] Yuhang Wang, Fillia Makedon, R-Histogram: Quantitative Representation of Spatial Relations for Similarity-Based Image Retrieval [22] Po-Whei Huang and Chu-Hui Lee, Image Database Design Based on 9D-SPA Representation for Spatial Relations, IEEE Transactions On Knowledge And Data Engineering, Vol. 16, No. 12, December 2004 [23] Keith Marsolo, Michael Twa, Classification of Biomedical Data Through Model-based Spatial Averaging, Proceedings of the 5th IEEE Symposium on Bioinformatics and Bioengineering (BIBE05), 2005. [24] Ilya Zaslavsky, Haiyun He, Joshua Tran, Maryann E. Martone, Amarnath Gupta, Integrating Brain Data Spatially: Spatial Data Infrastructure and Atlas Environment for Online Federation and Analysis of Brain Images, Proceedings of the 15th International Workshop on Database and Expert Systems Applications (DEXA04), 2004 [25] Soon M. Chung and Qing Wang, Content-based Retrieval and Data Mining of a Skin Cancer Image Database, Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC .01), 2001. [26] Damon Liu and W alter Karplus, Data Management for Exploring Complex Time-Dependent Flow Datasets, Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC .01), 2001. [27] Ho Lee, Jaeil Hwang, Joonwoo Lee, Seungyong Park, Chungwoo Lee, Yunmook Nah, Long-term Location Data Management for Distributed Moving Object Databases, Proceedings of the Ninth IEEE International Symposium on Object and Component-Oriented Real-Time Distributed Computing, 2006 [28] Narendra Ahuja, Jack Veenstra, Generating Octrees from Object Silhouettes in Orthographic Views, IEEETransactions On Pattern Analysis And Machine Intelligence. Vol. Ii. No. 2. February 1989 137 [29] Qingmin Shi, Joseph JaJa, Isosurface Extraction and Spatial Filtering Using Persistent Octree (POT), IEEE Transactions On Visualization And Computer Graphics, Vol. 12, No. 5, September/October 2006 e-Books/ Book available: DM/Machine Learning/ Image Processing [30] Ian H. Witten & Eibe Frank, Data Mining: Practical Machine Learning Tools and techniques, 2/e, Morgan Kaupman Publishers [31] Daniel T. Larose, Discovering Knowledge in Data: An Introduction to Data Mining , Wiley Interscience. [32] Acharya and Roy, Image Processing- Principles and Appliations, Wiley Interscience. [33] Image Representation, Indexing and Retrieval Based on Spatial Relationships and Properties of Objects, a dissertation presented to the faculty of the Department of Computer Science of the University of Crete in partial fulfillment of the requirements for the degree of Doctor of Philosophy by Euripides G.M. Petrakis
[34] Image Processing in C, 2/e, Dwayne Phillips, R&D Publications /Miller-Freeman Inc./ CMP Media Inc. [35] MATLAB Recipes for Earth Sciences, 2/e, Martin H. Trauth, Springer. More research papers: Statistical Techniques
[36] Tiago Sousa, Ana Neves, Arlindo Silva, Swarm Optimisation as a New Tool for Data Mining, Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS03) [37] Gianluigi Folino, Agostino Forestiero, Giandomenico Spezzano, Swarming Agents for Discovering Clusters in Spatial Data, Proceedings of the Second International Symposium on Parallel and Distributed Computing ISPDC03) [38] Data Clustering: A Review, A.K. JAIN, M.N. MURTY,P.J. FLYNN, ACM Computing Surveys, Vol. 31, No. 3, September 1999 [39] Bin Gao Tie-Yan Liu Wei-Ying Ma, Star-Structured High-Order Heterogeneous Data Co-clustering based on Consistent Information Theory, Proceedings of the Sixth International Conference on Data Mining (ICDM'06)
72
[40] Bijan Bihari Misra, Suresh Chandra Satapathy P. K. Dash, Particle Swarm Optimized Polynomials for Data Classification, Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications (ISDA'06) [41] Ofer Miller, Ety Navon, Amir Averbuch, Tracking Of Moving Objects Based On Graph Edges Similarity, ICME 2003 References within references [5.1] A. Dempster, N. Laird, and D. Rubin. Maximum likelihood estimation from incomplete data via them algorithm (with discussion). J.R. Statist. Soc. B, 39:138, 1977. [5.2] S. Lu, M. Lyu, and I. King. Video Summarization by Spatial-Temporal Graph Optimization. In Proceedings of the 2004 International Symposium on Circuits and Systems, volume 2, pages 197200, Vancouver, Canada, May 2004. [5.3] J. Lee, J. Oh, and S. Hwang. STRG-Index: Spatio-Temporal Region Graph Indexing for Large Video Databases. In Proceedings of the 2005 ACM SIGMOD, pages 718729, Baltimore, MD, June 2005. [5.4] H. T. Chen, H. Lin, and T. L. Liu. Multi-object tracking using dynamical graph matching. Proc. of the 2001 IEEE Conf. on CVPR, pages 210217, 2001. [6.1] Wang Junqing, Shi Zelin, and Huang Shabai, Detection of Moving Targets in Video Sequences. Opto-Electronic Engineering, Dec 2005, pp. 5-8. [6.2] Ren Mingwu, and Sun Han, A Practical Method for Moving Target Detection Under Complex Background. Computer Engineering, Oct 2005, pp. 33-34. [6.3] Milan Sonka, Vaclav Hlavac, and Roger Boyle, Image Processing, Analysis, and Machine Vision (Second Edition), Posts & Telecom Press, Beijing, Sep 2003. [6.4] Zhang Yunchu, Liang Zize, Li En, and Tan Min, A Background Reconstruction Algorithm Based on C-means Clustering for Video Surveillance, Computer Engineering and Application, 2006, pp. 45-47. [7.1] P. Shenoy and H. M. Vin. Efficient support for interactive operations in multiresolution video server. ACM Multimedia Syst., 7(3):241.253, Nov. 1999. [7.2] S. Lim, Y. Jeong and K. Park. Interactive media server with media synchronized RAID storage system. Proc. of International Workshop on Network and Operating Systems Support for Digital Audio and Video 2005, Jun. 2005. [7.3] E. Chang and A. Zakhor. Disk-based storage for scalable video. IEEE Trans. on circuits and systems for video technology, 7(5):758.770, Oct. 1997. [7.4] R. Rangaswami, Z. Dimitrijevic, E. Chang and S.-H. G. Chan. Fine-grained Device Management in an Interactive Media Server. IEEE Transactions on Multimedia, Vol. 5, No. 4, pages 558-569, Dec. 2003. [7.5] S. Kang, Y. Won and S. Roh. Harmonic placement: file system support for scalable streaming of layer encoded object. Proc. of International Workshop on Network and Operating Systems Support for Digital Audio and Video 2006, May 2006. [8.1] ISO/IEC 14496-1:1999, Coding of Audio-Visual Objects, Systems, Amendment 1, Dec. 1999. [8.2] ISO/IEC 14496-2:1999, Coding of Audio-Visual Objects, Visual, Amendment 1, Dec. 1999. [8.3] M. Preda, A. Salomie, F. Preteux, and G. Lafruit, Virtual Character Definition and Animation within the MPEG-4 Standard, 3D Modeling and Animation: Synthesis and Analysis Techniques for the Human Body, M. Strintzis and N. Sarris, eds., chapter 2, pp. 27-69, IRM Press, 2004. [8.4] S. Chattopadhyay, S.M. Bhandarkar, and K. Li, Efficient Compression and Delivery of Stored Motion Data for Virtual Human Animation in Resource Constrained Devices, Proc. ACM Conf. Virtual Reality Software and Technology (VRST 05), pp. 235- 243, Nov. 2005. [8.5] M. Endo, T. Yasuda, and S. Yokoi, A Distributed Multi-User Virtual Space System, IEEE Computer Graphics and Applications, vol. 23, no. 1, pp. 50-57, Jan./Feb. 2003. [8.6] T. Hijiri, K. Nishitani, T. Cornish, T. Naka, and S. Asahara, A Spatial Hierarchical Compression Method for 3D Streaming Animation, Proc. Fifth Symp. Virtual Reality Modeling Language (Web3D-VRML), pp. 95-101, 2000. [8.7] T. Giacomo, C. Joslin, S. Garchery, and N. Magnenat-Thalmann, Adaptation of Facial and Body Animation for MPEG-Based Architectures, Proc. Intl Conf. Cyberworlds, p. 221, 2003. [8.8] A. Aubel, R. Boulic, and D. Thalmann, Animated Impostors for Real-Time Display of Numerous Virtual Humans, Proc. First Intl Conf. Virtual Worlds (VW 98), vol. 1434, pp. 14-28, 1998. [8.9] O. Arikan, Compression of Motion Capture Database, Proc. ACM Trans. Graphics (ACM TOG), vol. 25, no. 3, pp. 890-897, 2006. [9.1] James C.C. Chen and Arbee L.P. Chen, Query by Rhythm An Approach for Song Retrieval in Music Databases, In Proc. Of Intl Workshop on Research Issues in Data Engineering, Pages 139-146, 1998. [9.2] Arbee L.P. Chen, M. Chang, J. Chen, J.L. Hsu, C.H. Hsu, and Spot Y.S. Hua, Query by Music Segments: An Efficient Approach for Song Retrieval, In Proc. Of IEEE Intl Conf. on Multimedia and Expro, 2000.
73
[9.4] J.L. Hsu, C.C. Liu, and Arbee L.P. Chen, Efficient Repeating Patterrn Finding in Music Databases, InProc. of ACM Intl Conf. on Information and Knowledge Management, 1998. [9.5] C. L. Krumhansl, Cognitive Foundations of Musical Pitch, Oxford University Press, New York, 1990. [9.6] W. Lee and A.L.P. Chen, Efficient Multi-Feature Index Structures for Music Data Retrieval, In Proc. Of SPIE Conf. on Storage and Retrieval for Image and Video Database, 2000. [9.7] Chia-Han Lin and Arbee L. P. Chen, Indexing and Matching Multiple-Attribute Strings for Efficient Multimedia Query Processing, IEEE Transactions On Multimedia, Vol. 8, No. 2, April 2006. [9.8] C.C. Liu, J.L. Hsu, and Arbee L.P. Chen, Efficient Theme and Non-Trivial Repeating Pattern Discovering in Music Databases, In Proc. of IEEE Data Engineering, Pages 14-21, 1999. [9.9] C.C. Liu, J.L. Hsu, and Arbee L.P. Chen, An Approximate String Matching Algorithm for Content-Based Music Data Retrieval, In Proc. of IEEE Intl Conf. on Multimedia Computing and Systems, Pages 451-456, 1999. [9.10] Yu-lung Lo and Shiou-jiuan Chen, The Numeric Indexing For Music Data, in Proc. of the IEEE 22nd ICDCS Workshops the 4th Intl Workshop on Multimedia Network Systems and Applications (MNSA2002), Vienna, Austria, Pages 258-263, July 2002. [9.11] Yu-lung Lo and Shiou-jiuan Chen, Multi-feature Indexing For Music Data, in Proc. Of the IEEE 23nd ICDCS Workshops the 5th Intl Workshop on Multimedia Network Systems and Applications (MNSA2003), Providence, Rhode Island, USA, Pages 654-659, May 19-22, 2003. [9.12] Yu-lung Lo, Ho-cheng Yu, and Mei-chin Fan, Efficient Non-trivial Repeating Pattern Discovering in Music Databses, Tamsui Oxford Journal of Mathematical Sciences, Vol. 17, No. 2, Pages 163-187, Nov. 2001. [11.1] D. Koller, J. Weber, and J. Malik, Robust Multiple Car Tracking with Occlusion Reasoning, in Proc. ECCV 94. Stockholm, Sweden. 1994. [11.2] L. Wixson, Detecting salient motion by accumulating directionally-consistentflow, IEEE Trans. Pattern Analysis and Machine Intelligence, 2000. 22(8): p. 774-780. [11.3] Tian, Y.-L. and A. Hampapur. Robust Salient Motion Detection with Complex Background for Real-time Video Surveillance, in IEEE Computer Society Workshop on Motion and Video Computing 2005. Breckenridge, Colorado. [11.4] A. Monnet, A. Mittal, and N. Paragios, Background modeling and subtraction of dynamic scenes, in Proc. ICCV 2003: p. 1305-1312. [11.5] C. R. Wren, et al., Pfinder: real-time tracking of the human body, IEEE Trans. Pattern Analysis and Machine Intelligence, 1997. 19(7): p. 780785. [11.6] C. Stauffer andW.E.L. Grimson, Learning patterns of activity using real-time tracking, IEEE Trans. Pattern Analysis and Machine Intelligence, 2000. 22(8): p. 747-757. [11.7] A. Elgammal, D. Harwood, and L.S. Davis. Nonparametric Model for Background Subtraction, in Proc. ICCV Frame-Rate Workshop. 1999. Kerkyra, Greece. [11.8] A. Mittal and N. Paragios, Motion-based background subtraction using adaptive kernel density estimation, in Proc. IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. 2. [12.5] J.K. Wu, A.D. Narasimhalu, B.M. Mehtre, C.P. Lam, and Y.J. Gao, CORE: A Content-Based Retrieval Engine for Multimedia Databases, ACM Multimedia Systems, vol. 3, pp. 3-25, 1995. [12.8] C. Faloutsos, M. Flickner, W. Niblack, D. Petkovic, W. Equitz, and R. Barber, Efficient and Effective Querying by Image Content, Technical Report, IBM Research Division, Almaden Research Center, RJ 9453 (83074), Aug. 1993. [12.9] S.K. Chang, C. Yan, D.C. Dimitroff, and T. Arndt, An Intelligent Image Database System, IEEE Trans. Software Eng. vol. 14, pp. 681- 688, 1988. [12.10] W.I. Grosky and R. Mehrota, Index-Based Object Recognition in Pictorial Data Management, CVGIP, vol. 52, pp. 416-436, 1990. [12.13] T. Kohonen, The Self-Organizing Map, Proc. IEEE, vol. 78, pp. 1,464-1, 480, 1990. [12.15] A. Tversky, Features of Similarity, Psychological Rev., vol. 84, pp. 327-352, 1977. [12.20] J.-K. Wu, F. Gao, and P. Yang, Model-Based 3D Object Recognition, Proc. Second Intl Conf. Automation, Robotics, and Computer Vision, Singapore, Sept. 1992. [13.9] E.Wold et al. Content-based classification, search and retrieval of audio data. IEEE Multimedia Magazine, 1996. [13.10] A.Ghias et al. Query by humming. Proc. ACM Multimedia Conf., 1995. [14.12] C. Silva, Y. Chiang, J. El-Sana, and P. Lindstrom, Out-of-core algorithms for scientific visualization and computer graphics, IEEE Visualization Course Notes, 2002. [14.3] P. M. Sutton and C. D. Hansen, Accelerated isosurface extraction in time-varying fields, IEEE Transactions on Visualization and Computer Graphics, vol. 6, no. 2, pp. 98107, Apr 2000. [14.4] J. Wilhelms and A. V. Gelder, Octrees for faster isosurface generation, ACM Transactions on Graphics, vol. 11, no. 3, pp. 201227, Jul 1992.
74
[14.5] J. Vitter, External memory algorithms and data structures: Dealing with massive data. ACM Computing Surveys, March 2000. [14.13] H. Samet, The design and analysis of spatial data structures. Addison Wesley, 1990. [14.14] T. M. Cover and J. A. Thomas, Elements of Information Theory. John Wiley, 1991. [15.1] R. Srinivasan, XDR: External Data Representation Standard, RFC1832, 1995. [15.2] M.P. Singh, Agent Communication Languages: Rethinking the Principles, IEEE Computer, vol.31, no.12, pp.40-47, 1998. [15.3] Y. Hiranaka and M. Kato, Multimedia Data Representation by the Universal Data Format, Trans. IPSJ Meeting, 4V-9, 3-577/578, 1999. [15.4] T. Obata, T. Taketa and Y. Hiranaka, Multimedia User Interface, Trans. IPSJ Tohoku Chapter Meeting, 00-4-6, 2001. [17.1] Anne Denton, Density-based Clustering of Time Series Subsequences, In Proceedings The Third Workshop on Mining Temporal and Sequential Data (TDM 04) in conjunction with The Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, Aug. 22, 2004. [17.2] Darvish A., Bak E., Gopalakrishhan K., Zadeh R.H., Najarian K., A New Hierarchical Method for Identification of Dynamic Regulatory Pathways from Time-Series DNA Microarray Data, Proceedings of The 3rd annual Computational Systems Bioinformatics conference (CSB2004), Stanford, CA, U.S.A.. pp.602-603, August 1620, 2004. [17.3] S. Salvador, P. Chan, J. Brodie, Learning States and Rules for Time Series Anomaly Detection, Proc. 17th Intl. FLAIRS Conf, pp.300-305, 2004. [17.4] Daxin Jiang, Jian Pei, Aidong Zhang, DHC: A Density-Based Hierarchical Clustering Method for Time Series Gene Expression Data, BIBE, pp.393-400, 2003. [17.5] Pedro Rodrigues, Joao Gama, Joao Pedro Pedroso, Hierarchical Time-Series Clustering for Data Streams, First International Workshop on Knowledge Discovery in Data Streams, 2004 [19.1] T. Miyano and Kuhara. Identification of genetic networks from a small number of gene expression patterns under the Boolean network model. Pac. Symp. Biocomput., 4, 17-28, 1999. [19.2] M.J. Beal, F. Falciani, Z. Ghahramani, C. and D.L. Wild. A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. BioInformatics, 21(3), 2005 [19.3] T.Chen, H. He and G. Church. Modeling gene expression with differential equations. Pacific Symposium on BioCom puting, 1999. [19.4] C. Rangel, J. Angus, Z. Ghahramani, M. Lioumi, E. Sotheran, A. Gaiba, D.L. Wild, and Falciani. Modeling T-cell activation using gene expression profiling and state-space models, BioInformatics, 20(9), 2004. [19.5] MJL de Hoon, Imoto, Kobayashi, Ogasawara , Miyano. Infering gene regulatory networks from time-ordered gene expression data of Bacillus subtilis using differential equations. Pac.Symp. Biocomput., 8, 2003. [19.6] N. Friedman, K. Murphy and S. Russell. Learning the structure of dynamic probabilistic networks. Proc. Conferenceon Uncertainty in Artificial Intelligence, 139-147, 1998. [19.7] S . Kim, S. Imoto and S. Miyano . Inferring gene networks from time series microarray data using dynamic Bayesian networks. Brief. Bioinform., 4(3)228-235,2003. [19.8] S. Kim, S . Imoto and S. Miyano. Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data. Biosystems, 75(1-3), 5765,2004. [19.11] I. Shmulevich, E.R. Dougherty, S. Kim, and W. Zhang. Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks. Bioinfomtics, 18(2), 2002. [20.1] Agrawal, R., Faloutsos, C., & Swami, A. Efficient similarity search in sequence databases Proceedings of the 4th Conference on Foundations of Data Organization and Algorithms.(1993) [20.2] Chan, K. & Fu, W. Efficient time series matching by wavelets, Proceedings of the 15th IEEE InternationalConference on Data Engineering. (1999). [20.3] Lin, J., Keogh, E., Lonardi, S. & Chiu, B. A Symbolic Representation of Time Series, with Implications for Streaming Algorithms, In proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. (2003). [20.4] Keogh, E,. Chakrabarti, K,. Pazzani, M. & Mehrotra Dimensionality reduction for fast similarity search in large time series databases, Journal of Knowledge and Information Systems. (2000). [20.5] Eamonn J. Keogh , Michael J. Pazzani, An Indexing Scheme for Fast Similarity Search in Large Time Series Databases 11th International Conference on Scientific and Statistical DatabaseManagement 1999 [20.51] Keogh, E., Chakrabarti, K., Pazzani, M. & Mehrotra, S. Locally adaptive dimensionality reduction for indexing large time series databases, In proceedings of ACM SIGMOD Conference on Management of Data. Santa Barbara, CA, May 21-24. pp 151-162. (2001).
75
[20.6] Keogh, E., Chu, S., Hart, D. & Pazzani, M. An Online Algorithm for Segmenting Time Series. In Proceedings of IEEE International Conference on Data Mining. pp 289-296. (2001). [20.7] Yi, B-K and Faloutsos, C., Fast Time Sequence Indexing for Arbitrary Lp Norms, Proceedings of the VLDB, Cairo, Egypt, Sept, (2000). [21.1] J. F. Allen. Maintaining knowledge about temporal intervals. Commun. ACM, 26(11):832843, 1983. [21.2] I. Bloch and A. Ralescu. Directional relative position between objects in image processing: a comparison between fuzzy approaches. Pattern Recognition, 36(7):15631582, 2003. [21.3] S.-K. Chang, Q.-Y. Shi, and C.-W. Yan. Iconic indexing by 2-D strings. PAMI, 9(3):413 28, 1987. [21.4] A. G. Cohn and S. M. Hazarika. Qualitative spatial representation and reasoning: an overview. Fundamenta Informaticae, 46(1-2):129, 2001. [21.5] S. Dagtas and A. Ghafoor. Indexing and retrieval of video based on spatial relation sequences. In Proc. Of ACM Multimedia99, pages 119122, 1999. [21.6] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker. Query by image and video content: the QBIC system. Computer, 28(9):2332, 1995. [21.8] V. N. Gudivada and V. V. Raghavan. Design and evaluation of algorithms for image retrieval by spatial similarity. ACM Trans. on Information Systems, 13(2):115144, 1995. [21.9] J. Keller and X. Wang. Comparison of spatial relation de.nitions in computer vision. In Proc. of ISUMA - NAFIPS95, pages 679684, 1995. [21.13] M. Nabil, A. H. H. Ngu, and J. Shepherd. Picture similarity retrieval using the 2D projection interval representation. IEEE Trans. Knowl. Data Eng., 8(4):533539, 1996. [21.14] A. Pentland, R. W. Picard, and A. Sclaro. Photobook: Content based manipulation of image databases. Intl J. of Computer Vision, 18(3):233 254, 1996. [21.15] E. Petrakis, C. Faloutsos, and K.-I. Lin. ImageMap: an image indexing method based on spatial similarity. IEEE Trans. on Knowl. and Data Eng., 14(5):979987, 2002. [21.16] J. Sharma and D. M. Flewelling. Inferences from combined knowledge about topology and directions. In Advances in Spatial Databases, volume 951 of Lecture Notes in Computer Science, pages 279291. 1995. [21.17] C.-R. Shyu and P. Matsakis. Spatial lesion indexing for medical image databases using force histograms. In Proc. of IEEE CVPR01, pages 603608, 2001. [21.18] M. Swain and D. Ballard. Color indexing. Intl J. of Computer Vision, 7(1):1132, 1991. [22.1] J.R. Bach, C. Fuller, A. Gupta, A. Hampapur, B. Horowitz, R. Humphrey, R.C. Jain, and C. Shu, Virage Image Search Engine: An Open Framework for Image Management, Proc. Symp. Electronic Imaging: Science and TechnologyStorage & Retrieval for Image and Video Database IV, pp. 76-87, 1996. [22.2] B. Bhanu and S. Lee, Genetic Learning for Adaptive Image Segmentation. Norwell: Kluwer Academic, 1994. [22.3] C.C. Chang, Spatial Match Retrieval of Symbolic Pictures, J. Information Science and Eng., vol. 7, pp. 405-422, Dec. 1991. [22.4] S.K. Chang, E. Jungert, and Y. Li, Representation and Retrieval of Symbolic Pictures Using Generalized 2D Strings, technical report, Univ. of Pittsburg, 1988. [22.5] S.K. Chang, Q.Y. Shi, and C.W. Yan, Iconic Indexing by 2-D Strings, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 9, no. 3, pp. 413-428, May 1987. [22.6] S.K. Chang, Principles of Pictorial Information Systems Design. Englewood Cliffs, N.J.: Prentice-Hall Inc., 1989. [22.7] Y. Chen and J.Z. Wang, A Region-Based Fuzzy Feature Matching Approach to Content-Based Image Retrieval, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24, no. 9, pp. 1252-1267, Sept. 2002. [22.8] M. Flickner, H. Sawhney, W. Niblack, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J. Hafner, D. Lee, D. Petkovic, D. Streele, and P. Yanker, Query by Image and Video Content: The QBIC System, Computer, vol. 28, no. 9, pp. 23-32, Sept. 1995. [22.10] P.W. Huang and Y.R. Jean, Design of Large Intelligent Image Database Systems, Intl J. Intelligent Systems, vol. 11, pp. 347-365, 1996. [22.11] P.W. Huang and S.K. Dai, Image Retrieval by Texture Similarity, Pattern Recognition, vol. 36, pp. 665-679, 2003. [22.14] L.J. Latecki and R. Lakamper, Application of Planar Shape Comparison to Object Retrieval in Image Database, Pattern Recognition, vol. 35, pp. 15-29, 2002. [22.15] S.Y. Lee and F.J. Hsu, 2D C-String: A New Spatial Knowledge Representation for Image Database Systems, Pattern Recognition, vol. 23, no. 10, pp. 1077-1087, Oct. 1990. [22.16] S.Y. Lee and F.J. Hsu, Spatial Reasoning and Similarity Retrieval of Images Using 2D C-String Knowledge Representation, Pattern Recognition, vol. 25, no. 3, pp. 305-318, Mar. 1992.
76
[22.17] K.C. Liang and C.C. Jay Kuo, WaveGuide: A Joint Wavelet-Based Image Representation and Description System, IEEE Trans. Image Processing, vol. 8, no. 11, pp. 1619-1629, 1999. [22.18] A.K. Majumdar, I. Bhattacharya, and A.K. Saha, An Object- Oriented Fuzzy Data Model for Similarity Detection in Image Databases, IEEE Trans. Knowledge and Data Eng., vol. 14, no. 5, pp. 1186-1189, Sept./Oct. 2002. [22.21] A. Pentland, R.W. Picard, and S. Sclaroff, Photobook: Tool for Content-Based Manipulation of Image Databases, Intl J. Computer Vision, vol. 18, no. 3, pp. 233-254, June 1996. [22.22] G. Petraglia, M. Sebillo, M. Tucci, and G. Tortora, Virtual Images for Similarity Retrieval in Image Databases, IEEE Trans. Knowledge and Data Eng., vol. 13, no. 6, pp. 951-967, Nov./Dec. 2001. [22.23] E. Petrakis, C. Faloutsos, and K.I. Lin, ImageMap: An Image Indexing Method Based on Spatial Similarity, IEEE Trans. Knowledge and Data Eng., vol. 14, no. 5, pp. 979-987, Sept./Oct. 2002. [22.24] A. Rao, R.K. Srihari, L. Zhu, and A. Zhang, A Method for Measuring the Complexity of Image Databases, IEEE Trans. Multimedia, vol. 4, no. 2, pp. 160-173, Mar./Apr. 2002. [22.25] Y. Rui, T.S. Huang, Image Retrieval: Current Techniques, Promising Directions, and Open Issues, J. Visual Comm. Image Representation, vol. 10, pp. 39-62, 1999. [22.26] J.R. Smith and S.F. Chang, VisualSEEK: A Full Automated Content-Based Image Query System, Proc. Fourth ACM Intl Multimedia Conf., pp. 87-98, 1996. [22.27] J. Vleugels, R.C. Veltkamp, and C. Remco, Efficient Image Retrieval through Vantage Objects, Pattern Recognition, vol. 35, pp. 69-80, 2002. [22.28] X.M. Zhou and C.H. Ang, Retrieving Similar Pictures from a Pictorial Database by an Improved Hashing Table, Pattern Recognition Letters, vol. 18, pp. 751-758, 1997. [22.29] http://www.annapolistech.com/reseller/retrieval.htm, 2004. [25.2] L. Xu, M. Jackowski, A. Goshtasby, C. Yu, D. Roseman, and S. Bines, "Segmentation of Skin Cancer Images," Image and Vision Computing, 17(1), 1999, pp. 65-74. [25.3] J. E. Golston, W. V. Stoecker, R. H. Moss, and I. P. S. Dhillon, "Automatic Detection of Irregular Borders in Melanoma and Other Skin Tumors," Computerized Medical Imaging and Graphics, 16(3), 1992, pp. 188-203. [25.4] W. V. Stoecker, W. W. Li, and R. H. Moss, "Automatic Detection of Asymmetry in Skin Tumors," Computerized Medical Imaging and Graphics, 16(3), 1992, pp. 191-197. [25.5] R. Jain, R. Kasturi, and B. G. Schunck, Machine Vision, McGraw-Hill, 1995. [25.6] D. H. Ballard and C. M. Brown, Computer Vision, Prentice-Hall, 1982. [25.7] P. Adriaans and D. Zantinge, Data Mining, Addison-Wesley, 1996.
[26.3] D. Liu, M. Burgin, and W. Karplus, \Computer support system for aneurysm treatment," Proc. of the 13th IEEE Symposium on Computer-Based Medical Systems, Houston, Texas, pp. 13{18, June 2000. [26.4] D.J. Meagher, \Geometric modeling using octree encoding," Computer Graphics and Image Processing, vol. 19, no. 2, pp. 129{147, June 1982. [26.5] D.A. Patterson, P.M. Chen, G. Gibson, and R.H. Katz, \Introduction to Redundant Arrays of Inexpensive Disks (RAID)," Proc. IEEE COMPCON Spring '89, pp. 112{117, IEEE Computer Society Press, 1989. [26.6] H. Samet, \The quadtree and related hierarchical data structures," Computing Surveys, vol. 16, no. 2, pp. 186{260, June 1984. [27.7] Nah, Y., Wang, T., Kim, K.H., Kim. M.H., and Yang, Y.K., "TMO-structured Cluster-based Real-time Management of Location Data on Massive Volume of Moving Items," in Proc. STFES 2003, IEEE Press, Hakodate, Japan, May 2003, pp.89-92. [27.8] Nah, Y., Kim, K.H., Wang, T., Kim, M.H., Lee, J., and Yang, Y.K., "A Cluster-based TMO-structured Scalable Approach for Location Information Systems," in Proc. WORDS 2003 Fall, IEEE CS Press, Capri Island, Italy, October 2003, pp.225-233. [27.9] Nah, Y., Lee., J. Lee, W.J., Lee, H., Kim, M.H. and Han, K.J., Distributed Scalable Location Data Management System based on the GALIS Architecture, in Proc. WORDS 2005, February 2005, Sedona, Arizona, pp.397-404. [27.10] Nah, Y., Lee, J., Park, S., Kim, S., Kim, M.H. and Han, K.J., TMO-structured Distributed Location Information System Prototype, in Proc. ISORC 2005, Seattle, Washington, May 2005, pp.321-328.

Dimensions in Data Processing 2402

Caricato da

Informazioni sul documento

Descrizione originale:

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Dimensions in Data Processing 2402

Caricato da

Copyright:

Formati disponibili

Dimensions in Data Processing & Data Management Technology -

Data Structures & Algorithms Efficiency Concerns

Review Of Contemporary Research Related To Representation And Processing Of Multimedia Data

Figure 1. An Illustration of Two Dimensional Scalable Video

denote the number of disks to be accessed for scalability level Ls = i and Lt = j.

Later the placement

III. Graph-based Approach for modeling and indexing Video:

V V is a finite set of spatial edges between adjacent nodes in fn,

: V AV is a set of functions generating node attributes,

| GC | min(| G N (v) |, (G N (v' ) |)

way, we find all pairs of corresponding neighborhood graphs (eventually corresponding

STRG-Index Tree Construction

Figure 1 The flow chart of frame difference method

Figure 2 The flow chart of moving edge method

Figure 3 The flow chart of the improved algorithm

V. Motion Picture Storage with Compression [8]

of human-like virtual characters has potential applications in the design of human

Matrix Reprsentation Of MoCap Data

and the rows of d are the differences between successive rows of X.

Bap-Indexing: Indexing Of The BAP Data

where is a constant. Empirical observations have revealed that as

Figure 1. Construction of the Twin Suffix Tree Trees

Figure 2. An example of the Grid-Twin Suffix

Figure 3. The structure of Hybrid 2-Feature Index

Figure 4. The structure of Hybrid 3-Feature Index

Use of Stochastic (Statistical) Analysis in Image-processing

, where (x, y)i represents the points

Figure 1. Foreground (light gray) and background

Indexing: An explicit discussion

or conceptually described by a set of concepts.

O p = {O p sup , O p subO p other}

S represents set of states of Omob.

is to be classified. Assume also that satisfies that

=( D, , where D is a set of parameters to define the mapping, and classes in )

F10 = Fi , Fl1 = F j , Fl2 = F k . Nodes are labeled with a number of

Fl0 = F i , inherit the reference feature Fl0 = F i .

F10 = Fi , Fl1 = F j , Fl2 = F k

Nodes are labeled with the number of

Fig. 3. ContIndex indexing tree and its horizontal links.

uncertainty of the outcome for an event x, given by H(x) =

pi log 2 pi , where x is a random variable, n is the

hierarchical indexing structures for the data of Figure 1:

s ( x y z t ) , where s is the size of the spatial dimension of 2

Universal Communication Format for Multimedia Data

Figure: Example of UCF control data.

Data Hiding and Error Concealment

Major Processing Concerns and Solutions

I. Time Series for Gene Expressions

are governed by the joint probability Gaussian density . in which

evolves according to the first-order Markov

Each row of controlled by the entropy of

denoted by for h = 1,. . . ,G.

Smoothness of change in regimes are

denote the initial distributions to derive the dynamic

of A, is specified by the truncated Gaussian distribution

whose support are restricted to the positive part and

II. Time Series for Traffic Data

and then between cluster A and cluster B is:

cluster A. Then we move series u to cluster B. 2.

D(u,A,B) = D(A,u) D(B,u)

nothing in this situation. For every v 1.

B, there are similar two conditions:

cluster B. Then we move series v to cluster A. 2.