Sei sulla pagina 1di 6

A Multi Dimensional Design Framework for Querying Spatial Data Using Concept Lattice

Animesh Tripathy
Assistant Professor, School of Computer Engineering, KIIT University, Bhubaneswar, India

Lizashree Mishra
Research Associate, School of Computer Engineering, KIIT University, Bhubaneswar, India

Prashanta Kumar Patra


Professor, Department of Computer Science & Engineering, CET, Bhubaneswar, India

Abstract- Data Warehouses (DWs) and On-Line Analytical Processing (OLAP) systems rely on a multidimensional model that includes dimensions and measures. Such model allows expressing users' requirements for supporting the decisionmaking process. Spatial related data has been used for a long time; however, spatial dimensions have not been fully exploited. To exploit the full potential of the spatial and temporal data for analysis spatial dimensions is a necessity for building a data warehouse. It has been observed that OLAP possesses a certain potential to support spatio-temporal analysis. However, without a spatial framework for viewing and manipulating the geometric component of the spatial data, the analysis remains incomplete. This paper presents a multi dimensional design framework adapted for effective spatio-temporal exploration and analysis. This includes an extension of a conceptual model with spatial dimensions to enable spatial analysis. The proposed design framework addresses the problem of spatial and temporal data integration by providing information to facilitate data analysis in a Spatial Data Warehouse (SDW) that uniformly handles all types of data. Keywords- Data warehouse (DW), online analytical processing (OLAP), spatial data warehouse (SDW), spatial data cube, spatial OLAP (SOLAP).

managing aggregated data in OLAP systems has already been extended for spatial data in spatial OLAP (SOLAP) systems. Further, since spatial data usually change over time, these changes can be represented using the time dimension provided by current DWs [3]. The technologies that are included in this process, using time and space dimensions as important factors, are data warehouse with online analytical processing interface and geographical information systems (GIS). DW/OLAP systems are responsible for both data extraction from several operational sources and data organization according to a historical, thematic and multi-dimensional model. The research on spatial data warehouse is very incipient. Many of the research proposals are based on either federated or integrated architectures. In federated architecture, numerical and spatial data are linked through some common properties without affecting their original sources. Moreover, the responsibility for information capture and translation among different sources is limited to some components. This usually gives rise to lose of transparency and some semantic problems. On the other hand, the integrated architecture uses a singular and adapted environment in which queries involving both spatial and numerical data may be posed. This results in high flexibility and expression power of OLAP operations. Research work in spatial data mining includes modules for complete numerical-spatial data integration: spatial data cube construction, analytical-spatial query processing, and spatial data mining. To reduce the query processing costs of queries, some algorithms are proposed to selective data materialization, considering the object access frequency. Operations like rollup and drill-down, fundamental in decision support interfaces, are not provided [1, 2]. The proposed model is based on an integrated architecture. The proposed framework validates a true SDW. This architecture is based on well-established standards: Open Geospatial Consortium (OGC, 2005) and Common Warehouse Model (CWM, 2005). The use of such standards aims to achieve interoperability through the integration of heterogeneous data sources. II. RELATED WORK Organizations collect increasingly significant volumes of data. Once stored in data warehouses, they form the basis

I. INTRODUCTION A Data Warehouse (DW) is defined as a collection of subject-oriented, integrated, non-volatile, and time-variant data supporting the ability to take decisions [5]. On-Line Analytical Processing (OLAP) systems allow decision-making users to dynamically manipulate the data contained in a DW. Since it is estimated that about 80% of data stored in spatial databases has a spatial or location component, the location dimensions have been widely integrated in DWs and in OLAP systems. The management of this kind of data is usually carried out by Spatial Databases (SDBs) or Geographic Information Systems (GISs). Spatial Data Warehouses (SDWs) combine DWs and SDBs for managing significant amounts of historical data that include spatial location. Merging these two technologies allows exploiting the capabilities of both systems for improving data analysis, visualization, and manipulation. DWs offer efficient access methods and management of high volumes of data. On the other hand, SDBs have a long experience in managing spatial data. The experience gained in

978-1-4244-4791-6/10/$25.00 c 2010 IEEE

394

for data analysis processes and guide the organization's strategic decisions. However, data are not always used to their full potential and part of their richness is simply left out, that is, their spatial component [5]. Hidden in most data is a geographical component that can be tied to a place: an address, postal code, global positioning system location, region or country etc. Spatial dimensions, just like temporal ones, should then be considered standard for any data warehouse implementation. In fact, to date, the spatial dimension has been widely integrated in data warehouses, but usually in a nominal, non-cartographic manner (i.e. using solely place names). In rare occasions, one may use coordinates inside a data warehouse for map display and drilling, but it is rarely used to its full potential for data exploration. To gain better advantage of the spatial and temporal dimensions in decision making, the appropriate tools must be used. Geographic Information Systems are the obvious potential candidates for such tasks. While having some spatio-temporal analytical capabilities, it is recognized that existing GISs are not adequate for decision-support applications when used alone and that alternative solutions must be use [4]. Among the possible solutions, the coupling of spatial and non-spatial technologies, GIS and OLAP for instance, may be an interesting option. OLAP is a category of decision-support tools often used to provide access in an efficient and intuitive manner to a data warehouse. The efficiency of OLAP to conduct data analysis easily and rapidly has been recognized. Such easiness and rapidity are two essential conditions for an analyst (decision maker) to maintain a train of thought when exploring or validating hypothesis. Accordingly, the coupling of OLAP and GIS functionalities paves the way for the emergence of a new category of decision-support tools that are better adapted for spatio-temporal exploration and analysis of data [6]. Map Cube is defined by an operator that has a base map parameter, together with data tables and some cartographic preferences, to produce an album of maps, arranged through aggregation hierarchies [8, 9]. It is based on conventional data cube, but it enables the visualization of the results as data tables and maps. Despite of allowing spatial observation of summarized data, Map Cube does not support spatial rollup/drill-down OLAP operations. Both concepts do not validate true SDW [10, 11]. In DW and OLAP systems it is recognized that a multidimensional model is well suited for expressing the requirements of decision-making process concerning the focus of analysis. On the other hand, the advantage of using spatial representations to empower the decision-making process is widely recognized. Thus, the challenging task is to propose a multidimensional model that allows the inclusion of spatial data [2, 3]. This work analyzed different features to be included in a spatial multidimensional model, such as spatial dimensions with spatial hierarchies, spatial fact relationships, and spatial measures. Additionally, several dimensions can share a part of spatial hierarchy [6]. When more than one spatial dimension is represented in a multidimensional model, a topological relationship between them is required based on a spatial predicate [1]. The presented concepts aim at improving the analysis and design of spatial DWs and spatial OLAP

systems by integrating spatial components in a multidimensional model. In this way decision-making users can represent in an abstract manner their analysis needs without considering complex implementation issues and SOLAP tools developers can have a common vision for representing spatial data in a multidimensional model [5]. Interoperability is the ability of a system or components of a system, to provide information portability & interapplication cooperative process control. Interoperability is a form of systems intelligence that enhances the cooperation between component information systems. There are six levels of interoperability between two or more spatially distributed independent GISs [7]. GISs have the potential of providing decision makers with timely spatial information about earth systems using diverse sources, including field monitoring, remotely sensed imagery & environmental models. Environmental models have limited protocol for quality control & standardization. They tend to have weak or poorly defined semantics & so their output is often difficult to interpret outside a very limited range of applications for which they are designed. Many of the issues associated with weak model semantics can be resolved with the addition of selfevaluating logic & context based tools that present the semantic weaknesses to the end-user [4]. Research on SDW is still incipient due to two main reasons: first, although DW/OLAP and GIS technologies are consolidated, when taken into account individually, the incorporation of the spatial context in a multidimensional model is peculiar, complex and challenging; second, SDW still lacks of an expressive market [8, 10]. More recently, some conceptual models for SDW have appeared. If, on one hand, conceptual models are possibly more flexible, on the other hand they are not immediately operational, and therefore lack of a concept-proof, or validation [9, 11]. This paper proposes a novel logical multidimensional model suitable for SDW. The reason for this choice is that SDW implementation brings interesting open problems, related to spatial multidimensional query performance. To the best of our knowledge there is no SDW commercially available. The paper represents a prototype which aims to validate the proposed ideas. III. THE PROPOSED FRAMEWORK The proposed framework implements Object Relational Spatial Snowflake Schemata in an object-relational DBMS. In this design, we have chosen the Oracle Object-Relational DBMS due to its support to spatial data. The Oracle spatial capabilities are implemented by the Oracle spatial package with spatial measures, dimensions, topological operations and spatial roll-up and spatial drill-down operations. The Proposed Architecture is shown in Fig. 1. It consists of four layered structure: Internal Layer, Designed Layer, Operational Layer and Display Layer. The description of these four layers is given below: A. Internal Layer The first layer consists of heterogeneous data sources forms the Internal Layer. According to the users request, data are fetched from various data sources using the ETL (extract, transform and load) process.

2010 IEEE 2nd International Advance Computing Conference

395

user interface, according to the requirements. When a user gives request, the request is first processed and then displayed in the forms of web, maps, and commands via the user interface with the help of materialized views of the spatial data. IV. DESIGNED SPECIFICATION A SDW can be constructed using a spatial data cube model/spatial multidimensional database model. A data cube allows data to be modeled and viewed in multiple dimensions. It has two important parameters which is defined by dimension and measure linked together to build a fact table. The most popular data model for a data warehouse is a multidimensional model. Such a model can exist in the form of schemas such as star schema, snowflake schema and the fact constellation. A. Schema Analyzing all the three available schemas, we conclude that the snowflake schema is more efficient. The snowflake schema is a variant of the star schema model, where some dimension tables are normalized, thereby further splitting the data into additional tables. The dimension tables of the snowflake model may be kept in normalized form to reduce redundancies. Such a table is easy to maintain and saves storage space. DW can be defined using two language primitives, one for cube definition and one for dimension definition. The cube definition statement has the following syntax: define cube <cube_name> [<dimension_list>] : <measure_list> The dimension definition statement has the following syntax: define dimension <dimension_name> as (<attribute_or_dimension_list>) B. Snowflake Schema Definition The snowflake schema of Fig. 2 is defined in Data Mining Query Language (DMQL) as follows:

Figure 1. The Proposed Architecture.

B. Designed Layer The second layer is the Designed Layer. After retrieving the data from various data sources, they are kept in DWs. The spatial data transfer standard provides cartographers with a consistent set of terminology and concepts around which data structures can be developed. Spatial data structures are modeled using the transformed data. A spatial data structure is a framework of spatial data, metadata, users and tools that are interactively connected in order to use spatial data in an efficient and flexible way. This helps to acquire, process, use, maintain, and preserve spatial data. The data and metadata should not be managed centrally. They should be managed by the data originator and the operations are connected to the various sources. Different types of data warehouse schemas are also constructed in this specific layer. The actual physical structure of a data warehouse is related to a multidimensional data cube which provides a conceptual multidimensional view of data and allows pre-computation and fast accessing of summarized data. C. Operational Layer The third layer is the Operational Layer which performs the spatial OLAP operations on data cubes by integrating both the multidimensional operations and spatial operations. Multidimensional operations and spatial operations are carried out on the data of SDW with the help of multidimensional data views and the precomputation of summarized data which is well suited for analytical processing. Multidimensional operations provide multidimensional data views and spatial operations are functions that form important components of an underlying model that takes input data related to location, performs analysis on it, and assimilates the data to produce output information. These processes are together known as Spatial OLAP. So, this layer is responsible for SOLAP of user request for processing. D. Display Layer The fourth layer is the Display Layer. The layer displays the result of user requests through interfaces. It defines the

Figure 2. Snowflake Schema of Data Warehouse for persons_involved.

396

2010 IEEE 2nd International Advance Computing Conference

TABLE I. A 3-DIMENSIONAL VIEW WITH DIMENSIONS AS TIME, LOCATION, AND PUBLIC_SPACE. THE MEASURE IS ASSOCIATED_PEOPLE.
location = Kolkata public_space

location = Chennai
public_space

location = Mumbai public_ space

location = Delhi public_space

time M1 M2 M3 M4

Hospital 854 943 1032 1129

College 882 890 924 992

Temple 89 64 59 63

Bank 63 698 789 870

Hospital 1087 1130 1034 1142

College 968 1024 1048 1091

Temple 38 41 45 54

Bank 872 925 1002 984

Hospital 818 894 940 978

College 746 769 795 864

Temple 43 52 58 59

Bank 591 682 728 784

Hospital 605 680 812 927

College 825 952 1023 1038

Temple 14 31 30 38

Bank 400 512 501 580

define cube persons_involved_snowflake [time, location, public_space_id]: associated_people = sum(number_of_people) define dimension time as (time, day, month, year) define dimension location as (location, city (city, state, country), street) define dimension public_space as public_space_type, services_provided) (public_space_id,

C. Spatial Data Cube Construction Measures and dimensions in the spatial data cube can be either spatial or non-spatial data. Suppose that we would like to view associated_people according to time, location, and public_space for the cities Delhi, Mumbai, Chennai, and Kolkata. These 3-D spatial data are shown in Table I. These 3D spatial data are shown in the form of a 3-D spatial data cube, as in Fig. 3. Given a set of dimensions, we can generate cuboids for each of the possible subsets of the given dimensions. The lattice of cuboids is then referred to as a data cube. Fig. 4 shows a lattice of cuboids for the dimensions time, location, and public_space.

D. The Algorithm One of the important contributions of this paper is the algorithm that can be used to implement spatial data cube efficiently by materialization. The algorithm depicts the interaction between the requesting client and the SDW that intercepts the clients query and, wherever possible, transforms base-level into aggregate-level. To realize the potential of spatial aggregates, the algorithm provides efficient solutions to the following problems: Spatial aggregate design: determining what spatial aggregates to materialize, including how to store them. Spatial aggregate maintenance: efficiently updating spatial aggregates when spatial fact tables are updated. Spatial aggregate exploitation: making efficient use of spatial aggregates to speed up OLAP query processing.

The part of the algorithm concerning the spatial aggregate exploitation consists of the following three steps: I. Sort the spatial aggregates (including the base spatial aggregate) from smallest to largest based on spatial fact table cardinality. Choose the next smallest spatial aggregate;

Figure 3. A 3-dimeansional spatial data cube representation of the spatial data according to the dimensions time, location, and public_space. The measure displayed is associated_people.

Figure 4. Lattice of cuboids making up 3-Dimeansional spatial data cube. The base cuboid contains the three dimensions time, location, and public_space.

2010 IEEE 2nd International Advance Computing Conference

397

II. If all of the attributes in the SQL statement can be directly or indirectly found in the spatial aggregate, alter the original query by simply substituting the base fact table for the spatial aggregate fact table; else, go back to Step 1; III. Run the altered query. Step 2 is not always the case of full materialization; on the contrary, partial materialization is much more reasonable. For example, if the query demands aggregation of crop areas by region, and the spatial aggregate is by micro-region, then from the performance point of view the query on the spatial aggregate partial materialization works better than the original one on the base schema (or micro-region is closer to region than municipality is). The algorithm is guaranteed to terminate successfully because eventually one arrives at the base schema, which is always guaranteed to satisfy the query. Almost no metadata is required to support general navigation, if the user is careful with the choice of the spatial aggregates. V. SPATIAL ONLINE ANALYTICAL PROCESSING Conventional OLAP can be used for spatiotemporal analysis and exploration. However, the lack of cartographic representations leads to serious limitations (lack of spatial visualization, lack of map-based navigation, etc). Therefore to overcome these limitations visualization tools and map-based navigation tools have to be integrated within the conventional OLAP. The result would be a Spatial OLAP that can be seen as a client application on top of a spatial data warehouse. Similar to the architecture of an OLAP system the architecture of a SOLAP system is composed of a multidimensional spatiotemporal database, a SOLAP server and a SOLAP client. Spatial OLAP is defined as A visual platform built especially to support rapid and easy spatiotemporal analysis and exploration of data following a multidimensional approach comprised of aggregation levels available in cartographic displays as well as in tabular and diagram. SOLAP are meant to be client applications sitting on top of a multi-scale spatial data warehouse. However, the non-expert can also see them as a new type of user interface for multi-scale GIS applications and Web mapping.

3) Providing the capabilities to define new calculated measures from existing ones. The user should have the ability to filter out a given subset of dimension members values to restrict the analysis on defined values of dimension members. 4) SOLAP should give support to spatial analysis on several different dimensions of the same cube. It should also provide the support of multiple geometric representation and multiple data sources. B. Operations of SOLAP Spatial or geometric aggregate functions for operating through spatial or non-spatial hierarchies are mandatory with spatial OLAP interfaces. In particular, we remark the spatial roll-up and drill-down operations. The action of spatial roll-up is straightforward: it creates geometric aggregate values that roll up from the most detailed level to the least detailed level, following a spatial or non-spatial hierarchy. Spatial drill-down is the reverse operation of roll-up. Let us examine some popular OLAP operations and analyze how they are performed in a spatial data cube. 1) Slicing and Dicing: Each selects a portion of the cube based on the constant(s) in one or a few dimensions. This can be realized by transforming the selection criteria into a query against the spatial data warehouse and be processed by query processing methods. 2) Pivot: This presents the measures in different crosstabular layouts. This can be implemented in a similar way as in nonspatial data cubes. 3) Roll-up: It generalizes one or a few dimensions (including the removal of some dimensions when desired) and performs appropriate aggregations in the corresponding measure(s). For nonspatial measures, aggregation is implemented in the same way as in nonspatial data. However, for spatial measures, aggregation takes a collection of spatial pointers in a map or map-overlay and performs certain spatial aggregation operation, such as region merge or map overlay. It is challenging to efficiently implement such operations since it is both time and space consuming to compute spatial merge or overlay and save the merged or overlaid spatial objects. 4) Drill-down: It specializes one or a few dimensions and presents low-level objects, collections, or aggregations. This can be viewed as a reverse operation of roll-up and can often be implemented by saving low-level cuboids, presenting it, or performing appropriate generalization from it when necessary. From this analysis, one can see that a major performance challenge for implementing spatial OLAP is the efficient construction of spatial data cubes and implementation of rollup/drill-down operations. Using the dataset given in Fig. 3 we can explain the operations as given below as: The slice operation performs a selection on one dimension of the given cube, resulting in a subcube. An example of a slice operation where the associated_people data are selected from the data cube for the dimension time using the following criteria that involve one dimension:

A. Features of SOLAP
The most important features of SOLAP based on theoretical and implementation works are: 1) A flexible interface that supports different data visualization formats using cartographical and non cartographical displays. Several measures are represented using different formats and colors. Explorations in different formats are facilitated through using legends and background maps. 2) All navigation operations must be available in all forms of display (diagrams, maps, tables). In addition to that, spatial and temporal analysis functions are integrated to assure exploration and analysis flexibility.

398

2010 IEEE 2nd International Advance Computing Conference

time = M1. The dice operation defines a subcube by performing a selection on two or more dimensions. A dice operation on the data cube based on the following selection criteria that involve three dimensions: (location = Delhi or Mumbai) and (time = M1 or M2) and (public_space = Hospital or College). Pivot or rotate is a visualization operation that rotates the data axes in view in order to provide an alternative presentation of the data. An example of pivot operation is the public_space and location axes in a 2-D slice are rotated. When roll-up is performed, one or more dimensions are removed from the given cube. For example, consider an associated_people data cube containing only the two dimensions location and time. Roll-up may be performed by removing the time dimension, resulting in an aggregation of the total associated_people by location, rather than by location and by time. Drill-down navigates from less detailed data to more detailed data. Drill-down occurs by descending the time hierarchy to the more detailed level. So, the resulting data cube details the total associated_people per day rather than summarizing them by month. VI. BENEFITS OF THE PROPOSED FRAMEWORK The proposed model is based on an integrated architecture which aims to achieve interoperability through the integration of heterogeneous data sources and it supports all the SOLAP operations. This integration coins new terms: spatial data warehouse and spatial OLAP. We may display any n-D data as a series of (n-1)-D cubes. SOLAP operations are carried out on these data cubes. By using SOLAP, users enhance their capacity to explore the underlying dataset once spatial methods incorporated into OLAP ones are used. The main contributions of our framework include a formalized data model for SDW which enables all the SOLAP operations. VII. CONCLUSION & FUTURE WORK Although data warehouse for spatial database system technologies are very useful in the decision making process, usually they are used separately. Data Warehousing for decision support systems may be enhanced qualitatively if they are able to also deal with spatial dimensions and measures characterizing a Spatial Data Warehouse. The incorporation of spatial dimension and measure enables to locate more efficiently tendencies in a given application domain, by using dynamic maps with zooming, panning, aggregation and other functionalities.

We have proposed a framework which enables the implementation of a spatial data warehousing. Spatial Data Warehousing is still in its infancy and more research on this topic is due. Hence, as future work we intend to further investigate the use of spatial aggregates; in particular, the issues concerning spatial aggregate design and spatial aggregate maintenance will be addressed. Furthermore, work can be done on the interface in order to enhance usability to include other OLAP and to mine patterns from the spatial data cube. We also plan to improve usability as currently user needs to know the query language syntax and the underlying schema in order to pose their queries. We plan to develop a visual query language for SOLAP to facilitate user interaction. Finally, another interesting work is to extend this framework to distributed SDW. REFERENCES
[1] F. Fonseca, C. Davis and G. Camara. Bridging ontologies and conceptual schemas in geographic information integration. Geoinformatica, 7(4), 355-378, 2003. J. Han, K. Koperski and N. Stefanovic. GeoMiner: A system prototype for spatial data mining. In Proceedings of the ACM-SIGMOD International Conference on Management of Data (SIGMOD97), Tucson, AZ (pp. 553-556), ACM Press, May 1997. E. Malinowski and E. Zimnyi. Representing spatiality in a conceptual multidimensional model. In Proceedings of the ACM Workshop on Geographical Information Systems, Washington, DC (pp. 12-21), ACM Press, Nov. 2004. S. Rivest, Y. Bdard and P. Marchand. Towards better support for spatial decision-making: Defining the characteristics of spatial on-line analytical processing. Geomatica, 55(4), 539-555, 2001. H. J. Miller and J. Han (Eds.), Geographic data mining and knowledge discovery, (pp. 74-109). London: Taylor & Francis. S. Bimonte, A. Tchounikine and M. Miquel. Spatial OLAP: Open Issues and a Web Based Prototype. In the Proc. of 10th AGILE International Conference on Geographic Information Science, Aalborg University, Denmark, 2007.

[2]

[3]

[4]

[5] [6]

[7]

N. Stefanovic, J. Han and K. Koperski. Object-based Selective Materialization for Efficient Implementation of Spatial Data Cubes. IEEE Transactions on Knowledge and Data Engineering, Vol. 12, No. 6 pp. 938 957, 2000. [8] Y. Bishr. Overcoming the semantic and other barriers to GIS interoperability. International Journal of Geographical Information Science, 12(4), 299-314, 1998. [9] D. S. Mackay. Semantic integration of environmental models for application to global information systems and decision making. SIGMOD Record, 28(1), 13-19, 1999. [10] S. Shekhar, C. T. Lu, X. Tan, S. Chawla and R. R. Vatsavai. Map cube: A visualization tool for spatial data warehouses. 2000. [11] R. Kimball and M. Ross. The data warehouse toolkit: The complete guide to dimensional modeling. 2nd Ed., John Wiley & Sons, 2002.

2010 IEEE 2nd International Advance Computing Conference

399

Potrebbero piacerti anche