Sei sulla pagina 1di 5

eGov Technical Area 10 : Spatial Information Systems

An Effective Co-Location Pattern Mining From Geo-Spatial Data Set Literature Survey
Spatial data is a special type of data. Non-standard methods of database management are required to process such data. This is because of the features of spatial data, which include: complex structure, dynamic processing, lack of standard spatial algebra as well as openness of spatial operators. Spatial data mining is a process of acquiring information and knowledge from large databases storing data having geographic reference. Due to the characteristics of geographical entities, relations and data, the standard KDD techniques are insufficient (Shekhar and Chawla 2003). This is mainly caused by the nature of geographic space and the complexity of spatial objects and relations. The main differences between classical and spatial data mining are: (1) classical data mining utilizes specific input data, whereas spatial predicates (e.g. overlapping) are often implicit, (2) classical data mining treats all input data as independent, while spatial patterns often show continuity and high autocorrelation among nearby objects. Geographic data often show properties of spatial dependency and spatial heterogeneity (Miller 2006). Spatial dependency is a tendency of observations located close to one another in the geographic space to show a higher degree of similarity or dissimilarity (depending on the phenomenon under study). Closeness can be defined very generallythrough distance, direction and/or topology. Spatial heterogeneity or inconsistency of the process with respect to its location is often visible, while many geographic processes have a local character. Spatial dependency and heterogeneity can reflect the nature of the geographic process. Research in the area of spatial data mining has a wide literature. Spatial data mining tasks include spatial trends detection and spatial characterization (Ester et al. 1997, 1998, 1999, 2000, 2001), spatial clustering (Zhang et al. 1996; Ng and Han 1994; Kryszkiewicz and Skonieczny 2005; Wang et al. 1997), spatial classification (Koperski et al. 1998), spatial association and collocation rules (Koperski and Han 1995; Morimoto 2001; Shekhar and Huang 2001; EstivillCastro and Lee 2001).

eGov Technical Area 10 : Spatial Information Systems

A Spatial collocation represents a subset of spatial features whose instances are frequently located together in spatial neighborhoods. It has been applied in many areas like Mcommerce, earth science, biology, public health, and transportation. For example, a mobile service provider may be interested in service patterns frequently requested by geographically neighboring users. The frequent neighboring request sets may be used for providing attractive location sensitive advertisements, recommendations.

Many algorithms have been proposed for spatial co-location pattern mining. In the following, we summarize the previous work on this topic. One main type of these techniques adopts the aforementioned three-step approach, which (1) firstly builds the neighborhood relationship graph using a distance threshold that is the maximal distance allowed for two events to be neighbors, (2) then collects the clique instances of candidate co-locations, and (3) lastly selects the prevalent co-location patterns using a prevalence threshold based on which the algorithm can identify prevalent co-locations level wisely. This general framework was first proposed by Shekhar et al. [1]. Within this framework, different algorithms such as partial-join algorithm and join-less algorithm [2], synchronic sweep algorithm [3], density-based algorithm [4], and Neighbor Cluster Algorithm (NCA) [5] were proposed to improve the performance of mining process, especially the efficiency of collecting clique instances. This threshold-based approach was also used in the spatiotemporal data sets by straightforwardly introducing a time factor [6,7,8]. Qian et al. modified the interest measure to be weighted by time interval [9] and extended the work to mine spread patterns of co-occurrence phenomena which makes the investigation sensitive to different regions [10]. An extra time prevalence threshold was introduced in these work, which is the significant difference in spatiotemporal co-location pattern mining.

The second type is a distortion of the first one, which diversifies the objective of spatial co-location pattern mining while still following the above three steps with distance and prevalence thresholds. For example, the work was extended to mine complex spatial co-location patterns (positive relationship, self-co-locating relationship, self-exclusive relationship, one to many relationship, multi-feature exclusive relationship, and comprehensive relationship) [11]

eGov Technical Area 10 : Spatial Information Systems

and maximal co-location patterns [12]. Huang et al. [13] also adjusted the interest measure to treat the case with rare events.

The third type replaces the usage of distance threshold in the first step or prevalence threshold in the third step with some other interesting measures. Huang et al. [14] proposed to use density ratio of different features to describe the neighborhood relationship together with a clustering algorithm. A buffer-based model [15] was also proposed to describe the neighborhood relationship for dealing with extended spatial objects such as lines and polygons. However, in these work, a similar neighborhood-related threshold or function should be given by users. Recently, Yoo et al. [16] analyzed the drawbacks of prevalence threshold and replaced it with a N-most co-location pattern mining strategy. The intention of their method is similar to replacing minimum support threshold with Top-k strategy in frequent pattern mining [17] and sequential pattern mining [18]. However, it still requires to predefine the distance threshold.

In addition, in the area of spatial statistics, hypothesis testing methods are used to identify the correlation patterns. In Salmenkivis work [19], the neighborhood relationship and prevalence measure were bound together. The work merely discovers the spatial co-locations with two features. Sheng et al. [20] introduced the definition of influence function based on Gaussian kernel to describe the neighborhood relationship. The algorithm assumed uniform distributions of feature intersections on the global space.

To get rid of the above constraints of thresholds predefining, we are proposing an effective spatial mining approach that allows users to discover the spatial co-location patterns and Association rules by build the neighborhood relationship graph using Delaunay diagrams efficiently. This method allows users to iteratively select informative edges to construct the neighborhood relationship graph until every significant co-location has enough confidence and eventually to discover all spatial co-location patterns.

eGov Technical Area 10 : Spatial Information Systems

References
[1]. Shekhar S, Huang Y (2001) Discovering spatial co-location patterns: A summary of results. In: Proceedings of the 7th international symposium on spatial and temporal databases, Redondo Beach, USA, July 1215, pp 236256 [2]. Yoo JS, Shekhar S (2006) A joinless approach for mining spatial colocation patterns. IEEE Trans Knowl Data Eng 18(10):13231337 [3]. Zhang X, Mamoulis N, Cheung DW, Shou Y (2004) Fast mining of spatial collocations. In: Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, USA, August 2225, pp 384393 [4]. Xiao X, Xie X, Luo Q, Ma WY (2008) Density based co-location pattern discovery. In: Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems, Irvine, USA, November 57, pp 110 [5]. Lin Z, Lim SJ (2009) Optimal candidate generation in spatial co-location mining. In: Proceedings of the 2009 ACM symposium on applied computing, Hawaii, USA, March 912, pp 14411445 [6]. Celik M, Shekhar S, Rogers JP, Shine JA (2006) Sustained emerging spatio-temporal cooccurrence pattern mining: a summary of results. In: Proceedings of the 18th IEEE international conference on tools with artificial intelligence, Washington, USA, November 1315, pp 106115 [7]. Celik M, Shekhar S, Rogers JP, Shine JA, Yoo JS (2006) Mixed-drove spatio-temporal cooccurrence pattern mining: a summary of results. In: Proceedings of the 6th international conference on data mining, Hong Kong, China, December 1822, pp 119128 [8]. Yoo JS, Shekhar S, Kim S, Celik M (2006) Discovery of co-evolving spatial event sets. In: Proceedings of the 6th SIAM international conference on data mining, Bethesda, USA, November 2022, pp 306315 [9]. Qian F, Yin L, He Q, He J (2009) Mining spatio-temporal co-location patterns with weighted sliding window. In: Proceedings of IEEE international conference on intelligent computing and intelligent systems, Shanghai, China, November 2022, pp 181185 [10]. Qian F, He Q, He J (2009) Mining spread patterns of spatio-temporal co-occurrences over zones. In: Proceedings of the international conference on computational science and its applications, Seoul, Korea, June 29July 2, pp 686701

eGov Technical Area 10 : Spatial Information Systems

[11]. Wang L, Zhou L, Lu J, Yip J (2009) An order-clique-based approach for mining maximal co-locations. Inf Sci 179(19):33703382 [12]. Huang Y, Pei J, Xiong H (2006) Mining co-Location patterns with rare events from spatial data sets. GeoInformatica 10(3):239260 [13]. Huang Y, Zhang P, Zhang C (2008) On the relationships between clustering and spatial colocation pattern mining. Int J Artif Intell Tools 17(1):5570 [14]. Xiong H, Shekhar S, Huang Y, Kumar V, Ma X, Yoo JS (2004) A framework for discovering co-location patterns in data sets with extended spatial objects. In: Proceedings of the 4th SIAM international conference on data mining, Lake Buena, USA, April 2224, vol 89, p 78 [15]. Yoo JS, Bow M (2009) Finding N-most prevalent colocated event sets. In: Proceedings of the 11th international conference on data warehousing and knowledge discovery, Linz, USA, August 31September 2, pp 415427 [16]. Salam A, Khayal M (2011) Mining top-k frequent patterns without minimum support threshold. Knowl Inf Syst, pp 130 [17]. Yang B, Huang H (2010) TOPSIL-miner: an efficient algorithm for mining top-k significant itemsets over data streams. Knowl Inf Syst 23(2):225242 [18]. Tzvetkov P, Yan X, Han J (2005) TSP: mining top-k closed sequential patterns. Knowl Inf Syst 7(4): 438457 [19]. Salmenkivi M (2004) Evaluating attraction in spatial point patterns with an application in the field of cultural history. In: Proceedings of the 4th IEEE international conference on data mining, Brighton, UK, November 14, pp 511514 [20]. Sheng C, Hsu W, Lee ML, Tung AKH (2008) Discovering spatial interaction patterns. In: Proceedings of the 13th international conference on database systems for advanced applications, New Delhi, India, March 1921, pp 95109

Potrebbero piacerti anche