Sei sulla pagina 1di 9

Volume 2, Issue 2, February 2012

ISSN: 2277 128X

International Journal of Advanced Research in Computer Science and Software Engineering


Research Paper Available online at: www.ijarcsse.com

Multidimensional Data Management in Unstructured P2P Networks


T Rajeswara Rao*
Dept. of IT/ UCEV-JNTUK

Prof. O Srinivasa Rao


Dept. of CSE/ UCEV-JNTUK

Dr. MHM Krishna Prasad


Dept. of CSE/ UCEV-JNTUK

Abstract A P2P-based framework is proposed which supports the extraction of aggregates from historical Multidimensional data. This proposed framework provides strong and well-organized query evaluation. When a multidimensional data population is available, data are summarized in a synopsis.The synopsis consists of an index built on top of a set of subsynopses which stores compressed representations of separate data portions. The index and the sub synopses are distributed across the network. A suitable replication mechanism that takes into account the query workload and network conditions are employed. These replication mechanisms provide the suitable coverage for both the index and the subsynopses. Keywords P2P networks, multidimensional data, data management.

I. INTRODUCTION Today, PEER-TO-PEER (P2P) networks have turn out to be very fashionable for exchanging data among large communities of users in the file-sharing context. In this state of affairs, the victory of P2P-based solutions is firmly associated with the use of lossy data compression techniques such as MPEG formats, which give in sensible detail levels in representing large amounts of information and make data exchange possible in practice by significantly reducing data transmission costs. However, the trouble of properly extending-data-compression-based solutions to application contexts other than file sharing has not been severely investigated yet. Particularly, no P2P-based solution has forced itself as an efficient development of traditional distributed databases. This is somewhat astonishing, as the enormous amount of resources provided by P2P networks in terms of storage capacity, computing power, and data transmission capability could successfully hold up data management. From this point of view, one of the application contexts which are expected to benefit from the hold up of a P2P network is the study of multidimensional data. In this state of affairs, information is represented as points in a multidimensional space whose dimensions match up to diverse perspectives over data: users discover data and regain aggregates by issuing range queries, i.e., queries specifying an aggregate operator and the range of the data domain from which the aggregate information should be retrieved. Particularly, we will reflect on the case of analytical applications dealing with historical data, which characteristically need enormous computation and storage capabilities, due to the bulky amount of data which need to be accessed to evaluate queries.

Even though the multidimensional data model is considerably more difficult than the representation paradigm adopted in the file sharing context where data are structured according to <name,file> pairs, analytical applications dealing with historical multidimensional data and file-sharing applications share a fundamental aspect: they can rely on lossy data compression. Indeed, analogously to tools for reproducing audio and/or video files, a lot of applications dealing with multidimensional data can successfully carry out their tasks even in the case that only fairly accurate representation of data is obtainable. For example, in Decision Support Systems (DSSs) or statistical databases, users are frequently worried with performing data examination with the intend of discovering fascinating trends rather than extracting fine-grained information. In this state of affairs, high correctness in less related digits of query answers is not considered necessary, as providing their order of magnitude suffices to position the regions of the database containing related information. At the same time, quick answers to these preliminary queries permit users to heart their explorations speedily and successfully, thus saving huge amounts of system resources. II. FRAMEWORK Our proposal is a framework supporting the sharing and the analysis of compressed historical multidimensional data over an unstructured P2P network. From the user point of view, two tasks are supported: data publication and data querying.. A. Data Publication Let p be a peer which is willing to share a historical multidimensional data set D so that the other peers can create aggregate range queries against it. In order to make its data apt

Volume 2, issue 2, February 2012 for being circulated across the network, p builds a synopsis of D by first appropriately partitioning D, and then, compressing each portion of data in the partition. Peer p also builds an index over these subsynopses, which, again, is appropriately fragmented in order to make it prone to be circulated. Finally, the subsynopses and the index portions are distributed across the network, along with metadata about D. The assignment of data and index portions to peers takes into account the willingness of peers to share their resources. B. Data Querying Exploration queries can be issued by peers to discover the shared data sets in which they may be fascinated. These queries indicate criteria that are matched against the metadata associated with each obtainable data set. The result of the discovery process is a set of matching data sets and for each of them, a set of peers that should be contacted to start the evaluation of range queries, i.e., peers hosting portions of the distributed index that are thus capable to appropriately route range queries. After issuing an explorative query, a peer p can decide to create range queries against a matching data set. To do so, p contacts one of the peers that can route the query toward the peers hosting the subsynopses required for evaluating the query. Finally, p gathers the partial results obtained by these peers and combines them to work out the final answer. The completeness of the answer can be checked through a suitable mechanism which, if some partial result has not been acknowledged yet, allows p to complete the answer without issuing the range query from scratch. The choice of an unstructured P2P network conserves the sovereignty of peers in deciding the amount of storage and computational resources they desire to make obtainable to others. Moreover, participants are permitted to join/leave the system when required, as appropriate mechanisms are designed which dynamically hand over the responsibilities of hosting data and index portions, accurately reacting to peer departures. The framework adapts the distribution of the index and the data to both the attention exhibited by the users and the network conditions. Particularly, with the idea of reducing the number of peers to be accessed for evaluating queries, the observed query workload is broken to force the distribution of replicas across the network: the larger the number of queries involving a certain index or data portion, the larger the number of replicas maintained for that portion. Moreover, in order to take into account the limited storage and computational capabilities of peers, specific load-balancing and fault-tolerant techniques are in use. III. PROPOSED SYSTEM The proposed system involves devising a Peerto-Peer based framework that supports the analysis of historical data that is multi-dimensional. The system enables one to combine P2P networks and data compression to provide a support for the evaluation of range queries, possibly trading off efficiency with accuracy of answers. This system should enable members of an organization to cooperate by sharing their resources (both storage and computational) to host

www.ijarcsse.com (compressed) data and perform aggregate queries on them, while preservation of their autonomy. It involves three steps. They are: partitioning, compressing and indexing. A. Partitioning the Data Domain The partitioning part is to partition the data domain into non-overlapping blocks. These blocks will be compressed separately, yielding distinct sub-synopses. For each of them, a portion of the amount of storage space B chosen to represent the whole synopsis will be invested. The distribution of B among blocks will take into account the following requirements. They are: B must be fairly distributed among blocks and each block must be assigned a small portion of B. the data domain partitioning algorithm is used to do the partitions of the data population in the server side program of the unstructured peer to peer network. By the help of the data domain partitioning algorithm, we are going to publish the document in the unstructured peer to peer network. To develop the proposed system, authors adopted the algorithm proposed by Filippo Furfaro, Giuseppe Massimiliano Mazzeo, and Andrea Pugliese, "Managing Multidimensional Historical Aggregate Data in Unstructured P2P Networks", IEEE knowledge and data engineering, 2010, for Data Domain Partitioning algorithm. To develop unstructured peer to peer network authors used socket programming. In the data domain partitioning algorithm, authors partition the data population document in the network which is to be published in the network. The data domain partitioning algorithm is used to do the partitions the population of the published data in the unstructured peer to peer network. Here, we can get the population partitions to publish in the network. The data domain partitioning algorithm is for only do partition in server program. Data domain partitioning algorithm is expected the data population of to publish data in the peer to peer network. This unstructured peer to peer network is to be used to manage the multi-dimensional data in the network. To manage the data in the unstructured peer to peer network one need to use the indexing algorithm, algorithm pseudo code is as follows. Pseudo code for the data domain partitioning algorithm: Algorithm : Data Domain Partitioning Input : Overall Data Population (D) Upper Bound (Bmax) Lower Bound (Bmin) for the storage space assigned to single block. Output : List of pairs. begin Vector vector=new Vector() spacemax = population size of the data while(spacemax>0) String str= space max .split(Bmin , Bmax) B min = spacemax .indexOf(str) Spacemax = spacemax. split(Bmin) vector.add(str) return vector end

2012, IJARCSSE All Rights Reserved

Volume 2, issue 2, February 2012 The above pseudo code executes at server side for doing the partitions of the overall data population in the unstructured peer to peer network. B. Indexing Technique The proposed system would be the manager the multidimensional data in the unstructured peer to peer network. To manage data in the unstructured peer to peer network we are following the three mechanisms that are partitioning, indexing and compressing. In the above section we described about the partitioning algorithm that is data domain partitioning algorithm. Here we are describing about the indexing mechanism whenever we are going to publish the document in the unstructured peer to peer network. These unstructured peer to peer network is manager for the multi-dimensional data in the network. By using the above data domain partitioning algorithm we are partitioning the document in the server side, and to make the retrieving document as easy we are going to indexing mechanism in the server side where we are going to publish the document.The indexing mechanism is to be followed in the proposed system as stack or queue mechanism in the network. Our proposed system is going use the queue mechanism. That first come first serve policy. Whenever the peers are connecting with the server, the server program is maintaining the connected clients addresses in a variable that is ArrayList variable. the collection framework variable is clients. The clients is the ArrayList object in the server programming. This clients object maintaining the information of the connected peers in the network. It means we are going to store the ip address of the connected peers in the clients variable. whenever we are going to indexing the data partitions in the network we are indexing first partition as 1 and second partition in the another client side as 1.after completion of the all clients we are doing indexing from the first connected client system as 2. We are indexing like this until the partitions are completed. After completion of the indexing we get the partitions with index values. By these index values we are going to store the document partitions in the network. C. Compression Technique At this compression stage, a suitable compression algorithm is run on each of the pairs and each pair consisting of the block of the partition and its corresponding storage space resulting from the partitioning step, and sub-synopses are obtained, where each of them is a compressed representation of bi consuming storage space is less. To avoid the network traffic we are using the compression technique. To consume the network traffic we are going to use the compression technique in the server program of the unstructured peer to peer network. The compression technique we are using in the unstructured peer to peer network is as Zip file format. It reduces the size of the partition files. Whenever we are going to compress the partitions in the server side program we are just creating the zip file by adding the partitions in a single unit. The compressing is prepared on the partitioned data of the to be publish document in the unstructured peer to peer network. There are some more

www.ijarcsse.com mechanism to use compressing the data in the network or anywhere. Here we are following the compression technique that is only the zip file format. IV. EXPERIMENTAL WORK The framework is implemented in Java language and whole experiment is carried out on the windows operating system. With respect to peer to peer network for managing multidimensional data, the user has to run at least one server in the network. The figure 1 shows about the running of the server in the distributed peer to peer network. Suppose the user is running the server in the IP address of the 192.168.1.7 and this system also acts as client.

Fig. 1

The figure 2 shows about the server is running in the one system. Here it shows the message as Server is Ready to Serve. The server is running in the IP address of 192.168.1.7.

Fig. 2

2012, IJARCSSE All Rights Reserved

Volume 2, issue 2, February 2012 The user is connecting the client application to the server with the port number of 192.168.1.7. To connect the client application with the server we are running PortInfo.java. Whenever the user runs the PortInfo.java class, the user get an AWT window with the text box to enter the IP address to be which system the user has to connect with the server.

www.ijarcsse.com

Fig. 5

In the figure 6, the user get the P2PClientGUI window. In this screen the user can publish the document and process the document. In the below screen the buttons publish and the process query are in disabled state. Why because before loading the data the user cant publish the data. Similarly without preparing query the user cant process that query. For this the user is doing the publish and process query buttons as disabled state. Whenever the user clicks the load data button, he is enabled the publish data button, and also process query button is also enabled when prepare query button is clicked.

Fig. 3

Whenever the user selects run file option tool, it displays the below P2PNetwork window. In this window the user have to enter the server running IP address to connect with server.

Fig. 6 Fig. 4

In this window the user enters the IP address of where the server is running. Suppose the server is running at 192.168.1.7 system and the user is trying to connect the system with that server. When the user click the connect & run button in the P2Pnetwork window, the user will get the client window.

The figure 7 shows the list of pair which are connected with the server. Here server is running on the port of 192.168.1.7 and clients are 192.168.1.7 and 192.168.1.3. Now the user can publish the document from the peer which have port 192.168.1.7 as well as 192.168.1.3. The following screen is the output screen for the show peers.

2012, IJARCSSE All Rights Reserved

Volume 2, issue 2, February 2012

www.ijarcsse.com

Fig. 8.2

Fig. 7

To publish the data the user going to load the document file in any one of the peer above showed. Suppose the document is available in the system which has the address of 192.168.1.7. To load the publish data the user is using button load data in the P2PClientGUI window. The figure 8 shows while clicking the load data button. When the user is going to publish the document, he is using the load data button in the P2PClientGUI window. Whenever the user clicks the load data button in it, an open dialog box is opened and the user is going to select DataSet.txt file to publish and click open button in open dialog box.

After loading the data that is DataSet.txt, the user is going to publish the document in the unstructured peer to peer network. To publish the data in the unstructured peer to peer network the user is using the button publish in the P2PClientGUI window. Whenever the user is going to publish, a message dialog will come with message of Document published. It indicates the publisher that the document is published in the unstructured peer to peer network.

Fig. 9.1

Fig. 8.1 Fig. 9.2

In the figure 10, the user has two folders in two different peers named with there IP addresses. That is the one folder

2012, IJARCSSE All Rights Reserved

Volume 2, issue 2, February 2012 with the name 192.168.1.3 in the peer 192.168.1.3 IP address system. And the another folder with the name 192.168.1.7 in the peer 192.168.1.7 IP addressed system. And the data is DataSet.txt file is saved in the two folders in the two peers.

www.ijarcsse.com can see the DataSet_1.txt and DataSet_7.txt are stored in the two different peers.

Fig. 11.1 Fig. 10.1

Fig. 11.2

Fig. 10.2

The figure 11 shows about getting the meta data information about the published data in the peer to peer network. In the following screen, when the user selects meta data button in the P2PClientGUI window. In right screen the user will get the information about published data in the unstructured peer to peer network. The meta data information is displayed in the network which are published previously. That is the user can get the historical published data in the unstructured peer to peer network. The following screen shows the DataSet.txt file is published in the network when two peers are connected in the network. The connected peers are 192.168.1.3, 192.168.1.7 addressed peers. Here the user

In the figure 12, the user has the query window to create a range query for process. Without having the query the user cant process for the data retrieving. For this the user is preparing the range query for processing the published data. To process the range query the user is writing the range query in the Query window.

2012, IJARCSSE All Rights Reserved

Volume 2, issue 2, February 2012

www.ijarcsse.com

Fig. 13.2

The figure 14 shows the selection of the process query button to process the query for retrieving the whole data which is published.

Fig. 12.1

Fig. 12.2

The figure 13 shows query generation in unstructured peer to peer network. Whenever the user clicks on ok button, he Fig. 14 will get the message dialog box with message as query.txt is The figure 15 shows the client machine 192.168.1.3 created in current working directory. When this dialog comes, processed the query and then displaying the data of the query.txt is created in the current working directory. DataSet.txt file. Here the user can see that he is selecting the Display button in peer which is addressed 192.168.1.3. The user can also process the query from address 192.168.1.7 which is connected with the server running program.

Fig. 13.1

2012, IJARCSSE All Rights Reserved

Volume 2, issue 2, February 2012

www.ijarcsse.com performance of Reactive Strategy is essentially independent from the capacity of peers: the diagrams obtained for different capacities are almost equal. In this graph, X-axis represents the query frequency for storage capacities and Y-axis represents the path length of search queries at steady state.

Fig. 15

Fig. 17: Average length of explorative queries at steady state when storage capacity C = 6.

Displaying the data of the document that is DataSet.txt after publishing. To get this data the user is getting data partitions from all connected systems. Suppose to get this DataSet.txt document data, the server program is contacted with two connected peers that is 192.168.1.3 and 192.168.1.7. In their system IP addressed named folders are there.

Fig. 18: Average length of explorative queries at steady state when storage capacity C = 36.

Fig. 16

From the experimental results we obtained, the following graphs can be drawn which represents the comparison between Path Based Strategy and Reactive Strategy. The graphs in figures 17 and 18 show the path length of search queries at steady state with respect to query frequency for different storage capacities. These diagrams allow us to appreciate the sensitivity of the two replication strategies to storage capacity and query frequency. Very interestingly, the

V. CONCLUSIONS We implement a framework for sharing and performing analytical queries on historical multidimensional data in unstructured peer-to-peer networks. In our approach, the resources are maintained across P2P network for the possibility of accessing and posing queries against the data published by others. Our solution is based on suitable data partitioning and indexing techniques, and on mechanisms for data distribution. The testing results showed the effectiveness of our approach in providing fast and accurate query answers, and ensuring the robustness that is mandatory in peer-to-peer setting. We can enhance our future work for managing multidimensional data of picture, audio and video. We also need to use the indexing algorithms while saving the metadata of the published document. REFERENCES
[1] Filippo Furfaro, Giuseppe Massimiliano Mazzeo, and Andrea Pugliese, "Managing Multidimensional Historical Aggregate Data in

2012, IJARCSSE All Rights Reserved

Volume 2, issue 2, February 2012


Unstructured P2P Networks", IEEE knowledge and data engineering, 2010. Sabu M. Thampi, K.Chandra Sekaran, 2009, Review of Replication Schemes for Unstructured P2P Networks. R. Saravanan ,Nov 2010 , Processing of Query in Peer to Peer Networks. T.A Welch, 1984, a Technique for High-Performance Data Compression. O.D. Sahin, S. Antony, D. Agrawal, and A. El Abbadi, 2005, PROBe: Multi-dimensional Range Queries in P2P Networks. Bin Liu, Wang-Chien Lee, Dik Lun Lee, Supporting Complex Multidimensional Queries in P2P Systems. Murat Demirbas, Hakan Ferhatosmanoglu, Peer-to-Peer Spatial Queries in Sensor Networks. [8] [9]

www.ijarcsse.com
Swarup Acharya, Viswanath Poosala, Sridhar Ramaswamy, Selectivity Estimation in Spatial Databases. P. Lalitha Kumari, P. Rama Rao, Chinnam YuvaRaju, 2011, efficient way of Data Managing for Range Queries in Unstructured Peer to Peer Networks. Rolando Blanco, Nabeel Ahmed, David Hadaller, and L. G. Alex Sung Herman Li, and Mohamed Ali Soliman, 2006 A Survey of Data Management in Peer-to-Peer Systems. Zhonghong Ou, 2006, Structured Peer-To-Peer Networks: Hierarchical Architecture And Performance Evaluation. E. Cohen and S. Shenker, Replication Strategies in Unstructured Peer-to-Peer Networks, Proc. ACM SIGCOMM 02, 2002.

[2] [3] [4] [5] [6] [7]

[10]

[11] [12]

2012, IJARCSSE All Rights Reserved

Potrebbero piacerti anche