Sei sulla pagina 1di 10

International Journal of Computer Science Engineering and Information Technology Research (IJCSEITR) ISSN 2249-6831 Vol.

3, Issue 1, Mar 2013, 123-132 TJPRC Pvt. Ltd.

A WEB SEARCH APPROACH FOR BINARY STRING IN XML DATA


B. A. JADHAWAR, A. S. TAMBOLI & S. S. PALKAR Department of Information Technology, ADCET, Ashta, Maharashtra, India

ABSTRACT
In many emerging applications, such as XML publishing systems [2], electronic commerce, and intelligent Web searching, ordered XML data are available in query processing [3]. As is well known, the elements in the XML tree are intrinsically ordered, in a manner which is referred to as the document order (i.e. the element sequence). The relative order of two paragraphs in XML is important, because it has an influence on the semantics of the XML [4]. Hence, it is very important to support the document order when the XML is updated. Thus, many studies have been done to support the document order in XML updating [-7]. However, the update costs of these approaches are still expensive. Therefore, this proposed paper focuses on how to efficiently update ordered XML documents, without sacrificing query performance. And also calculate Online and Offline time of query performance.

KEYWORDS: String, XML Data, Order Sensitive Update, Search Query INTRODUCTION
A number of labeling schemes have been designed to facilitate the query of XML, based on which the ancestordescendant relationship between any two nodes can be determined quickly. Another important feature of XML is that the elements in XML are intrinsically ordered. However, the label update cost is high based on the present labeling schemes. They have to re-label the existing nodes or re-calculate some values when inserting an order-sensitive element. Thus, it is important to design a scheme that supports order-sensitive queries, yet it has low label update cost. It is planned to design a binary string prefix scheme which supports order-sensitive update without any re-labeling or recalculation. Focusing Intelligent Web searching technique, we will generalize idea of XML parser. We can parse XML in many standardized formats like for searching utility, banking utilities (ATM transactions).To make single utility program to clone all above utilities, we will try to work out accordingly. XML has become a standard to represent and exchange data on the web. In the definition of XML, one Element is allowed to refer to another, therefore theoretically an XML document is a graph. However for simplicity, most of the research works process queries over the XML data that conform to an ordered tree-structured data model. Fig.1 shows an ordered XML tree. A binary tree is made of nodes, where each node contains a "left", a "right" node, and a data element. The "root" node (book) points to the topmost node in the tree. The left (title) and right (author) node recursively point to smaller "subtrees" on either side. A null node represents a binary tree with no elements i.e. the empty tree. The formal recursive definition is: a binary tree is either empty (represented by a null node), or is made of a single node, where the left and right nodes (recursive definition ahead) each point to a binary tree.

124

B.A. Jadhawar, A.S.Tamboli & S.S.Palkar

Figure 1: An Ordered XML Tree Literature Survey Many studies have been done to support the document order in XML updating; however, they do not consider how to process the deleted labels [-7,] proposed a new reuse algorithm based on QED [7], which can reuse all of the deleted labels in order to prevent the label size from rapidly increasing. In the context of XML twig query processing, the extended Dewey scheme can derive the names of all the elements along the path from the root. OrdPath [8] is a labeling scheme that can essentially process order-sensitive updates. OrdPath is similar to DeweyID, but it only uses odd numbers at the initial labeling. When the XML tree is updated, it uses the even number between two odd numbers to concatenate another odd number. OrdPath is tolerant to insertions; however, as pointed out by [4],it suffers from poor query performance. Wu et al. proposed the prime number labeling scheme to label XML trees. However, Prime needs to recalculate the SC values based on the new ordering of the nodes. The VLEI Code and Sedna scheme [6] use a bit sequence code and apply this code to XML labeling to reduce the cost of insertions. However, the query performance is aggravated. Proposed Work The research objectives are as follows: To implement an Intelligent Web searching technique and generalize idea of XML parser. To parse XML in standardized format: Searching Utility Approach using an Improved Labeling (IBSL) algorithm. To compare the query response times for testing with online and offline cases. To insert the nodes (put another query) and check the response time. To check delete node condition and memory status. Binary String

The input keyword is considered as a root node and output results are results after search which can be stored as a binary format strings / labels. These binary strings will be further stored at random memory locations which depends on operating system algorithm.

TECHNICAL PRELIMINARIES
XML Tree Labeling and Update in IBSL Scheme elaborates on IBSL, which is a binary-string-based prefix scheme. The most important feature of IBSL is that it compares the labels based on their lexicographical order rather than their numerical order. With IBSL, labels can be inserted between any two consecutive labels with their order being kept and without relabeling the existing labels.

A Web Search Approach for Binary String in XML Data

125

Definition :-( Lexicographical Order <): Given two consecutive binary strings Sleft and Sright, Sleft is said to be lexicographically equal to Sright if they are exactly the same. To determine whether Sleft is lexicographically smaller than Sright, i.e., Sleft < Sright, the following procedure is performed. The lexicographical comparison of Sleft and Sright is performed bit by bit from left to right. If the current bit of

Sleft is 0 and the current bit of Sright is 1, then Sleft < Sright and the comparison is stopped, or If len(Sleft)< len(Sright), Sleft is a prefix string of Sright, and the remaining bits are 1 except for the prefix string of Sleft then Sleft < Sright and the comparison is stopped, If len(Sleft) > len(Sright), Sright is a prefix string of Sleft, and the remaining bits are 0 except for the prefix string of Sright, then Sleft < Sright and the comparison is stopped. For example, given two binary strings 10 and 110, 10 < 110 lexicographically (cond.1). Given two binary strings 1100 and 11001, 1100 < 11001 (cond.2), while 11000 < 1100 (cond.3).

Figure 2: Example of IBSL

Figure 3: Example of Update Operation

In Fig. 2 shows an example of IBSL. The prefix labels of the three child nodes (non-shaded and non-dotted circles: 10, 110, 1110) are all empty strings; thus, the self label is exactly the same as the complete label for the three-child nodes. The label of the node concatenates its parents label (prefix_label) and its own label (self_label). In Fig. 3 the shaded circle denotes the leaf node update and the white dotted circle denotes subtree update. Based on algorithm described a binary string can be inserted between two existing labels without the need for relabeling. For example, when inserting node a in Fig. 3 (Case 1), the self_label of a is 100 (10 0 100). When inserting node b (Case 2), since the left self_label of b is 10 with length 2 and the right self _label of b is 110 with length 3, we directly concatenate one more 0 after the right self_label (110 0 1100), whereupon the self_ label of b is 1100.Same procedure is follows by self_lable c andd.

126

B.A. Jadhawar, A.S.Tamboli & S.S.Palkar

PROPOSED ALGORITHMS Improved Binary String Labeling Algorithms


Definition (Label for Node N) The root is labeled by an empty string. For a nonroot element, the ith child element of parent N is labeled with label (N):1i0, where the left-hand side of the delimiter is prefix_label and the right-hand side is self_label. Algorithm1: Algorithm 1 gives the details of the operation employed to label nodes with IBSL. In IBSL, we carry out a breadth-first traversal of an XML tree and assign each node a label. Algorithm 1 is the simple implementation of Definition 1, where the delimiter . is employed to assist the user in understanding the relationship (i.e. parent_child relationship) between the nodes. For example, by looking at node 110.100, one can see that it is a child of node 110. Algorithm.2: is the foundation of this paper, which can help to process updates efficiently. In Fig.4.2, the shaded circle denotes the leaf node update and the white dotted circle denotes the subtree update. Based on Algorithm .2 we can insert a binary string between two existing labels without the need for relabeling. Algorithm.3: The main idea is that we compare Nleft and Nright bit by bit to find Nnew such that Nnew has the smallest length of all of the labels between Nleft and Nright lexicographically. Case 1 handles the case when a label is inserted before the leftmost label. When inserting node a in Fig.4.2, Ntemp is Nright concatenated with a single bit 0. Case 2 handles the case when a label is inserted between two labels. Case 2(a) corresponds to the case when the length of Nright is larger than that of Nleft. If Nleft (say, 100) is the prefix string of Nright (say, 1001), then Nnew is Nright concatenated with a single bit 0, i.e., Nright (1001) 0 10010). If Nleft (say, 100) is not the prefix string of Nright (say, 1100) and the length of Nleft is not equal to P, then the shared common bits from the two labels are extracted, which is again concatenated with a single bit 0. Otherwise, if Nleft (say, 10) is not the prefix string of Nright (say, 110) and the length of Nleft is equal to P, then Nnew is Nright concatenated with a single bit 0, i.e., Nright (110) 0 1100. Case 2(b) corresponds to the fact that the length of Nleft is equal to the length of Nright. If all of the extracted same bits of Nleft (say, 1100) and Nright (say, 1110) are 1, then Nnew is Ntemp concatenated with a single bit 0, i.e., Ntemp (11) 0 110. Case 2(c) corresponds to the fact that the length of Nleft is larger than that of Nright. If Nright (say, 110) is the prefix string of Nleft (say, 1100), then Nnew is Nleft concatenated with a single bit 1, i.e., Nleft (1100) 1 11001. Otherwise, if Nright is not the prefix string of Nleft, then Nnew is Ntemp, where Ntemp = substring (Nright, 1, P-1). Case 3 handles the case where a label is inserted after the rightmost label. When inserting node c in Fig.4.2, Ntemp is Nleft concatenated with a single bit 1. In addition to the algorithm.3 described in [1], the additional provision is made to work with URL. These are mention below. getResult: url BinConvert: return binary url if NewUrl

A Web Search Approach for Binary String in XML Data

127

memAlocate: randomLocation else referPrevious: retrieve url if NextStringSearch go to step 1: if SearchSave: memoryAlocate if garbageCollected: updateLocation end if

Experimentation and Results The experimentation is carried out for finding the response time. The results are calculated for online versus offline responses by using an Improved Binary String Algorithm. Experimentation Setup The Experiments were carried out on an Intel(R) Core(TM) I3 Processer, 3 GB of RAM running Microsoft Windows XP Professional Service pack 2. The XML data sets (and their corresponding labels) were stored in shield SQL, Yog SQL; the setup also requires database connectivity to MYSQL and SQL 2000. Keyword Datasets The testing datasets are from the Google Web IT -gram Version 1 data set repository. We choose a variety of keyword datasets with different characteristics, in the number of features, the number of labels, the size of datasets and data distributions. Tables 1 show the parameters used in our search engine model. We choose the number of search result retrieval time based on the predefined input root nodes. Table1: Books and University Dataset Datasets Sr. No 1 2 3 4 6 1 2 3 4 6 Leaf Nodes Software Engg. C# Mobile Technology Database Engg. Computer Graphics Computer Networks Com.C. in S.U.K Agree C in pune Law C. in pune Medical C in S.U.K Sci.C.in pune WCE, Sangli IBSL Time(Sec) 0.9867 1.0616 1.0197 1.0384 0.703 0.87 1.032 1.0168 0.8622 1.0149 0.9926 0.8717

Books

University

Table 1 shows the Books and University dataset as root. It also mentions total six number of test leaf nodes for each dataset. We can measure search result time for each leaf node in Seconds.

128

B.A. Jadhawar, A.S.Tamboli & S.S.Palkar

Experimental Results The input keyword can be predefined for experiment purpose. The input keyword is considered as a root node and outputs are results after search that can be stored as a binary format strings / labels. These binary strings will be further stored at random memory locations which depends on operating system algorithm. IBSL improved the search time due to page rank algorithm by indexing each domain name of result set. The memory usage due to IBSL implementation is less than conventional search method. This partially depends on system configurations. But it will give good results if it is hosted on Win NT Exchange Servers. A case study Books web search considers two types of query searches; one is on the basis of category word and the other is category.

Figure 4: (a) Online Search Type Fig.4 (a) shows Books is a root node and mobiletechnologybyjochenSchiller as a leaf node. In category type, only can search root node. In a binary string every space is a same binary value if we can put query with space which is considered as end of the string. Therefore in IBSL we can put sentence in continuous.

Figure 4: (b) XML Format Fig. 4(b): shows an XML Format of the Google database with type category word shown in Fig.4 (a)

Figure 4: (c) IBSL Online Search

A Web Search Approach for Binary String in XML Data

129

Fig.4 (c) shows IBSL Search of similar query carried out in Fig.4 (b) the time requirement of the query processing in Fig.4 (c) is in seconds. If you write something in Feedback box and click on submit Feedback button, it means our result is inserted in database else It is automatically deleted. You can directly search another query.

Figure 4: (d) Offline Search Type Fig. 4 (d) shows offline process search type. If you search same query second time you can get the message saved in Binary tree storage. After this search we will get Offline XML format same as online search type as show above. Fig 4(e) Shows IBSL offline search time in seconds which is less than the online search time.Fig. 4.(f) Shows the four types of datasets as root node response time in seconds with various types leaf nodes by using Improved Binary String Labeling (IBSL) Algorithm.

Figure 4: (e) IBSL Offline Search

IBSL Time 2 (sec)

1.5 1 0.5 0
Books Hotels

Books Time (sec) University Time (sec) Hotels Time (sec) Temples Time (sec)

Datasets
Figure 4: (f) Response Time of Datasets Comparison for IBSL Online and Offline Time In this section, comparison study is performed: Online: When Google search Engine is in connection.

130

B.A. Jadhawar, A.S.Tamboli & S.S.Palkar

Offline: Retrieved time of results in the database.

2 1.5 IBSL Time (sec) 1 0.5 0

IBSL Online Time(sec) IBSL Offline Time(sec)

University
Figure 5: IBSL Online and Offline Time of University Dataset As our results are stored in binary format, it will take less search time. So the IBSL offline time is obviously less than the IBSL online Google search time. Figure 5 below indicates the comparison study of IBSL Online and Offline response time. Comparison for Existing Scheme and IBSL Scheme Time In this section, work demonstrates comparative results of the Existing scheme [2] and the IBSL scheme [1]. The time measured is in seconds. From the Fig 6 it has been observed that, for existing scheme the experiment are carried on the processor specification mention in the ref [1].However, implementing on similar platform or on specified, the time required for IBSL scheme is less than the Existing scheme. One of the reasons for obtaining speedy operations with IBSL scheme that it avoids relabeling of the nodes in the consequent queries. Overall comparison for labeling time is the most important factor as; cost of any software definitely depends on memory issues and response time. As our results are in binary format it will take less time to search result collectively.

Figure 6: Comparative Results of Existing Scheme and IBSL Scheme So the IBSL response time is obviously less than reference conventional Google search time. Effect on economy is directly proportional to the investment, efficiency and time ratio. Less memory is utilized due to binary string conversions, therefore less time required. Remark In this dissertation work, it is demonstrated that IBSL algorithm can be applied to areas where the data search is extensive like in case of search engine optimization. Data strings can be converted in to binary format and can randomly be saved on memory locations instead of database. The Offline search time in seconds is less than online search time. This will reduce data storage cost as well as memory efficiency.

A Web Search Approach for Binary String in XML Data

131

FUTURE SCOPE
As an extension to existing work we will implement data storage node deletion from random memory location. For which we need to develop rigorous permutations and combinations of logic which leads to identification of operating system algorithms and study of same to develop reverse algorithm for different operating systems.

REFERENCES
1. Hye-Kyeong Ko and SangKeun Lee A Binary String Approach for Updates in Dynamic Ordered XML Data IEEE Transactions On Knowledge And Data Engineering, Vol. 22, No. 4, 602-607,April 2010 2. R. Agrawal, A. Borgida, and H.V. Jadadish, Efficient Management of Transitive Relationships in Large Data and Knowledge Bases, Proc. ACM SIGMOD, pp. 23-262, 1989. 3. C. Zhang, J.F. Naughton, D.J. DeWitt, Q. Luo, and G.M. Lohman, On Supporting Containment Queries in Relational Database Management Systems, Proc. ACM SIGMOD, pp. 42-436, 2001. 4. C. Li, T.W. Ling, and M. Hu, Efficient Updates in Dynamic XML Data:From Binary String to Quaternary String, Very Large Data Bases J., vol. 17,no. 3, pp. 73-601, 2008. 5. E. Cohen, H. Kaplan, and T. Milo, Labeling Dynamic XML Trees, Proc.Symp. Principles of Database Systems (PODS), pp. 271-281, 2002. 6. A. Fomichev, M. Grinev, and S. Kuznetsov, Sedna: A Native XML DBMS,Proc. Intl Conf. Current Trends in Theory and Practice of Computer Science (SOFSEM), pp. 272-281, 2006. 7. C. Li, T.W. Ling, and M. Hu, Reuse or Never Reuse the Deleted Labels in XML Query Processing Based on Labeling Schemes, Proc. Intl Conf. Database Systems for Advanced Applications (DASFAA), pp. 69-673, 2006. 8. S. Boag, D. Chamberlin, M.F. Fernandez, D. Florescu, J. Robie, and J. Simon. XQuery 1.0: An XML Query Language, W3C working draft 04, Apr. 200.

Potrebbero piacerti anche