Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1. Introduction
Improvements in networking & internet technology have had a significant impact for the need to access
thousands of heterogeneous information resources on various platforms [4] since they could be immediately
made available for many users through the existing computer networks. But there is a growing need for tools to
maximize the portability, reusability and interoperability of arbitrary computing services while maintaining the
autonomy of the pre-existing applications in a federated approach [1]. The problems of achieving
interoperability among heterogeneous and autonomous sources, global query decomposition, and
optimization of queries have long been a subject of heavy research in the field.
Web Caching [7] is the caching of web documents (e.g. web pages, images) in order to reduce bandwidth
usage, server load and improve latency time [10]. A web cache stores copy of documents passing through
it.Caching helps bridge the performance gap between local activity and remote content [14]. In short term,
caching helps to improve Web performance by reducing the cost and end-user latency for Web access [5].
2. Literature Survey
One approach to coordinate caches in the same system is to set up a caching hierarchy. With hierarchical
caching, caches are placed at multiple levels of the network [12]. In distributed Web caching systems, [8, 13,
11] there are no other intermediate cache levels than the institutional caches which serve each others' misses. In
order to decide from which institutional cache to retrieve a miss document, all institutional caches keep meta-
data information about the content of every other institutional cache. With distributed caching [2], most of the
traffic flows through low network levels, which are less congested. In addition, distributed caching allows better
load sharing and are more fault tolerant. Nevertheless, a large-scale deployment of distributed caching may
encounter several problems such as high connection times, higher bandwidth usage, administrative issues etc.
There are several approaches [9] to the distributed caching [5] like Internet Cache Protocol (ICP),
which supports discovery and retrieval of documents from neighbouring caches as well as parent caches.
Another approach to distributed caching is the Cache Array Routing protocol (CARP), which divides the URL-
space among an array of loosely coupled caches and lets each cache store only the documents whose URL are
hashed to it.
ISSN: 0975-5462 3206
Neera Batra et. al. / International Journal of Engineering Science and Technology
Vol. 2(7), 2010, 3206-3212
3. Proposed Model
In our proposed model, we perform fast query processing through cache maintenance at three levels: one at user
level, second at proxy server[3] level and the third at database server level. When there is any request for a
query from user, proxy server searches for results in database server cache firstly and results are given back to
the end user. The result is also stored in local cache of user so that if the same query is generated by the same
user again, then the results can be found easily in the local web cache. But if the same query is generated by
another user, then proxy server checks for the results of that particular query in its proxy cache where results of
queries are stored. Due to this, the query submitted by user need not to go to the cache of database server. This
approach reduces time taken for searching results.
User
User Cache 1 User cache2 - cache n
Proxy
Server Proxy
Cache
Figure 2: Proposed model for Web Cache Based Query Optimization in Distributed Database
4. Implementation
Dynamic scripting technologies such as Active Server Pages (ASP) (www.microsoft.com) and Java Server
Pages (JSP) (www.sun.com), allow Web sites to assemble pages based on various run-time parameters in an
attempt to tailor content to each individual user. A major disadvantage of such dynamic scripting technologies,
however, is that they reduce Web and application server scalability because of the additional load placed on the
Web/application server. In addition to pure script execution overhead, the delays caused by dynamic scripting
technologies include: delays due to fetching content from persistent storage (e.g., database systems), delays due
to data transformations (e.g., XML to HTML transformations), and delays due to executing business logic (e.g.,
personalization software). There has been very little work done so far to address the problem of dynamic page
generation delays.
ISSN: 0975-5462 3207
Neera Batra et. al. / International Journal of Engineering Science and Technology
Vol. 2(7), 2010, 3206-3212
requested fragment is not found in cache, then the code block is executed and the requested fragment is
generated and subsequently placed in the cache.
Write to out
Navigational
Component
Code Block
Current
Write to out
Headline
Component
Personalized
Code Block Component
Write to out
A critical aspect of any caching solution is cache management. As the cache becomes full, the effectiveness of
the cache replacement policy dictates the hit rates of the cache, and thus its performance. Our cache replacement
algorithm, which we refer to as Least Likely to be Used (LLU), is based on a predictive technique. When
choosing a replacement \victim, it takes into account not only how recently a cached item has been referenced,
but also whether any user is likely to need the item in the near future. Also, as the underlying source data
changes, some mechanism is required to keep the cached components fresh. Our solution supports several
existing invalidation techniques (e.g. time-based and event-based invalidation) which have been adapted to work
in the context of our component level cache. A simplified depiction (without routers, firewalls etc.) of an end-to-
end web site architecture is shown in figure 4. As the figure shows, the DCA sits adjacent to the server rack,
along with other resources such as site content databases.
Content
Client
Client
Communication Cache
API manager manager
Client
Client
Cache
Client
API
API
Client
ISSN: 0975-5462 3208
Neera Batra et. al. / International Journal of Engineering Science and Technology
Vol. 2(7), 2010, 3206-3212
5. Performance Analysis
The main contribution of this work provides the mechanisms for intelligent and fragmented caching by treating
the static and dynamic user information separately. The tag based classification of dynamically updated
information utilizes the code block to perform the fragmentation. This saves the data retrieval time by
preventing the complete page to be recovered and thus provides query optimization. Moreover, the performance
is enhanced by increased throughput and scalability which are inherently provided by the implementation of
caching.
DCA technique achieves high performance by using fragmented level web caching efficiently in a very
innovative way as compared to the other algorithms. Examples and performance results are presented for the
DCA architecture described in this paper. The performance is evaluated using these measures: 1) processing
time v/s no. of pages to be executed in existing and new scheme.
processing time vs no of pages to be executed in old 7 new scheme
40
Processing time
30
20
10
0 OLD SCHEME
10 20 30 40 50 60 70 80 90 100 NEW SCHEME
No. of pages
Figure 5 : processing time w.r.t. no. of pages to be executed in old and new scheme.
40
Processing time(in ms)
35
30
25
OLD SCHEME
20
NEW SCHEME
15
10
5
0
10 20 30 40 50 60 70 80 90 100
No. of pages
Figure 6: Performance analysis of old & new scheme
This figure shows how DCA technique is efficient in improving Query optimization as compared to old
schemes. In existing schemes[6], processing time for executing the requested pages increases in the same ratio
as the no. of pages increases but in the proposed scheme, processing time increases but not so much as in the
old scheme because it caches the static(reusable) part and generates only the required dynamic part as and when
required. For example, if requested page size is 100 kb consisting of 5 code blocks in which 2 code blocks are
reusable and are available in cache and only 3 code blocks need to be re-generated. We can say that only 60% of
the whole page need to be regenerated. This results in faster generation of results for the requesting queries
using proposed scheme than the old schemes. Hence, it saves the cache storage space and fulfils the aim of
query optimization.
ISSN: 0975-5462 3209
Neera Batra et. al. / International Journal of Engineering Science and Technology
Vol. 2(7), 2010, 3206-3212
Now if a user requests for mi & m j modules which may or may not be located on same page and let mi & m j
be present on pi & pj page (where i=j or i≠j).
In traditional approach (old scheme), total no. of bytes transferred for such requests are:
Bt p b i p b j (3)
Where Bt is the bytes transferred and p b i & p b j are total no. of bytes on ith & jth pages respectively.
In the proposed (new) scheme:
bt nb i nb j (4)
where nb i =no. of bytes in i module th
Since number of bits transferred is an effective parameter to calculate the bandwidth utilization .Now we define
a bandwidth utilization factor that is the ratio of no. of bits transmitted in traditional and the proposed
scheme:
Bt p i p b j
b (5)
bt nb i nb j
To generalize for k requested modules:
i
k
pb
gen i 1
(6)
i
k
nb
i 1
Now another important parameter considered in cache management is query response time .If ‘t’ is the time for
1 byte to travel over network, then in traditional scheme ,total time taken for byte transfer ( t T ) i.e. query
response time is:
k
tT ( Pb (i ))t
i 1 (7)
ISSN: 0975-5462 3210
Neera Batra et. al. / International Journal of Engineering Science and Technology
Vol. 2(7), 2010, 3206-3212
Whereas in the proposed model, total time taken for byte transfer ( T p ) i.e. query response time is:
:
k
T p ( n b ( i )) t
i 1 (8)
Therefore the query response time is enhanced by a factor of Qt when the proposed scheme is used
k k
Qt t T / T p Pb i t / nb i t (9)
i 1 i 1
6.1. An example:
Suppose a user requests for module M1 from P1 and module M3 from P2consisting of 100kb and 200kb data
respectively. In traditional scheme, upon a request from user, both dynamic pages will be executed causing data
transfer in order of 100 for M1, 250 for M2 in P1 and 300 for M1, 200 for M3 modules in P2 respectively causing
total data transfer equal to 850kb whereas in the proposed scheme, only a fragment of P1 and P2 is executed i.e.
100kb of M1 in P1 and 200kb of M2 in P2 respectively causing total data transfer equals to 300kb only.
P1 100 250 X X X
P2 300 X 200 X X
i.e. bandwidth utilization using the proposed scheme is 2.8 times better than using the traditional scheme in case
of given example.
In the proposed scheme, a two dimensional matrix table named Page-Module table shown as table 1 is
maintained in apache cache which keeps the records of all dynamic requests and maintains a record of number
of bytes occupied per module in a page so as to give the total bytes consumed per page.
ISSN: 0975-5462 3211
Neera Batra et. al. / International Journal of Engineering Science and Technology
Vol. 2(7), 2010, 3206-3212
will be sharing all the information of all users on the basis of request given to its local server. For this purpose
the main issue will be security in caching for every user to make synchronization between all the users caching.
References:
[1] Evrendilek, C.; Dogac, A.; Nural, S.; and Ozcan, F.;(2007) : Multidatabase query optimization. Distributed and Parallel Databases, 5 1:
pp. 77–114.
[2] Adali, S. ; Candan, K.S. ; Papakonstantinou, Y. and Subrahmanian, V.S.(2006): Query caching and optimization in distributed
mediator systems. ACM SIGMOD, 3 5, pp. 45–118.
[3] Luo, Qiong; Naughton, Jeffrey (2006):Form-Based Proxy Caching for Database-Backed Web Sites.Rome ,35(1-6),pp.254–314.
[4] White Paper(2006):Distributed Application Platform , Zembu Labs,.
[5] Bhattacharjee, B,et al(2003): Efficient peer-to-peer searches using result-caching. 2nd Int. Workshop on Peer-to-Peer Systems.
[6] Wang, Jia(1999): A Survey of Web Caching Schemes for the Internet, ACM SIGCOMM ,vol. 29,pp. 36 – 46.
[7] Davison, Brian D.(2001):A Web Caching Primer. IEEE INTERNET COMPUTING,5(4),pp38-45
[8] Christian, Pablo Spanner, Rodriguez ,Biersack,and Ernst W.(1994): Analysis of Web Caching Architectures: Hierarchical and
Distributed Cachin,. IEEE/ACM TRANSACTIONS ON NETWORKING. VOL. 9( 4)
[9] Barish, Greg and Obraczka, Katia(2006):World Wide Web Caching: Trends and Techniques. IEEE Communications Magazine.
[10] Rabinovich, M.; Spatscheck, O.(2006):Web Caching and Replication, Addison Wesley.
[11] Feldman, M.; Chuang, J.:Service Differentiation in Web Caching and Content Distribution, http://citeseer.nj.nec.com/554650.html.
[12] Sun, Hairong; Zang , Xinyu and Trivedi Kishor S.(2005): The effect of web caching on network planning, 12( 84).
[13] Chong,. E. I. (2004):Query Optimization in Distributed Database Systems and Multidatabase Systems, Ph.D. Thesis, Northwestern
University.
[14] Florescu, D.,et al(2006):Query optimization in the presence of limited access patterns, ACM SIGMOD, 19(24).
ISSN: 0975-5462 3212