Experimental Study of Parallel Downloading Schemes For Internet Mirror Sites

Experimental Study of Parallel Downloading Schemes
for Internet Mirror Sites

Spiro Philopoulos Muthucumaru Maheswaran
Dept. of Elect. and Comp. Eng. Dept. of Computer Science
University of Manitoba University of Manitoba
Winnipeg, Canada Winnipeg, Canada
email: umphilop@cc.umanitoba.ca email: maheswar@cs.umanitoba.ca
ABSTRACT cies and primary purposes. Mirror servers were tradition-

ally used to improve the availability of the content. Re-
A common method used to reduce document retrieval cently, however, several projects have examined the con-
times is the use of content replication i.e., mirror servers. cept of concurrently using multiple mirror servers to down-
The mirror servers provide several alternate sites to download content for a single client. The concurrent or parallel
load a specific document and were traditionally used to in- downloading is performed to reduce the download time for
crease the availability of content. Recently, several stud- a client.
ies focused on using multiple mirror sites to concurrently One of the key problems with paraloading schemes is
download portions of a document from a set of mirror sites. the mirror site selection problem. This problem is compli-
Following are some of the issues involved in using multiple cated because accurate “performance” information to select
mirror sites concurrently: (a) selection of the “best” mir- the best set of mirror servers is not available to the clients.
ror servers from the client, (b) coping with dynamic over- Additionally, the problems mentioned above for the sin-
loading of the network and servers, and (c) coping with gle server case still exist. Further, network conditions can
faults. This paper briefly examines two existing schemes change during the download that can lead to decreased per-
for concurrent downloading or parallel-access download- formance. This observation motivated us to examine adap-
ing, or paraloading as it is called. It proposes a third par- tive parallel download techniques that use a set of mirror
aloading scheme called the Dynamic Parallel Access. The servers. The membership of the mirror server set may dy-
performance of this scheme is experimentally evaluated. namically change. In paraloading segments of the file are
Recommendations for further improvements are also dis- downloaded from each server in the set and are then re-
cussed. assembled at the client. The parallel-access scheme was
first proposed by [4]. Following are some of the advantages
of paraloading:
1. Introduction
As various applications are Internet-enabled, the number Depending on the topology, in the ideal case, the ag-
of repositories in the Internet that hold valuable content are gregate bandwidth of all individual connections may
increasing. For example Internet-based repositories are be- increase the overall throughput to the client.
ginning to hold content such as multi-gigabit movie files,
operating system and other large software distributions, and Because multiple connections are used, paraloading is
large multimedia documents. This creates a need for the more resilient to link and general route failures.
clients to find faster downloading schemes for both online
and offline usage. The conventional way of downloading The inherent load balancing associated with parallel
files from an Internet-based server is to open one or more loading due to the fact that connections are spread
connections between it and the client. While opening mul- out to many servers and not just one. Therefore, the
tiple connections might reduce download times compared scheme can be immune to individual server load fluc-
to a single connection, due to the following issues the per- tuations, bottleneck link bandwidth and traffic fluctu-
formance enhancement can be limited: (a) server load and ations.
capacity, (b) bottleneck link bandwidth, (c) instantaneous
bandwidth, (d) multi-connection client overhead, and (e) Section 2. examines three different paraloading
interconnection resource allocation. schemes (two existing and one new) all of which use
Replicating content such that it can be accessed from application-level negotiation to schedule the transmission
multiple locations is one way for decreasing the download of different segments of a file. Section 3. describes the ex-
times. Caching, content delivery, and mirroring are some periments performed to evaluate the different schemes and
of the techniques that replicate content using different poli- examines the results obtained from the experiments.
2. Parallel Downloading Schemes ously in parallel by another server leading potentially to
a faster download for that block. In this scheme, faster
2.1 History-Based Parallel Access servers provide larger portions (more blocks) of the total
transmitted data.
History-based parallel access [4] is a relatively simple Based on the paraloading scheme proposed in [4], [3]
scheme in which information regarding the previous trans- proposed a modified paraloading scheme which is essen-
mission rates between the client and every mirror server is tially identical to that proposed by [4] but with the follow-
recorded in a database and used to determine how large a ing three enhancements:
file segment is downloaded by each server. More specif-
ically, forM mirror servers, the file will be divided into
Minimizing the delay at startup by piggybacking onto
M unequally-sized disjoint segments each of which will be
a data block a request for the file size and a request for
the list of mirror servers that posses the requested file.
assigned to a mirror server based on “expected” data rates
derived from historical data between the client and each Minimizing the idle time between block downloads by
mirror, i.e., the faster a server is (based on the history) the pipelining the block requests.
larger the file block it will be assigned to be downloaded Minimizing the idle time in downloading the last
from that server. File blocks are downloaded from each in- block in one of three ways: a) use small block sizes b)
dividual server using the HTTP 1.1 byte range header fea- dynamically adjust file block size for that last blocks
ture resulting in a server-side transparent solution requiring c) send requests to the idle servers to download the
modifications only on the client side. remaining portions of the last block.
The problem with the history-based parallel access is
of course the validity of the recorded transmission rates i.e.,
how close those recorded rates are compared to the actual
2.3 Dynamic Parallel Access
transmission rates in future downloads. The larger the di- In this section a new type of paraloading scheme developed
vergence between historical and (future) actual data trans- by the authors of this paper is examined, called dynamic
mission rates, the worse the performance is since slower parallel access downloading. This new paraloading method
servers will be assigned file blocks larger than should have is based to a certain extent on the semi-dynamic paraload-
been and vice versa for mirror servers faster than what his- ing scheme first proposed by [4], as examined above, how-
torical data “proclaims.” One issue that must be addressed ever it was designed with large size file transfers in mind. In
with such a scheme is how exactly is server transmission this new scheme, like those before, the client segments the
rate data obtained and updated. file into fixed-size blocks which are requested and down-
loaded from the individual mirror servers. More specifi-
2.2 Semi-dynamic Parallel Access cally, operation is as follows.
Initially connections are opened to all the mirror
The semi-dynamic parallel access downloading method servers. In the current implementation the list of available
was first proposed by [4], where it is referred to as dynamic mirror servers is obtained from a file containing the server
parallel access downloading but in this paper is referred list, however in the future for actual practical use there must
to as semi-dynamic in order not to confuse with our own be a way to dynamically obtain the list of available servers.
parallel downloading method examined next. The semi- This could be done in various ways, such as retrieving the
dynamic scheme is conceptually simple: initially the re- server list from some type of a directory service, or by ex-
ceiver will obtain the size of the file it wishes to download tending the DNS system to provide such information [2].
(e.g., by polling one of the mirror servers) and then the file In order to reduce overhead, persistent TCP con-
is segmented by the receiver into equal-size blocks. The nections are used between the client and each server so
receiver will request at the beginning from all the mirror that the TCP connection-setup three-way handshake de-
servers to download one block. Once a server has com- lay is avoided and also so that several slow-start phases
pleted sending the requested file block, the client will re- are avoided. This paraloading scheme, as opposed to the
quest a new (undelivered) block from the server. This history-based and semi-dynamic parallel access paraload-
will continue until all file blocks have been downloaded, ing methods examined before, is not based on the HTTP
at which point the receiver will reassemble the file from the protocol (using the HTTP byte range header to download
individual downloaded blocks. As in history-based paral- a block of a file), but uses a proprietary paraloading server
lel access downloading, the HTTP 1.1 byte range feature and client running on top of the TCP transport layer proto-
is used to download individual blocks of the file from each col.
mirror server. The file size is obtained from the first server with
In order to reduce overhead, persistent TCP connec- which the client comes into contact with and thus no time
tions are used between the client and each server. One en- is lost by performing separate server probing for the ex-
hancement used is that if there are less file blocks left than plicit purpose of obtaining the file size value. The file, as
servers, a file block that has already been requested but not mentioned above, is divided into fixed-size blocks and a re-
yet completely downloaded can be downloaded simultane- quest is made to each server to download a distinct block.
Once a given server has completed the assigned file block server upscaling testing will commence. Namely, if a sig-
download a new block for downloading is assigned to it. nificant decrease in the aggregate download rate exists for
Currently the block size used is set to 1 Mbyte. The selec- a sustained period of time then an additional server will be
tion of the block size is an important issue, in which three added for use in paraloading. More specifically, at periodic
facts should be taken into account: time intervals (currently every 30 seconds, although this
value can be adjusted) testing will start by monitoring the
The file block size should be such, such that the num- aggregate download rate twice at given intervals (10 sec-
ber of blocks is larger than the number of servers. Oth- onds after test commencement and 10 seconds after that,
erwise the faster servers will exhibit large idle times. again these values can be varied). If at both of those time
intervals there is a significant drop in the aggregate down-
Each block should be small enough so as to provide a load rate (of 15% or more, although as previous testing pa-
fine enough granularity, for the same reason as men- rameters this can be varied) then it is concluded that there
tioned in i). is a sustained drop in the download rate and to remedy this
On the other hand, each block should be large enough a mirror server, if an available one exists of course, is added
so as to reduce the number of requests that need to be for use in downloading to try to increase the rate.
made to servers, thus reducing the ratio of idle time to The fact that paraloading starts using all available mir-
download time. ror servers and then downscales as opposed to for example
starting from one server and then adding mirror servers un-
The major point of difference between this paraload- til there was no substantial increase in bandwidth is deliber-
ing algorithm and semi-dynamic paraloading, is that the ate. The reason is that the purpose is to decrease download
number of mirror servers used in downloading does not re- times and thus bandwidth underutilization (using too few
main static. After a connection is established to each server servers) is a much more significant factor than bandwidth
and a block download request is made to every server, overutilization (using too many servers). While bandwidth
the so-called server downscaling testing commences in the underutilization results in lost time, bandwidth overutiliza-
case that 4 or more servers are currently being used. With tion simply results in temporary use of unneeded resources
downscaling testing, the transmission rate of every server that will be released eventually.
is monitored during the parallel download. At given time One observation that should briefly be made here is
intervals, the slowest of the servers (based on the recorded the duration of the testing intervals used in server down-
transmission rates) is selected to remain idle for a period of scaling testing as examined above. On the one hand it is
time by not being given any new block download requests. desirable that testing periods be as short as possible so as to
After the testing time has elapsed, the aggregate download complete mirror server deletion as quickly as possible. On
transmission rate (i.e. the sum of the individual transmis- the other hand though these testing intervals must be long
sion rates of each server) is compared to the aggregate data enough so as to obtain valid data (transmission rate read-
rate of the time before the selected server was made idle. ings) that can be used to make a valid decision. In other
If the new aggregate rate is lower than the old aggre- words when a server is made idle in order to measure the
gate rate by less than a certain threshold percentage (cur- effects on the aggregate download rate, sufficient time must
rently a value of 15% is used, but can be varied if desired) elapse to allow the network to enter into a steady-state so
or the new rate happens to be equal or even higher, then to speak so as to measure the real effects on the aggregate
the server is deemed to be unnecessary since it offers no download rate. The value used currently, and believed to
substantial increase in bandwidth and is deleted from the satisfy both requirements stated above, is 10 seconds.
list of active mirror servers for the current download and
is used no further in downloading, thus freeing up unnec- 3. Experimental Results and Analysis
essarily tied up server and network resources. In this case
server downscaling testing will continue by proceeding to Experiments were conducted by measuring the download
the next server (which will be the slowest among the cur- times using various downloading methods. More specifi-
rently active servers, as before). Otherwise, if the drop in cally, the download times using dynamic paraloading were
the aggregate download rate is above the threshold percent- compared to those of single server FTP file downloading
age, then the server is taken out of the idle state and used and single server multiple parallel connection downloading
again in downloading. In addition, server downscaling will using the dynamic paraloading client and server.
cease in this case since it is considered that we have now In all tests conducted clients from the same domain
reached the “ideal” number of servers needed i.e. the least were used along with a multitude of geographically dis-
number of servers required to give the maximum possible persed servers in Canada and the United States. More
download rate. Server downscaling will also terminate at specifically, a total of eight servers were used: three at the
any time if less than four servers are currently active par- University of Victoria in British Colombia, two TRLabs
ticipating in the download process. servers in Winnipeg and Regina, two servers at the Pur-
After server downscaling has terminated (i.e. no more due University and an additional server at the University of
servers will be deactivated in the particular download), Illinois.
1600 250
1400
200
loading time/ (sec)

1200
loading time/ (sec)
150
1000
800 100
600
50
400
200 0
11
13
15
17
19
21
23
0 hour of day (2:30AM - 2:30PM)
1 3 5 7 9 11 13 15 17 19 21 23
Multiserver Paraloading
hour of day (2:30 AM - 2:30 PM)
Fastest Single Server Paraloading
Multiserver Paraloading Fastest FTP Slowest FTP Slowest Single Server Paraloading
Figure 1. Download times of multi-server dynamic par- Figure 2. Download times of multi-server dynamic par-
aloading and single server FTP. aloading and single-server multiple connection paraload-
ing.
The tests (dynamic paraloading, single-server multi-

ple connection paraloading and simple single-server FTP also the reason for which server downscaling, as explained
downloading) were conducted over a 24-hour period with before, was added to the dynamic paraloading scheme.
results being obtained every hour in order to get results for Figure 2 compares single-server multiple connection
the performance of each of the three downloading schemes paraloading results to those obtained for multi-server dy-
used under relatively heavy and varying traffic conditions namic paraloading. Here we notice that mutli-server par-
(during the day) and under lighter and less varying traffic aloading is somewhat slower than the fastest case of single-
conditions i.e during the early and late hours. server multi-connection paraloading. This is not unex-
In testing dynamic parallel access downloading, all pected as its only natural that in multi-server paraloading
eight remote hosts were used as mirror servers to download the slower servers will degrade the performance and will
a 45 Mbyte file. The same 45 Mbyte file was used for test- very likely be somewhat slower than when using the same
ing the other two single-server downloading schemes also, number of connections to the fastest server(s). The advan-
testing with many different servers. For the single-server tage of multi-server paraloading even in this case is the bet-
multiple connection paraloading scheme, a single mirror ter load balancing it achieves by spreading the load across
server was used each time with eight connections between multiple servers and not just one server where with multiple
client and server being used, i.e. as many as the number of connections to one server the server will very soon become
mirror servers used in multi-server dynamic paraloading. congested. The case is the opposite though for the slowest
Figure 1 compares the results obtained for multi- single-server multi-connection paraloading, where multi-
server dynamic paraloading with those obtained for single- server paraloading significantly outperforms it due mainly
server FTP downloading (fastest and slowest cases). From to one of the advantages of using multiple servers which
the results it can be seen that while paraloading signif- is higher performance mirror servers compensating for the
icantly outperformed the slowest FTP downloading case, lower performance servers.
being approximately 10 times faster, the difference is sig- Figure 3 summarizes the results displayed in the pre-
nificantly smaller when compared to the fastest FTP down- vious two figures comparing multi-server dynamic par-
load, but still much faster proving the obvious benefit of aloading performance to the best FTP and single-server
paraloading. The maximum theoretical performance of paraloading performances. It is worthy of mention here
multi-server dynamic paraloading is if the aggregate down- that the download performances, particularly that of FTP,
load rate is equal to the sum of the individual server down- degrade during the morning and afternoon hours and the
load rates which is not the case here. This is due to the fact variation also of the download times increases during those
that given a sufficiently large number of servers a satura- hours. This applies mostly to FTP, with the two paraloading
tion point is reached beyond which the aggregate rate can schemes being affected to a lesser extent, thus concluding
increase no more no matter how many servers are added that the use of multiple connections also has a “smooth-
due to a bottleneck at the receiving client and/or along the ing” effect on download performance isolating to a certain
network path between client and mirror servers. This is extent the overall download performance from the perfor-
by the authors of this paper was presented and examined.
Dynamic parallel access downloading, based on the exper-
250 iments performed, proved that it performed very well in
terms of reducing download time even under varying traf-
200
loading time/ (sec)
fic/network conditions. Additionally, dynamic parallel ac-

150 cess possesses some advantages over the other two down-
loading methods briefly examined, such as being able to
100
adjust the number of mirror servers that are active thus re-
50 leasing server and network resources that are unnecessarily
utilized, and also improved server load balancing compared
0 to the other downloading schemes. It is strongly believed
1
11
13
15
17
19
21
23
that performance can be further improved with additional
hour of day (2:30AM - 2:30PM)
enhancements.
Multiserver Paraloading
Fastest Single Server Paraloading
Some possible enhancements to dynamic parallel ac-
Fastest FTP cess paraloading that should be examined as future work
are:
The ability to be able to add additional connections
between the client and a given server. More specif-
Figure 3. Download times of multi-server dynamic par-
ically, the ability to add a 2nd, 3rd etc. connection
aloading and best single-server multi-connection paraload-
between the client and the fastest servers rather than
ing and FTP performances.
just being able to add an additional mirror server.
Use pipelining of block download requests in order to
minimize the number of idle periods between block
mance of any individual server or other traffic/network con- downloads.
dition variations.
The development of a method to dynamically retrieve
the list of mirror servers, such as a directory service
3.1 Comparison to Semi-Dynamic Paraload- for example.
ing Determine the effects of paraloading in terms of net-
work congestion.
From the experimental results, presented and analyzed
in the previous section, it is apparent that the dynamic
paraloading scheme performs very well increasing down- References
load performance significantly when compared to slower
[1] J. Byers, M. Luby,and M. Mitzenmacher, “Accessing
servers.
multiple mirror sites in parallel: Using tornado codes
One aspect were dynamic paraloading has an advan- to speed up downloads,” IEEE INFOCOM, 1999.
tage over semi-dynamic, is the adjustable number of mir-
ror servers that are used at any given time. Instead of us- [2] J. Kangasharju, K.W. Ross, and J. W. Roberts, “Lo-
ing all the mirror servers that are available, dynamic par- cating copies of objects using the domain name
aloading can reduce the number of active servers thus re- system,” 4th International Caching Workshop, Mar.
leasing server and network resources that are unnecessarily 2000.
utilized.
Another advantage of dynamic paraloading is the bet- [3] A. Miu and E. Shih, Performance Analysis of a Dy-
ter server load balancing achieved compared to the other namic Parallel Downloading Scheme from Mirror
downloading schemes. While all downloading schemes Sites Throughout the Internet, Technical Report, Lab-
achieve server load balancing to a certain extent, by dis- oratory of Computer Science, MIT, 2000.
tributing the connections among all mirror servers, dy- [4] P. Rodriguez, A. Kirpal, and E. Biersack, “Parallel-
namic paraloading with its server downscaling feature re- access for mirror sites in the Internet,” IEEE INFO-
leases servers that are unnecessarily utilized (i.e. that offer COM, 2000.
very little to the aggregate download rate) and this would
include heavily loaded servers.
4. Conclusions and Future Work

In this paper a new parallel access downloading scheme
referred to as Dynamic Parallel Access scheme developed

Experimental Study of Parallel Downloading Schemes For Internet Mirror Sites

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Experimental Study of Parallel Downloading Schemes For Internet Mirror Sites

Caricato da

Copyright:

Formati disponibili

Experimental Study of Parallel Downloading Schemes

for Internet Mirror Sites

ABSTRACT cies and primary purposes. Mirror servers were tradition-

loading time/ (sec)

The tests (dynamic paraloading, single-server multi-

fic/network conditions. Additionally, dynamic parallel ac-

4. Conclusions and Future Work

Potrebbero piacerti anche