Socket Buffer Auto

SOCKET BUFFER AUTOSIZINGFORBIG DATA TRANSFERS
THROUGH PIPELINING, PARALLELISM AND

CONCURRENCY
CONTENT
Chapter no Title Page no
ABSTRACT
1 1.INTRODUCTION
1.1 PROBLEM STATEMENT

2 2.LITERATURE SURVEY
3 3.SYSTEM ANALYSIS
3.1 EXISITNG SYSTEMS
3.2 PROPOSED SYSTEM
4 PROPOSED ALGORITHM
5 SYSTEM REQUIREMENTS
HARDWARE SPECIFICATION
SOFTWARE SPECIFICATION
6 MODULE DESCRIPTION
7 SOFTWARE DESCRIPTION
8 RESULT AND DISCUSSION
CONCLUSION
9
REFERENCES
ABSTRACT:
In continuous data transfers, there are a number of factors affecting the data transfer
throughput, such as the network uniqueness (e.g., network bandwidth, round-trip-time,
background traffic); end-system characteristics (e.g., NIC capacity, number of CPU cores and
their clock rate, number of disk drives and their I/O rate); and the dataset characteristics (e.g.,
average file size, dataset size, file size distribution). Optimization of big data transfers over inter-
cloud and intra-cloud networks is a challenging task that requires joint-consideration of all of
these parameter. This optimization task becomes even more not easy when transferring datasets
comprise of various file sizes (i.e., large files and small files mixed). Previous AN
OPTIMIZATION structure FOR MOBILE DATA group IN ENERGY-HARVESTING
WIRELESS antenna work in this area only focuses on the end-system and network
characteristics however does not provide models regarding the dataset characteristics. In this
study, we analyze the effects of the three most important transfer parameter that are used to
enhance data transfer throughput: pipelining, parallelism and concurrency. We provide models
and guidelines to set the best values for these parameters and present two different transfer
optimization algorithms that use the models developed. The tests conduct over quick network
and cloud test beds show that our algorithms better the most popular data transfer tools like
Globus Online and UDT in majority of the cases.
CHAPTER 1
1. INTRODUCTION
As big-data processing and analysis dominates the usage of Cloud systems today,
the need for Cloud-hosted data scheduling and optimization services increases.
According to a recent study by Forrester Research, 77% of the 106 large IT
organizations operate three or more datacenters and run regular backup and replication
services among these sites. More than half of these organizations have over a petabyte
of data in their primary datacenter and expect their inter-data center throughput
requirements to double or triple over the next couple of years. Big Data is an emerging
hot topic as human beings are generating data in an explosive fashion. Big Data
possesses profound information of our society; therefore, it impacts numerous aspects
of human society, such as government, finance, security, and climate and so on. Big
Data is usually so large and complex, which is far beyond the capacity of the existing
database management tools or traditional data processing applications. Currently, most
of the work on Big Data is focusing on business, application and information processing
level, such as data mining and analysis. However, Big Data definitely and desperately
desires the support from networking aspect, especially when real time or near real time
applications are demanded. For example, the storage of Big Data must depend on
distributed systems and mechanisms. Therefore, it involves issues of network
performance, structure, security, privacy and so on.
1.1 APPLICATION LEVEL OPTIMIZATION
Application optimization is the process of modifying a software system to make

some aspect of it work more efficiently or use fewer resources. In general, An
application may be optimized so that it executes more rapidly, or is capable of operating
with less memory storage or other resources, or draw less power. Optimization can
occur at a number of levels. Typically the higher levels have greater impact, and are
harder to change later on in a project, requiring significant changes or a complete rewrite
if they need to be changed. Thus optimization can typically proceed via refinement from
higher to lower, with initial gains being larger and achieved with less work, and later
gains being smaller and requiring more work. However, in some cases overall
performance depends on performance of very low-level portions of a program, and
small changes at a late stage or early consideration of low-level details can have
outsized impact. Typically some consideration is given to efficiency throughout a project
– though this varies significantly – but major optimization is often considered a
refinement to be done late, if ever. On longer-running projects there are typically cycles
of optimization, where improving one area reveals limitations in another, and these are
typically curtailed when performance is acceptable or gains become too small or costly.
In some cases, however, optimization relies on using more elaborate

algorithms, making use of "special cases" and special "tricks" and performing complex
trade-offs. A "fully optimized" program might be more difficult to comprehend and hence
may contain more faults than unoptimized versions. Beyond eliminating obvious
antipatterns, some code level optimizations decrease maintainability. Optimization will
generally focus on improving just one or two aspects of performance: execution time,
memory usage, disk space, bandwidth, power consumption or some other resource. This
will usually require a trade-off — where one factor is optimized at the expense of
others. For example, increasing the size of cache improves runtime performance, but also
increases the memory consumption. Other common trade-offs include code clarity and
conciseness.
1.2 NETWORKING
Computer networks are no longer relegated to allowing a group of computers to access

a common set of files stored on a computer designated as a file server. Instead, with the
building of high-speed, highly redundant networks, network architects are seeing the
wisdom of placing a variety of traffic types on a single network. Examples include voice
and video, in addition to data. At its essence, a network’s purpose is to make
connections. These connections might be between a PC and a printer or between a
laptop and the Internet, as just a couple of examples. However, the true value of a
network comes from the traffic flowing over those connections. Consider a sampling of
applications that can travel over a network’s connections:
 File sharing between two computers

 Video chatting between computers located in different parts of the world
 Surfing the web (for example, to use social media sites, watch streaming video,
listen to an Internet radio station, or do research for a school term paper)
 Instant messaging (IM) between computers with IM software installed
 E-mail
 Voice over IP (VoIP), to replace traditional telephony systems
A term commonly given to a network transporting multiple types of traffic (for example,
voice, video, and data) is a converged network. A converged network might offer
significant cost savings to organizations that previously supported separate network
infrastructures for voice, data, and video traffic. This convergence can also potentially
reduce staffing costs, because only a single network needs to be maintained, rather than
separate networks for separate traffic types.
1.2.1 NETWORKING GLOSSARY
The following terminologies must be clearly understood for complete understanding of

the explained concept.
Transmission Control Protocol (TCP) :A connection-oriented transport protocol.

Connection-oriented transport protocols provide reliable transport, in that if a segment
is dropped, the sender can detect that drop and retransmit that dropped segment.
Specifically, a receiver acknowledges segments that it receives. Based on those
acknowledgments, a sender can determine which segments were successfully received.
User Datagram Protocol (UDP) : A connectionless transport protocol. Connectionless

transport protocols provide unreliable transport, in that if a segment is dropped, the
sender is unaware of the drop, and no retransmission occurs.
Unicast : A unicast communication flow is a one-to-one flow.
Multicast: A multicast communication flow is a one-to-many flow.

Broadcast: A broadcast communication flow is one-to-all flow.
Bandwidth: Bandwidth is the raw capability of a communication channel to move data

through that channel. Typically measured in bits or bytes per second or Hertz.
Throughput: Throughput is the total capability of a processing system to move product

through that system.
Round Trip Time (RTT) :Round trip time is the length of time it takes for a signal to be
sent plus the length of time it takes for an acknowledgment of that signal to be received.
Background traffic: Millions of corporate workers sit down everyday at work and send
e-mail, browse the web, transfer files, etc. They log into domains, send instant
messages, and remotely administer networking equipment. Some have VoIP at work and
make their phone calls over the Internet. All this data and more proceeds through
various network checkpoints while en route to its destination. One of the checkpoints is
likely to be a Network Intrusion Prevention System (IPS), since a Network IPS sits inline
in corporate networks inspecting traffic for various attacks aimed at remotely-
exploitable vulnerabilities. All the data being passed back and forth through the Network
IPS that is free of attacks is considered “background traffic.”
Network Interface Card (NIC) :A network interface card (NIC) is a circuit board or card
that is installed in a computer so that it can be connected to a network. A network
interface card provides the computer with a dedicated, full-time connection to a
network.
CPU core : A core is usually the basic computation unit of the CPU - it can run a single
program context (or multiple ones if it supports hardware threads such as
hyperthreading on Intel CPUs), maintaining the correct program state, registers, and
correct execution order, and performing the operations through Arithmetic and Logical
Units. For optimization purposes, a core can also hold on-core caches with copies of
frequently used memory chunks.
Cloud Networking: Cloud networking is related the concept of cloud computing, in
which centralized computing resources are shared for customers or clients. In cloud
networking, the network can be shared as well as the computing resources. It has
spurred a trend of pushing more network management functions into the cloud, so that
fewer customer devices are needed to manage the network..
Delay: The delay of a network specifies how long it takes for a bit of data to travel
across the network from one node or endpoint to another. It is typically measured in
multiples or fractions of seconds. Delay may differ slightly, depending on the location of
the specific pair of communicating nodes.
Network Latency: Network latency in a packet-switched network is measured either

one-way (the time from the source sending a packet to the destination receiving it), or
round-trip delay time (the one-way latency from source to destination plus the one-way
latency from the destination back to the source). Round-trip latency is more often
quoted, because it can be measured from a single point. Note that round trip latency
excludes the amount of time that a destination system spends processing the packet.
Jitter : Jitter is the variation in latency as measured in the variability over time of the
packet latency across a network. A network with constant latency has no variation (or
jitter). Packet jitter is expressed as an average of the deviation from the network mean
latency. However, for this use, the term is imprecise. The standards-based term is
"packet delay variation" (PDV). PDV is an important quality of service factor in
assessment of network performance.
BDP : Bandwidth-delay product refers to the product of a data link's capacity (in bits
per second) and its round-trip delay time (in seconds). The result, an amount of data
measured in bits (or bytes), is equivalent to the maximum amount of data on the
network circuit at any given time, i.e., data that has been transmitted but not yet
acknowledged.
Pipelining :Pipelining refers to sending multiple transfer requests over a single data
channel without waiting for the “transfer complete” acknowledgement in order to
minimize the delay between individual transfers.
Parallelism :Parallelism refers to sending different chunks of the same file through
different data channels at the same time.
Concurrency :Concurrency refers to sending different files through different data

channels at the same time.
1.3 BIG DATA NETWORKING
Big data applications do, in fact, deal with large volumes of information that are
made even bigger as data is replicated across racks for resiliency. Yet the most
meaningful attribute of big data is not its size, but its ability to break larger jobs into lots
of smaller ones, distributing resources to work in parallel on a single task.
1.3.1 Network resiliency and big data applications
When you have a set of distributed resources that must coordinate through an
interconnection, availability is crucial. If the network is unavailable, the result is a
discontiguouscollection of stranded compute resources and data sets. Appropriately, the
primary focus for most network architects and engineers is uptime. But the sources of
downtime in networks are varied. They include everything from device failures (both
hardware and software) to maintenance windows, to human error. Downtime is
unavoidable. While it is important to build a highly available network, designing for
perfect availability is impossible. Rather than making downtime avoidance the objective,
network architects should design networks that are resilient to failures. Resilience in
networks is determined by path diversity (having more than one way to get between
resources) and failover (being able to identify issues quickly and fail over to other
paths). The real design criteria for big data networks ought to explicitly include these
characteristics alongside more traditional mean time between failures, or MTBF,
methods.
1.3.2 Network partitioning to handle big data
Network partitioning is crucial in setting up big data environments. In its

simplest form, partitioning can mean the separation of big data traffic from residual
network traffic so that bursty demands from applications do not impact other mission-
critical workloads. Beyond that, there is a need to handle multiple tenants running
multiple jobs for performance, compliance and/or auditing reasons. Doing this requires
networks to keep workloads logically separate in some cases and physically separate in
others. Architects need to plan for both, though initial requirements might favor just one.
1.4 FILE TRANSFERS
File transfer plays a very important role in the cloud environment as

the transfer is expected to have a good quality. Moreover, efficient and flexible file
transfer with reliability has an important role in guaranteeing a good quality of service
for users. In the recent times, the data produced by large scale applications are higher
and hence needed to be stored in cloud that are reliable and efficient. In such a case,
when data is being transferred, it is necessary to know that throughput increases.
When data transfer takes place between systems, it is in practice to check for the
system level requirements. In networks, although there are protocols being developed,
a protocol becomes inefficient because of the end system characteristics thereby
resulting in underutilization of the protocol. Hence, we make a joint consideration of the
parameters, such as NIC capacity, memory, background to be larger or smaller and
decide the technique to be traffic etc.
1.5 DATA CENTER
Data centers are simply centralized locations where computing and networking
equipment is concentrated for the purpose of collecting, storing, processing, distributing
or allowing access to large amounts of data. They have existed in one form or another
since the advent of computers. In the days of the room-sized behemoths that were our
early computers, a data center might have had one supercomputer. As equipment got
smaller and cheaper, and data processing needs began to increase and they have
increased exponentially -- we started networking multiple servers (the industrial
counterparts to our home computers) together to increase processing power. We
connect them to communication networks so that people can access them, or the
information on them, remotely. Large numbers of these clustered servers and related
equipment can be housed in a room, an entire building or groups of buildings. Today's
data center is likely to have thousands of very powerful and very small servers running
24/7. Because of their high concentrations of servers, often stacked in racks that are
placed in rows, data centers are sometimes referred to a server farms. They provide
important services such as data storage, backup and recovery, data management and
networking. These centers can store and serve up Web sites, run e-mail and instant
messaging (IM) services, provide cloud storage and applications, enable e-commerce
transactions, power online gaming communities and do a host of other things that
require the wholesale crunching of zeroes and ones.
The configuration of servers, the network topology and the supporting equipment can
vary greatly depending upon the company, purpose, location, growth rate and initial
design concept of the data center. Its layout can greatly affect the efficiency of data flow
and the environmental conditions within the center. Some sites might divide their servers
into groups by functions, such as separating web servers, application servers and
database servers, and some might have each of its servers performing multiple duties.
There are no hard and fast rules, and there aren't many official standards. Or course,
some groups are trying to create guidelines. The Telecommunication Industry
Association developed a data center tier classification standard in 2005 called the TIA-
942 project, which identified four categories of data center, rated by metrics like
redundancy and level of fault tolerance. These include:
Tier 1 - Basic site infrastructure with a single distribution path that has no built-in
redundancy.
Tier 2 - Redundant site infrastructure with a single distribution path that includes
redundant components.
Tier 3 - Concurrently maintainable site infrastructure that has multiple paths, only one of
which is active at a time.
Tier 4 - Fault tolerant site infrastructure that has multiple active distribution paths
for lots of redundancy.
1.6 SCOPE OF THE WORK:
In this study, we analyze the effects of the three most important transfer parameter that
are used to enhance data transfer throughput: pipelining, parallelism and concurrency. We
provide models and guidelines to set the best values for these parameters and present two
different transfers optimization algorithms that use the models developed. An auto-tuning
technique that is based on active bandwidth estimation is the Work around Daemon (WAD).
WAD uses ping to measure the minimum RTT prior to the start of a TCP connection, and pipe
char to estimate the capacity of the path. A similar approach is taken by the NLANR Auto-
Tuning FTP implementation. Similar socket buffer sizing guidelines are given in and. The first
proposal for automatic TCP buffer tuning was. The goal of that work was to allow a host
(typically a server) to fairly share kernel memory between multiple ongoing connections. The
proposed mechanism, even though simple to implement, requires changes in the operating
system. An important point about is that the BDP of a path was estimated based on the
congestion window (cwnd) of the TCP connection. The receive socket buffer size was set to a
sufficiently large value so that it does not limit the transfer’s throughput.
1.7 PROBLEM DEFINITION
In huge network Infrastructures like data centers, when a data transfer ( can be a
migration or a normal communication transfer) is made, there are several factors affecting the
data transfer throughput, such as the network characteristics (e.g. network bandwidth, round-trip-
time, background traffic); end-system characteristics (e.g. NIC capacity, number of CPU cores
and their clock rate, number of disk drives and their I/O rate); and the dataset characteristics (e.g.
average file size, data set size, file size distribution). Optimization of big data transfers over
inter-cloud and intra-cloud networks is a challenging task that requires joint-consideration of all
of these parameters. This optimization task becomes even more challenging when transferring
datasets comprised of heterogeneous file sizes (i.e. large files and small files mixed). we analyze
the effects of the three most important transfer parameters that are used to enhance data transfer
throughput: pipelining, parallelism and concurrency.
CHAPTER 2
LITERATURE SURVEY:
AUTHOR NAME TITLE YEAR
E Yildirim Gridftp pipelining 2012
RS Prasad Socket buffer auto-sizing for high-performance data 2004
transfers
G Hasegawa Scalable socket buffer tuning for high-performance 2001

web servers
E César Dynamic performance tuning supported by program 2002

specification
Tevfik On parameter tuning of data transfer protocol gridftp 2012
for wide-area networks
KM Choi 2005
Efficient Resource Management Scheme of TCP
Buffer Tuned Parallel Stream to Optimize System
Performance
Y Zhao Devising a cloud scientific workflow platform for big 2014
data
TJ Hacker The end-to-end performance effects of parallel tcp 2016
sockets on a lossy wide-area network
D Lu Modeling and taming parallel tcp on the wide area 2012
network
E Yildirim Prediction of optimal parallelism level in wide area 2011
data transfers
2. 1. GRIDFTP PIPELINING
GridFTP is an exceptionally fast transfer protocol for large volumes of data.

Implementations of it are widely deployed and used on well-connected Grid environments
such as those of the TeraGrid because of its ability to scale to network speeds. However,
when the data is partitioned into many small files instead of few large files, it suffers from
lower transfer rates. The latency between the serialized transfer requests of each file directly
detracts from the amount of time data pathways are active, thus lowering achieved
throughput. Further, when a data pathway is inactive, the TCP window closes, and TCP must
go through the slow-start algorithm. The performance penalty can be severe. This situation is
known as the “lots of small files” problem. In this paper we introduce a solution to this
problem. This solution, called pipelining, allows many transfer requests to be sent to the
server before any one completes. Thus, pipelining hides the latency of each transfer request
by sending the requests while a data transfer is in progress. We present an implementation
and performance study of the pipelining solution.We have described a new open source
implementation of the GridFTP protocol. In designing this system, we set out to create a
robust, performant, and modular data transfer framework for use in a variety of data-
intensive tools and applications. The resulting Globus GridFTP system integrates a variety of
techniques, including a modular protocol processing pipeline and parallel I/O, to meet its
design goals in a way that no other system has done before. We have tested our system
thoroughly, as have early adopters. Performance is excellent in all situations studied,
comparing favorably with that of other FTP servers for single-stream transfers and doing far
better when striping is used. Performance with other network protocols, data transforms, and
storage systems remains to be studied. Our system’s modular structure has allowed its use in
many different contexts. We give four examples. The “TeraGrid Copy” (tgcp) program
automatically selects appropriate parallelism and window size parameters to maximize
performance on the TeraGrid network. The GT4 GRAM execution management service uses
our mechanisms for data staging and streaming. The NeST storage appliance and the Earth
System Grid’s OPeNDAP-G system use our libraries for data transport. We have many ideas
for further research and development. As indicated earlier, successful completion of an end-
to-end transfer may involve intermediate staging of data products , negotiation with firewalls,
use of alternative network protocols, and/or reservation of network or storage resources.
Some such functions may appropriately be placed within, or require support from, our
libraries. We also hope to exploit emerging Web services specifications to define more
powerful and standards-based control interfaces, and to implement proposed GridFTP
protocol improvements.
GridFTP is a well-known and robust protocol for fast data transfer on the Grid. Given
resources, the GridFTP implementation provided by the Globus Toolkit can scale to network
speeds and has been shown to deliver 27 Gb/s on 30 Gb/s links. The protocol is optimized to
transfer large volumes of data commonly found in Grid applications. Datasets of sizes from
hundreds of megabytes to terabytes and beyond can be transferred at close to network speeds
by using GridFTP. Given the high-speed networks commonly found in modern Grid
environments, datasets less than 100 MB are too small for the underlying protocols like TCP
to utilize the maximum capacity of the network. Therefore, GridFTP – and most bulk data
transfer protocols – experiences the highest levels of throughput when transferring large
volumes of data. Unfortunately, conventional implementations of GridFTP have a limitation
as to how the data must be partitioned to reach these high-throughput levels. Not only must
the amount of data to transfer be large enough to allow TCP to reach full throttle, but the data
must also be in large files, ideally in one single file. If the dataset is large but partitioned into
many small files (on gigabit networks we consider any file smaller than 100 MB as a small
file), the performance of GridFTP servers suffers drastically This problem is known as the
“lots of small files” (LOSF) problem. In this paper we study the LOSF problem and present a
solution known as pipelining. We have implemented pipelining in the Globus Toolkit,and we
present here a performance evaluation of that implementation. The rest of this paper is as
follows. After discussing related work in Section 2, we provide details in Section 3 about the
LOSF problem. In Section 4, we describe our pipelining solution, and in Section 5 we discuss
the implementation of the proposed solution. In Section 6, we present experimental results.
We conclude in Section 7 with a brief discussion of future work.
2.2 SOCKET BUFFER AUTO-SIZING FOR HIGH-PERFORMANCE
DATA TRANSFERS
It is often claimed that TCP is not a suitable transport protocol for data intensive Grid
applications in high-performance networks. We argue that this is not necessarily the case.
Without changing the TCP protocol, congestion control, or implementation, we show that an
appropriately tuned TCP bulk transfer can saturate the available bandwidth of a network
path. The proposed technique, called SOBAS, is based on automatic socket buffer sizing at
the application layer. In non-congested paths, SOBAS limits the socket buffer size based on
direct measurements of the received throughput and of the corresponding round-trip time.
The key idea is that the send window should be limited, after the transfer has saturated the
available bandwidth in the path, so that the transfer does not cause buffer overflows (“self-
induced losses”). A difference with other socket buffer sizing schemes is that SOBAS does
not require prior knowledge of the path characteristics, and it can be performed while the
transfer is in progress. Experimental results in several high bandwidth-delay product paths
show that SOBAS provides consistently a significant throughput increase (20% to 80%)
compared to TCP transfers that use the maximum possible socket buffer size. We expect that
SOBAS will be mostly useful for applications such as GridFTP in no congested wide-area
networks.Common socket buffer sizing practices, such as setting the socket buffer size to the
default or maximum value, can lead to poor throughput. We developed SOBAS, an
application layer mechanism that automatically sets the socket buffer size while the transfer
is in progress, without prior knowledge of any path characteristics. SOBAS manages to
saturate the available bandwidth in the network path, without saturating the tight link buffer
in the path. SOBAS can be integrated with bulk transfer applications, such as GridFTP,
providing significantly better performance in non-congested wide-area network paths. We
plan to integrate SOBAS with popular Grid data transfer applications in the future.
The emergence of the Grid computing paradigm raises new interest in the end-to-end
performance of data intensive applications. In particular, the scientific community pushes the
edge of network performance with applications such as distributed simulation, remote
colaboratories, and frequent multigigabyte transfers. Typically, such applications run over
well provisioned networks (Internet2, ESnet, GEANT, etc) built with high bandwidth links
(OC-12 or higher) that are lightly loaded for most of the time. Additionally, through the
deployment of Gigabit and 10-Gigabit Ethernet interfaces, congestion also becomes rare at
network edges and end-hosts. With all this bandwidth, it is not surprising that Grid users
expect superb end-to-end performance. However, this is not always the case. A recent
measurement study at Internet2 showed that 90% of the bulk TCP transfers (i.e., more than
10MB) receive less than 5Mbps. It is widely believed that a major reason for the relatively
low end-to-end throughput is TCP. This is either due to TCP itself (e.g., congestion control
algorithms and parameters), or because of local system configuration (e.g., default or
maximum socket buffer size). TCP is blamed that it is slow in capturing the available
bandwidth of high performance networks, mostly because of two reasons: 1. Small socket
buffers at the end-hosts limit the effective window of the transfer, and thus the maximum
throughput. 2. Packet losses cause large window reductions, with a subsequent slow (linear)
window increase rate, reducing the transfer’s average throughput. Other TCP-related issues
that impede performance are multiple packet losses at the end of slow start (commonly
resulting in timeouts), the inability to distinguish between congestive and random packet
losses, the use of small segments, or the initial thresh value. Researchers have focused on
these problems, pursuing mostly three approaches: TCP modifications parallel TCP transfers
and automatic buffer sizing .Changes in TCP or new congestion control schemes, possibly
with cooperation from routers can lead to significant benefits for both applications and
networks. However, modifying TCP has proven to be quite difficult in the last few years.
Parallel TCP connections can increase the aggregate throughput that an application receives.
This technique raises fairness issues, however, because an aggregate of connections
decreases its aggregate window by a factor, rather than, upon a packet loss. Also, the
aggregate window increase rate is times faster than that of a single connection. Finally,
techniques that automatically adjust the socket buffer size can be performed at the
application-layer, and so they do not require changes at the TCP implementation or protocol.
In this work, we adopt the automatic socket buffer sizing approach. How is the socket buffer
size related to the throughput of a TCP connection? The send and receive socket buffers 2
should be sufficiently large so that the transfer can saturate the underlying network path.
Specifically, suppose that the bottleneck link of a path has a transmission capacity of bps and
the path between the sender and the receiver has a Round-Trip Time (RTT) of sec. When
there is no competing traffic, the connection will be able to saturate the path if its send
window is , i.e., the well known Bandwidth Delay Product (BDP) of the path. For the
window to be this large, however, TCP’s flow control requires that the smaller of the two
socket buffers (send and receive) should be equally large. If the size of the smaller socket
buffer is less than, the connection will underutilize the path. If is larger than, the connection
will overload the path. In that case, depending on the amount of buffering in the bottleneck
link, the transfer may cause buffer overflows, window reductions, and throughput drops.
2.3 SCALABLE SOCKET BUFFER TUNING FOR HIGH-

PERFORMANCE WEB SERVERS
Although many research efforts have been devoted to network congestion in the face of an
increase in the Internet traffic, there is little recent discussion on performance improvements for
end hosts. In this paper, we propose a new architecture, called Scalable Socket Buffer Tuning
(SSBT), to provide high-performance and fair service for many TCP connections at Internet end
hosts. SSBT has two major features. One is to reduce the number of memory accesses at the
sender host by using some new system calls, called Simple Memory copy Reduction (SMR)
scheme. The other is Equation-based Automatic TCP Buffer Tuning (E-ATBT), where the
sender host estimates ‘expected’ throughput of the TCP connections through a simple
mathematical equation, and assigns a send socket buffer to them according to the estimated
throughput. If the socket buffer is short, max-min fairness policy is used. We confirm the
effectiveness of our proposed algorithm through both simulation technique and an experimental
system. From the experimental results, we have found that our SSBT can achieve up to a 30%
gain for Web server throughput, and a fair and effective usage of the sender socket buffer can be
achieved.In this paper, we have proposed SSBT (Scalable Socket Buffer Tuning), a novel
architecture for effectively and fairly utilizing the send socket buffer of the busy Internet server.
SSBT consists of the two algorithms, the E-ATBT and SMR schemes. The E-ATBT algorithm
assigns the send socket buffer to the connections according to the estimated throughput of
connections for fair and effective usage of the socket buffer. The SMR scheme can reduce the
number of memory-copy operations when the data packet is sent by TCP, and improve the
overall performance of the server host. We have confirmed the effectiveness of the SSBT
algorithm through the implementation experiments, and have shown that SSBT can improve the
overall performance of the server, and provide the fair assignment of the send socket buffer to
the heterogeneous TCP connections.
Short TCP connections are becoming widespread. While large content transfers (e.g., high-
resolution videos) consume the most bandwidth, short “transactions” 1 dominate the number of
TCP flows. In a large cellular network, for example, over 90% of TCP flows are smaller than 32
KB and more than half are less than 4 KB . Scaling the processing speed of these short
connections is important not only for popular user-facing online services that process small
messages. It is also critical for backend systems (e.g., memcachedclusters ) and middleboxes
(e.g., SSL proxies and redundancy elimination ) that must process TCP connections at high
speed. Despite recent advances in software packet processing , supporting high TCP transaction
rates remains very challenging. For example, Linux TCP transaction rates peak at about 0.3
million transactions per second (shown in Section 5), whereas packet I/O can scale up to tens of
millions packets per second. Prior studies attribute the inefficiency to either the high system call
overhead of the operating system or inefficient implementations that cause resource contention
on multicore systems . The former approach drastically changes the I/O abstraction (e.g., socket
API) to amortize the cost of system calls. The practical limitation of such an approach, however,
is that it requires significant modifications within the kernel and forces existing applications to
be re-written. The latter one typically makes incremental changes in existing implementations
and, thus, falls short in fully addressing the inefficiencies. In this paper, we explore an alternative
approach that delivers high performance without requiring drastic changes to the existing code
base. In particular, we take a cleanslate approach to assess the performance of an untethered
design that divorces the limitation of the kernel implementation. To this end, we build a user-
level TCP stack from the ground up by leveraging high-performance packet I/O libraries that
allow applications to directly access the packets. Our user-level stack, mTCP, is designed for
three explicit goals: 1. Multicore scalability of the TCP stack. 2. Ease of use (i.e., application
portability to mTCP). 3. Ease of deployment (i.e., no kernel modifications). Implementing TCP
in the user level provides many opportunities. In particular, it can eliminate the expensive system
call overhead by translating syscalls into inter-process communication (IPC). However, it also
introduces fundamental challenges that must be addressed— processing IPC messages, including
shared memory messages, involve context-switches that are typically much more expensive than
the system calls themselves. Our key approach is to amortize the context-switch overhead over a
batch of packet-level and socket-level events. While packet-level batching and system-call
batching (including socket-level events) have been explored individually, integrating the two
requires a careful design of the networking stack that translates packet-level events to socket-
level events and vice-versa. This paper makes two key contributions: First, we demonstrate that
significant performance gain can be obtained by integrating packet- and socketlevel batching. In
addition, we incorporate all known optimizations, such as per-core listen sockets and load
balancing of concurrent flows on multicore CPUs with receive-side scaling (RSS). The resulting
TCP stack outperforms Linux and MegaPipe by up to 25x (w/o SO_REUSEPORT) and 3x,
respectively, in handling TCP transactions. This directly translates to application performance;
mTCP increases existing applications’ performance by 33% (SSLShader) to 320% (lighttpd).
Second, unlike other designs , we show that such integration can be done purely at the user level
in a way that ensures ease of porting without requiring significant modifications to the kernel.
mTCP provides BSD-like socket and epoll-like event-driven interfaces. Migrating existing
event-driven applications is easy since one simply needs to replace the socket calls to their
counterparts in mTCP (e.g., accept()becomes mtcp_accept()) and use the per-core listen socket.
2.4DYNAMIC PERFORMANCE TUNING SUPPORTED BY PROGRAM

SPECIFICATION
Performance analysis and tuning of parallel/distributed applications are very difficult

tasks for non-expert programmers. It is necessary to provide tools that automatically carry out
these tasks. These can be static tools that carry out the analysis on a post-mortem phase or can
tune the application on the fly. Both kinds of tools have their target applications. Static automatic
analysis tools are suitable for stable application while dynamic tuning tools are more appropriate
to applications with dynamic behavior. In this paper, we describe KappaPi as an example of a
static automatic performance analysis tool, and also a general environment based on parallel
patterns for developing and dynamically tuning parallel/distributed applications.We have
presented two kinds of tools for automatic performance analysis. KappaPi is a knowledge-based
static automatic performance analysis tool that analyses trace file looking for bottlenecks and
provides certain hints to users. Users can take advantage of these hints in order to modify the
application to improve performance. The second approach to parallel/distributed performance
analysis and tuning includes a patternbased application design tool and a dynamic performance
tuning tool. The sets of patterns included in the pattern-based application design tool have been
selected to cover a wide range of applications. They offer well-defined behaviour, and the
bottlenecks that can occur are also very well determined. In this sense, the both analysis of the
application and performancetuning on the fly can be carried out successfully. Using this
environment, the programmers can design its application in a fairly simple way, and then have
no need to concern themselves about any performance analysis or tuning, as dynamic
performance tuning automatically takes care of these tasks.
The main goal of parallel and distributed computing is to obtain the highest performance
in a due environment. Designers of parallel applications are responsible for providing the best
possible behaviour on the target system. To reach this goal it is necessary to carry out a tuning
process of the application through a performance analysis and the modification of critical
application/system parameters. This tuning process implies the monitoring of application
execution in order to collect the relevant related information, then the analysis of this
information to find the performance bottlenecks and determination of the actions to be taken to
eliminate these bottlenecks. The classical way of carrying out this process has been to use a
monitoring tool that collects the information generated during the execution and use a
visualisation tool to present users with the information in a more comprehensive way that tries to
help in the performance analysis. This tool helps users in the collection of information and the
presentation, but obliges them to carry out the performance analysis on their own. Therefore, this
process requires a high degree of expertise detecting the performance bottlenecks and, moreover,
in relating them to the source code of the application or to the system components. To complete
the tuning cycle, it is necessary to modify the application code or the system parameters in order
to improve application performance. Consequently, the participation of users in the whole
process is very significant. Many tools have been designed and developed to support this
approach. However, the requirements asked of users with respect to the degree of expertise and
the time consumed in this process, have not facilitated widespread use of such tools in real
applications. To overcome these difficulties, it is very important to offer users a new generation
of tools that guide them in the tuning process, avoiding the degree of expertise required by the
visualisation tools. This new generation of tools must introduce certain automatic features that
help users and guide them in the tuning process or even carry out certain steps automatically in
such a way that user participation can be reduced or even avoided. In this sense, two approaches
can be distinguished: the static and the dynamic.
2.5 ON PARAMETER TUNING OF DATA TRANSFER PROTOCOL

GRIDFTP FOR WIDE-AREA NETWORKS
In wide-area Grid computing, geographically distributed computational resources are

connected for enabling efficient and large-scale scientific/engineering computations. In the wide-
area Grid computing, a data transfer protocol called GridFTP has been commonly used for large
file transfers. GridFTP has the following features for solving problems of the existing TCP. First,
for accelerating the start-up in TCP’s slow start phase and achieving high throughput in TCP’s
congestion avoidance phase, multiple TCP connections can be established in parallel. Second,
according to the bandwidth-delay product of a network, the TCP socket buffer size can be
negotiated between GridFTP server and client. However, in the literature, sufficient investigation
has not been performed either on the optimal number of TCP connections or the optimal TCP
socket buffer size. In this paper, we therefore quantitatively investigate the optimal parameter
configuration of GridFTP in terms of the number of TCP connections and the TCP socket buffer
size. We first derive performance metrics of GridFTP in steady state (i.e., good put and packet
loss probability). We then derive the optimal parameter configuration for GridFTP and
quantitatively show performance limitations of GridFTP through several numerical examples.
We also demonstrate validity of our approximate analysis by comparing simulation results with
analytic ones.The techniques described in this paper and implemented in both GridFTP and the
DPSS will be needed to realize the potential of next generation high bandwidth networks.
However, use of these techniques still requires extra effort and knowledge usually not available
to the application programmer. We feel that the example implementations here show not only
how to use these techniques, but also how these techniques can be accessed in a fashion that is
not much different then that of a local standard file access, while at the same time taking full
advantage of a high speed wide area network. The basic functionality of GridFTP is currently in
place. The code is in late alpha testing and should be going to beta soon. When released it will be
available under the Globus public license at http://www.globus.org. As a result of our
experiences at SC 2000 we have already made 2 small, but important improvements to our
current implementation. We have added 64 bit file support for larger than 2 GB files, and we
have added data channel caching. The data channel caching will be particularly useful since it
will avoid the overhead of setup and tear down of the sockets, which can be significant,
particularly when authentication is enabled on the data channels. We are also going to explore
the possibility of implementing our striped server on top of a parallel virtual file system.
Large-scale network-based applications spaning multiple sites over wide area networks heavily
depend on the underlying data transfer protocol for their data handling, and their end-to-end
performance may suffer significantly if the underlying protocol does not use the available
bandwidth effectively. Most of the widely used transfer protocols are based on TCP, which may
sacrifice performance for the sake of fairness. There has been considerable research on
enhancing TCP as well as tuning its parameters for improved performance . In the application
layer, opening parallel streams and tuning the buffer size could improve the bottlenecks of TCP
performance. Parallel streams achieve high throughput by mimicking the behavior of individual
streams and get an unfair share of the available bandwidth. On the other hand, using too many
simultaneous connections reaches the network on a congestion point and after that threshold, the
achievable throughput starts to drop down. Unfortunately it is difficult to predict the point of
congestion and is variable over some parameters which are unique in both time and domain.
There are a few studies that try to find the optimal number of streams and they are mostly based
on approximate theoretical models. They all have specific constraints and assumptions and
cannot predict the complex behavior of throughput in existence of congestion. Also the
correctness of the proposed models is mostly proved with simulation results only. Hacker et al.
claim that the total number of streams behaves like one giant stream that transfers in capacity of
total of each streams’ achievable throughput. However, this model only works for uncongested
networks. Thus, it cannot provide a feasible solution for congested networks. Another study
declares the same theory but develops a protocol which at the same time provides fairness. Dinda
et al. model the bandwidth of multiple streams as a partial second order equation and require two
different throughput measurement of different stream numbers to predict the others.
2.6 Efficient Resource Management Scheme of TCP Buffer Tuned Parallel

Stream to Optimize System Performance
GridFTP is a high-performance, secure and reliable parallel data transfer protocol, used for
transferring widely distributed data. Currently it allows users to configure the number of
parallel streams and socket buffer size. However, the tuning procedure for its optimal
combinations is a time consuming task. The socket handlers and buffers are important system
resources and must therefore be carefully managed. In this paper, an efficient resource
management scheme which predicts optimal combinations based on a simple regression
equation is proposed. In addition, the equation is verified by comparing measured and
predicted values and we apply the equation to an actual experiment on the KOREN. The result
demonstrates that the equation predicts excellently with only 8% error boundary. This
approach eliminates the time wasted tuning procedure. These results can be utilized directly
and widely for the fast decision in typical applications such as GridFTP. The goal of this
performance evaluation is a comparison of a MANET between AODV, DSR and AFR routing
protocols. AODV in our simulation experiment shows to have the overall best performance. It
has an improvement of DSR and DSDV and has advantages of both of them. AFR performs
better at high speed high mobility and has a high throughput as compared to AODV and DSR. It
often serves as the underlying protocol for light on eight adaptive multicast algorithms. Whereas
DSR suits for network in which mobiles move at moderate speed. It has a significant overhead as
the packet size is large carrying full routing information. Table 2 shows a numerical comparison
of the four protocols, “1” for the best up to “4” for the worst.
Internet backbone speeds have increased considerably in the last few years due to projects
like Internet II and NGI. At the same time, projects like NTON and Superset are providing a
preview of the near future of wide area networks. Unfortunately, distributed applications often
do not take full advantage of these new high-speed networks. This is largely due to the fact that
the applications use the default parameters for TCP, which have been consciously designed to
sacrifice optimal throughput in exchange for fair sharing of bandwidth on congested networks. In
order to overcome this limitation, distributed applications running over high-speed wide-area
networks need to become “network-aware” , which means that they need to adjust their
networking parameters and resource demands to the current network conditions. There exists a
large body of work showing that good performance can be achieved using the proper tuning
techniques. The most important technique is the use of the optimal TCP buffer size, and
techniques for determining the optimal value for the TCP buffer size are described in. Another
important technique is to use parallel sockets, as described in. Using a combination of these
techniques, applications should be able to utilize all the available network bandwidth, which is
demonstrated in and. However, determining the correct tuning parameters can be quite difficult,
especially for users or developers who are not network experts. The optimal TCP buffer size and
number of parallel streams are different for every network path, vary over time, and vary
depending on the configuration of the end hosts. There are several tools that help determine these
values, such as iperf ,pchar, pipechar , netspec , and nettest, but none of these include a client
API, and all require some level of network expertise to use. Another tool is NWS , which
applications can use to determine upper bounds on throughput from the network, but it does not
tell the applications how to achieve that throughput. Other groups are addressing this problem at
the kernel level, such as the web100 project , Linux 2.4 , and others , as described below. Still
others are addressing this within the application. The autoftp file transfer service from NCSA
attempts to determine and set the optimal TCP buffer size for each connection.
2.7 DEVISING A CLOUD SCIENTIFIC WORKFLOW PLATFORM FOR

BIG DATA
Scientific workflow management systems (SWFMSs) are facing unprecedented

challenges from big data deluge. As revising all the existing workflow applications to fit into
Cloud computing paradigm is impractical, thus migrating SWFMSs into the Cloud to
leverage the functionalities of both Cloud computing and SWFMSs may provide a viable
approach to big data processing. In this paper, we first discuss the challenges for scientific
workflow applications and the available solutions in details, and analyze the essential
requirements for a scientific computing Cloud platform. Then we propose a service
framework to normalize the integration of SWFMS with Cloud computing. Meanwhile, we
also present our implementation experience based on the service Framework. At last, we set
up a series of experiments to demonstrate the capability of our implementation and use a
Montage Image Mosaic Workflow as a showcase of the implementation. Clouds became a
powerful platform for e-research as they enable scientists to have access to elastic, cost-
effective, and virtually infinite computing power. Because clouds provide their users the
view of infinite computing capacity, the real limitations on the scalability of the applications
lie in the available budget for cloud usage and limitations in the applications themselves.
Therefore, it is important that scientific application developers enable their applications to
get the most from the cloud. In this chapter, we discussed recent trends for execution of
workflows in clouds. The architecture we presented is composed of a platform layer and an
application layer. The platform layer enables operations such as dynamic resource
provisioning, autonomic scheduling of applications, fault tolerance, security, and privacy in
data access. The features enabled by this layer can be explored by virtually any application
that can be described as scientific workflow. In the application layer, we discussed a data
analytics application enabling simulation of the public transport system of Singapore and the
effect of abnormal events in the transport network. The application consists of an agent-based
simulation of the public transport system of Singapore, and it allows evaluation of effects of
incidents (such as train delays) in the flow of passengers in the country. Workflows are a
commonly used application model in computational science. They describe a series of
computations that enable the analysis of data in a structured and distributed manner and are
commonly expressed as a set of tasks and a set of dependencies between them. These
applications offer an efficient way of processing and extracting knowledge from the ever-
growing data produced by increasingly powerful tools such as telescopes, particle
accelerators, and gravitational wave detectors and have been successfully used to make
significant scientific advances in various fields such as biology, physics, medicine, and
astronomy . Scientific workflows are often data- and resource-intensive applications and
require a distributed platform in order for meaningful results to be obtained in a reasonable
amount of time. Their deployment is managed by Workflow Management Systems (WMS)
which are responsible for transparently orchestrating the execution of the workflow tasks in a
set of distributed compute resources while ensuring the dependencies are preserved. A high-
level overview of this process In general, WMSs provide essential functionality to enable the
execution of workflows such as data management and provenance, task scheduling, resource
provisioning, and fault tolerance among others. The latest distributed computing paradigm,
cloud computing, offers several advantages for the deployment of these applications. In
particular, Infrastructure as a Service (IaaS) clouds offer WMSs an easily accessible, flexible,
and scalable infrastructure by leasing virtualised compute resources, or Virtual Machines
(VMs). This allows workflows to be easily packaged and deployed and more importantly,
enables WMSs to access a virtually infinite pool of heterogeneous VMs that can be
elastically acquired and released and are charged on a pay-per-use basis. In this way, WMSs
can use cloud resources opportunistically based on the number and type of tasks that need to
be processed at a given point in time. This is a convenient feature as it is common for the task
parallelism of scientific workflows to significantly change throughout their execution. The
resource pool can be scaled out and in to adjust the number of resources as the execution of
the workflow progresses. This facilitates the fulfilment of the quality-of-service (QoS)
requirements by allowing WMS to fine-tune performance while ensuring the available
resources are efficiently used.
2.8 THE END-TO-END PERFORMANCE EFFECTS OF PARALLEL TCP

SOCKETS ON A LOSSY WIDE-AREA NETWORK
This paper examines the effects of using parallel TCP flows to improve end-to-end network
performance for distributed data intensive applications. A series of transmission experiments
were conducted over a widearea network to assess how parallel flows improve throughput,
and to understand the number of flows necessary to improve throughput while avoiding
congestion. An empirical throughput expression for parallel flows based on experimental
data is presented, and guidelines for the use of parallel flows are discussed.In this paper, we
demonstrated that our combined parallel TCP approach can effectively consume available
network bandwidth on an uncongested network. We also showed that our approach is fairer
to competing TCP streams than the unmodified parallel TCP method when the network is
congested, and that the effectiveness and fairness tradeoff can be adjusted by changing the
virtual RTT multiplier. We showed that our method exploits a feature of the TCP congestion
avoidance algorithm in which short RTT streams dominate long RTT streams. The
fundamental characteristics of network technology have changed since the congestion
avoidance algorithm was designed in 1988. The goals of fairly sharing bandwidth and
efficiently using network resources have not changed. We believe that new approaches to
congestion avoidance must consider fairness as well as effectiveness to preserve shared
public internetworks.
Network might be overburdened to the detriment of the transfer and other users.
Furthermore, the optimal level of usage for each technique varies depending on network and
end-system conditions, meaning no combination of parameters is optimal for every scenario.
Dynamic optimization techniques provide a method for determining which combination of
parameters is “just right” for a given transfer. This paper proposes optimization techniques
that try to maximize transfer throughput by choosing optimal parallelism, concurrency, and
pipelining levels through file set analysis and clustering. Our algorithms also re-provision
idle control channels dynamically to improve the performance of “slower” file clusters,
ensuring that resources are effectively utilized. In this paper, we present four application-
level algorithms for heuristically tuning protocol parameters for data transfers in wide-area
networks. Our algorithms can tune the number of parallel data streams per file (for large file
optimization), the level of control and data channel pipelining (for small file optimization),
and the number of concurrent file transfers to fill network pipes (a technique useful for all
types of files) in an efficient manner. The developed algorithms are implemented as a
standalone service as well as being used in interaction with external data scheduling tools
such as Stork . The experimental results are very promising, and our algorithms outperform
other existing solutions in this area.
Globus Online offers fire-and-forget GridFTP file transfers as a service. The developers
mention that they set the pipelining, parallelism, and concurrency parameters to fixed values
for three different file sizes (i.e. less than 50MB, larger than 250MB, and in between).
However, the tuning Globus Online performs is non-adaptive; it does not change depending
on network conditions and transfer performance. Other approaches aim to improve
throughput by opening flows over multiple paths between end-systems , however there are
cases where individual data flows fail to achieve optimal throughput because of end-system
bottlenecks. Several others propose solutions that improve utilization of a single path by
means of parallel streams, pipelining, and concurrent transfers. Although using parallelism,
pipelining, and concurrency may improve throughput in certain cases, an optimization
algorithm should also consider system configuration, since end-systems may present factors
(e.g., low disk I/O speeds or over-tasked CPUs) which can introduce bottlenecks. 3 In our
previous work , we proposed network-aware transfer optimization by automatically detecting
bottlenecks and improving throughput by utilizing network and end-system parallelism. We
developed three highly-accurate models which would require as few as three sampling points
to provide accurate predictions for the optimal parallel stream number. These models have
proved to be more accurate than existing similar models which lack in predicting the parallel
stream number that gives the peak throughput. We have developed algorithms to determine
the best sampling size and the best sampling points for data transfers by using bandwidth,
Round-Trip Time (RTT), or Bandwidth-Delay Product (BDP).
2.9 MODELING AND TAMING PARALLEL TCP ON THE WIDE AREA

NETWORK
Parallel TCP flows are broadly used in the high performance distributed computing
community to enhance network throughput, particularly for large data transfers. Previous
research has studied the mechanism by which parallel TCP improves aggregate throughput, but
there doesn’t exist any practical mechanism to predict its throughput and its impact on the
background traffic. In this work, we address how to predict parallel TCP throughput as a
function of the number of flows, as well as how to predict the corresponding impact on cross
traffic. To the best of our knowledge, we are the first to answer the following question on behalf
of a user: what number of parallel flows will give the highest throughput with less than a impact
on cross traffic? We term this the maximum no disruptive throughput. We begin by studying the
behavior of parallel TCP in simulation to help derive a model for predicting parallel TCP
throughput and its impact on cross traffic. Combining this model with some previous findings we
derive a simple, yet effective, online advisor. We evaluate our advisor through extensive
simulations and wide-area experimentation. We have shown how to predict both parallel TCP
throughput and its impact on cross traffic as a function of the degree of parallelism using only
two probes at different parallelism levels. Both predictions are monotonically changing with
parallelism levels. Hence, the TameParallelTCP () function can be implemented using a simple
binary search. To the best of our knowledge, our work is the first to provide a practical parallel
TCP throughput prediction tool and to estimate the impact on the cross traffic. Although the
Internet paths show statistical stability, the transient stability won’t hold over the long term.
Either periodic resembling as in NWS or the dynamic sampling rate adjustment algorithm from
our other work can be applied for the long term monitoring.
Data intensive computing applications require efficient management and transfer of

terabytes of data over wide area networks. For example, the Large Hadron Collider (LHC) at the
European physics center CERN is predicted to generate several petabytes of raw and derived
data per year for approximately 15 years starting from 2005 [5]. Data grids aim to provide the
essential infrastructure and services for these applications, and a reliable, high-speed data
transfer service is a fundamental and critical component.
The available bandwidth of a path is defined as “the maximum rate that the path can
provide to a flow, without reducing the rate of the rest of the traffic.” . Available bandwidth has
been a central topic of research in packet networks over the years. To measure it accurately,
quickly, and non-intrusively, researchers have developed a variety of algorithms and systems.
Tools that measure either the bottleneck link capacity or the available bandwidth include IGI,
Remos ,Nettimer and pathload, among others. Most of these tools use packet pair or packet train
techniques to conduct the measurements and typically take a long time to converge. Previous
research has shown that, in most cases, the throughput that TCP achieves is considerably lower
than the available bandwidth. Parallel TCP is one response to this observation. Sivakumar et al.
Present PSockets, a library that stripes data over several sockets and evaluate its performance
through wide-area experimentation. The authors concluded that this approach can enhance TCP
throughput and, in certain situations, be more effective than tuning the TCP window size.
Allcock et al. Evaluate the performance of parallel GridFTP data transfers on the wide-area, and
applied GridFTP to the data management and transfer service in Grid environments.
Considerable effort has been spent on understanding the aggregate behavior of parallel TCP
flows on wide area networks. Shenker et al were first to point out that a small number of TCP
connections with the same RTT and bottleneck can get their congestion window synchronized.
Qiu et al. studied the aggregate TCP throughput, goodput and loss probability on a bottleneck
link via extensive ns2-based simulations. They found that a large number of TCP flows with the
same round trip time (RTT) can also become synchronized on the bottleneck link when the
average size of each TCP congestion window is larger than three packets. A detailed explanation
for this synchronization was given in . Due to global synchronization, all the flows share the
resource fairly: in the steady state they experience the same loss rate, RTT and thus the same
bandwidth. The work most relevant to ours is that of Hacker et al. The authors observe that
parallel TCP increases aggregate throughput by recovering faster from a loss event when the
network is not congested. The authors go on to propose a theoretical model for the upper bound
of parallel TCP throughput for an uncongested path. The model produces a tight upper bound
only if the network is not congested before and after adding the parallel TCP flows; the
aggregated throughput then increases linearly with the number of parallel TCP flows. Clearly
this reduces the utility of the model as networks are often congested.
2.10PREDICTION OF OPTIMAL PARALLELISM LEVEL IN WIDE

AREA DATA TRANSFERS
Wide area data transfers may be a major bottleneck for the end-to-end performance of
distributed applications. A practical way of increasing the wide area throughput at the
application layer is using multiple parallel streams. Although increased number of parallel
streams may yield much better performance than using a single stream, overwhelming the
network by opening too many streams may have an inverse effect. The congestion created by
excess number of streams may cause a drop down in the throughput achieved. Hence, it is
important to decide on the optimal number of streams without congesting the network.
Predicting this ’optimum’ number is not straightforward, since it depends on many
parameters specific to each individual transfer. Generic models that try to predict this number
either rely too much on historical information or fail to achieve accurate predictions. In this
paper, we present a set of new models which aim to approximate the optimal number with
least history information and lowest prediction overhead. An algorithm is introduced to select
the best combination of historic information to do the prediction for evaluation purposes as
well as optimizing prediction by reducing error rate. We measure the feasibility and accuracy
of the proposed prediction models by comparing to actual GridFTP data transfers by using
little historical information and have seen that we could predict the throughput of parallel
streams accurately and find a very close approximation of the optimal stream number. The
parallel transfer behavior of a TCP-based protocol, GridFTP, over wide area networks is
analyzed and several prediction models are presented and improved according to the
characteristics of the transfers. It has been observed that the aggregate throughput starts to
fall down in existence of congestion and none of the models could mimic this behavior. By
using minimum information on historical results, we have improved the current models to
predict the throughput curve of GridFTP and we have observed promising results.
Theoretically we prove that our improved models can minimize the average error compared
with the existing models. Furthermore, we propose assumptions about the delimitation of the
coefficients for our improved model and prove it mathematically. After the mathematical
analysis and proof of the correctness regarding to our assumptions, we design an algorithm to
find out the coefficients of the throughput function. Also we present an intelligent strategy to
choose appropriate parallelism levels and decrease data set size. A detailed experimental
analysis is done to support our ideas using multiple sampling, enumerating and averaging.
Based on the results of our experiments we conclude that we are able to predict the
throughput behavior of parallel transfers with Full Second Order and Newton’s Iteration with
very good accuracy and with a limited data size of historical transfers.
The end-to-end performance of a data intensive distributed application heavily depends

on the wide area data transfer performance and the effective throughput achieved by the
application. Prediction of the effective throughput that is achievable by an application given
the capacity of a network and current load is a study area in which several different methods
have been developed either in high or low-level. As an example of low level methods,
different transport protocols have been developed and also tuning the existing protocol
parameters gave promising results. Among those protocols, TCP is the most widely used one
and different versions of TCP are implemented to increase efficiency in achievable transfer
rate. On the high level, other techniques are proposed which use the existing underlying
protocol. Opening parallel streams is one way of doing that and is widely used in many
application areas from data-intensive scientific computing to live multimedia and peer-to-
peer paradigms. It is shown that parallel streams achieve high throughput by mimicking the
behavior of individual streams and get an unfair share of the available bandwidth . On the
other hand, using too many simultaneous connections results in congestion and the
achievable throughput starts to drop down. Unfortunately it is difficult to predict the point of
congestion and is variable over some parameters which are unique in both time and domain.
The prediction of the optimal
CHAPTER 3
3.1 METHODOLOGY
In this module, we form the cloud service provider. It contains many datacenters for
storage.Users can upload his files to this cloud service provider.It transfers this file to datacenter
for storage.Most scientific cloud applications require movement of large data sets either inside a
data center, or between multiple data centers. Transferring large data sets especially with
heterogeneous file sizes (i.e., many small and large files together) causes inefficient utilization of
the available network bandwidth. Small file transfers may cause the underlying transfer protocol
not reaching the full network utilization due to short-duration transfers and connection start
up/tear down overhead; and large file transfers may suffer from protocol inefficiency and end-
system limitations. Application-level TCP tuning parameters such as pipelining, parallelism and
concurrency are very affective in removing these bottlenecks, especially when used together and
in correct combinations.
3.1.1 ADAPTIVE PCP ALGORITHM:
 This algorithm sorts the dataset based on the file size and divides it into two sets; the first
set (Set1) containing files with sizes less than BDP and the second set (Set2) containing
files with sizes greater than BDP.
 Since setting different pipelining level is effective for file sizes less than BDP (Rule 2),
we apply a recursive chunk division algorithm to the first set.
3.1.2 MULTI-CHUNK (MC) ALGORITHM:
 The Multi-Chunk algorithm basically tries to improve transfer throughput of mixed

datasets which consist of both small and large files (small and large sizes are defined
based on network BDP).
 It divides the dataset into chunks based on file sizes (Small, Middle, Large and, Huge)
and find optimal parameter configuration (pipelining, parallelism and concurrency) for
each chunk separately.
 Then, it transfers multiple chunks simultaneously with their optimal parameters.
3.1.4 ALGORITHM
Step 1:FOR signal samples x(k): FROM k = 1 TO N
Step 2:(1)set error signal e0(k) = x(k)
Step 3: FOR every neuron: FROM j = 1 TO m
Step 4:(2) set wj (0) randomly
Step 5:(3) set j in accordance with input variance
Step 6:FOR epoch index s: FROM s = 1 TO MAX S
Step 7:FOR input samples: FROM k = 1 TO N
Step 8: (4) yj (k) = wT j (k 1)x(k)
Step 9:(5) wj (k) = wj (k 1)+ +j yj (k)[ej1(k) yj (k)wj (k 1)]

Step 10:IFjwj (k 1) wj (k)j < THEN
Step 11:(6) wj = wj (k); GO TO STABLE
Step 12:(7) decrease j exponentially STABLE: FOR input samples: FROM k = 1 TO N
Step 13:(8) set yj (k) = wT j x(k)
Step 14:(9) set error ej (k) = ej1(k) yj (k) wj
3.2 EXISTING SYSTEM

Application-level transfer tuning parameters such as pipelining, parallelism and
concurrency are very powerful mechanisms for overcoming data transfer bottlenecks for
scientific cloud applications, however their optimal values depend on the environment in which
the transfers are conducted. The problem with the existing system is that it only focuses on the
end-system and network characteristics however do not provide models regarding the dataset
characteristics.
The following are the disadvantages in the existing system.
 When there is a lack of model regarding dataset characteristics, we can see the services to
experience latency and transfer inefficiencies especially in situations requiring the
services to transfer large-scale data (i.e. above gigabytes of data).
 Lack of procedural algorithms to enable a clear structure of data transfer.
 With the transfer of data taking place between networks, it is vital to consider the nature
of transfer that is happening in cloud networks i.e. inter cloud or intra cloud. Because the
data throughput may be affected.
 Analyzing the data transfer in a heterogeneous file transfer scenario may not be efficient
and deciding the technique to be implemented may not be reliable.
3.3 PROPOSED SYSTEM
An auto-tuning technique that is based on active bandwidth estimation is the Work
around Daemon (WAD). WAD uses ping to measure the minimum RTT prior to the start of a
TCP connection, and pipe charto estimate the capacity of the path. A similar approach is taken
by the NLANR Auto-Tuning FTP implementation. Similar socket buffer sizing guidelines are
given in and. The first proposal for automatic TCP buffer tuning was. The goal of that work was
to allow a host (typically a server) to fairly share kernel memory between multiple ongoing
connections. The proposed mechanism, even though simple to implement, requires changes in
the operating system. An important point about is that the BDP of a path was estimated based on
the congestion window (cwnd) of the TCP connection. The receive socket buffer size was set to
a sufficiently large value so that it does not limit the transfer’s throughput.
3.3.1 ADVANTAGE
Our proposed work outperforms the most popular data transfer tools like Globus
Online and UDT in majority of the cases.
CHAPTER 4
4. RESULT AND DISCUSS

4.1.1 Experimental result:
The algorithms were tested on real high-speed networking testbedsFutureGrid and Xsede
and also cloud networks by using Amazon Web Services EC2 instances. Three different datasets
were used in the tests classified as small (between 512KB-8MB), medium(25MB-100MB) and
large(512MB-2GB) file sizes. Total dataset size was around 10GB-20GB. The baseline disk
write throughput results measured with Bonnie++ and dd for every testbed used in the
experiments are presented in Table 2 for comparison purposes.It transfers this file to datacenter
for storage.Most scientific cloud applications require movement of large data sets either inside a
data center, or between multiple data centers. Transferring large data sets especially with
system limitations.
data transfer
90
80
70
60
50 data transfer
40
30
20
10
0
no optimization pipelining parallelism concurrency
4.1.2 Time and costing:
In production, research, retail, and accounting, a cost is the value of money that has been
used up to produce something, and hence is not available for use anymore. In business, the cost
may be one of acquisition, in which case the amount of money expended to acquire it is counted
as cost. In this case, money is the input that is gone in order to acquire the thing.It transfers this
file to datacenter for storage.Most scientific cloud applications require movement of large data
sets either inside a data center, or between multiple data centers. Transferring large data sets
especially with heterogeneous file sizes (i.e., many small and large files together) causes
inefficient utilization of the available network bandwidth. Small file transfers may cause the
underlying transfer protocol not reaching the full network utilization due to short-duration
transfers and connection start up/tear down overhead; and large file transfers may suffer from
protocol inefficiency and end-system limitations. This acquisition cost may be the sum of the
cost of production as incurred by the original producer, and further costs of transaction as
incurred by the acquirer over and above the price paid to the producer. Usually, the price also
includes a mark-up for profit over the cost of production.
data transfer
90
80
70
60
50 data transfer
40
30
20
10
0
no optimization pipelining parallelism concurrency
4.1.3 Performance level:
The goal of that work was to allow a host (typically a server) to fairly share kernel
memory between multiple ongoing connections. The proposed mechanism, even though simple
to implement, requires changes in the operating system. It transfers this file to datacenter for
storage.Most scientific cloud applications require movement of large data sets either inside a data
center, or between multiple data centers. Transferring large data sets especially with
system limitations. An important point about is that the BDP of a path was estimated based on
the congestion window (cwnd) of the TCP connection. The receive socket buffer size was set to
a sufficiently large value so that it does not limit the transfer’s throughput.
data transfer
no optimization
pipelining
parallelism
concurrency
.NET Framework
The Microsoft .NET Framework (pronounced dot net) is a software framework developed

by Microsoft that runs primarily on Microsoft Windows. It includes a large class library known
as Framework Class Library (FCL) and provides language interoperability (each language can
use code written in other languages) across several programming languages. Programs written
for .NET Framework execute in a software environment (as contrasted
to hardware environment), known as Common Language Runtime (CLR), an application virtual
machine that provides services such as security, memory management, and exception handling.
FCL and CLR together constitute .NET Framework.
FCL provides user interface, data access, database connectivity, cryptography, web

application development, numeric algorithms, and network communications. Programmers
produce software by combining their own source code with .NET Framework and other libraries.
.NET Framework is intended to be used by most new applications created for Windows
platform. Microsoft also produces an integrated development environment largely for .NET
software called Visual Studio.
HISTORY
Microsoft started development of .NET Framework in the late 1990s, originally under the
name of Next Generation Windows Services (NGWS). By late 2000, the first beta versions of
.NET 1.0 were released.
.NET Framework family also includes two versions for mobile or embedded device use.
A reduced version of the framework, .NET Compact Framework, is available on Windows
CE platforms, including Windows Mobile devices such as smartphones. Additionally, .NET
Micro Framework is targeted at severely resource-constrained devices.
ARCHITECTURE
COMMON LANGUAGE INFRASTRUCTURE:
Common Language Runtime (CLI) provides a language-neutral platform for application

development and execution, including functions for exception handling, garbage collection,
security, and interoperability. By implementing the core aspects of .NET Framework within the
scope of CLI, this functionality will not be tied to a single language but will be available across
the many languages supported by the framework. Microsoft's implementation of CLI is Common
Language Runtime (CLR). It serves as the execution engine of .NET Framework. All .NET
programs execute under the supervision of CLR, guaranteeing certain properties and behaviors in
the areas of memory management, security, and exception handling.
For computer programs to run on CLI, they need to be compiled into Common

Intermediate Language (CIL) – as opposed to being compiled into machine code. Upon
execution, an architecture-specific Just-in-time compiler (JIT) turns the CIL code into machine
code. To improve performance, however, .NET Framework comes with Native Image
Generator (NGEN) that performs ahead-of-time compilation.
Figure 2: visual overview of the common language infrastructure (CLI)
CLASS LIBRARY
.NET Framework includes a set of standard class libraries. The class library is organized
in a hierarchy of namespaces. Most of the built-in APIs are part of
either System.* or Microsoft.* namespaces. These class libraries implement a large number of
common functions, such as file reading and writing, graphic rendering, database interaction, and
XML document manipulation, among others. .NET class libraries are available to all CLI
compliant languages. .NET Framework class library is divided into two parts: Framework Class
Library (FCL) and Base Class Library (BCL).
.NET CORE
.NET Core is a free and open-source partial implementation of the .NET Framework. It

consists of CoreCLR and CoreFX, which are partial forks of CLR and BCL respectively.NET
Core comes with an improved JIT compiler, called RyuJIT.
ASSEMBLIES
Compiled CIL code is stored in CLI assemblies. As mandated by the specification,
assemblies are stored in Portable Executable (PE) file format, common on Windows platform for
all DLL and EXE files. Each assembly consists of one or more files, one of which must contain a
manifest bearing the metadata for the assembly. The complete name of an assembly (not to be
confused with the file name on disk) contains its simple text name, version number, culture,
and public key token. Assemblies are considered equivalent if they share the same complete
name, excluding the revision of the version number. A private key can also be used by the
creator of the assembly for strong naming. The public key token identifies which private key an
assembly is signed with. Only the creator of the keypair (typically .NET developer signing the
assembly) can sign assemblies that have the same strong name as a previous version assembly,
since the creator is in possession of the private key. Strong naming is required to add assemblies
to Global Assembly Cache.
DESIGN TENETS
LANGUAGE INDEPENDENCE
.NET Framework introduces a Common Type System (CTS) that defines all

possible datatypes and programming constructs supported by CLR and how they may or may not
interact with each other conforming to CLI specification. Because of this feature, .NET
Framework supports the exchange of types and object instances between libraries and
applications written using any conforming .NET language.
PORTABILITY
While Microsoft has never implemented the full framework on any system except
Microsoft Windows, it has engineered the framework to be platform-agnostic, and cross-
platform implementations are available for other operating systems. Microsoft submitted the
specifications for CLI (which includes the core class libraries, CTS, and CIL), and C++/CLI to
both ECMA and ISO, making them available as official standards. This makes it possible for
third parties to create compatible implementations of the framework and its languages on other
platforms.
SECURITY
.NET Framework has its own security mechanism with two general features: Code
Access Security (CAS), and validation and verification. CAS is based on evidence that is
associated with a specific assembly. Typically the evidence is the source of the assembly
(whether it is installed on the local machine or has been downloaded from the intranet or
Internet). CAS uses evidence to determine the permissions granted to the code. Other code can
demand that calling code be granted a specified permission. The demand causes CLR to perform
a call stack walk: every assembly of each method in the call stack is checked for the required
permission; if any assembly is not granted the permission a security exception is thrown.
MEMORY MANAGEMENT
CLR frees the developer from the burden of managing memory (allocating and freeing up
when done); it handles memory management itself by detecting when memory can be safely
freed. Instantiations of .NET types (objects) are allocated from the managed heap; a pool of
memory managed by CLR. As long as there exists a reference to an object, which might be either
a direct reference to an object or via a graph of objects, the object is considered to be in use.
When there is no reference to an object, and it cannot be reached or used, it becomes garbage,
eligible for collection. .NET Framework includes a garbage collector which runs periodically, on
a separate thread from the application's thread, that enumerates all the unusable objects and
reclaims the memory allocated to them and this is more effcient then the java.
SIMPLIFIED DEPLOYMENT
.NET Framework includes design features and tools which help manage the installation
of computer software to ensure that it does not interfere with previously installed software, and
that it conforms to security requirements.
Features Of . Net:
Microsoft .NET is a set of Microsoft software technologies for rapidly building and
integrating XML Web services, Microsoft Windows-based applications, and Web solutions.
The .NET Framework is a language-neutral platform for writing programs that can easily and
securely interoperate. There’s no language barrier with .NET: there are numerous languages
available to the developer including Managed C++, C#, Visual Basic and Java Script. The .NET
framework provides the foundation for components to interact seamlessly, whether locally or
remotely on different platforms. It standardizes common data types and communications
protocols so that components created in different languages can easily interoperate.
“.NET” is also the collective name given to various software components built upon
the .NET platform. These will be both products (Visual Studio.NET and Windows.NET Server,
for instance) and services (like Passport, .NET My Services, and so on).
THE .NET FRAMEWORK
The .NET Framework has two main parts:
1. The Common Language Runtime (CLR).
2. A hierarchical set of class libraries.
The CLR is described as the “execution engine” of .NET. It provides the environment
within which programs run. The most important features are
 Conversion from a low-level assembler-style language, called Intermediate

Language (IL), into code native to the platform being executed on.
 Memory management, notably including garbage collection.
 Checking and enforcing security restrictions on the running code.
 Loading and executing programs, with version control and other such features.
 The following features of the .NET framework are also worth description:
Managed Code:
The code that targets .NET, and which contains certain extra Information- “metadata” - to
describe itself. Whilst both managed and unmanaged code can run in the runtime, only managed
code contains the information that allows the CLR to guarantee, for instance, safe execution and
interoperability.
Managed Data
With Managed Code comes Managed Data. CLR provides memory allocation
and Deal location facilities, and garbage collection. Some .NET languages use Managed Data by
default, such as C#, Visual Basic.NET and JScript.NET, whereas others, namely C++, do not.
Targeting CLR can, depending on the language you’re using, impose certain constraints on the
features available. As with managed and unmanaged code, one can have both managed and
unmanaged data in .NET applications - data that doesn’t get garbage collected but instead is
looked after by unmanaged code.
Common Type System
The CLR uses something called the Common Type System (CTS) to strictly enforce
type-safety. This ensures that all classes are compatible with each other, by describing types in a
common way. CTS define how types work within the runtime, which enables types in one
language to interoperate with types in another language, including cross-language exception
handling. As well as ensuring that types are only used in appropriate ways, the runtime also
ensures that code doesn’t attempt to access memory that hasn’t been allocated to it.
Common Language Specification
The CLR provides built-in support for language interoperability. To ensure that you can
develop managed code that can be fully used by developers using any programming language, a
set of language features and rules for using them called the Common Language Specification
(CLS) has been defined. Components that follow these rules and expose only CLS features are
considered CLS-compliant.
THE CLASS LIBRARY
.NET provides a single-rooted hierarchy of classes, containing over 7000 types.

The root of the namespace is called System; this contains basic types like Byte, Double, Boolean,
and String, as well as Object. All objects derive from System. Object. As well as objects, there
are value types. Value types can be allocated on the stack, which can provide useful flexibility.
There are also efficient means of converting value types to object types if and when necessary.
The set of classes is pretty comprehensive, providing collections, file, screen, and
network I/O, threading, and so on, as well as XML and database connectivity.
The class library is subdivided into a number of sets (or namespaces), each
providing distinct areas of functionality, with dependencies between the namespaces kept to a
minimum.
OVERLOADING
Overloading is another feature in C#. Overloading enables us to define multiple

procedures with the same name, where each procedure has a different set of arguments.
Besides using overloading for procedures, we can use it for constructors and properties in a
class.
MULTITHREADING:
C#.NET also supports multithreading. An application that supports multithreading can

handle multiple tasks simultaneously, we can use multithreading to decrease the time taken
by an application to respond to user interaction.
STRUCTURED EXCEPTION HANDLING
C#.NET supports structured handling, which enables us to detect and remove errors at
runtime. In C#.NET, we need to use Try…Catch…Finally statements to create exception
handlers. Using Try…Catch…Finally statements, we can create robust and effective
exception handlers to improve the performance of our application.
THE .NET FRAMEWORK
The .NET Framework is a new computing platform that simplifies application

development in the highly distributed environment of the Internet.
OBJECTIVES OF. NET FRAMEWORK

1. To provide a consistent object-oriented programming environment whether object codes is
stored and executed locally on Internet-distributed, or executed remotely.
2. To provide a code-execution environment to minimizes software deployment and

guarantees safe execution of code.
3. Eliminates the performance problems.
There are different types of application, such as Windows-based applications and Web-based
applications.
MICROSOFT SQL SERVER
Microsoft SQL Server is a relational database management system developed

by Microsoft. As a database, it is a software product whose primary function is to store and
retrieve data as requested by other software applications, be it those on the same computer or
those running on another computer across a network (including the Internet). There are at least a
dozen different editions of Microsoft SQL Server aimed at different audiences and for workloads
ranging from small single-machine applications to large Internet-facing applications with
many concurrent users. Its primary query languages are T-SQL and ANSI SQL.
HISTORY:
GENESIS
Prior to version 7.0 the code base for MS SQL Server was sold by Sybase SQL Server to
Microsoft, and was Microsoft's entry to the enterprise-level database market, competing
against Oracle, IBM, and, later, Sybase. Microsoft, Sybase and Ashton-Tate originally worked
together to create and market the first version named SQL Server 1.0 for OS/2 (about 1989)
which was essentially the same as Sybase SQL Server 3.0 on Unix,VMS, etc.
Since the release of SQL Server 2000, advances have been made in performance, the client IDE
tools, and several complementary systems that are packaged with SQL Server 2005. These
include:
 an extract-transform-load (ETL) tool (SQL Server Integration Services or SSIS)

 a Reporting Server
 an OLAP and data mining server (Analysis Services)
Common Language Runtime (CLR) integration was introduced with this version,

enabling one to write SQL code as Managed Code by the CLR. For relational data, T-SQL has
been augmented with error handling features (try/catch) and support for recursive queries with
CTEs (Common Table Expressions). SQL Server 2005 has also been enhanced with new
indexing algorithms, syntax and better error recovery systems.
FEATURES SQL SERVER:
The OLAP Services feature available in SQL Server version 7.0 is now called SQL
Server 2000 Analysis Services. The term OLAP Services has been replaced with the term
Analysis Services. Analysis Services also includes a new data mining component. The
Repository component available in SQL Server version 7.0 is now called Microsoft SQL Server
2000 Meta Data Services. References to the component now use the term Meta Data Services.
The term repository is used only in reference to the repository engine within Meta Data Services
SQL-SERVER database consist of six type of objects,
They are,
1. TABLE
2. QUERY
3. FORM
4. REPORT
5. MACRO
TABLE:
A database is a collection of data about a specific topic.
VIEWS OF TABLE:
We can work with a table in two types,

1. Design View
2. Datasheet View
Design View
To build or modify the structure of a table we work in the table design view. We can
specify what kind of data will be hold.
Datasheet View
To add, edit or analyses the data itself we work in tables datasheet view mode.
QUERY:
A query is a question that has to be asked the data. Access gathers data that answers the
question from one or more table. The data that make up the answer is either dynaset (if you edit
it) or a snapshot (it cannot be edited).Each time we run query, we get latest information in the
dynaset. Access either displays the dynaset or snapshot for us to view or perform an action on it,
such as deleting or updating.
INPUT DESIGN AND OUTPUT DESIGN
INPUT DESIGN
The input design is the link between the information system and the user. It comprises the
developing specification and procedures for data preparation and those steps are necessary to put
transaction data in to a usable form for processing can be achieved by inspecting the computer to
read data from a written or printed document or it can occur by having people keying the data
directly into the system. The design of input focuses on controlling the amount of input required,
controlling the errors, avoiding delay, avoiding extra steps and keeping the process simple. The
input is designed in such a way so that it provides security and ease of use with retaining the
privacy. Input Design considered the following things:’
 What data should be given as input?

 How the data should be arranged or coded?
 The dialog to guide the operating personnel in providing input.
 Methods for preparing input validations and steps to follow when error occur.
OBJECTIVES
1.Input Design is the process of converting a user-oriented description of the input into a
computer-based system. This design is important to avoid errors in the data input process and
show the correct direction to the management for getting correct information from the
computerized system.
2.It is achieved by creating user-friendly screens for the data entry to handle large volume of
data. The goal of designing input is to make data entry easier and to be free from errors. The data
entry screen is designed in such a way that all the data manipulates can be performed. It also
provides record viewing facilities.
3.When the data is entered it will check for its validity. Data can be entered with the help of
screens. Appropriate messages are provided as when needed so that the user
will not be in maize of instant. Thus the objective of input design is to create an input layout that
is easy to follow
OUTPUT DESIGN
A quality output is one, which meets the requirements of the end user and presents the
information clearly. In any system results of processing are communicated to the users and to
other system through outputs. In output design it is determined how the information is to be
displaced for immediate need and also the hard copy output. It is the most important and direct
source information to the user. Efficient and intelligent output design improves the system’s
relationship to help user decision-making.
1. Designing computer output should proceed in an organized, well thought out manner; the right
output must be developed while ensuring that each output element is designed so that people will
find the system can use easily and effectively. When analysis design computer output, they
should Identify the specific output that is needed to meet the requirements.
2.Select methods for presenting information.
3.Create document, report, or other formats that contain information produced by the system.
The output form of an information system should accomplish one or more of the following
objectives.
 Convey information about past activities, current status or projections of the

 Future.
 Signal important events, opportunities, problems, or warnings.
 Trigger an action.
Confirm an action.
FEASIBILITY STUDY:
The feasibility of the project is analyzed in this phase and business proposal is put forth with a
very general plan for the project and some cost estimates. During system analysis the feasibility
study of the proposed system is to be carried out. This is to ensure that the proposed system is
not a burden to the company. For feasibility analysis, some understanding of the major
requirements for the system is essential.
Three key considerations involved in the feasibility analysis are
Economical feasibility
Technical feasibility
Social feasibility
ECONOMICAL FEASIBILITY:
This study is carried out to check the economic impact that the system will have on the
organization. The amount of fund that the company can pour into the research and development
of the system is limited. The expenditures must be justified. Thus the developed system as well
within the budget and this was achieved because most of the technologies used are freely
available. Only the customized products had to be purchased.
TECHNICAL FEASIBILITY:
This study is carried out to check the technical feasibility, that is, the technical requirements of
the system. Any system developed must not have a high demand on the available technical
resources. This will lead to high demands on the available technical resources. This will lead to
high demands being placed on the client. The developed system must have a modest
requirement, as only minimal or null changes are required for implementing this system.
SOCIAL FEASIBILITY:
The aspect of study is to check the level of acceptance of the system by the user. This
includes the process of training the user to use the system efficiently. The user must not feel
threatened by the system, instead must accept it as a necessity. The level of acceptance by the
users solely depends on the methods that are employed to educate the user about the system and
to make him familiar with it. His level of confidence must be raised so that he is also able to
make some constructive criticism, which is welcomed, as he is the final user of the system.
CHAPTER 5
5.1 CONCLUSIONS
Common socket bumper sizing practices, such as setting the socket bumper size to the
default or maximum value, can lead to poor throughput. We developed SOBAS, an application
layer mechanism that manually sets the socket bumper size while the transfer is in progress,
without prior knowledge of any path individuality. SOBAS manages to wet through the available
bandwidth in the network path, without saturate the tight link buffer in the path. SOBAS can be
included with bulk transfer applications, such as Grafts, providing considerably better
performance in non-congested wide-area network paths. We plan to join together SOBAS with
trendy Grid data transfer applications in the future.
5.2 FUTURE ENHANCEMENT
In future work, we intend to write an overhead-free implementation of a GridFTP client
to reduce the overhead regarding connection startup/tear down processes for different chunk
transfers.
REFERENCE:
[1] J. Bresnahan, M. Link, R. Kettimuthu, D. Fraser, and I. Foster, “Gridftp pipelining,” in
TeraGrid 2007, 2007.
[2] B. Allen, J. Bresnahan, L. Childers, I. Foster, G. Kandaswamy, R. Kettimuthu, J. Kordas, M.
Link, S. Martin, K. Pickett,
and S. Tuecke, “Software as a service for data scientists,” Communications of the ACM, vol.
55:2, pp. 81–88, 2012.
[3] R. S. Prasad, M. Jain, and C. Davrolis, “Socket buffer autosizing for high-performance data
transfers,” Journal of Grid Computing, vol. 1(4), pp. 361–376, Aug. 2004.
[4] G. Hasegawa, T. Terai, T. Okamoto, and M. M, “Scalable socket buffer tuning for high-
performance web servers,” in
International Conference on Network Protocols(ICNP01), 2001, p. 281.
[5] A. Morajko, “Dynamic tuning of parallel/distributed applications,” Ph.D. dissertation,
Universitat Autonoma de Barcelona,
2004.
[6] T. Ito, H. Ohsaki, and M. Imase, “On parameter tuning of data transfer protocol gridftp for
wide-area networks,” International
Journal of Computer Science and Engineering, vol. 2(4), pp. 177– 183, Sep. 2008.
[7] K. M. Choi, E. Huh, and H. Choo, “EfficieSnt resource management scheme of tcp buffer
tuned parallel stream to optimize
system performance,” in Proc. Embedded and ubiquitous computing, Nagasaki, Japan, Dec.
2005.
[8] E. Yildirim, M. Balman, and T. Kosar, Data-intensive Distributed Computing: Challenges
and Solutions for Large-scale Information Management. IGI-Global, 2012, ch. Data-aware
Distributed Computing.
[9] T. J. Hacker, B. D. Noble, and B. D. Atley, “The end-to-end performance effects of parallel
tcp sockets on a lossy wide area
network,” in Proc. IEEE International Symposium on Parallel and Distributed
Processing(IPDPS’02), 2002, pp. 434–443.
[10] D. Lu, Y. Qiao, P. A. Dinda, and F. E. Bustamante, “Modeling and taming parallel tcp on
the wide area network,” in Proc. IEEE International Symposium on Parallel and Distributed
Processing (IPDPS’05), Apr. 2005, p. 68b.
Proc. IEEE International Symposium on Parallel and Distributed Processing (IPDPS’05), Apr.
2005, p. 68b.
[11] S. Venugopal, K. Nadiminti, H. Gibbins, R. Buyya, Designing a resource broker for
heterogeneous grids, Software: Practice and Experience 38 (8) (2008) 793–825.
[12] M. A. Rodriguez, R. Buyya, A responsive knapsack-based algorithm for resource
provisioning and scheduling of scientific workflows in clouds, in: Proceedings of the Fourty-
fourth International Conference on Parallel Processing (ICPP), Vol. 1, IEEE, 2015, pp. 839–848.
[13] G. Juve, A. Chervenak, E. Deelman, S. Bharathi, G. Mehta, K. Vahi, Characterizing and
profiling scientific workflows, Future Generation Computer Systems 29 (3) (2013) 682–692.
[14] S. Esteves, L. Veiga, Waas: Workflow-as-a-service for the cloud with scheduling of
continuous and data-intensive workflows, The Computer Journal (2015) bxu158.

Socket Buffer Auto

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Socket Buffer Auto

Caricato da

Copyright:

Formati disponibili

SOCKET BUFFER AUTOSIZINGFORBIG DATA TRANSFERS

THROUGH PIPELINING, PARALLELISM AND

1.1 PROBLEM STATEMENT

3.1 EXISITNG SYSTEMS

3.2 PROPOSED SYSTEM

8 RESULT AND DISCUSSION

1.1 APPLICATION LEVEL OPTIMIZATION

Application optimization is the process of modifying a software system to make

In some cases, however, optimization relies on using more elaborate

Computer networks are no longer relegated to allowing a group of computers to access

 File sharing between two computers

 Instant messaging (IM) between computers with IM software installed

 Voice over IP (VoIP), to replace traditional telephony systems

1.2.1 NETWORKING GLOSSARY

The following terminologies must be clearly understood for complete understanding of

Transmission Control Protocol (TCP) :A connection-oriented transport protocol.

User Datagram Protocol (UDP) : A connectionless transport protocol. Connectionless

Unicast : A unicast communication flow is a one-to-one flow.

Multicast: A multicast communication flow is a one-to-many flow.

Bandwidth: Bandwidth is the raw capability of a communication channel to move data

Throughput: Throughput is the total capability of a processing system to move product

Network Latency: Network latency in a packet-switched network is measured either

Concurrency :Concurrency refers to sending different files through different data

1.3 BIG DATA NETWORKING

1.3.1 Network resiliency and big data applications

1.3.2 Network partitioning to handle big data

Network partitioning is crucial in setting up big data environments. In its

1.4 FILE TRANSFERS

File transfer plays a very important role in the cloud environment as

1.5 DATA CENTER

1.6 SCOPE OF THE WORK:

G Hasegawa Scalable socket buffer tuning for high-performance 2001

E César Dynamic performance tuning supported by program 2002

GridFTP is an exceptionally fast transfer protocol for large volumes of data.

2.3 SCALABLE SOCKET BUFFER TUNING FOR HIGH-

2.4DYNAMIC PERFORMANCE TUNING SUPPORTED BY PROGRAM

Performance analysis and tuning of parallel/distributed applications are very difficult

2.5 ON PARAMETER TUNING OF DATA TRANSFER PROTOCOL

In wide-area Grid computing, geographically distributed computational resources are

2.6 Efficient Resource Management Scheme of TCP Buffer Tuned Parallel

2.7 DEVISING A CLOUD SCIENTIFIC WORKFLOW PLATFORM FOR

Scientific workflow management systems (SWFMSs) are facing unprecedented

2.8 THE END-TO-END PERFORMANCE EFFECTS OF PARALLEL TCP

2.9 MODELING AND TAMING PARALLEL TCP ON THE WIDE AREA

Data intensive computing applications require efficient management and transfer of

2.10PREDICTION OF OPTIMAL PARALLELISM LEVEL IN WIDE

The end-to-end performance of a data intensive distributed application heavily depends

3.1.2 MULTI-CHUNK (MC) ALGORITHM:

 The Multi-Chunk algorithm basically tries to improve transfer throughput of mixed

Step 1:FOR signal samples x(k): FROM k = 1 TO N

Step 2:(1)set error signal e0(k) = x(k)

Step 3: FOR every neuron: FROM j = 1 TO m

Step 4:(2) set wj (0) randomly

Step 5:(3) set j in accordance with input variance

Step 6:FOR epoch index s: FROM s = 1 TO MAX S

Step 7:FOR input samples: FROM k = 1 TO N

Step 8: (4) yj (k) = wT j (k 1)x(k)

Step 9:(5) wj (k) = wj (k 1)+ +j yj (k)[ej1(k) yj (k)wj (k 1)]

Step 11:(6) wj = wj (k); GO TO STABLE

Step 12:(7) decrease j exponentially STABLE: FOR input samples: FROM k = 1 TO N

Step 13:(8) set yj (k) = wT j x(k)

Step 14:(9) set error ej (k) = ej1(k) yj (k) wj

3.2 EXISTING SYSTEM