Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Advance Internet
Technology
Mayur Patel**
Pratik Gandhi**
Abhishek Chandan*
We tried our level best to compile all the topics of AIT. The remaining of the topics left will soon be added.
**Major Contribution, * Medium Contribution
Module 1: Advanced Internet Protocols
Domain Name System
DNS:
The domain name system (DNS) is the way that Internet domain names are located and
translated into Internet Protocol addresses. A domain name is a meaningful and easy-to-remember
"handle" for an Internet address. Because maintaining a central list of domain name/IP address
correspondences would be impractical, the lists of domain names and IP addresses are distributed
throughout the Internet in a hierarchy of authority. There is probably a DNS server within close
geographic proximity to your access provider that maps the domain names in your Internet requests
or forwards them to other servers in the Internet.
Name servers:
The Domain Name System is maintained by a distributed database system, which uses the
client-server model. The nodes of this database are the name servers. Each domain has at least one
authoritative DNS server that publishes information about that domain and the name servers of any
domains subordinate to it. The top of the hierarchy is served by the root nameservers, the servers to
query when looking up (resolving) a top-level domain name (TLD).
The domain name space consists of a tree of domain names. Each node or leaf in the tree has
zero or more resource records, which hold information associated with the domain name. The tree
sub-divides into zones beginning at the root zone. A DNS zone may consist of only one domain, or
may comprise many domains and sub-domains, depending on the administrative authority delegated
to the manager.
An authoritative name server is a name server that gives answers that have been configured
by an original source, for example, the domain administrator or by dynamic DNS methods, in
contrast to answers that were obtained via a regular DNS query to another name server. An
authoritative-only name server only returns answers to queries about domain names that have been
specifically configured by the administrator.
An authoritative name server can either be a master server or a slave server. A master server
is a server that stores the original (master) copies of all zone records. A slave server uses an
automatic updating mechanism of the DNS protocol in communication with its master to maintain an
identical copy of the master records.
In principle, authoritative name servers are sufficient for the operation of the Internet.
However, with only authoritative name servers operating, every DNS query must start with recursive
queries at the root zone of the Domain Name System and each user system must implement resolver
software capable of recursive operation.
To improve efficiency, reduce DNS traffic across the Internet, and increase performance in
end-user applications, the Domain Name System supports DNS cache servers which store DNS
query results for a period of time determined in the configuration (time-to-live) of the domain name
record in question. Typically, such caching DNS servers, also called DNS caches, also implement
the recursive algorithm necessary to resolve a given name starting with the DNS root through to the
The combination of DNS caching and recursive functions in a name server is not mandatory,
the functions can be implemented independently in servers for special purposes.
Internet service providers typically provide recursive and caching name servers for their
customers. In addition, many home networking routers implement DNS caches and recursors to
improve efficiency in the local network.
DNS resolvers:
The client-side of the DNS is called a DNS resolver. It is responsible for initiating and
sequencing the queries that ultimately lead to a full resolution (translation) of the resource sought,
e.g., translation of a domain name into an IP address.
A non-recursive query is one in which the DNS server provides a record for a domain for
which it is authoritative itself, or it provides a partial result without querying other servers.
A recursive query is one for which the DNS server will fully answer the query (or give an
error) by querying other name servers as needed. DNS servers are not required to support
recursive queries.
The resolver, or another DNS server acting recursively on behalf of the resolver, negotiates
use of recursive service using bits in the query headers.
Resolving usually entails iterating through several name servers to find the needed
information. However, some resolvers function simplistically and can communicate only with a
single name server. These simple resolvers (called "stub resolvers") rely on a recursive name server
to perform the work of finding information for them.
Reverse lookup:
A reverse lookup is a query of the DNS for domain names when the IP address is known.
Multiple domain names may be associated with an IP address. The DNS stores IP addresses in the
form of domain names as a specially formatted names in pointer (PTR) records within the
When performing a reverse lookup, the DNS client converts the address into these formats,
and then queries the name for a PTR record following the delegation chain as for any DNS query.
For example, the IPv4 address 208.80.152.2 is represented as a DNS name as 2.152.80.208.in-
addr.arpa. The DNS resolver begins by querying the root servers, which point to ARIN's servers for
the 208.in-addr.arpa zone. From there the Wikimedia servers are assigned for 152.80.208.in-
addr.arpa, and the PTR lookup completes by querying the wikimedianameserver for 2.152.80.208.in-
addr.arpa, which results in an authoritative response.
Protocol details:
DNS primarily uses User Datagram Protocol (UDP) on port number 53 to serve requests.
DNS queries consist of a single UDP request from the client followed by a single UDP reply from
the server. The Transmission Control Protocol (TCP) is used when the response data size exceeds
512 bytes, or for tasks such as zone transfers. Some operating systems, such as HP-UX, are known
to have resolver implementations that use TCP for all queries, even when UDP would suffice.
A Resource Record (RR) is the basic data element in the domain name system. Each record
has a type (A, MX, etc.), an expiration time limit, a class, and some type-specific data. Resource
records of the same type define a resource record set. The order of resource records in a set, returned
by a resolver to an application, is undefined, but often servers implement round-robin ordering to
achieve load balancing. DNSSEC, however, works on complete resource record sets in a canonical
order.
NAME is the fully qualified domain name of the node in the tree. On the wire, the name may be
shortened using label compression where ends of domain names mentioned earlier in the packet can
be substituted for the end of the current domain name.
TYPE is the record type. It indicates the format of the data and it gives a hint of its intended use. For
example, the A record is used to translate from a domain name to an IPv4 address, the NS record
RDATA is data of type-specific relevance, such as the IP address for address records, or the priority
and hostname for MX records. Well known record types may use label compression in the RDATA
field, but "unknown" record types must not.
The CLASS of a record is set to IN (for Internet) for common DNS records involving Internet
hostnames, servers, or IP addresses. In addition, the classes Chaos (CN) and Hesiod (HS) exist. Each
class is an independent name space with potentially different delegations of DNS zones.
In addition to resource records defined in a zone file, the domain name system also defines several
request types that are used only in communication with other DNS nodes (on the wire), such as when
performing zone transfers (AXFR/IXFR) or for EDNS (OPT).
Most home networking routers today have this feature already built into their firmware. One
of the early routers to support Dynamic DNS was the UMAX UGate-3000 in 1999, which supported
the TZO.COM dynamic DNS service.
An example is residential users who wish to access their personal computer at home while
traveling. If the home computer has a fixed static IP address, the user can connect directly using this
address, but many provider networks force frequent changes the IP address configured in their
customers' equipment. With dynamic DNS, the home computer can automatically associate its
current IP address with a domain name. As a result the remote user can resolve the host name used
for the dynamic DNS service entry to the current address of the home computer with a DNS query.
If a remote control program such as VNC server may be kept running on a host in the private
network, the user can connect to the home network with a VNC client program.
Increasing efforts to secure Internet communications today involve encryption of all dynamic
updates via the public Internet, as these public dynamic DNS services have been abused increasingly
to design security breaches. Standards-based methods within the DNSSEC protocol suite, such as
TSIG, have been developed to secure DNS updates, but are not widely in use. Microsoft developed
alternative technology (GSS-TSIG) based on Kerberos authentication.
In the absence of DHCP, hosts may be manually configured with an IP address. Alternatively
IPv6 hosts may use stateless address auto configuration to generate an IP address. IPv4 hosts may
use link-local addressing to achieve limited local connectivity.
There are two versions of DHCP, one for IPv4 and one for IPv6. While both versions bear
the same name and perform much the same purpose, the details of the protocol for IPv4 and IPv6 are
sufficiently different that they can be considered separate protocols.
BOOTP Interaction:
Reliability
The DHCP protocol provides reliability in several ways: periodic renewal, rebinding, and
failover. DHCP clients are allocated leases that last for some period of time. Clients begin to attempt
to renew their leases once half the lease interval has expired. They do this by sending a unicast
DHCPREQUEST message to the DHCP server that granted the original lease. If that server is down
or unreachable, it will fail to respond to the DHCPREQUEST. However, the DHCPREQUEST will
be repeated by the client from time to time, so when the DHCP server comes back up or becomes
reachable again, the DHCP client will succeed in contacting it, and renew its lease.
If the DHCP server is unreachable for an extended period of time, the DHCP client will
attempt to rebind, by broadcasting its DHCPREQUEST rather than unicasting it. Because it is
broadcast, the DHCPREQUEST message will reach all available DHCP servers. If some other
DHCP server is able to renew the lease, it will do so at this time.
If rebinding fails, the lease will eventually expire. When the lease expires, the client must
stop using the IP address granted to it in its lease. At that time, it will restart the DHCP process from
the beginning by broadcasting a DHCPDISCOVER message. Since its lease has expired, it will
accept any IP address offered to it. Once it has a new IP address, presumably from a different DHCP
server, it will once again be able to use the network. However, since its IP address has changed, any
ongoing connections will be broken.
Security
The base DHCP protocol does not include any mechanism for authentication. Because of
this, it is vulnerable to a variety of attacks. These attacks fall into three main categories:
Because the client has no way to validate the identity of a DHCP server, unauthorized DHCP
servers can be operated on networks, providing incorrect information to DHCP clients. This can
serve either as a denial-of-service attack, preventing the client from gaining access to network
connectivity, or as a man-in-the-middle attack. Because the DHCP server provides the DHCP client
with server IP addresses, such as the IP address of one or more DNS servers, an attacker can
convince a DHCP client to do its DNS lookups through its own DNS server, and can therefore
provide its own answers to DNS queries from the client. This in turn allows the attacker to redirect
network traffic through itself, allowing it to eavesdrop on connections between the client and
network servers it contacts, or to simply replace those network servers with its own.
Because the DHCP server has no secure mechanism for authenticating the client, clients can
gain unauthorized access to IP addresses by presenting credentials, such as client identifiers, that
belong to other DHCP clients. This also allows DHCP clients to exhaust the DHCP server's store of
IP addresses—by presenting new credentials each time it asks for an address, the client can consume
all the available IP addresses on a particular network link, preventing other DHCP clients from
getting service.
DHCP does provide some mechanisms for mitigating these problems. The Relay Agent
Information Option protocol extension (RFC 3046) allows network operators to attach tags to DHCP
messages as these messages arrive on the network operator's trusted network. This tag is then used as
an authorization token to control the client's access to network resources. Because the client has no
access to the network upstream of the relay agent, the lack of authentication does not prevent the
DHCP server operator from relying on the authorization token.
Another extension, Authentication for DHCP Messages (RFC 3118), provides a mechanism
for authenticating DHCP messages.
The first FTP client applications were interactive command-line tools, implementing
standard commands and syntax. Graphical user interface clients have since been developed for many
of the popular desktop operating systems in use today.
Example shows an example of using FTP for retrieving a list of items in a directory:
A host that provides an FTP service may additionally provide anonymous FTP access. Users
typically log into the service with an 'anonymous' account when prompted for user name. Although
users are commonly asked to send their email address in lieu of a password, no verification is
actually performed on the supplied data.
GUI:
% ftp challenger.atc.fhda.edu
Connected to challenger.atc.fhda.edu
220 Server ready
Name:forouzan
Password:xxxxxxx
ftp>ls /usr/user/report
200 OK
150 Opening ASCII mode
...........
226 transfer complete
ftp>close
221 Goodbye
ftp>quit
Abbreviated address
CIDR address
Address structure
Loopback address
IPv6 datagram
IP addresses are described as consisting of two groups of bits in the address: the most
significant part is the network address which identifies a whole network or subnet and the least
significant portion is the host identifier, which specifies a particular host interface on that network.
This division is used as the basis of traffic routing between IP networks and for address allocation
policies. Classful network design for IPv4 sized the network address as one or more 8-bit groups,
resulting in the blocks of Class A, B, or C addresses. Classless Inter-Domain Routing allocates
address space to Internet service providers and end users on any address bit boundary, instead of on
8-bit segments. In IPv6, however, the interface identifier has a fixed size of 64 bits by convention,
and smaller subnets are never allocated to end users.
CIDR notation is a syntax of specifying IP addresses and their associated routing prefix. It
appends to the address a slash character and the decimal number of leading bits of the routing prefix,
e.g., 192.168.0.0/16 for IPv4, and 2001:db8::/32 for IPv6.
CIDR Blocks:
An IP address is part of a CIDR block, and is said to match the CIDR prefix if the initial N
bits of the address and the CIDR prefix are the same. Thus, understanding CIDR requires that IP
address be visualized in binary. Since the length of an IPv4 address has 32 bits, an N-bit CIDR
prefix leaves 32-N bits unmatched, meaning that 232-N IPv4 addresses match a given N-bit CIDR
prefix. Shorter CIDR prefixes match more addresses, while longer prefixes match fewer. An address
can match multiple CIDR prefixes of different lengths.
CIDR is also used for IPv6 addresses and the syntax semantic is identical. A prefix length
can range from 0 to 128, due to the larger number of bits in the address, however, by convention a
subnet on broadcast MAC layer networks always has 64-bit host identifiers. Larger prefixes are
rarely used even on point-to-point links.
Considering alternative solutions with every router connected to every other router, or if
every router was connected to 2 routers, shows the convenience of hierarchical routing. It decreases
the complexity of network topology, increases routing efficiency, and causes much less congestion
because of fewer routing advertisements. With hierarchical routing, only core routers connected to
the backbone are aware of all routes. Routers that lie within a LAN only know about routes in the
LAN. Unrecognized destinations are passed to the default route.
VoIP systems employ session control protocols to control the set-up and tear-down of calls as
well as audio codecs which encode speech allowing transmission over an IP network as digital audio
via an audio stream. The codec used is varied between different implementations of VoIP (and often
a range of codecs are used); some implementations rely on narrowband and compressed speech,
while others support high fidelitystereo codecs.
There are three types of VoIP tools that are commonly used; IP Phones,Software VoIP and
Mobile and Integrated VoIP. The IP Phones are the most institutionally established but still the least
obvious of the VoIP tools. Of all the software VoIP tools that exist, Skype is probably the most
easily identifiable. The use of software VoIP has increased during the global recession as many
persons, looking for ways to cut costs have turned to these tools for free or inexpensive calling or
video conferencing applications. Software VoIP can be further broken down into three classes or
subcategories; Web Calling, Voice and Video Instant Messaging and Web Conferencing. Mobile and
Integrated VoIP is just another example of the adaptability of VoIP. VoIP is available on many
smartphones and internet devices so even the users of portable devices that are not phones can still
make calls or send SMS text messages over 3G or WIFI.
Protocols:
Voice over IP has been implemented in various ways using both proprietary and open
protocols and standards. Examples of technologies used to implement Voice over IP include:
H.323
IP Multimedia Subsystem (IMS)
Media Gateway Control Protocol (MGCP)
Session Initiation Protocol (SIP)
Real-time Transport Protocol (RTP)
Session Description Protocol (SDP)
A notable proprietary implementation is the Skype protocol, which is in part based on the
principles of Peer-to-Peer (P2P) networking.
Adoption
Consumer market
A major development that started in 2004 was the introduction of mass-market VoIP services
that utilize existing broadband Internet access, by which subscribers place and receive telephone
calls in much the same manner as they would via the public switched telephone network (PSTN).
Full-service VoIP phone companies provide inbound and outbound service with Direct Inbound
Dialing. Many offer unlimited domestic calling for a flat monthly subscription fee. This sometimes
includes international calls to certain countries. Phone calls between subscribers of the same
provider are usually free when flat-fee service is not available.
A VoIP phone is necessary to connect to a VoIP service provider. This can be implemented
in several ways:
Dedicated VoIP phones connect directly to the IP network using technologies such as wired
Ethernet or wireless Wi-Fi. They are typically designed in the style of traditional digital
business telephones.
An analog telephone adapter is a device that connects to the network and implements the
electronics and firmware to operate a conventional analog telephone attached through a
modular phone jack. Some residential Internet gateways and cable modems have this
function built in.
A softphone is application software installed on a networked computer that is equipped with
a microphone and speaker, or headset. The application typically presents a dial pad and
display field to the user to operate the application by mouse clicks or keyboard input.
Smartphones and Wi-Fi enabled mobile phones may have SIP clients built into the firmware
or available as an application download. Such clients operate independently of the mobile telephone
phone network and use either the cellular data connection or WiFi to make and receive phone calls.
Corporate use
VoIP solutions aimed at businesses have evolved into "unified communications" services that
treat all communications—phone calls, faxes, voice mail, e-mail, Web conferences and more—as
discrete units that can all be delivered via any means and to any handset, including cellphones. Two
kinds of competitors are competing in this space: one set is focused on VoIP for medium to large
enterprises, while another is targeting the small-to-medium business (SMB) market.
VoIP allows both voice and data communications to be run over a single network, which can
significantly reduce infrastructure costs.
The prices of extensions on VoIP are lower than for PBX and key systems. VoIP switches
may run on commodity hardware, such as PCs or Linux systems. Rather than closed architectures,
these devices rely on standard interfaces.
VoIP devices have simple, intuitive user interfaces, so users can often make simple system
configuration changes. Dual-mode cellphones enable users to continue their conversations as they
move between an outside cellular service and an internal Wi-Fi network, so that it is no longer
necessary to carry both a desktop phone and a cellphone. Maintenance becomes simpler as there are
fewer devices to oversee.
Skype, which originally marketed itself as a service among friends, has begun to cater to
businesses, providing free-of-charge connections between any users on the Skype network and
connecting to and from ordinary PSTN telephones for a charge.
In the United States the Social Security Administration (SSA) is converting its field offices
of 63,000 workers from traditional phone installations to a VoIP infrastructure carried over its
existing data network.
Benefits
Operational cost
VoIP can be a benefit for reducing communication and infrastructure costs. Examples include:
Routing phone calls over existing data networks to avoid the need for separate voice and data
networks.
Conference calling, IVR, call forwarding, automatic redial, and caller ID features that
traditional telecommunication companies (telcos) normally charge extra for, are available
free of charge from open source VoIP implementations.
Flexibility
VoIP can facilitate tasks and provide services that may be more difficult to implement using the
PSTN. Examples include:
The Internet:
Leased Line :
A leased line is a service contract between a provider and a customer, whereby the provider
agrees to deliver a symmetrictelecommunications line connecting two or more locations in exchange
for a monthly rent (hence the term lease). It is sometimes known as a 'Private Circuit' or 'Data Line'
in the UK or as CDN (CircuitoDirettoNumerico) in Italy. Unlike traditional PSTN lines it does not
have a telephone number, each side of the line being permanently connected to the other. Leased
lines can be used for telephone, data or Internet services. Some are ringdown services, and some
connect two PBXes.
Typically, leased lines are used by businesses to connect geographically distant offices.
Unlike dial-up connections, a leased line is always active. The fee for the connection is a fixed
For example, a T-1 channel can be leased, and provides a maximum transmission speed of
1.544 Mbps. The user can divide the connection into different lines for multiplexing data and voice
communication, or use the channel for one high speed data circuit.
1. PPTP:
The Point-to-Point Tunneling Protocol (PPTP) is a method for implementing virtual private
networks. PPTP uses a control channel over TCP and a GRE tunnel operating to encapsulate PPP
packets.
The PPTP specification does not describe encryption or authentication features and relies on
the PPP protocol being tunneled to implement security functionality. However the most common
PPTP implementation, shipping with the Microsoft Windows product families, implements various
levels of authentication and encryption natively as standard features of the Windows PPTP stack.
The intended use of this protocol is to provide similar levels of security and remote access as typical
VPN products.
2. L2TP
Although L2TP acts like a Data Link Layer protocol in the OSI model, L2TP is in fact a
Session Layer protocol, and uses the registered UDP port 1701.
The entire L2TP packet, including payload and L2TP header, is sent within a UDP datagram.
It is common to carry Point-to-Point Protocol (PPP) sessions within an L2TP tunnel. L2TP does not
provide confidentiality or strong authentication by itself. IPsec is often used to secure L2TP packets
by providing confidentiality, authentication and integrity. The combination of these two protocols is
generally known as L2TP/IPsec (discussed below).
The two endpoints of an L2TP tunnel are called the LAC (L2TP Access Concentrator) and
the LNS (L2TP Network Server). The LAC is the initiator of the tunnel while the LNS is the server,
The packets exchanged within an L2TP tunnel are categorized as either control packets or
data packets. L2TP provides reliability features for the control packets, but no reliability for data
packets. Reliability, if desired, must be provided by the nested protocols running within each session
of the L2TP tunnel.
3. IPSec:
Internet Protocol Security (IPsec) is a protocol suite for securing Internet Protocol (IP)
communications by authenticating and encrypting each IP packet of a communication session. IPsec
also includes protocols for establishing mutual authentication between agents at the beginning of the
session and negotiation of cryptographic keys to be used during the session.
IPsec is an end-to-end security scheme operating in the Internet Layer of the Internet
Protocol Suite. It can be used in protecting data flows between a pair of hosts (host-to-host), between
a pair of security gateways (network-to-network), or between a security gateway and a host
(network-to-host).
Some other Internet security systems in widespread use, such as Secure Sockets Layer (SSL),
Transport Layer Security (TLS) and Secure Shell (SSH), operate in the upper layers of the TCP/IP
model. Hence, IPsec protects any application traffic across an IP network. Applications do not need
to be specifically designed to use IPsec. The use of TLS/SSL, on the other hand, must be designed
into an application to protect the application protocols.
IPsec is a successor of the ISO standard Network Layer Security Protocol (NLSP). NLSP
was based on the SP3 protocol that was published by NIST, but designed by the Secure Data
Network System project of the National Security Agency (NSA).
IPsec is officially specified by the Internet Engineering Task Force (IETF) in a series of
Request for Comment documents addressing various components and extensions. It specifies the
spelling of the protocol name to be IPsec.
The IPsec suite is an open standard. IPsec uses the following protocols to perform various
functions:
Authentication Headers (AH) : AH is designed for authenticating the source host and to ensure the
integrity of the payload carried by the IP packet. The protocol calculates a message digest using a
hashing function and a symmetric key and inserts the digest in the authentication header.
Security associations (SA)provide the bundle of algorithms and data that provide the parameters
necessary to operate the AH and/or ESP operations. The Internet Security Association and Key
Management Protocol (ISAKMP) provides a framework for authentication and key exchange, with
actual authenticated keying material provided either by manual configuration with pre-shared keys,
Internet Key Exchange (IKE and IKEv2), Kerberized Internet Negotiation of Keys (KINK), or
IPSECKEY DNS records
1. Transport Mode :
Transport mode is the default mode for IPSec, and it is used for end-to-end communications (for example,
for communications between a client and a server). When transport mode is used, IPSec encrypts only the IP
payload. Transport mode provides the protection of an IP payload through an AH or ESP header. Typical IP
payloads are TCP segments (containing a TCP header and TCP segment data), a UDP message (containing a
2. Tunnel Mode :
When IPSec tunnel mode is used, IPSec encrypts the IP header and the payload, whereas transport
mode only encrypts the IP payload. Tunnel mode provides the protection of an entire IP packet by
treating it as an AH or ESP payload. With tunnel mode, an entire IP packet is encapsulated with an
AH or ESP header and an additional IP header. The IP addresses of the outer IP header are the tunnel
endpoints, and the IP addresses of the encapsulated IP header are the ultimate source and destination
addresses.
IPSec tunnel mode is useful for protecting traffic between different networks, when traffic must pass
through an intermediate, untrusted network. Tunnel mode is primarily used for interoperability with
gateways, or end-systems that do not support L2TP/IPSec or PPTP connections. You can use tunnel
mode in the following configurations:
Gateway-to-gateway
Server-to-gateway
Server-to-server
Welcome to the world of web services. This chapter will ground you in the basics of web service
terminology and architecture. It does so by answering the most common questions, including:
A web service is any service that is available over the Internet, uses a standardized XML
messaging system, and is not tied to any one operating system or programming language.
There are several alternatives for XML messaging. For example, you could use XML
Remote Procedure Calls (XML-RPC) or SOAP, both of which are described later in this chapter.
Although they are not required, a web service may also have two additional (and desirable)
properties:
A web service should be self-describing. If you publish a new web service, you should also
publish a public interface to the service. At a minimum, your service should include human-
readable documentation so that other developers can more easily integrate your service. If you
have created a SOAP service, you should also ideally include a public interface written in a
common XML grammar. The XML grammar can be used to identify all public methods, method
arguments, and return values.
A web service should be discoverable. If you create a web service, there should be a relatively
simple mechanism for you to publish this fact. Likewise, there should be some simple
mechanism whereby interested parties can find the service and locate its public interface. The
exact mechanism could be via a completely decentralized system or a more logically centralized
registry system. To summarize, a complete web service is, therefore, any service that:
Is available over the Internet or private (intranet) networks
Uses a standardized XML messaging system
Is not tied to any one operating system or programming language
Is self-describing via a common XML grammar
Is discoverable via a simple find mechanism
To make web services more concrete, consider basic e-commerce functionality. For example,
Widgets, Inc. sells parts through its web site, enabling customers to submit purchase orders and
check on order status. To check on the order status, a customer logs into the company web site via a
web browser
With web services, we move from a human-centric Web to an application-centric Web. This
does not mean that humans are entirely out the picture! It just means that conversations can take
place directly between applications as easily as between web browsers and servers. For example, we
can turn the order status application into a web service. Applications and agents can then connect to
the service and utilize its functionality directly. For example, an inventory application can query
Widgets, Inc. on the status of all orders. The inventory system can then process the data, manipulate
it, and integrate it into its overall supply chain management software.
There are numerous areas where an application-centric Web could prove extremely helpful.
Examples include credit card verification, package tracking, portfolio tracking, shopping bots,
currency conversion, and language translation. Other options include centralized repositories for
personal information, such as Microsoft's proposed .NET MyServices project. .NET MyServices
aims to centralize calendar, email, and credit card information and to provide web services for
sharing that data.
An application-centric Web is not a new notion. For years, developers have created CGI
programs and Java servlets designed primarily for use by other applications. For example,
companies have developed credit card services, search systems, and news retrieval systems. The
crucial difference is that most of these systems consisted of ad hoc solutions. With web services, we
have the promise of some standardization, which should hopefully lower the barrier to application
integration
Rather than focusing on one particular implementation or framework, this book focuses on
common definitions and technologies. Hopefully, this will better equip you to cut through the
marketing hype and understand and evaluate the current contenders.
Styles of use
Web services are a set of tools that can be used in a number of ways. The three most common
styles of use are RPC, SOAP and REST.
RPC Web services present a distributed function (or method) call interface that is familiar to
many developers. Typically, the basic unit of RPC Web services is the WSDL operation.
The first Web services tools were focused on RPC, and as a result this style is widely
deployed and supported. However, it is sometimes criticized for not being loosely coupled, because
it was often implemented by mapping services directly to language-specific functions or method
calls. Many vendors felt this approach to be a dead end, and pushed for RPC to be disallowed in the
WS-I Basic Profile.
Other approaches with nearly the same functionality as RPC are Object Management Group's
(OMG) Common Object Request Broker Architecture (CORBA), Microsoft's Distributed
Component Object Model (DCOM) or Sun Microsystems's Java/Remote Method Invocation (RMI).
Service-Oriented Architecture
SOA Web services are supported by most major software vendors and industry analysts.
Unlike RPC Web services, loose coupling is more likely, because the focus is on the "contract" that
WSDL provides, rather than the underlying implementation details.
REST attempts to describe architectures that use HTTP or similar protocols by constraining
the interface to a set of well-known, standard operations (like GET, POST, PUT, DELETE for
HTTP). Here, the focus is on interacting with stateful resources, rather than messages or operations.
An architecture based on REST (one that is 'RESTful') can use WSDL to describe SOAP
messaging over HTTP, can be implemented as an abstraction purely on top of SOAP (e.g., WS-
Transfer), or can be created without using SOAP at all.
WSDL version 2.0 offers support for binding to all the HTTP request methods (not only GET
and POST as in version 1.1) so it enables a better implementation of RESTful Web services.[5]
However, support for this specification is still poor in software development kits, which often offer
tools only for WSDL 1.
CloudFront will likely prove to be a critical component to the fast, worldwide distribution of static content.
Google App Engine lets you run your web applications on Google's infrastructure. App Engine
applications are easy to build, easy to maintain, and easy to scale as your traffic and data storage
needs grow. With App Engine, there are no servers to maintain: You just upload your application,
and it's ready to serve your users.
You can serve your app from your own domain name (such as http://www.example.com/) using
Google Apps. Or, you can serve your app using a free name on the appspot.com domain. You can
share your application with the world, or limit access to members of your organization.
Google App Engine supports apps written in several programming languages. With App Engine's
Java runtime environment, you can build your app using standard Java technologies, including the
JVM, Java servlets, and the Java programming language—or any other language using a JVM-based
interpreter or compiler, such as JavaScript or Ruby. App Engine also features a dedicated Python
runtime environment, which includes a fast Python interpreter and the Python standard library. The
Java and Python runtime environments are built to ensure that your application runs quickly,
securely, and without interference from other apps on the system.
With App Engine, you only pay for what you use. There are no set-up costs and no recurring fees.
The resources your application uses, such as storage and bandwidth, are measured by the gigabyte,
App Engine costs nothing to get started. All applications can use up to 500 MB of storage and
enough CPU and bandwidth to support an efficient app serving around 5 million page views a
month, absolutely free. When you enable billing for your application, your free limits are raised, and
you only pay for resources you use above the free levels.
Google App Engine makes it easy to build an application that runs reliably, even under heavy load
and with large amounts of data. App Engine includes the following features:
dynamic web serving, with full support for common web technologies
persistent storage with queries, sorting and transactions
automatic scaling and load balancing
APIs for authenticating users and sending email using Google Accounts
a fully featured local development environment that simulates Google App Engine on your computer
task queues for performing work outside of the scope of a web request
scheduled tasks for triggering events at specified times and regular intervals
Your application can run in one of two runtime environments: the Java environment, and the Python
environment. Each environment provides standard protocols and common technologies for web
application development.
The Sandbox
Applications run in a secure environment that provides limited access to the underlying operating
system. These limitations allow App Engine to distribute web requests for the application across
multiple servers, and start and stop servers to meet traffic demands. The sandbox isolates your
application in its own secure, reliable environment that is independent of the hardware, operating
system and physical location of the web server.
An application can only access other computers on the Internet through the provided URL fetch and
email services. Other computers can only connect to the application by making HTTP (or HTTPS)
requests on the standard ports.
An application cannot write to the file system. An app can read files, but only files uploaded with the
application code. The app must use the App Engine datastore, memcache or other services for all data
that persists between requests.
Application code only runs in response to a web request, a queued task, or a scheduled task, and must
return response data within 30 seconds in any case. A request handler cannot spawn a sub-process or
execute code after the response has been sent.
The Java runtime environment uses Java 6. The App Engine Java SDK supports developing apps
using either Java 5 or 6.
The environment includes the Java SE Runtime Environment (JRE) 6 platform and libraries. The
restrictions of the sandbox environment are implemented in the JVM. An app can use any JVM
bytecode or library feature, as long as it does not exceed the sandbox restrictions. For instance,
bytecode that attempts to open a socket or write to a file will throw a runtime exception.
Your app accesses most App Engine services using Java standard APIs. For the App Engine
datastore, the Java SDK includes implementations of the Java Data Objects (JDO) and Java
Persistence API (JPA) interfaces. Your app can use the JavaMail API to send email messages with
the App Engine Mail service. The java.net HTTP APIs access the App Engine URL fetch service.
App Engine also includes low-level APIs for its services to implement additional adapters, or to use
directly from the application. See the documentation for the datastore, memcache, URL fetch, mail,
images and Google Accounts APIs.
Typically, Java developers use the Java programming language and APIs to implement web
applications for the JVM. With the use of JVM-compatible compilers or interpreters, you can also
use other languages to develop web applications, such as JavaScript, Ruby, or Scala.
For more information about the Java runtime environment, see The Java Runtime Environment.
With App Engine's Python runtime environment, you can implement your app using the Python
programming language, and run it on an optimized Python interpreter. App Engine includes rich
APIs and tools for Python web application development, including a feature rich data modeling API,
an easy-to-use web application framework, and tools for managing and accessing your app's data.
You can also take advantage of a wide variety of mature libraries and frameworks for Python web
application development, such as Django.
The Python runtime environment uses Python version 2.5.2. Additional support for Python 3 is being
considered for a future release.
The Python environment includes the Python standard library. Of course, not all of the library's
features can run in the sandbox environment. For instance, a call to a method that attempts to open a
socket or write to a file will raise an exception. For convenience, several modules in the standard
library whose core features are not supported by the runtime environment have been disabled, and
code that imports them will raise an error.
Application code written for the Python environment must be written exclusively in Python.
Extensions written in the C language are not supported.
You can upload other third-party libraries with your application, as long as they are implemented in
pure Python and do not require any unsupported standard library modules.
For more information about the Python runtime environment, see The Python Runtime Environment.
The Datastore
App Engine provides a distributed data storage service that features a query engine and transactions.
Just as the distributed web server grows with your traffic, the distributed datastore grows with your
data. You have the choice between two different data storage options differentiated by their
availability and consistency guarantees.
The App Engine datastore is not like a traditional relational database. Data objects, or "entities,"
have a kind and a set of properties. Queries can retrieve entities of a given kind filtered and sorted by
the values of the properties. Property values can be of any of the supported property value types.
Datastore entities are "schemaless." The structure of data entities is provided by and enforced by
your application code. The Java JDO/JPA interfaces and the Python datastore interface include
features for applying and enforcing structure within your app. Your app can also access the datastore
directly to apply as much or as little structure as it needs.
The datastore is strongly consistent and uses optimistic concurrency control. An update of a entity
occurs in a transaction that is retried a fixed number of times if other processes are trying to update
the same entity simultaneously. Your application can execute multiple datastore operations in a
single transaction which either all succeed or all fail, ensuring the integrity of your data.
The datastore implements transactions across its distributed network using "entity groups." A
transaction manipulates entities within a single group. Entities of the same group are stored together
for efficient execution of transactions. Your application can assign entities to groups when the
entities are created.
Google Accounts
App Engine supports integrating an app with Google Accounts for user authentication. Your
application can allow a user to sign in with a Google account, and access the email address and
displayable name associated with the account. Using Google Accounts lets the user start using your
application faster, because the user may not need to create a new account. It also saves you the effort
of implementing a user account system just for your application.
If your application is running under Google Apps, it can use the same features with members of your
organization and Google Apps accounts.
The Users API can also tell the application whether the current user is a registered administrator for
the application. This makes it easy to implement admin-only areas of your site.
App Engine provides a variety of services that enable you to perform common operations when
managing your application. The following APIs are provided to access these services:
URL Fetch
Applications can access resources on the Internet, such as web services or other data, using App
Engine's URL fetch service. The URL fetch service retrieves web resources using the same high-
speed Google infrastructure that retrieves web pages for many other Google products.
Applications can send email messages using App Engine's mail service. The mail service uses
Google infrastructure to send email messages.
Memcache
The Memcache service provides your application with a high performance in-memory key-value
cache that is accessible by multiple instances of your application. Memcache is useful for data that
does not need the persistence and transactional features of the datastore, such as temporary data or
data copied from the datastore to the cache for high speed access.
Image Manipulation
The Image service lets your application manipulate images. With this API, you can resize, crop,
rotate and flip images in JPEG and PNG formats.
An application can perform tasks outside of responding to web requests. Your application can
perform these tasks on a schedule that you configure, such as on a daily or hourly basis. Or, the
application can perform tasks added to a queue by the application itself, such as a background task
created while handling a request.
Scheduled tasks are also known as "cron jobs," handled by the Cron service. For more information
on using the Cron service, see the Python or Java cron documentation.
Task queues are currently released as an experimental feature. At this time, only the Python runtime
environment can use task queues.
AMAZON CLOUD
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides resizable compute
capacity in the cloud. It is designed to make web-scale computing easier for developers.
Amazon EC2’s simple web service interface allows you to obtain and configure capacity with
minimal friction. It provides you with complete control of your computing resources and lets you run
on Amazon’s proven computing environment. Amazon EC2 reduces the time required to obtain and
boot new server instances to minutes, allowing you to quickly scale capacity, both up and down, as
your computing requirements change. Amazon EC2 changes the economics of computing by
allowing you to pay only for capacity that you actually use. Amazon EC2 provides developers the
tools to build failure resilient applications and isolate themselves from common failure scenarios.
Amazon EC2 presents a true virtual computing environment, allowing you to use web service
interfaces to launch instances with a variety of operating systems, load them with your custom
application environment, manage your network’s access permissions, and run your image using as
many or few systems as you desire.
Select a pre-configured, templated image to get up and running immediately. Or create an Amazon
Machine Image (AMI) containing your applications, libraries, data, and associated configuration
settings.
Configure security and network access on your Amazon EC2 instance.
Choose which instance type(s) and operating system you want, then start, terminate, and monitor as
many instances of your AMI as needed, using the web service APIs or the variety of management
tools provided.
Determine whether you want to run in multiple locations, utilize static IP endpoints, or attach
persistent block storage to your instances.
Pay only for the resources that you actually consume, like instance-hours or data transfer.
Service Highlights
Elastic – Amazon EC2 enables you to increase or decrease capacity within minutes, not hours or
days. You can commission one, hundreds or even thousands of server instances simultaneously. Of
course, because this is all controlled with web service APIs, your application can automatically scale
itself up and down depending on its needs.
Completely Controlled – You have complete control of your instances. You have root access to each
one, and you can interact with them as you would any machine. You can stop your instance while
retaining the data on your boot partition and then subsequently restart the same instance using web
service APIs. Instances can be rebooted remotely using web service APIs. You also have access to
console output of your instances.
Designed for use with other Amazon Web Services – Amazon EC2 works in conjunction with
Amazon Simple Storage Service (Amazon S3), Amazon Relational Database Service (Amazon
RDS), Amazon SimpleDB and Amazon Simple Queue Service (Amazon SQS) to provide a complete
solution for computing, query processing and storage across a wide range of applications.
Reliable – Amazon EC2 offers a highly reliable environment where replacement instances can be
rapidly and predictably commissioned. The service runs within Amazon’s proven network
infrastructure and datacenters. The Amazon EC2 Service Level Agreement commitment is 99.95%
availability for each Amazon EC2 Region.
Secure – Amazon EC2 provides numerous mechanisms for securing your compute resources.
Amazon EC2 includes web service interfaces to configure firewall settings that control
network access to and between groups of instances.
When launching Amazon EC2 resources within Amazon Virtual Private Cloud (Amazon
VPC), you can isolate your compute instances by specifying the IP range you wish to use,
and connect to your existing IT infrastructure using industry-standard encrypted IPsec VPN.
You can also choose to launch Dedicated Instances into your VPC. Dedicated Instances are
Amazon EC2 Instances that run on hardware dedicated to a single customer for additional
isolation.
Inexpensive – Amazon EC2 passes on to you the financial benefits of Amazon’s scale. You pay a
very low rate for the compute capacity you actually consume.
On-Demand Instances – On-Demand Instances let you pay for compute capacity by the hour
with no long-term commitments. This frees you from the costs and complexities of planning,
purchasing, and maintaining hardware and transforms what are commonly large fixed costs
into much smaller variable costs. On-Demand Instances also remove the need to buy “safety
net” capacity to handle periodic traffic spikes.
Reserved Instances – Reserved Instances give you the option to make a low, one-time
payment for each instance you want to reserve and in turn receive a significant discount on
the hourly usage charge for that instance. After the one-time payment for an instance, that
instance is reserved for you, and you have no further obligation; you may choose to run that
instance for the discounted usage rate for the duration of your term, or when you do not use
the instance, you will not pay usage charges on it.
Spot Instances – Spot Instances allow customers to bid on unused Amazon EC2 capacity and
run those instances for as long as their bid exceeds the current Spot Price. The Spot Price
changes periodically based on supply and demand, and customers whose bids meet or exceed
it gain access to the available Spot Instances. If you have flexibility in when your
applications can run, Spot Instances can significantly lower your Amazon EC2 costs. See
here for more details on Spot Instances.
Amazon EC2 provides a number of powerful features for building scalable, failure resilient,
enterprise class applications, including:
Amazon Elastic Block Store – Amazon Elastic Block Store (EBS) offers persistent storage
for Amazon EC2 instances. Amazon EBS volumes provide off-instance storage that persists
independently from the life of an instance. Amazon EBS volumes are highly available,
highly reliable volumes that can be leveraged as an Amazon EC2 instance’s boot partition or
attached to a running Amazon EC2 instance as a standard block device. When used as a boot
partition, Amazon EC2 instances can be stopped and subsequently restarted, enabling you to
only pay for the storage resources used while maintaining your instance’s state. Amazon EBS
volumes offer greatly improved durability over local Amazon EC2 instance stores, as
Amazon EBS volumes are automatically replicated on the backend (in a single Availability
Zone). For those wanting even more durability, Amazon EBS provides the ability to create
point-in-time consistent snapshots of your volumes that are then stored in Amazon S3, and
automatically replicated across multiple Availability Zones. These snapshots can be used as
the starting point for new Amazon EBS volumes, and can protect your data for long term
durability. You can also easily share these snapshots with co-workers and other AWS
developers. See Amazon Elastic Block Store for more details on this feature.
Multiple Locations – Amazon EC2 provides the ability to place instances in multiple
locations. Amazon EC2 locations are composed of Regions and Availability Zones.
Availability Zones are distinct locations that are engineered to be insulated from failures in
other Availability Zones and provide inexpensive, low latency network connectivity to other
Availability Zones in the same Region. By launching instances in separate Availability
Zones, you can protect your applications from failure of a single location. Regions consist of
one or more Availability Zones, are geographically dispersed, and will be in separate
geographic areas or countries. The Amazon EC2 Service Level Agreement commitment is
99.95% availability for each Amazon EC2 Region. Amazon EC2 is currently available in
five regions: US East (Northern Virginia), US West (Northern California), EU (Ireland), Asia
Pacific (Singapore), and Asia Pacific (Tokyo).
Elastic IP Addresses – Elastic IP addresses are static IP addresses designed for dynamic
cloud computing. An Elastic IP address is associated with your account not a particular
instance, and you control that address until you choose to explicitly release it. Unlike
traditional static IP addresses, however, Elastic IP addresses allow you to mask instance or
Availability Zone failures by programmatically remapping your public IP addresses to any
instance in your account. Rather than waiting on a data technician to reconfigure or replace
your host, or waiting for DNS to propagate to all of your customers, Amazon EC2 enables
you to engineer around problems with your instance or software by quickly remapping your
Elastic IP address to a replacement instance. In addition, you can optionally configure the
reverse DNS record of any of your Elastic IP addresses by filling out this form.
Amazon Virtual Private Cloud – Amazon VPC is a secure and seamless bridge between a
company’s existing IT infrastructure and the AWS cloud. Amazon VPC enables enterprises
Amazon CloudWatch – Amazon CloudWatch is a web service that provides monitoring for
AWS cloud resources, starting with Amazon EC2. It provides you with visibility into
resource utilization, operational performance, and overall demand patterns—including
metrics such as CPU utilization, disk reads and writes, and network traffic. You can get
statistics, view graphs, and set alarms for your metric data. To use Amazon CloudWatch,
simply select the Amazon EC2 instances that you’d like to monitor; within minutes, Amazon
CloudWatch will begin aggregating and storing monitoring data that can be accessed using
web service APIs or Command Line Tools. See Amazon CloudWatch for more details.
Auto Scaling – Auto Scaling allows you to automatically scale your Amazon EC2 capacity
up or down according to conditions you define. With Auto Scaling, you can ensure that the
number of Amazon EC2 instances you’re using scales up seamlessly during demand spikes
to maintain performance, and scales down automatically during demand lulls to minimize
costs. Auto Scaling is particularly well suited for applications that experience hourly, daily,
or weekly variability in usage. Auto Scaling is enabled by Amazon CloudWatch and
available at no additional charge beyond Amazon CloudWatch fees. See Auto Scaling for
more details.
BitTorrent is a protocol that enables fast downloading of large files using minimum Internet
bandwidth. It sscosts nothing to use and includes no spyware or pop-up advertising.
Unlike other download methods, BitTorrent maximizes transfer speed by gathering pieces of the file
you want and downloading these pieces simultaneously from people who already have them. This
process makes popular and very large files, such as videos and television programs, download much
faster than is possible with other protocols.
To know why BitTorrent downloading is different from the regular downloading i’ll have to explain
how the traditional client-server downloading works:
You open a Web page and click a link to download a file to your computer.
The Web browser software on your computer (the client) tells the server (a central computer that holds the
Web page and the file you want to download) to transfer a copy of the file to your computer.
The transfer is handled by a protocol (a set of rules), such as FTP (File Transfer Protocol) or HTTP
(HyperText Transfer Protocol).
But the BitTorrent follows a different method of sharing files known as Peer-Peer Sharing
Peer-to-peer file sharing is different from traditional file downloading. In peer-to-peer sharing, you
use asoftware program (rather than your Web browser) to locate computers that have the file you
want. Because these are ordinary computers like yours, as opposed to servers, they are called peers.
The process works like this:
You run peer-to-peer file-sharing software (for example, a Gnutella program) on your
computer and send out a request for the file you want to download.
To locate the file, the software queries other computers that are connected to the Internet and
running the file-sharing software.
When the software finds a computer that has the file you want on its hard drive, the
download begins.
Others using the file-sharing software can obtain files they want from your computer’s hard
drive.
Unlike some other peer-to-peer downloading methods, BitTorrent is a protocol that offloads some of
the file tracking work to a central server (called a tracker). Another difference is that it uses a
principal called tit-for-tat. This means that in order to receive files, you have to give them. This
solves the problem of leeching — one of developer Bram Cohen‘s primary goals. With BitTorrent,
the more files you share with others, the faster your downloads are. Finally, to make better use of
available Internet bandwidth (the pipeline for data transmission), BitTorrent downloads different
pieces of the file you want simultaneously from multiple computers.
Downloading pieces of the file at the same time helps solve a common problem with other peer-to-
peer download methods: Peers upload at a much slower rate than they download. By downloading
multiple pieces at the same time, the overall speed is greatly improved. The more computers
involved in the swarm, the faster the file transfer occurs because there are more sources of each
piece of the file. For this reason, BitTorrent is especially useful for large, popular files.
With BitTorrent, the original file remains intact, the download speeds are amazing, and of course,
everything is available at your fingertips for free. Yes, these are among the myriad of reasons why it
is popular today. Instead of having to purchase a CD or DVD, downloads on BitTorrent are available
for free.
One of the main advantages of BitTorrent is that you can sample content prior to purchasing it. This
is great for both artists and users as well, who can end up buying albums if they like them. In this
way, movies and software that live up to expectations can be tested and then bought.
Many television shows, movies, and rare music may not be available in the market. However, you
are sure to find it on a BitTorrent. There may be TV shows that you may not have in your country
yet or songs that you can buy easily online.
Bittorrent is becoming popular, and many software publishers today include torrents in their
downloads section. This is because they have realized the convenience with which files can be
downloaded quickly with BitTorrent. This is of course in addition to the increasing number of
dedicated sites. You can expect to download a game much faster with BitTorrent instead of HTTP.
This entry was posted on Saturday, February 6th, 2010 at 11:11 am and is filed under Software. You
can follow any responses to this entry through the RSS 2.0 feed. Responses are currently closed, but
you can trackback from your own site.