Sei sulla pagina 1di 47

Introduction

Fekade Getahun fekadegetahun@gmail.com

Web

The World-Wide Web (WWW) is a pair of software applications, which allow both distribution of and access to information on the Internet. The web is not the Internet but a means of distributing and accessing the information that is on it.
"E-Commerce" (Electronic Commerce or EC) is the buying and selling of goods and services on the Internet, especially on the World-Wide Web

Amazon.com

Web overview (Application level)

Components of the Web

URLs/URIs HTTP HTML

Dynamic content and interaction

PHP, python, Java, .Net

HTML/XHTML for display

HTTP for transport

HTTP Server
Clients (browsers)
URL/URI for addressing PHP, python etc. for interaction

URIs

URI: Uniform Resource Identifier

Universal naming mechanism for identifying resources on the Web A resource is anything to which we can attach identity (Web page, image, anchor in page, database record, etc.) Web is an information space, URIs are handles Unique Web naming/addressing technology (HTML/HTTP: not the only data format/Web protocol) Subset of URIs for some existing Internet protocols (http, ftp, mailto, etc.) No longer used in specifications

URL: Uniform Resource Locator


URI syntax

Human-readable form of request: URI


<scheme>:<scheme-dependent-information>

Scheme: tells the application the type of the resource and the mechanisms to use to access it

Example: http, ftp, news, mailto, telnet, file Recent examples: Azureus magnet link, Skype call link,

Most schemes need additional information for locating resources

Example: http://www.hilcoe.com.et/index.html ftp://www.hilcoe.com/webtec/ass-123.zip mailto:info@hilcoe.com.et

HTTP

The Hypertext Transfer Protocol is the set of rules for exchanging files (text, graphic images, sound, video, and other multimedia files) on the World Wide Web.

URIs for HTTP

Syntax
http://<host>:<port>/<path>?<searchpart> http://<host>:<port>/<path>#<fragment> IP port is optional (80 by default)

Hierarchical paths (elements separated by '/')

If path is empty, the system "home page" is returned Path and search part are interpreted by server

Search part (optional) is used to pass information to server


http://hilcoe.com.et/registration.htm?term=spring&year=2010

Fragment corresponds to named anchor in HTML document

Fragment is not sent to server (used by client for display)

http://hilcoe.com.et/registration.htm#msc

URIs for HTTP (cont'd)

Reserved symbols have a special meaning in URIs


; / ? : @ = &

Unsafe symbols may have a special meaning in context (to avoid)


< > " # % { } | \ ^ ~ [ ] `

Special characters represented by '%' (escape) and 2 hex digits Example: %20 (space), %25 (%), %26 (&), %2D (-), %2F (/), %3D (=), %3F (?), etc.

HTTP

HTTP is a method of transmitting the information on the web.

HTTP basically publishes and retrieves the HTTP pages on the World Wide Web. HTTP is a language that is used to communicate between the browser and web server. The information that is transferred using HTTP can be plain text, audio, video, images, and hypertext. Many proxies, tunnels, and gateways can be existing between the web browser (client) and server (web server). An HTTP client initializes a request by establishing a TCP connection to a particular port on the remote host (typically 80 or 8080). An HTTP server listens to that port and receives a request message from the client. Upon receiving the request, server sends back 200 OK messages, its own message, an error

Protocols Involved in HTTP

Client HTTP message

Web server

HTTP

HTTP

TCP segment TCP Router Router TCP

IP packet IP IP

IP packet IP

IP packet IP

Ethernet interface

Ethernet interface

SONET interface

SONET interface

Ethernet interface

Ethernet interface

Ethernet

SONET link

Ethernet

10

HTTP in Context

Major steps in a "browser process"

http://origin/..

DNS query

DNS server

Client

Establish TCP connection

Origin server HTTP transaction

HTTP request HTTP response Optional parallel connections

HTML
11

HTTP Transactions

An HTTP transaction is a request/reply interaction between a Web client (e.g., browser) and a web server, using HTTP
GET / HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: www.hilcoe.com.et Connection: Keep-Alive

Client

Origin server

HTTP/1.1 200 OK Date: Mon, 15 Jul 2002 08:49:00 GMT Server: Apache/1.3.26 (Unix) PHP/4.2.1 Last-Modified: Wed, 12 Jun 2002 08:49:49 GMT ETag: "2a-50ea-3d070b2d" Accept-Ranges: bytes Content-Length: 20714 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html <html> ...

HTML
12

HTTP Transactions (Cont.)

Complex web pages made of multiple objects

1 Object for the page skeleton [n] objects for each page element (graphics, )

HTTP / TCP interaction to retrieve objects:


TCP SYN TCP SYN, ACK

Client

TCP ACK HTTP Request HTTP Response TCP FIN

Origin server

TCP FIN ACK

Communication Overhead !! For each object: 3 TCP messages (3-way handshake) 2 HTTP messages 2 TCP messages (Connection Close)

13

HTTP 1.1 - Persistence

Eliminates the problem of establishing multiple TCP connections Allows a CLIENT to re-use existing TCP connection after initial request
TCP SYN TCP SYN, ACK TCP ACK

Client

HTTP Request 1 HTTP Response 1 HTTP Request 2 HTTP Response 2 TCP FIN TCP FIN ACK

Origin server

Communication Overhead For the first object: 3 TCP messages (3-way handshake) 2 HTTP messages 2 TCP messages (Connection Close) Subsequent objects: 2 HTTP messages

14

Non-persistent, persistent connection

Non-persistent

Persistent

http 1.0: server parses request, responds, closes TCP connection 2 RTTs (Round Trip Time) to fetch object

TCP connection Object request/transfer

Each transfer suffers from TCPs initially slow sending rate Many browsers open multiple parallel connections

Default for http 1.1 On same TCP connection: server, parses request, responds, parses new request, Client sends request for all referenced object as soon as it receives base HTMLL Fewer RTTS, less slow start

HTTP 1.1 - Pipelining

CLIENT does not have to wait for a response to one request before issuing a new request on the same TCP connection
TCP SYN TCP SYN, ACK TCP ACK

Client

HTTP Request 1 HTTP Request 2 HTTP Response 1 HTTP Response 2 TCP FIN TCP FIN ACK

Origin server

16

HTTP 1.1 Pipelining (Cont)

Restrictions

CLIENTS should not pipeline until they are sure the connection is persistent HTTP responses must be returned in the same order as the requests CLIENTS should not pipeline requests that have side effects On error, pipelining prevents clients from knowing which of a series of pipelined requests were executed by the server

17

HTTP Parallel connections


Recall: a browser could naively process each embedded object serially HTTP allows clients to open multiple connections and perform multiple HTTP transactions in parallel Properties / drawbacks

Parallel Conn. May Make Pages Load Faster Connection delays can be overlapped if client BW is not saturated Parallel Conn. Are Not Always Faster If client BW is scarce, it is better to transfer as fast as possible each object Parallel Conn. May "Feel" Faster Human perception of seeing multiple objects gradually appearing on the screen

18

HTTP Transactions (Recap.)


HTTP transaction: initiated by Client Requests/Replies have special format

Client

GET / HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: www.hilcoe.com.et Connection: Keep-Alive

Origin server

HTTP/1.1 200 OK Date: Mon, 15 Jul 2002 08:49:00 GMT Server: Apache/1.3.26 (Unix) PHP/4.2.1 Last-Modified: Wed, 12 Jun 2002 08:49:49 GMT ETag: "2a-50ea-3d070b2d" Accept-Ranges: bytes Content-Length: 20714 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Content-Type: text/html <html> ...

HTML
19

HTTP Requests

HTTP request is ASCII text (human-readable)


Request line (method, URI, HTTP version) Header lines <CR> indicates end of message Optional payload
GET /index.htm HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: www.hilcoe.com.et Connection: Keep-Alive <CR>

GET /index.htm HTTP/1.1 request "/index.htm" using HTTP version 1.1 Accept: types of documents accepted by browser Accept-Language: preferred language is English Accept-Encoding: browser understands compressed documents User-Agent: identification of browser (real type is IE 5.01) Host: what the client thinks the server host is Connection: keep TCP connection open until explicitly disconnected

20

http request message: general format

method

sp

URL :

sp value

version cr lf

cr

lf

Request line

Header field name

Header lines
Header field name cr lf Entity Body : value cr lf

More on HTTP Requests

Structure of a client request


First line: tells the server the method to use, the entity (document) to apply it to, and the client's version of HTTP General header: used in client and server messages Request header: tell more information about the client Entity header/body: used when an entity is sent by the client

Method URI HTTP-ver. General-header Request-header Entity-header Entity-body 22

POST /cgi-bin/query HTTP/1.0 Connection: Keep-Alive Host: www.hilcoe.com.et User-Agent: Mozilla/4.0 Content-type: application-www-form-urlencoded Content-length: 23 query=knuth&type=author

Client Methods

GET
Retrieve a resource from the server (static file, or dynamically-generated data)

DELETE
Remove a resource from the server

OPTIONS (HTTP 1.1)


Request other options for an URI (methods understood by a server or allowed for a given URI)

HEAD
Get information about a resource (but not the actual resource)

POST
Client provides some information to the server, e.g., through forms (may update the state of the server)

TRACE (HTTP 1.1)


Ask proxies to declare themselves in the headers (used for debugging)

CONNECT (HTTP 1.1)


Used for HTTPS (secure HTTP) through a proxy

PUT
Provide a new or replacement resource to put on the server
23

HTTP Replies

Sample reply returned to client


HTTP/1.1 200 OK Date: Mon, 15 Jul 2002 08:49:00 GMT Server: Apache/1.3.26 (Unix) PHP/4.2.1 Last-Modified: Wed, 12 Jun 2002 08:49:49 GMT ETag: "2a-50ea-3d070b2d" Accept-Ranges: bytes Content-Length: 20714 Connection: close Content-Type: text/html <CR> Requested <html>...

Status line (Protocol status code)


Header lines

html file

HTTP/1.1 200 OK document found (code 200); server is using HTTP 1.1 Date: current date at the server Server: software run by the server Last-Modified: most recent modification of the document ETag: entity tag (unique identifier for the server resource, usable for caching) Accept-Ranges: server can return subsections of a document Content-Length: length of the body (which follows the header) in bytes Connection: the connection will close after the server's response Content-Type: what kind of document is included in the response <html>... document text (follows blank line)

24

More on HTTP Replies

Structure of a server response

First line: tells the client the server's version of HTTP, the status code, and a human-readable description of the status General header: used in client and server messages Response header: tell more information about the server Entity header/body: response sent to the client
HTTP/1.0 200 OK Date: Mon, 15 Jul 2002 08:49:00 GMT Server: Apache/1.3.26 (Unix) PHP/4.2.1 Content-type: text/html Content-Length: 20714 <html>...

HTTP-ver. Status Reason General-header Response-header Entity-header Entity-body 25

Server Response Codes

100 range: Informational

100 Continue 101: Switching protocols 200: OK 201: Created 204: No content ... 301: Moved permanently 305: Use proxy ... 400: Bad request 401: Unauthorized ... 500: Internal error 501: Not implemented ...

200 range: Client request successful

300 range: Redirection

400 range: Client request incomplete

500 range: Server errors

26

Resource Retrieval

GET Method
GET /index.htm HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: www.hilcoe.com.et Connection: Keep-Alive <CR>

Reply
HTTP/1.1 200 OK Date: Mon, 15 Jul 2002 08:49:00 GMT Server: Apache/1.3.26 (Unix) PHP/4.2.1 Last-Modified: Wed, 12 Jun 2002 08:49:49 GMT ETag: "2a-50ea-3d070b2d" Accept-Ranges: bytes Content-Length: 20714 Connection: close Content-Type: text/html <CR> <html>...

27

Sending Data to Server

POST Method
POST /cgi-bin/query HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: www.hilcoe.com.et Content-type: application-www-form-urlencoded Content-length: 23 <CR> query=knuth&type=author

GET Method (arguments encoded in URI)


GET /cgi-bin/query?query=knuth&type=author HTTP/1.1 Accept: image/gif, image/x-xbitmap, image/ jpeg, image/pjpeg, */* Accept-Language: en-us Accept-Encoding: gzip, deflate User-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: www.hilcoe.com.et <CR>

Reply as before

28

HTTP Proxying

Proxy forwards requests on behalf of clients

Proxy servers are both Web servers and Web clients Can be nested, can be used as firewall, cache, anonymizer, etc. Proxy may modify requests/replies (e.g., change image formats)

GET http://origin/..

Proxy server 1
GET http://origin/.. Via: proxy1

Proxy server 2

Origin server

Client
GET http://origin/.. Via: proxy1,proxy2

200 OK 200 OK Via: proxy2 200 OK Via: proxy2,proxy1

29

HTTP Proxy Deployment

Managed by clients organization

Managed by ISPs

Managed by web service provider (accelerators)

Managed by network companies to alleviate network congestion

30

HTTP Proxy: Internet Filter

31

HTTP Proxy: Server Access Control

32

Firewall

A firewall is a program, usually an Internet gateway server, that protects the resources of one network from users from other networks. Enterprise want a firewall to prevent outsiders from accessing its own private data resources. There are a number of firewall screening methods.

screen requests to make sure they come from acceptable domain names and IP addresses. not allow Telnet access into your network except for your own users.

HTTP Proxy: Security Firewall

34

HTTP Proxy: Web Cache

35

HTTP Caching

Cache data close to users

Improves Web performance, reduces load on server Cache control directives in HTTP header (no cache, age, etc.)

GET http://origin/..

Cache server
GET http://origin/..

Origin server

Client 1

200 OK

200 OK

Client 2

GET http://origin/..

Deployment issues: How to best place caches? How many caches to use? How to dimension cache? How long to cache?

200 OK

36

HTTP Caching (Contd.)

Caches reduce redundant data transfers

Saves you money in network charges

Caches reduce network bottlenecks

Pages load faster without more bandwidth

Caches reduce demand on origin servers

Servers reply faster and avoid overload

Caches reduce distance delays

37

Indeed, pages load slower from farther away

HTTP Caching: GET request flowchart

38

HTTP Client Authentication

Simple username/password security mechanism

Basic scheme: username:password base-64 encoded echo -n user:password" | openssl base64 echo "c2NvdHQ6dGlnZXI=" | openssl base64 -d

GET /private/ HTTP/1.1

Client
Username: joe Password: ********

HTTP/1.1 401 Unauthorized WWW-Authenticate: Basic realm="secret"

Origin server

GET /private/ HTTP/1.1 Authorized: Basic SHY3GH7D3SH==

HTTP/1.1 200 OK ...

HTML
39

HTTP Client Authentication

Digest scheme (RFC 2617)

WWW-Authenticate header of the server's initial 401 response contains a nonce value

Unique, opaque, time-limited, previously unused value


A1 = username + ":" + realm + ":" + password A2 = method + ":" + uri digest = MD5(MD5(A1) + ":" + nonce + ":" + MD5(A2))

Browser computes digest as:

Attacker cannot get password (even hashed)

Even so, only access to one realm compromised


Authorized: Digest ...

Digest is embedded in subsequent client requests

40

Cookies

Allow Web servers to store state at client


Key/value pairs Mostly used for managing sessions (HTTP is stateless)

Client
GET http://origin/.. 200 OK Set-Cookie: ABC=XYZ

Server can store session ID or actual state in cookie

Origin server

Cookies are scoped by a site or domain Server can specify desired expiration date Client can reject cookies, limit their size/duration, etc.

GET http://origin/.. Cookie: ABC=XYZ 200 OK

Use later

for authentication remembering user preferences, previous choices

41

Three layers

Presentation Business Data access Presentation layer refers to UI that communicate with the business layer The business layer contains set of methods that validate user input condition before calling a method from data layer It also insure that the output is correct. The validation of input is called business rules. The business rule is not only restricted to data validation, it can apply also to any calculations

Architecture

Single Tier 2 tier 3 tier N tier

1 tier

Main frame All processing in a single computer All resources attached to the same computer Access via dumb terminals

Advantage

Simple Efficient Uncomplicated

But costs of central machine was very high

2 tier

The personal computer Client server model Logical system components are mostly on the client (UI, data access, and business rules), the server contains the data layer Drawback:

Connections are very expensive It is not scalable Cost-ineffetive

3 tier

It is client/server model but from a web server The client only display the GUI and data but has no part in producing results Application layer: user interface, business rules and data access Data layer

3 tier (Cont )

Benefit:

Scalability

The application servers can be deployed on may machines The database no longer requires a connection from every client rather from application servers

Better Re-use Improve data integrity, security Reduce distribution Improve availability Encapsulate database structure

Potrebbero piacerti anche