Lecture Chap2 App 2

Chapter2: Application layer Web and HTTP
Principles of network applications First some jargon

Architecture: client-server or P2P Web page consists of objects
Services that an application needs Object can be HTML file, JPEG image, Java
applet, audio file,…
important application-level protocols
Web page consists of base HTML-file which
FTP, SMTP, P2P, ……
includes several referenced objects
programming network applications Each object is addressable by a URL
socket API Example URL:
Web stuff www.someschool.edu/someDept/pic.gif
HTTP, DNS, Web searching
host name path name
2: Application Layer 1 2: Application Layer 2
HTTP overview HTTP overview (continued)

HTTP: hypertext Uses TCP: HTTP is “stateless”
transfer protocol HT
T Pr
client initiates TCP server maintains no
Web’s application layer PC running HT
equ
est connection (creates socket) information about
TP past client requests
protocol Explorer res to server, port 80
pon
se
client/server model server accepts TCP
connection from client aside
client: browser that st Protocols that maintain
ue HTTP messages (application-
requests, receives, eq “state” are complex!
T Pr nse Server
“displays” Web objects HT spo running layer protocol messages) past history (state) must
P re
server: Web server
T Apache Web exchanged between browser
HT be maintained
server (HTTP client) and Web
sends objects in if server/client crashes,
response to requests server (HTTP server)
Mac running
their views of “state” may
TCP connection closed
HTTP 1.0: RFC 1945 Navigator be inconsistent, must be
HTTP 1.1: RFC 2068 reconciled
1
HTTP connections Nonpersistent HTTP
(contains text,
Suppose user enters URL references to 10
Nonpersistent HTTP Persistent HTTP
www.someSchool.edu/someDepartment/home.index jpeg images)
At most one object is Multiple objects can 1a. HTTP client initiates TCP
sent over a TCP be sent over single connection to HTTP server
1b. HTTP server at host
(process) at
connection. TCP connection www.someSchool.edu on port 80
www.someSchool.edu waiting
between client and for TCP connection at port 80.
HTTP/1.0 uses “accepts” connection, notifying
nonpersistent HTTP server. client
2. HTTP client sends HTTP
HTTP/1.1 uses request message (containing
persistent connections URL) into TCP connection 3. HTTP server receives request
in default mode socket. Message indicates message, forms response
that client wants object message containing requested
someDepartment/home.index object, and sends message
into its socket
time
Nonpersistent HTTP (cont.) Non-Persistent HTTP: Response time

Definition of RTT: time to
4. HTTP server closes TCP
connection. send a small packet to
5. HTTP client receives response travel from client to
message containing html file, server and back. initiate TCP
displays html. Parsing html connection
file, finds 10 referenced jpeg Response time: RTT
objects
one RTT to initiate TCP
time 6. Steps 1-5 repeated for each
request
connection file
time to
of 10 jpeg objects RTT
one RTT for HTTP transmit
file
request and first few file
bytes of HTTP response

received
to return time time

file transmission time
total = 2RTT+transmit time
2
Persistent HTTP
HTTP request message
Nonpersistent HTTP issues: Persistent HTTP
requires 2 RTTs per object
two types of HTTP messages: request, response
server leaves connection
OS overhead for each TCP open after sending HTTP request message:
connection response ASCII (human-readable format)
browsers often open parallel subsequent HTTP messages
TCP connections to fetch between same request line
referenced objects client/server sent over (GET, POST, GET /somedir/page.html HTTP/1.1
open connection HEAD commands) Host: www.someschool.edu
client sends requests as User-agent: Mozilla/4.0
soon as it encounters a header Connection: close
referenced object lines Accept-language:fr
as little as one RTT for all
Carriage return,
the referenced objects (extra carriage return, line feed)
line feed
indicates end
of message
HTTP request message: general format Uploading form input

Post method:
Web page often
includes form input URL method:
Input is uploaded to Uses GET method
server in entity body Input is uploaded in
URL field of request
line:
www.somesite.com/animalsearch?monkeys&banana
3
Method types HTTP response message
status line
HTTP/1.0 HTTP/1.1 (protocol
GET GET, POST, HEAD status code HTTP/1.1 200 OK
status phrase) Connection close
POST PUT Date: Thu, 06 Aug 1998 12:00:15 GMT
HEAD uploads file in entity header
Server: Apache/1.3.0 (Unix)
body to path specified lines
Last-Modified: Mon, 22 Jun 1998 …...
asks server to leave
in URL field Content-Length: 6821
requested object out of
response DELETE Content-Type: text/html
deletes file specified in data, e.g., data data data data data ...
the URL field requested
HTML file
HTTP response message HTTP response status codes

In first line in server->client response message.
A few sample codes:
200 OK
request succeeded, requested object later in this message
301 Moved Permanently
requested object moved, new location specified later in
this message (Location:)
400 Bad Request
request message not understood by server
404 Not Found
requested document not found on this server
505 HTTP Version Not Supported
4
Trying out HTTP (client side) for yourself User-server state: cookies
Example:
1. Telnet to your favorite Web server: Many major Web sites
use cookies Susan always access
telnet cis.poly.edu 80 Opens TCP connection to port 80 Internet always from PC
(default HTTP server port) at cis.poly.edu. Four components:
Anything typed in sent 1) cookie header line of visits specific e-
to port 80 at cis.poly.edu HTTP response message commerce site for first
2) cookie header line in time
2. Type in a GET HTTP request: HTTP request message
3) cookie file kept on when initial HTTP
By typing this in (hit carriage
GET /~ross/ HTTP/1.1
return twice), you send
user’s host, managed by requests arrives at site,
Host: cis.poly.edu user’s browser site creates:
this minimal (but complete)
GET request to HTTP server 4) back-end database at
Web site unique ID
3. Look at response message sent by HTTP server! entry in backend

database for ID
Cookies: keeping “state” (cont.) Cookies (continued)

client server aside
What cookies can bring: Cookies and privacy:
ebay 8734 authorization cookies permit sites to
usual http request msg
Amazon server learn a lot about you
creates ID shopping carts
cookie file usual http response
you may supply name
Set-cookie: 1678 1678 for user create recommendations
ebay 8734 entry and e-mail to sites
amazon 1678 user session state
usual http request msg
cookie: 1678 cookie- access (Web e-mail)
specific
one week later: usual http response msg backend How to keep “state”:
action
access
database protocol endpoints: maintain state
ebay 8734 usual http request msg at sender/receiver over multiple
amazon 1678 cookie: 1678 cookie- transactions
spectific
usual http response msg action cookies: http messages carry state
5
Web caches (proxy server) More about Web caching
Goal: satisfy client request without involving origin server
cache acts as both Why Web caching?
user sets browser: origin client and server reduce response time
server
Web accesses via typically cache is for client request
cache Proxy installed by ISP
HT t reduce traffic on an
T ues
equ server P req
Pr
browser sends all (university, company, institution’s access
clientHTT Pr
est
HT T
p on se
HTTP requests to esp
ons T P res residential ISP) link.
e HT
cache st
eque Internet dense with
object in cache: cache r se
TP on

HT esp caches: enables “poor”
returns object T Pr
HT content providers to
else cache requests
object from origin client effectively deliver
server, then returns
origin
server
content (but so does
object to client P2P file sharing)
Caching example Caching example (cont)

origin origin
Assumptions Possible solution
servers servers
average object size = 100,000 increase bandwidth of access
bits public link to, say, 10 Mbps public
Internet Internet
avg. request rate from Consequences
institution’s browsers to origin utilization on LAN = 15%
servers = 15/sec utilization on access link = 15%
1.5 Mbps 10 Mbps
delay from institutional router Total delay = Internet delay +
access link access link
to any origin server and back access delay + LAN delay
institutional institutional
to router = 2 sec = 2 sec + msecs + msecs
network network
10 Mbps LAN 10 Mbps LAN
Consequences often a costly upgrade
utilization on LAN = 15%
utilization on access link = 100%
total delay = Internet delay + institutional institutional
access delay + LAN delay cache cache
= 2 sec + minutes + milliseconds
6
Caching example (cont) Conditional GET
origin
possible solution: install servers Goal: don’t send object if cache server
cache public cache has up-to-date cached HTTP request msg
suppose hit rate is 0.4 Internet version
object
If-modified-since:
consequence cache: specify date of <date>
not
40% requests will be cached copy in HTTP request modified
satisfied almost immediately If-modified-since: HTTP response
1.5 Mbps
60% requests satisfied by access link <date>
HTTP/1.0
origin server 304 Not Modified
institutional server: response contains no
utilization of access link
reduced to 60%, resulting in network
10 Mbps LAN object if cached copy is up-
HTTP request msg
negligible delays (say 10 to-date:
msec) HTTP/1.0 304 Not
If-modified-since:
<date> object
total avg delay = Internet Modified modified
delay + access delay + LAN institutional
delay = .6*(2.01) secs + HTTP response
cache
.4*milliseconds < 1.4 secs HTTP/1.0 200 OK
<data>
Chapter2: Application layer DNS: Domain Name System
Principles of network applications People: many identifiers: Domain Name System:

Architecture: client-server or P2P
SSN, name, passport # distributed database
Internet hosts, routers: implemented in hierarchy of
Services that an application needs many name servers
IP address (32 bit) -
important application-level protocols used for addressing
application-layer protocol
host, routers, name servers to
FTP, SMTP, P2P, …… datagrams
communicate to resolve names
“name”, e.g., (address/name translation)
programming network applications ww.yahoo.com - used by
note: core Internet
socket API humans
function, implemented as
Web stuff Q: map between IP application-layer protocol
addresses and name ? complexity at network’s
DNS
“edge”
7
DNS Distributed, Hierarchical Database
Root DNS Servers
DNS services Why not centralize DNS?

Hostname to IP single point of failure
com DNS servers org DNS servers edu DNS servers
address translation traffic volume
Host aliasing distant centralized yahoo.com amazon.com pbs.org poly.edu umass.edu
database DNS servers DNS serversDNS servers
Canonical and alias DNS servers DNS servers
names maintenance Client wants IP for www.amazon.com; 1st approx:
Mail server aliasing
Client queries a root server to find com DNS
Load distribution server
doesn’t scale!
Replicated Web Client queries com DNS server to get amazon.com
servers: set of IP DNS server
addresses for one
canonical name Client queries amazon.com DNS server to get IP
address for www.amazon.com
DNS: Root name servers TLD and Authoritative Servers

contacted by local name server that can not resolve name
root name server: Top-level domain (TLD) servers:
contacts authoritative name server if name mapping not known responsible for com, org, net, edu, etc, and all
gets mapping top-level country domains uk, fr, ca, jp.
returns mapping to local name server Network Solutions maintains servers for com TLD
a Verisign, Dulles, VA Educause for edu TLD
Authoritative DNS servers:

c Cogent, Herndon, VA (also Los Angeles)
d U Maryland College Park, MD k RIPE London (also Amsterdam,
g US DoD Vienna, VA Frankfurt)
organization’s DNS servers, providing

h ARL Aberdeen, MD i Autonomica, Stockholm (plus 3
j Verisign, ( 11 locations) other locations)
authoritative hostname to IP mappings for

m WIDE Tokyo
e NASA Mt View, CA
organization’s servers (e.g., Web, mail).

f Internet Software C. Palo Alto,
CA (and 17 other locations)
13 root name can be maintained by organization or service

servers worldwide
b USC-ISI Marina del Rey, CA
provider
l ICANN Los Angeles, CA
8
DNS name
Local Name Server root DNS server
resolution example
2
Does not strictly belong to hierarchy Host at cis.poly.edu 3
TLD DNS server
Each ISP (residential ISP, company, wants IP address for 4
university) has one. gaia.cs.umass.edu 5
Also called “default name server” iterated query: local DNS server
dns.poly.edu
When a host makes a DNS query, query is contacted server 7 6
replies with name of 1 8
sent to its local DNS server server to contact
Acts as a proxy, forwards query into hierarchy. “I don’t know this
authoritative DNS server
dns.cs.umass.edu
name, but ask this requesting host
server” cis.poly.edu
gaia.cs.umass.edu
DNS name
DNS: caching and updating records
resolution example root DNS server
once (any) name server learns mapping, it caches

recursive query: 2 3 mapping
puts burden of name
7 6 cache entries timeout (disappear) after some
resolution on
contacted name TLD DNS server time
server TLD servers typically cached in local name
heavy load? local DNS server servers
dns.poly.edu 5 4
• Thus root name servers not often visited
1 8
update/notify mechanisms under design by IETF
authoritative DNS server RFC 2136
dns.cs.umass.edu http://www.ietf.org/html.charters/dnsind-charter.html
requesting host
cis.poly.edu
gaia.cs.umass.edu
9
DNS records
DNS: distributed db storing resource records (RR)
Type=A RR format: (name, value, type, ttl)
name is hostname
value is IP address
E.g.: (dns.umass.edu, 128.119.40.111, A)
Type=NS
name is domain (e.g. foo.com)
value is hostname of authoritative name server for this domain
E.g.: (umass.edu, dns.umass.edu, NS)
Type=CNAME
name is alias name for some “canonical” (the real) name
www.ibm.com is really servereast.backup2.ibm.com
value is canonical name
E.g. : (www.ibm.com, servereast.backup2.ibm.com, CNAME)
Type=MX
value is name of mailserver associated with name
2: Application Layer 37 E.g. (foo.com, mail.bar.foo.com, MX) 2: Application Layer 38
Domain Type TTL Answer

Example hotmail.com. MX 3600 mx2.hotmail.com.
hotmail.com.
hotmail.com.
MX
MX
3600
3600
mx3.hotmail.com.
mx4.hotmail.com.
DNS protocol, messages
hotmail.com. MX 3600 mx1.hotmail.com.
DNS protocol : query and reply messages, both with
mx2.hotmail.com. A 3600 65.54.244.168
mx2.hotmail.com. A 3600 65.54.244.40
same message format
mx2.hotmail.com. A 3600 65.54.190.50
msg header
mx2.hotmail.com. A 3600 65.54.245.40
identification: 16 bit #
mx3.hotmail.com. A 3600 65.54.244.200
for query, reply to query
mx3.hotmail.com. A 3600 64.4.50.179
uses same #
mx3.hotmail.com. A 3600 65.54.244.72
flags:
mx3.hotmail.com. A 3600 65.54.245.72
query or reply
mx4.hotmail.com. A 3600 65.54.244.104
mx4.hotmail.com. A 3600 65.54.244.232 recursion desired
mx4.hotmail.com. A 3600 65.54.245.104 recursion available
mx4.hotmail.com. A 3600 65.54.190.179 reply is authoritative

mx1.hotmail.com. A 3600 65.54.244.8
mx1.hotmail.com. A 3600 64.4.50.50
mx1.hotmail.com. A 3600 65.54.245.8
mx1.hotmail.com. A 3600 65.54.244.136
10
DNS protocol, messages Inserting records into DNS
Example: just created startup “Network Utopia”
Name, type fields Register name networkuptopia.com at a registrar
for a query (e.g., Network Solutions)
Need to provide registrar with names and IP addresses of
RRs in response your authoritative name server (primary and secondary)
to query Registrar inserts two RRs into the com TLD server:
records for (networkutopia.com, dns1.networkutopia.com, NS)

authoritative servers (dns1.networkutopia.com, 212.212.212.1, A)
additional “helpful” Put in authoritative server Type A record for

info that may be used www.networkuptopia.com and Type MX record for
mail.networkutopia.com
How do people get the IP address of your Web site?
Exercise
Chapter2: Application layer
Suppose within your Web browser, you click on a link
to obtain a Webpage. The IP address for the
associated URL is not cached in your local host. Principles of network applications
Suppose that n DNS servers should be visited before Architecture: client-server or P2P
your host receives the IP address. The successive Services that an application needs
visits incur an RTT of RTT1, RTT2, …, RTTn. Suppose
important application-level protocols
that the base HTML file associated with the link
references three very small objects (small pictures) FTP, SMTP, P2P, ……
on the same server. Let RTT0 denote the RTT programming network applications
between the local host and the server containing the socket API
objects. Neglecting transmission times, how much time
elapses with Web stuff
a) Non-persistent HTTP with no parallel TCP connections? Web searching
b) Non-persistent HTTP with parallel connections?
c) Persistent HTTP with pipelining?
11
How Search Engines Work Standard Web Search Engine Architecture
Check for duplicates,
crawl the store the
Gather the contents of all web pages (using web documents
Crawler
a program called a crawler or spider) machines
docIDs
Organize the contents of the pages in a

way that allows efficient retrieval Create an
(indexing) inverted
index
Take in a query, determine which pages
match, and show the results (ranking and
display of results) Search
Inverted
engine
index
servers
Standard Web Search Engine Architecture

Check for duplicates,
crawl the store the
web documents
Crawler More detailed
DocIds architecture,
machines
from “Anatomy of a Large-
Scale Hypertext Web
Search Engine”, Brin &
Create an Page, 1998.
user inverted http://dbpubs.stanford.edu:8090/pub/1998-8
query index
Search
Inverted
Show results engine
index
To user servers
12
Spiders or crawlers Spider behaviour varies
How to find web pages to visit and copy? Parts of a web page that are indexed
Can start with a list of domain names, visit How deeply a site is indexed
the home pages there.
Types of files indexed
Look at the hyperlink on the home page, and
follow those links to more pages. How frequently the site is spidered
• Use HTTP commands to GET the pages
Keep a list of URLs visited, and those still to
be visited.
Each time the program loads in a new HTML
page, add the links in that page to the list to
be crawled.
Slide adapted from Lew & Davis2: Slide adapted from Lew & Davis2:
Application Layer Application Layer
Four Laws of Crawling Lots of tricky aspects
A Crawler must show identification Servers are often down or slow

Hyperlinks can get the crawler into cycles
A Crawler must obey the robots Some websites have junk in the web pages
exclusion standard Now many pages have dynamic content
http://www.robotstxt.org/wc/norobots.html The “hidden” web
E.g., schedule.berkeley.edu
A Crawler must not hog resources

• You don’t see the course schedules until you run a
query.
A Crawler must report errors
The web is HUG
13
The Internet Is Enormous “Freshness”
Need to keep checking pages
Pages change (25%,7% large changes)
• At different frequencies
• Who is the fastest changing?
• Pages are removed
Many search engines cache the pages (store a
copy on their own servers)
Image from http://www.nature.com/nature/webmatters/tomog/tomfigs/fig1.html

What really gets crawled? ii. Index (the database)
A small fraction of the Web that search Record information about each page
engines know about; no search engine is List of words
exhaustive
In the title?
Not the “live” Web, but the search engine’s
How far down in the page?
index
Was the word in boldface?
Not the “Deep Web”
URLs of pages pointing to this one
Anchor text on pages pointing to this one
The anchor text summarizes what the
website is about.
<a href=http://web.njit… > CS 656 </a>
14
Inverted Index Example
Inverted Index
How to store the words for fast lookup

Basic steps:
Make a “dictionary” of all the words in all of the web
pages
For each word, list all the documents it occurs in.
Often omit very common words
• “stop words”
Sometimes stem the words
• (also called morphological analysis)
• cats -> cat
• running -> run
2: Application Layer 57 Image from http://developer.apple.com 2: Application Layer 58

/documentation/UserExperience/Conceptual/SearchKitConcepts/searchKit_basics/chapter_2_section_2.html
Inverted Index Query Serving Architecture

“travel”
In reality, this index is HUGE

Index divided into segments
Need to store the contents across many
Load Balancer
“travel”
each served by a node
machines FE1 FE2 … FE8 … Each row of nodes replicated
for query load
Need to do optimization tricks to make “travel” Query integrator distributes
lookup fast. QI1 QI2 … QI8 … query and merges results
Front end creates a HTML
“travel” “travel” page with the query results
Node1,1 Node1,2 Node1,3 … Node1,N
2: Application Layer 59
2: Application Layer 60
15
iii. Results ranking Some ranking criteria
Search engine receives a query, then

For a given candidate result page, use:
Looks up the words in the index, retrieves many Number of matching query words in the page
documents, then Proximity of matching words to one another
Location of terms within the page
Rank orders the pages and extracts “snippets” or
Location of terms within tags e.g. <title>, <h1>, link text, body
summaries containing query words. text
Most web search engines assume the user wants all of Anchor text on pages pointing to this one
the words (Boolean AND, not OR). Frequency of terms on the page and in general
Link analysis of which pages point to this one
These are complex and highly guarded algorithms
(Sometimes) Click-through analysis: how often the page is
unique to each search engine. clicked on
How “fresh” is the page
Complex formulae combine these together.
Measuring Importance of Measuring Importance of

Linking Linking
PageRank Algorithm Example: each page starts with 100
points.
Idea: important pages are pointed
Each page’s score is recalculated by
to by other important pages
adding up the score from each incoming
Method: link.
• Each link from one page to another is counted as a This is the score of the linking page
“vote” for the destination page divided by the number of outgoing links it
• But the importance of the starting page also has.
influences the importance of the destination page. E.g, the page in green has 2 outgoing links
and so its “points” are shared evenly by
• And those pages scores, in turn, depend on those
the 2 pages it links to.
linking to them.
Keep repeating the score updates until
no more changes.
2: Application Layer
Image and explanation from http://www.economist.com/science/tq/displayStory.cfm?story_id=3172188
63 2: Application Layer 64
Image and explanation from http://www.economist.com/science/tq/displayStory.cfm?story_id=3172188
16
Search Engine Information Acknowledgement
www.searchenginewatch.com Slides about web searching are adapted
www.searchenginejournal.com from the slides authored by Dr. Marti
www.searchengineshowdown.com Hearst.
http://battellemedia.com
http://jeremy.zawodny.com/blog/
17

Lecture Chap2 App 2

Caricato da

Informazioni sul documento

Descrizione originale:

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Lecture Chap2 App 2

Caricato da

Copyright:

Formati disponibili

Chapter2: Application layer Web and HTTP

 Principles of network applications First some jargon

2: Application Layer 1 2: Application Layer 2

HTTP overview HTTP overview (continued)

2: Application Layer 3 2: Application Layer 4

Nonpersistent HTTP (cont.) Non-Persistent HTTP: Response time

bytes of HTTP response

to return time time

HTTP request message: general format Uploading form input

2: Application Layer 11 2: Application Layer 12

2: Application Layer 13 2: Application Layer 14

HTTP response message HTTP response status codes

3. Look at response message sent by HTTP server!  entry in backend

Cookies: keeping “state” (cont.) Cookies (continued)

2: Application Layer 19 2: Application Layer 20

Caching example Caching example (cont)

Chapter2: Application layer DNS: Domain Name System

 Principles of network applications People: many identifiers: Domain Name System:

2: Application Layer 27 2: Application Layer 28

DNS services Why not centralize DNS?

DNS: Root name servers TLD and Authoritative Servers

 Authoritative DNS servers:

 organization’s DNS servers, providing

authoritative hostname to IP mappings for

organization’s servers (e.g., Web, mail).

13 root name  can be maintained by organization or service

2: Application Layer 31 2: Application Layer 32

university) has one. gaia.cs.umass.edu 5

2: Application Layer 33 2: Application Layer 34

 once (any) name server learns mapping, it caches

Domain Type TTL Answer

mx4.hotmail.com. A 3600 65.54.245.104  recursion available

mx4.hotmail.com. A 3600 65.54.190.179  reply is authoritative

records for (networkutopia.com, dns1.networkutopia.com, NS)

additional “helpful”  Put in authoritative server Type A record for

2: Application Layer 41 2: Application Layer 42

 Organize the contents of the pages in a

2: Application Layer 45 2: Application Layer 46

Standard Web Search Engine Architecture

2: Application Layer 47 2: Application Layer 48

Four Laws of Crawling Lots of tricky aspects

 A Crawler must show identification  Servers are often down or slow

2: Application Layer 51 2: Application Layer 52

Image from http://www.nature.com/nature/webmatters/tomog/tomfigs/fig1.html

What really gets crawled? ii. Index (the database)

 How to store the words for fast lookup

2: Application Layer 57 Image from http://developer.apple.com 2: Application Layer 58

Inverted Index Query Serving Architecture

 In reality, this index is HUGE

Node1,1 Node1,2 Node1,3 … Node1,N

Node2,1 Node2,2 Node2,3 … Node2,N

Node3,1 Node3,2 Node3,3 … Node3,N

 Search engine receives a query, then

Measuring Importance of Measuring Importance of

2: Application Layer 65 2: Application Layer 66

Potrebbero piacerti anche

Principles of network applications First some jargon

3. Look at response message sent by HTTP server! entry in backend

Principles of network applications People: many identifiers: Domain Name System:

Authoritative DNS servers:

organization’s DNS servers, providing

13 root name can be maintained by organization or service

once (any) name server learns mapping, it caches

mx4.hotmail.com. A 3600 65.54.245.104 recursion available

mx4.hotmail.com. A 3600 65.54.190.179 reply is authoritative

additional “helpful” Put in authoritative server Type A record for

Organize the contents of the pages in a

A Crawler must show identification Servers are often down or slow

How to store the words for fast lookup

In reality, this index is HUGE

Search engine receives a query, then