Sei sulla pagina 1di 28

Internet Quality-of-Service (QoS)

Henning Schulzrinne Columbia University Fall 2003

Quality of Service

Motivation Service availability Elementary queueing theory Traffic characterization & control Integrated services (RSVP, NSIS) Differentiated services (DiffServ)

What is quality of service?

Many applications are sensitive to the effects of delay (+ jitter) and packet loss

utility ($)

may have floor below which utility drops to zero

The existing Internet architecture provides a best effort service.

All traffic is treated equally (generally, FIFO queuing) No mechanism for distinguishing between delay sensitive and best effort traffic

Original IP architecture (IPv4) has TOS (type-ofservice byte) in packet header

RFC 795: defined multiple axes (delay, throughput, reliability) rarely used outside some (rumor) military networks

bandwidth

Motivation

QoS service availability

not good enough if all but 2 minutes of my phone call sound perfect

Support mission-critical applications that cant tolerate disruption


VoIP VPNs (LAN emulation) high-availability computing

Charge more for business applications vs. consumer applications

Service availability

Users do not care about QoS at least not about packet loss, jitter, delay rather, its service availability how likely is it that I can place a call and not get interrupted? availability = MTBF / (MTBF + MTTR)

MTBF = mean time between failures MTTR = mean time to repair

availability = successful calls / first call attempts


equipment availability: 99.999% (5 nines)Long-distance 5 minutes/year voice 99.978% AT&T (2003): ATM data 99.999% Sprint IP frame relay SLA: 99.5% Frame relay data 99.998%
IP 99.991%

Availability PSTN metrics

PSTN metrics (Worldbank study):


fault rate

should be less than 0.2 per main line next business day

fault clearance (~ MTTR)

call completion rate

during network busy hour varies from about 60% - 75%

dial tone delay

Example PSTN statistics

Source: Worldbank

Measurement setup
Node name
columbia

Location
Columbia University, NY

Connectivity
>= OC3

Network
I2

wustl
unm

Washington U., St. Louis


Univ. of New Mexico

I2
I2

epfl
hut

EPFL, Lausanne, CH
Helsinki University of Technology

I2+
I2+

rr
rrqueens

NYC
Queens, NY

cable modem
cable modem

ISP
ISP

njcable
newport sanjose suna sh Shanghaihome Shanghaioffice

New Jersey
New Jersey San Jose, California Kitakyushu, Japan Shanghai, China Shanghai, China Shanghai, China

cable modem
ADSL cable modem 3 Mb/s cable modem cable modem ADSL

ISP
ISP ISP ISP ISP ISP ISP

Measurement setup

Active measurements call duration 3 or 7 minutes UDP packets:


36 bytes alternating with 72 bytes (FEC) 40 ms spacing

September 10 to December 6, 2002 13,500 call hours

Call success probability All


Internet2

99.53% 99.52% 99.56%

62,027 calls succeeded, 292 failed 99.53% availability roughly constant across I2, I2+, commercial ISPs

Internet2+

Commercial
Domestic (US) International Domestic commercial International commercial

99.51%
99.45% 99.58% 99.39% 99.59%

Overall network loss

PSTN: once connected, call usually of good quality

loss All ISP I2 I2+

0% 82.3 78.6 97.7 86.8

5% 97.48 96.72 99.67 98.41

10% 99.16 99.04 99.77 99.32

20% 99.75 99.74 99.79 99.76

exception: mobile phones

compute periods of time below loss threshold


5% causes degradation for many codecs others acceptable till 20%

US
Int. US ISP Int. ISP

83.6
81.7 73.6 81.2

96.95
97.73 95.03 97.60

99.27
99.11 98.92 99.10

99.79
99.73 99.79 99.71

Network outages

sustained packet losses


arbitrarily defined at 8 packets far beyond any recoverable loss (FEC, interpolation)

23% outages make up significant part of 0.25% unavailability symmetric: AB BA spatially correlated: AB AX not correlated across networks (e.g., I2 and commercial)

Network outages

Complementary CDF

Complementary CDF

US Domestic paths International paths 0.1 0.01 0.001

all paths Internet2

0.1 0.01 0.001

0.0001 1e-05 0 50 100 150 200 250 300 350 400 outage duration (sec)

0.0001

50

100 150 200 250 300 350 400 outage duration (sec)

Network outages
no. of outages % symmetric

duration (mean)

duration (median)

total (all, h:m)

outages > 1000 packets

all I2 I2+ ISP

10,753 819 2,708 8,045

30% 14.5% 10% 37%

145 360 259 107

25 25 26 24

17:20 3:17 7:47 9:33

10:58 2:33 5:37 4:58

US
Int.

1,777
8,976

18%
33%

269
121

20
26

5:18
12:02

3:53
6:42

Outage-induced call abortion probability


Long interruption user likely to abandon call from E.855 survey: P[holding] = et/17.26 (t in seconds) half the users will abandon call after 12s 2,566 have at least one outage 946 of 2,566 expected to be dropped 1.53% of all calls

all I2 I2+ ISP US Int. US ISP Int. ISP

1.53% 1.16% 1.15% 1.82% 0.99% 1.78% 0.86% 2.30%

Conclusions from measurement


Availability in space is (mostly) solved availability in time restricts usability for new applications initial investigation into service availability for VoIP need to define metrics for, say, web access unify packet loss and no Internet dial tone far less than 5 nines working on identifying fault sources and locations looking for additional measurement sites

Whats next?

Existing SLAs are mostly useless


too many exceptions wrong time scales: month vs. minutes no guarantees for interconnects

Existing measurements similarly dubious Limited ability to learn from mistakes


what are the primary causes of service unavailability? what can I do to protect myself multi-homing via same fiber? diverse access mechanisms?

Consumers of services have no good ways to compare service availability

only some very large customers may get access to carrier-internal data

Thus, market failure Need published metrics

similar to switch availability reporting

What's hard to scale (and not)

Signaling does not have be hard:

one message, on a reliable peering channel or IP router alert option NSIS effort in the IETF? 700 MHz Celeron processor 10,000 flow setups/second 300,000 softstate flows

YESSIR: RTCP-based signaling


If scaling matters, sink-tree based reservation (BGRP)

Diversity is good

Unlike routing, no need for single signaling protocol:


multicast is much harder dumb end devices edge "pop-up" only show up in edge nodes

AAA

Signaling can easily be done in ASIC (no harder than IP), but

need cryptographic verification of request need interface to Authentication, Authorization, Accounting (AAA) cross-domain authentication hard, but 3G networks will do it anyway easier if both sides ask their own access router see also: iPass for dial-up, OSP (open settlement protocol)

AAA example
reserves for both directions

source

AR1

Internet

AR2
signs request

destination

Cell phone model: both sides pay

Reservation scaling

Example: every long-distance call in the US uses VoIP with per-flow resource reservation 2000: 567.4 billion minutes @ 10 minutes each 1,800 calls/second single mySQL server can sustain 5002,000 queries+updates/second

Business models don't work

Most of the time, "tin" service is no worse than "platinum" service


can't impress others with platinum AmEx card no frequent flyer bonuses

everybody switches only when the network is in bad shape

Resource control & reservation


Application Tspec
Routing Protocols & DBs

Reservation Protocol

Y/N

Admission Control

Traffic Control DB

Data

Classifier & route selection

Packet Scheduler

QoS queuing
Best-effort queuing

USC EE-S 555

RED (Random Early Detection)

TCP synchronization effect during overload, many connections lose packets and go into slowstart RED: start dropping based on average queue occupancy (vs. instantaneous queue occupancy) Parameter setting critical and non-trivial See also RFC 2309

THmax

THmin 0

Discard

Discard with increasing probability Pd

Do not discard

ECN (Explicit Congestion Notification)


Extension of RED: mark instead of drop RFC 2481 (A Proposal to add Explicit Congestion Notification (ECN) to IP) IP TOS6 bit indicates congestion: ECN IP TOS7 bit indicates support for mechanism Needs cooperation of TCP (or similar protocols) TCP should act almost as if packet was dropped

ECT=1 ECN=0

ECT=1 ECN=1

TCP ACK: ECN echo

congestion window but dont do slow-start

Next steps in signaling (NSIS)

RSVP not widely used for resource reservation


but is used for MPLS path setup design heavily biased by multicast needs marginal and after-the-fact security limited support for IP mobility

Thus, IETF NSIS working group developing new framework for general state management protocol

resource reservation NAT and firewall control traffic and QoS measurement MPLS and lambda path setup NSLP: services NTLP: transport

Split into two components:


NSIS

On-path vs. off-path


off-path bandwidth brokers use router alert option QoS NAT/FW NTLP measure

Discovery of next NTLP or NSLP hop

SCTP
UDP TCP SCTP

Potrebbero piacerti anche