Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
de Comunicații (APC)
Protocoale de
nivel Transport
End-to-end data transport
Other Other
apps apps
File transfer File transfer
FTP E-mail E-mail FTP
Web apps SMTP, SMTP, Web apps
POP, IMAP POP, IMAP
HTTP HTTP
© Octavian Catrina 3
Review: TCP/IP protocol stack
Web browser,
e-mail, ... Other user
applications User
Applications
space
Application protocols:
HTTP, SMTP, FTP, ...
TCP UDP
Transport
OS
IGMP ICMP RIP OSPF kernel
IP
Network
RARP ARP
LAN DL WAN DL
technology technology Data Link
© Octavian Catrina 4
Addressing
Applications IP address + TCP/UDP port Applications
DL DL
PHY PHY
⚫ Which host?
⚫ IP address. 32 bits (IPv4), in IP packet header.
⚫ Which transport protocol?
⚫ Protocol id. 8 bits (IPv4), in IP packet header.
© Octavian Catrina 5
TCP/UDP ports and the Client-Server model
Distributed application
Server Client
Wait for service request Start
Receive request Send request
Serve request
Send result Receive result
Stop
Transport
protocol
⚫ Server
⚫ Listens continuously for requests on a port known to clients.
Reserved server ports: < 1024 (see RFC 1700).
However, many servers use port numbers > 1024.
⚫ Client
⚫ A currently unused port ( 1024) is dynamically allocated for
the duration of its life. Issues request to known server port.
© Octavian Catrina 6
Example: HTTP server and clients
neptun.elc.ro hugo.int.fr zola.int.fr
141.85.43.8 139.29.100.11 139.29.35.18
Source IP address
Pseudo-
header Destination IP address
0 Protocol (17) UDP datagram length
IP header 0 15 16 31
(20 bytes + opt.)
Source UDP port Destination UDP port
UDP header
(8 bytes) Length Checksum
UDP data Data
⚫ Pseudo-header
⚫ Part of IP header contents. Accompanies UDP datagram at the
interface between UDP and IP.
⚫ UDP checksum
⚫ Covers UDP datagram and pseudo-header.
⚫ Checksum computation is optional.
© Octavian Catrina 8
TCP packets ("segments")
Source IP address
Pseudo-
header Destination IP address
0 Protocol TCP segment length
0 4 10 16 31
© Octavian Catrina 10
Overview of TCP operation
Open-Active Open-Passive
SYN, ...
SYN-SENT LISTEN
SYN+ACK, ... SYN-RCVD
Open-Success
ESTABLISHED ACK, ... Open-Success
Send(dt[100]) ESTABLISHED
..., dt[100]
Receive(dt[100])
ACK, ...
Close
FIN-WAIT-1 FIN, ... Closing
ACK, ... CLOSE-WAIT
Close
FIN-WAIT-2 FIN, ...
Terminate LAST-ACK
TIME-WAIT ACK, ... Terminate
CLOSED CLOSED
© Octavian Catrina 11
TCP state machine
CLOSED
U-1 TCP-1 TCP-2 U-2
CLOSED CLOSED
Open-
Close/ Open-Active Open-Passive
Passive/
LISTEN Open- [SYN, ...] LISTEN
Active/SYN SYN-SENT
SYN/ SYN+ACK [SYN+ACK, ...] SYN-RCVD
Open-Success
RST/ ESTAB. [ACK, ...]
SYN SYN Open-Success
RECVD SENT ESTAB.
Send(100)
ACK/ SYN+ACK/ACK [..., data(100)]
© Octavian Catrina 12
TCP state machine (cont.)
⚫ LISTEN - waiting for a connection request (CTRL=SYN) from any remote TCP.
⚫ SYN-SENT - waiting for a matching reply after having sent a connection request
(CTRL=SYN).
⚫ SYN-RECEIVED - waiting for a matching connection acknowledgment
(CTRL=ACK) after having received a connection request (CTRL=SYN) and
replied (CTRL=SYN+ACK).
⚫ ESTABLISHED - connection is open, user data can be sent and received.
⚫ FIN-WAIT-1 - waiting for connection termination request from the remote TCP
(CTRL=FIN+ACK), or acknowledgment of the termination request previously sent.
⚫ FIN-WAIT-2 - waiting for a connection termination request from the remote TCP.
⚫ CLOSE-WAIT - waiting for a connection termination request from the local user.
⚫ CLOSING - waiting for connection termination request acknowledgment from
remote TCP.
⚫ LAST-ACK - waiting for acknowledgment of the connection termination request
previously sent to the remote TCP (after acknowledging its termination request).
⚫ TIME-WAIT - waiting for enough time to pass to be sure the remote TCP received
the acknowledgment of its connection termination request.
⚫ CLOSED - no connection active or pending.
SYN segment
details
⚫ Main functions
⚫ Error control.
⚫ Flow control.
⚫ Congestion control.
© Octavian Catrina 16
Error control: Data acknowledgement
User A TCP A TCP B User B
ESTABLISHED ESTABLISHED
Send(data[500])
ACK, seq=s1, ack=s2, data[500]
Receive(data[500])
ACK, seq=s2, ack=s1+500
Send(data[300])
ACK, seq=s1+500, ack=s2, data[300]
Send(data[200]) Receive(data[300])
ACK, seq=s1+800, ack=s2, data[200]
Send(data[400])
ACK, seq=s2, ack=s1+800
Receive(data[200])
ACK, seq=s2, ack=s1+1000, data[400]
Receive(data[400])
ACK, seq=s1+1000, ack=s2+400
TCP segment header fields used for TCP state variables used for error control
error control: • Send Sequence Variables
• Sequence number (SEG.SEQ) SND.UNA - Send Unacknowledged
• Acknowledgment number (SEG.ACK) SND.NXT - Send Next
• Checksum. • Receive Sequence Variables
RCV.NXT - Receive Next
© Octavian Catrina 18
Basic data retransmission
User A TCP A TCP B User B
ESTABLISHED ESTABLISHED
Send(data[500])
ACK, seq=s1, data[500]
Send(data[300])
ACK, seq=s1+500, data[300] Store in buffer
Send(data[200])
ACK, seq=s1+800, data[200] ACK, ack=s1 500 ? 300
Timeout
Retransmission ACK, seq=s1, data[500]
500 300 200
of the data from
s1 to s1+500 ACK, ack=s1+1000 Receive(data[1000])
triggered by
timeout
TCP A TCP B
RTT[k+1] = Tack[k]
ACK, ack=s1+800
Tack[k+1]-Tdata[k+1]
Tack[k+1]
© Octavian Catrina 20
Selective retransmission
User A TCP A TCP B User B
ESTABLISHED ESTABLISHED
Send(data[500])
ACK, seq=s1, data[500]
Send(data[300])
ACK, seq=s1+500, data[300] Store in buffer
Send(data[200])
ACK, seq=s1+800, data[200] 500 ? 300
ACK, ack=s1, sack=(s1+500 → s1+800) 500 ? 300 200
ACK, ack=s1, sack=s1+500 → s1+1000)
Retransmission
of the data from ACK, seq=s1, data[500]
500 300 200
s1 to s1+500
triggered by ACK, ack=s1+1000 Receive(data[1000])
selective ack.
Faster recovery
IF the SACK option is supported and enabled, the receiver uses selective
acknowledgments to tell the sender what out-of-order data (beyond RCV.NXT)
is saved in its buffer, and hence what data has to be retransmitted.
Faster recovery, especially when multiple data segments are lost.
© Octavian Catrina 21
Example: Fast retransmission, no SACK
HTTP data transfer over TCP connection from 1.1.4.2 to 1.1.3.2 without SACK.
Fast retransmission: The sender retransmits after receiving > 3 duplicate acknowledgements.
Data segment length: 1024 octets. Single lost packet.
Lost data:
seq [78354, 79378),
1 data segment
© Octavian Catrina 22
Selective acknowledgment (SACK)
Cumulative acknowledgment:
All data has been received up
to sequence number 162232
Selective acknowledgment
option (SACK):
Further data received
from sequence number
163346 to 164370
© Octavian Catrina 23
Example: Selective retransmission (1)
HTTP data transfer over TCP connection from 1.1.4.2 to 1.1.3.2 with selective acknowledgements (SACK)
Data segment length: 1024 octets. SLE: SACK Left Edge. SRE: SACK Right Edge.
Single lost packet.
© Octavian Catrina 24
Example: Selective retransmission (2)
HTTP data transfer over TCP connection from 1.1.4.2 to 1.1.1.2 with selective acknowledgements (SACK)
Data segment length: 1024 octets. SLE: SACK Left Edge. SRE: SACK Right Edge.
Congested network path, multiple lost packets.
Retransmission
of the 4 lost data
segments
© Octavian Catrina 25
Flow control
⚫ Allows the receiver to slow down a faster transmitter
⚫ End-to-end flow control using sliding-window mechanism.
Sender Receiver,
bottleneck.
Data (sequence number)
© Octavian Catrina 26
Flow control: Sender/receiver windows
© Octavian Catrina 27
Flow control: example
© Octavian Catrina 28
Congestion control
⚫ Limits transmission to avoid network congestion
⚫ As congestion is building up, IP routers start dropping packets.
Also, the transfer delay increases (due to queuing delay).
⚫ TCP congestion control adjusts the transmission rate according
to implicit congestion signals from the network.
Assumes that packets are lost due to congestion rather than bit errors.
Details in next section.
LAN LAN
IP network (bottleneck)
Slow down
TCP transmitter Dropped packets
(detected by error control)
© Octavian Catrina 29
Graceful close
User A TCP A TCP B User B
ESTABLISHED ESTABLISHED
Close
FIN-WAIT-1 FIN, ACK, seq=s1, ack=s2 Closing
Closing stream A→B ACK, seq=s2, ack=s1+1 CLOSE-WAIT
Stream A→B closed
FIN-WAIT-2
Stream A→B closed Close
FIN, ACK, seq=s2, ack=s1+1
Terminate LAST-ACK
Closing stream B→A
TIME-WAIT ACK, seq=s1+1, ack=s2+1 Terminate
Streams AB closed
CLOSED
CLOSED Streams AB closed
© Octavian Catrina 30
Closing: avoiding anomalies
⚫ TIME-WAIT state
⚫ The TCP endpoint that sends the last ACK during the
closing procedure must delay the release of the
connection's state.
⚫ Duration: 2MSL, where MSL = Maximum Segment
Lifetime (e.g., 1-2 min.).
⚫ Purpose of the TIME-WAIT state
⚫ Allow recovery of the last closing handshake (when the
last ACK is lost and hence the last FIN is retransmitted).
⚫ Prevent the reuse of the connection's address pair as
long as its packets can survive (2MSL) in the network.
Avoid interference between successive connection
instances.
© Octavian Catrina 31
Congestion in IP networks
Non-responsive Flows
© Octavian Catrina 32
Packet forwarding model
Limited packet
R1 queue (buffer) size R5
Bottleneck
R3 link R4
Packets dropped
when the queue
R2 overflows R6
© Octavian Catrina 33
Example: Congestion in IP networks
R1
r=100
R5 Overall throughput:
r=10
r1=10 r1=2 r1+r2 = 2+1 = 3
R3 r=20 r1=2 R4 r=1
But we could get:
r2=90 r2=1
r=100 r2=18 r1+r2 = 10+1 = 11
R2 R6
Throughput
⚫ Knee: point after which Congestion
collapse
⚫ throughput increases slowly.
⚫ delay increases quickly.
⚫ Cliff: point after which under saturation over
utilization utilization
⚫ throughput decreases quickly
Load
to zero - congestion collapse.
Delay
⚫ delay goes to infinity.
© Octavian Catrina 36
Congestion experiments (2)
Test 2: H1 sends at r1 = 80 (UDP)
r0 = 8
H0 r0' 1.82 H5 r5 1.8
BW0 = 100 BW5 =10
r1' 18.18
r3 = 20
r1 = 80 BW6 =10
H1 H6 r6 10
BW1 = 100 R1 BW = 20 R2
Bottleneck link
N3 N4
© Octavian Catrina 37
Congestion experiments (3)
Test 3: H1 sends at r1 = 80 (UDP), BW6 = 1
r0 = 8
H0 r0' 1.82 H5 r5 1.8
BW0 = 100 BW5 =10
r1' 18.18
r3 = 20
r1 = 80 BW6 =1
H1 H6 r6 1
Bottleneck
BW1 = 100 R1 BW = 20 R2 link
Bottleneck link
N3 N4
© Octavian Catrina 38
Congestion in IP networks
TCP Congestion Control
© Octavian Catrina 39
IP and TCP congestion control
⚫ IP network behavior ⚫ TCP host behavior
When congestion builds up, the TCP monitors the amount of
packets accumulate in routers' data in transit, the round-trip-
packet queues. time, and the lost packets.
The transfer delay increases, the It assumes that lost packets are
routers start dropping packets. congestion symptoms.
⚫ IP congestion control ⚫ TCP congestion control
Routers use queue management TCP limits its transmission
mechanisms to control the queue using a congestion window,
size and decide which packets to dynamically adjusted based on
forward or drop and when. congestion symptoms.
Goals
⚫ Efficiency: Avoid overload (collapse) as well as underutilization.
⚫ Fairness: Allocate a fair share of resources to all flows.
⚫ Smooth convergence (low oscillations) to efficiency and fairness.
© Octavian Catrina 40
TCP data transfer (1/3)
Amount of data in the pipe:
Transmission N=RD
rate R'
Data pipe with rate R and delay D
DATA
ACK
seq=s1, data[1000]
seq=s1+1000, data[1000]
seq=s1+2000, data[1000]
RTT
(Round-Trip ACK, ack=s1+1000
seq=s1+3000, data[1000]
Time) ACK, ack=s1+2000
seq=s1+4000, data[1000]
seq=s1+5000, data[1000] ACK, ack=s1+3000
ACK, ack=s1+4000
...
...
⚫ The TCP acknowledgment mechanism allows TCP to fill the pipe
(handle multiple unacknowledged data segments), and to estimate
the current RTT and the current amount of data in the pipe.
Sent during RTT Unacknowledged = SND.NXT - SND.UNA.
© Octavian Catrina 42
TCP data transfer (3/3)
Amount of data in the pipe:
Transmission N=RD
rate R'
Data pipe with rate R and delay D
DATA
TCP sender:
R=? D=?
CWND R RTT ACK
R' CWND / RTT
⚫ TCP adjusts its transmission rate to the available data rate on the
network path:
⚫ Maintains a congestion window CWND which approximates RRTT.
⚫ Limits transmission such that the amount of unacknowledged data
(SND.NXT - SND.UNA) is less than CWND (also less than the
window advertised by the receiver, for end-to-end flow control).
⚫ Advances the window (and sends more data) when ACKs arrive,
indicating that some data was delivered (hence exited the pipe).
Therefore, ACKs also provide transmission timing (self-clocking).
⚫ How to dynamically adjust CWND such that to satisfy the goals
(efficiency, fairness, smooth convergence)?
© Octavian Catrina 43
Efficiency and fairness
Source 1 Goals: Efficiency and fairness
x1 Control system model Example: 2 flows
Source 2 x2 fairness
xk Goals x2 > x1 line: x1=x2
....
xn
Flow 2, rate x2
too much
Binary feedback: for x2
Source n - decrease xk under- over-
- increase xk utilization utilization
x1+x2 C x1+x2 C
increase: xk(t+1) = aIxk(t) + bI
x1 > x 2
decrease: xk(t+1) = aDxk(t) + bD too much efficiency
for x1 line: x1+x2=C
⚫ This system converges to data rates meeting
Flow 1, rate x1
the efficiency and fairness goals only for
additive or multiplicative increase and
(x1,x2)
multiplicative decrease. (aDx1+bI,
fairness
aDx2+bI)
Flow 2, rate x2
line
⚫ Best: Additive Increase & Multiplicative
(aDx1,
Decrease (AIMD): aDx2)
Additive Increase: xk(t+1)=xk(t)+bI
Multiplicative Decrease: xk(t+1)=aDxk(t) efficiency
⚫ Basic solution used by TCP congestion control line
mechanisms (+ enhancements).
© Octavian Catrina
Flow 1, rate x1
44
TCP congestion control
⚫ Congestion window (CWND) adjustment
⚫ At steady state, CWND oscillates around the current optimal
value, CWND R RTT, for the throughput R that the network
path can offer to the TCP flow, and the current RTT.
congestion window
Additive Multiplicative
increase decrease
time
time
© Octavian Catrina 47
Slow Start, Congestion Avoidance (3/3)
time
© Octavian Catrina 48
Data retransmission
seq=s1, data[1000] seq=s1, data[1000]
seq=s1+1000, data[1000] seq=s1+1000, data[1000]
seq=s1+2000, data[1000] seq=s1+2000, data[1000]
ACK, ack=s1 ACK, ack=s1
seq=s1+3000, data[1000] seq=s1+3000, data[1000]
ACK, ack=s1 ACK, ack=s1
ACK, ack=s1 2 duplicate ACKs ACK, ack=s1
seq=s1, data[1000] Could retransmit (?)
Retransmission timer expires
ACK, ack=s1+3000
seq=s1, data[1000] Timeout retransmission
RFC 2581 requires the reception of three
ACK, ack=s1+3000 duplicate ACKs before fast retransmission
⚫ Timeout retransmission
⚫ The timer is adjusted based on RTT measurements, but set to
a conservative value (substantially larger than RTT).
⚫ Duplicate ACKs
⚫ When data arrives out of order due to the loss of previous
segments, the receiver returns an ACK indicating the expected
sequence number. Can be used to trigger earlier retransmission.
© Octavian Catrina 49
Fast Retransmit, Fast Recovery
cwnd TCP Tahoe cwnd Fast retrs.+ TCP Reno
Fast
retransmit Fast recovery
Slow Congestion Slow Congestion Slow Congestion Congestion
Start Avoidance Start Avoidance Start Avoidance Avoidance
time time
⚫ Fast Retransmit
⚫ Add another congestion symptom event: three duplicate ACKs.
⚫ Faster than waiting for timeout. Introduced in TCP Tahoe.
⚫ Fast Recovery
⚫ Duplicate ACKs are received The network still delivers data
Light congestion Do not empty the pipe, just reduce the
amount of data to half. Set CWND=ssthresh=SentUnacked/2.
⚫ See details in RFC 2581 and RFC 2582. Added in TCP Reno.
© Octavian Catrina 50
Analysis
Approximation of the TCP behavior
⚫ TCP transmission rate congestion window
R k·L/(T·q1/2) (bps) W
R1 Bottleneck link R2
Trace packets
dropped when the
queue overflows
Monitor TCP Monitor end-to-end
cwnd, ... throughput
Queue management:
FIFO, Tail-drop
© Octavian Catrina 52
TCP Tahoe
Bottleneck
link
TCP
R1 queue size
(capacity: 5000)
ssthresh
Slow Congestion
Start Avoidance
© Octavian Catrina 53
TCP Reno
Bottleneck
link
TCP
R1 queue size
(capacity: 5000)
ssthresh
© Octavian Catrina 54
TCP (Reno) + UDP/CBR
Link CBR
start: 80
TCP; TCP; CBR
R1 queue size
(<= 5000)
© Octavian Catrina 55