Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Data Communications and Networks II 4003-541-70/4005-741-70 Spring Quarter 2006
Course Page

4003-541-70/4005-741-70
Data Communications and Networks II
Module 5. Transport Algorithms -- Lecture Notes

Prof. Alan Kaminsky -- Spring Quarter 2006
Rochester Institute of Technology -- Department of Computer Science


Transport Service Design Issues

  • Setup options
    • Connection-oriented
    • Connectionless
       
  • Delivery options
    • Reliable delivery
    • Unreliable delivery
       
  • Content options
    • Stream (unbounded size)
    • Messages (unbounded size)
    • Messages (bounded size)
       
  • Endpoint identification
     
  • Application programming interface
     
  • Segmentation
     
  • Sequence numbering options
    • Give each byte its own sequence number
    • Give each segment its own sequence number
       
  • Segment identification
    • A segment must be uniquely correlated to the connection it is part of
    • Problem: Delayed delivery of segments for previous connections
       
  • Corrupted segments: detection, response
     


TCP Design

  • RFC 793, "Transmission Control Protocol" (September 1981)
     
  • RFC 1122, "Requirements for Internet Hosts -- Communication Layers" (October 1989)
     
  • Setup: Connection-oriented
     
  • Delivery: Reliable delivery
     
  • Content: Stream (unbounded size)
    • A TCP connection carries a data stream (one in each direction)
    • Each data stream can consist of any content (unbounded length)
    • TCP connection provides one output stream for writing outgoing data stream's content
    • TCP connection provides one input stream for reading incoming data stream's content
       
  • Endpoint identification: Port number (16 bits)
     
  • TCP API
    • Class java.net.ServerSocket
    • Class java.net.Socket
       
  • Segmentation
    • Transmitter splits up outgoing stream into segments
    • Receiver tells transmitter the Maximum Segment Size (MSS) during connection setup (TCP header option)
       
  • Sequence numbering: Give each byte its own sequence number
     
  • Segment identification
    • A segment is correlated to a connection by source IP address, source TCP port, destination IP address, and destination TCP port
    • All those fields could be the same for two consecutive connections
    • Problem: Delayed delivery of a segment for the old connection could be interpreted as part of the new connection
    • "Solution:" Make sure old connection's sequence numbers do not overlap with new connection's sequence numbers
      • Requires initial sequence number negotiation during connection setup
      • Requires an elaborate scheme for picking initial sequence numbers
      • Requires an elaborate scheme for dealing with potential sequence number overlap with the old connection during the new connection
         
  • Corrupted segments
    • Detection: Checksum
      • Computed over pseudoheader (certain fields from IP header), TCP header, and TCP data
      • Checksum algorithm specified in RFC 1071, "Computing the Internet Checksum" (September 1988)
    • Response: Ignore a corrupted packet
       
  • TCP segment format
     0                   1                   2                   3   
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |          Source Port          |       Destination Port        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        Sequence Number                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                    Acknowledgment Number                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  Data |           |U|A|P|R|S|F|                               |
    | Offset| Reserved  |R|C|S|S|Y|I|            Window             |
    |       |           |G|K|H|T|N|N|                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |           Checksum            |         Urgent Pointer        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                    Options                    |    Padding    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                             data                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    


RMTP Design

  • RIT Overlay Network (RON) Message Transport Protocol (RMTP)
     
  • Setup: Connection-oriented
     
  • Delivery: Reliable delivery
     
  • Content: Messages (unbounded size)
    • An RMTP connection carries a sequence of messages in each direction
    • Each message can consist of any content (unbounded length)
    • RMTP connection provides a separate output stream for writing each outgoing message's content
    • RMTP connection provides a separate input stream for reading each incoming message's content
       
  • Endpoint identification: Port number (16 bits)
     
  • RMTP API -- Package edu.rit.ron.rmtp
  • Segmentation
    • Transmitter splits up outgoing stream into segments
    • MSS is fixed at (RONP maximum payload size, 536 bytes) - (RMTP header size, 20 bytes) = 516 bytes
       
  • Sequence numbering: Give each segment its own sequence number
     
  • Segment identification
    • A segment is correlated to a connection by connecting RONP address and connection ID
    • Those are designed never to be the same for any two connections (globally unique ID)
    • Consequently, all of TCP's elaborate schemes for dealing with delayed segments go away
    • 64-bit connection ID
      • Use the Java system clock (System.currentTimeMillis())
      • Add in the connecting computer's address left-shifted 32 bits
      • If that's the same as the previous connection ID, add 1
         
  • Corrupted segments
    • Detection: Performed by the UDP layer under RMTP/RONP
    • Response: UDP layer drops a corrupted packet
       
  • RMTP segment format
     0                   1                   2                   3   
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |             Connection ID (most significant word)             |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |             Connection ID (least significant word)            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                 Sequence Number/Port Number*                  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                    Acknowledgment Number                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                     |S|C|F|D|E|                               |
    |       Reserved      |Y|O|I|A|O|            Window             |
    |                     |N|N|N|T|M|                               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                            Payload                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    
    *Connection setup: port number; connection established: sequence number
     


Connection Setup and Teardown


Data Transfer


Flow Control, Part 1


Lost Packet Recovery


Adaptive Timeouts


Flow Control, Part 2

  • A stop-and-wait protocol achieves flow control, but is inefficient on high-latency connections
     
  • Alternative: Window-based flow control
     
  • Window-based flow control in TCP
    • TCP header includes a 16-bit window size field
    • This tells the number of bytes available in the receive buffer of the host that sent the segment
    • Sender can send as many bytes as the receiver's window size
    • Then the sender must stop and wait until the receiver advertises a new window size
    • A window size of 0 is legal and means, "Don't send any data now" -- the receiver closed the window
    • When buffer space becomes available, the receiver sends a segment with a nonzero window size -- the receiver opened the window
    • When the receiver's window is closed, the sender periodically times out and sends a probe segment; the receiver responds with an ack segment stating the window size
       
  • Improving TCP performance
    • Goal: Sender should send data in big chunks, not little chunks
    • Goal: Receiver should ask for data in big chunks, not little chunks
    • Goal: Receiver should piggyback acks, not send separate small ack segments
       
  • Sending data in little chunks, e.g. Telnet or SSH
    • Nagle's algorithm (1984) -- RFC 896, "Congestion Control in IP/TCP Internetworks" (January 1984)
    • The first time, send a little chunk
    • Accumulate little chunks until the ack for the first chunk returns, then send the accumulated chunks as one big chunk
    • Can also send the accumulated chunks before that if they would fill half the receiver's window
    • Can also send the accumulated chunks before that if they would fill the MSS
       
  • Receiving data in little chunks, e.g. a byte-at-a-time application
    • Receiver will oscillate window size between 0 and 1, sender will send one-byte segments -- silly window syndrome
    • Clark's algorithm -- RFC 813, "Window and Acknowledgment Strategy in TCP" (July 1982)
    • Receiver cannot advertise a new window size until the receive buffer can hold the MSS or the receive buffer is half empty, whichever is smaller
       
  • Piggybacking acks
    • If there is data traffic from the sender back to the receiver, the acks can piggyback on the traffic immediately
    • If there is no data traffic from the sender back to the receiver, wait a short time to see if some data traffic shows up
    • If not, then send a separate ack segment
    • Reduces the overhead due to separate ack segments
    • Increases the RTT


Congestion Control

  • Each sending host's Transport Layer sends data at as high as rate as possible while losing as few packets as possible
     
  • Alternative 1: Explicit rate feedback
    • Each router (Network Layer) tells the sending hosts (Transport Layer) how fast to send data
    • Hosts send data no faster than that rate
    • Asynchronous Transfer Mode Available Bit Rate (ATM ABR) service works this way
       
  • Alternative 2: Explicit congestion feedback
    • Each router (Network Layer) tells the sending hosts (Transport Layer) whether the router is congested or not
    • Hosts reduce their sending rate until the routers report they're not congested
    • ATM can also work this way
       
  • Alternative 3: No feedback
    • The routers (Network Layer) don't tell the sending hosts (Transport Layer) anything
    • Hosts must deduce congestion based on network behavior
    • Hosts reduce their sending rate until it appears there is no congestion
    • The Internet (TCP/IP) works this way
       


TCP Congestion Control

  • Deducing congestion: Packet loss events
    • Triple acks
    • Timeouts
    • TCP assumes that if a packet is lost, the reason is congestion, so slow down the sender
      • This is usually a good assumption on sessile networks
      • This is usually a bad assumption on wireless networks
         
  • Controlling sending rate: Congestion window
    • Works exactly the same as the flow control receive window
    • Sender must stop and wait after sending min (receive window, congestion window) bytes
    • For the rest of this section, assume receive window > congestion window, so congestion control (not flow control) determines the sending rate
       
  • Determining congestion window size: Congestion control algorithm
     
  • "TCP Reno" algorithm, first implemented in the Reno release of BSD Unix (1990), most widely used today
    1. Threshold = infinity
    2. Window = 1 * MSS
    3. Send (Window) number of bytes
    4. Wait for success (normal ack) or failure (triple ack, timeout)
    5. If a triple ack occurs:
      1. Threshold = Window / 2 -- "Multiplicative decrease"
      2. Window = Threshold
    6. Else if a timeout occurs:
      1. Threshold = Window / 2 -- "Multiplicative decrease"
      2. Window = 1 * MSS
    7. Else if Window < Threshold:
      1. Window = 2 * Window -- "Slow start"
    8. Else:
      1. Window = Window + 1 * MSS -- "Additive increase"
    9. Go to Step 3
       
  • "TCP Tahoe" algorithm, predecessor of TCP Reno, very similar to TCP Reno except both a triple ack and a timeout initiate a slow start
    1. Threshold = infinity
    2. Window = 1 * MSS
    3. Send (Window) number of bytes
    4. Wait for success (normal ack) or failure (triple ack, timeout)
    5. If a triple ack or a timeout occurs:
      1. Threshold = Window / 2 -- "Multiplicative decrease"
      2. Window = 1 * MSS
    6. Else if Window < Threshold:
      1. Window = 2 * Window -- "Slow start"
    7. Else:
      1. Window = Window + 1 * MSS -- "Additive increase"
    8. Go to Step 3
       
  • Steady-state average throughput of a TCP connection transferring a large quantity of data
     
  • Throughput of a TCP connection transferring a small quantity of data, like a little web page or image

Data Communications and Networks II 4003-541-70/4005-741-70 Spring Quarter 2006
Course Page
Alan Kaminsky Department of Computer Science Rochester Institute of Technology 4486 + 2220 = 6706
Home Page
Copyright © 2006 Alan Kaminsky. All rights reserved. Last updated 22-May-2006. Please send comments to ark­@­cs.rit.edu.