Presentation is loading. Please wait.

Presentation is loading. Please wait.

TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester TCP/IP and ATLAS T/DAQ With help from: Richard, HansPeter, Bob, & …

Similar presentations


Presentation on theme: "TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester TCP/IP and ATLAS T/DAQ With help from: Richard, HansPeter, Bob, & …"— Presentation transcript:

1 TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester TCP/IP and ATLAS T/DAQ With help from: Richard, HansPeter, Bob, & …

2 TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester Micro Introduction to TCP/IP (1) uTCP was designed for reliable bit-wise correct data transfer over slow, unreliable Wide Area Networks uStream orientated – user has to ensure they have ALL the message ! uUses sliding window to control the data flow Transmit buffer size Available space in the receive buffer Congestion window - cwnd Unsent Data may be transmitted immediately Sent Data buffered waiting ACK TCP sliding window Data to be sent waiting for window to open Data sent and ACKed Sending host advances marker as data transmitted Received ACK advances trailing edge Receiver’s advertised window advances leading edge

3 TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester Micro Introduction to TCP/IP (2) uTCP Phases Slow start cwnd initially 1 then increased by 1 MTU for each ACK received – exponential growth –Send 1 st packet get 1 ack increase cwnd to 2 –Send 2 packets get 2 ACKs inc cwnd to 4 – … Congestion avoidance cwnd increased by 1 /MTU for each ACK – linear increase in rate Slow start to Congestion avoidance transition determined by ssthresh Fast Retransmit & Fast Recovery SACKs

4 TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester Micro Introduction to TCP/IP (3) uTCP takes packet loss as indication of congestion ! uLost packets detected by 2 methods: Timeout just don’t get and ACK back. [ Timeout = RTT + 3*σ(rtt) ] 3 duplicate ACKs received by sender Send / recv 12345 ACK1222 Re-transmit3 ACK5 uAction on Packet loss: Timeout Enter Slow-start – set cwnd to 1 3 duplicate ACKs Set ssthresh to half cwnd – so enter congestion avoidance phase (Keep sending when get duplicate ACKs ) Set cwnd to half original value uLoose 1 packet at 1 Gbit between CERN - US take 27min to recover ! uThere is a difference between what the protocol says and what the implementation gives you.

5 TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester An ATLAS TDAQ Candidate Architecture. uMessage Flows: L2PU request to ROBs. SFI request to ROBs. Super to L2PU. Low rate ~ 230Hz / L2PU Super to DFM. Grouped accept+reject from L2 ~ 2 kHz 1Super to 1DMF DMF to SFI. Low rate ~ 20 Hz /SFI DFM to ROB. Mcast clears

6 TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester What does TCP/IP mean for T/DAQ? uProperties of T/DAQ data transfers: Many logical links are involved Links remain alive for a long time – days! Mainly Request-Response - 1 packet request 1-2 packet response generally No Continuous high rate flows i.e. no streaming uTCP 3 way hand-shake not an important time constraint uTCP Slow-start not important Fragments small – within / close to Slow-Start capability uBW limitation due to congestion avoidance not important Fragments small – halving of cwnd not an issue uPacket loss recovery uYou can get it out of the box!

7 TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester Event Building: messages SFI - ROB uEach SFI processes 15 events/s so Repeat every ~ 66 ms on average. uDoubled no. EB frames out of the SFI & into ROB - Increased total frames by 3/2 Extra ROB I/O is used ROB has to compute the ACK uAssume lose a SFI request TCP wont timeout and re-try for ~ 35 ms – a long time cf the RTT uAssume lose a ROB Response SFI wont get the ACK so SFI will timeout and re-send the request. ROB wont get its ACK so TCP will think about timing out and re-try. Both ends re-try !! uAssume lose ACK from SFI TCP in the ROB will resend before the next request. ROB resends data you don’t want !! You have it already SFI Application TCP ROB Application Network Req. Event Got Data Send Data Response 1-2 kbyte In ~ 100 us Need to ACK Req. Event Need to ACK piggyback ACK

8 TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester Event Building: messages DFM – ROB - SFI uEach DMF/SFI processes 15 events/s so Repeat every ~ 66 ms on average. uDoubled no. EB frames on Network uROB does extra work Even more Extra ROB I/O is used ROB has to compute the ACK to send ROB has to compute the ACK received uAssume lose any ACK TCP resends data you don’t want !! You have it already DFM Application TCP ROB Application Network Req. Event be sent SFI Application Gets Data Send Data Response 1-2 kbyte ACK Req. Event ACK

9 TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester Level2: messages L2PU – ROB (1) uIndividual L2UP - ROB req. rates: Hi Lumi. Calo 1 ROB Req rate 6kHz L2PU – requests 1 every ~ 62 ms Many other rates uL2PU accesses 20-40 ROBs per event uMost cases ACK from ROB will piggyback the response. uMany cases TCP will generate an ACK from L2PU to ROB. uLike SFI-ROB have Doubled no. EB frames out of the SFI & into ROB Extra ROB I/O is used Extra ROB CPU to compute the ACK uAssume loose a L2PU request. Just an example ! TCP will re-try: After the ~35 ms timeout / After the next 2 requests to the same ROB Not what you want ! TCP re-try gives a long delay cf the 10 ms processing time of L2PU Blocks all comms between that L2PU and the ROB until the lost packet is received Other worker threads may stall L2PU Application TCP ROB Application Network Req. Event Got Data Send Data Response 1-2 kbyte In ~ 100 us Need to ACK Req. Event Need to ACK piggyback ACK

10 TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester LAN Tests – Req. every 66ms (used 2.4.19-SMP) 64 byte Req. 1400 byte Response ACK of Req. Piggyback ACK of Response Extra packet

11 TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester Well what do we conclude ? uTCP/IP is easy to maintain (but so is UDP/IP) uWhat you get is what is in the box! uThere is a difference between that the protocol says and what the implementation gives you. uTCP/IP is probably is useful for Super to L2PU Super to DFM DMF to SFI uTCP/IP (or the implementation) does things behind your back. uTCP ACKs will generate extra traffic on already loaded links. uTCP does packet loss recovery – good But sometimes when it has done it you no longer want the data ! uTCP does timeouts just like the applications do now for UDP/IP or Raw but much more crudely! uTCP is doing what you are doing anyway but T/DAQ looses fine control of the network transfers and thread operation/timing. uTCP probably will do the job for all cases. uBut you can also wear a fur coat at sea- level on the equator !!


Download ppt "TCP and ATLAS T/DAQ Dec 2002 R. Hughes-Jones Manchester TCP/IP and ATLAS T/DAQ With help from: Richard, HansPeter, Bob, & …"

Similar presentations


Ads by Google