TCP Server Fault Tolerance Using Connection Migration to a Backup Server 2003 IEEE Manish Marwah, Shivakant Mishra, Christof Fetzer University of Colorado International Conference on DSN Presented by JIUN-JAU-CHIOU
Outline Introduction Introduction Overview of ST-TCP Overview of ST-TCP UDP channel & receive bufferUDP channel & receive buffer InitializationInitialization Failure-free periodFailure-free period Failure detectionFailure detection Performance Performance Conclusion Conclusion
Introduction ST-TCP (Server fault-Tolerant TCP) ST-TCP (Server fault-Tolerant TCP) A Primary/Backup approachA Primary/Backup approach ST-TCP is transparent to clientsST-TCP is transparent to clients Changes only required on server sideChanges only required on server side
Overview of ST-TCP Client Backup Primary UDP communication channel
UDP channel UDP channel Heartbeat messageHeartbeat message ACK messageACK message ACKs from backup to primary ACKs from backup to primary NextByteExpected – LastByteAcked >= XNextByteExpected – LastByteAcked >= X A fixed time intervalA fixed time interval Packet retransmissionPacket retransmission Backup sends request to primary to get lost packets. Backup sends request to primary to get lost packets.
Primary ’ s receive buffer Primary ’ s receive buffer Size of buffer is doubledSize of buffer is doubled It was logically divided into 2 partIt was logically divided into 2 part LastByte Acked LastByte Read NextByte Need
Initialization. Initialization. UDP channelUDP channel It created when servers started It created when servers started Sequence numberSequence number Backup server makes its seq. number match primary ’ s Backup server makes its seq. number match primary ’ s
Failure-free period Failure-free period ConsistenceConsistence ACK strategy ACK strategy Modified receive buffer Modified receive buffer Using multicastUsing multicast Heartbeat messageHeartbeat message Client Backup Primary Gateway
Failure detection Failure detection Both servers monitor HB messagesBoth servers monitor HB messages Timeout mechanismTimeout mechanism Timeout = 3 consecutive HB Timeout = 3 consecutive HB Guess is always rightGuess is always right Switch off power of suspect Switch off power of suspect
Performance
Conclusion Low performance overhead during failure-free period Low performance overhead during failure-free period ST-TCP is transparent to clients ST-TCP is transparent to clients Fast failover Fast failover