Presentation on theme: "Control-Plane Protocol Interactions in Cellular Networks"— Presentation transcript:
1Control-Plane Protocol Interactions in Cellular Networks Guan-Hua Tu*1, Yuanjie Li*1,Chunyi Peng 2, Chi-Yu Li 1, Hongyi Wang 1, Songwu Lu 120sThanks for introduction. Hi, everyone,I’m very pleased to present our work on the study of cellular network control-plane.With the growing popularity of mobile usage, cellular network has drawn more and more research efforts.In this work, we will explore its control plane protocols, and uncover problematic interactions between protocols.I’d like to show some interesting findings on how unexpected function violation or performance downgrade happens, due to troublesome signaling protocol interactions.It is an joint work with Guan-Hua, Prof. Peng from OSH, Chiyu, Hongyi and Prof. Lu from UCLA.1: University of California, Los Angeles;2: The Ohio State University* The first two authors contribute equally to this work.
2Cellular Services are Ubiquitous Large-scale wireless infrastructureOffer data and voice services to anyone, anywhere, anytime6.8+ billion30s: slow downCellular network is a large-scale wireless infrastructure that offers universal coverage and wide-area access.It provides mobile users with data service and carrier-grade quality voice serviceSource:
3Cellular Network Architecture Circuit Switching (CS)3G(PS + CS)Mobile Switching Center3G GatewaysMobility Management Entity(Control Node)4G(PS only)3G Base stationsPacket Switching (PS)1minLet us look at the mobile network architecture to see how these services are possible.Simply speaking, it has two parts.First, base stations to offer radio accessSecond, core network which consists of gateways (like edge routers), switching centers, and is used to base stations with external networks.To support voice and data services with carrier-grade quality, conventionally, here, 3G networks use CS to support voice and PS to support data.4G has the similar core network architecture. But compared with 3G, the 4G is migrating to the pure packet-switch based design.
5Control Plane in Cellular Network Layered protocol stackConnectivity Management (CM)Mobility Management (MM)Radio Resource Control (RRC)1min: slow downThe salient feature of its design is its control-plane protocols.The control plane provides critical functions like radio resource management, mobility support, voice/data service etc.Similar to the Internet, these functions are also organized as a layered protocol stack.The radio resource control (RRC) manages radio resources and help route signaling messages.The mobility management layer resides over RRC, which provides location update and mobility support for call/data sessions.The connectivity management layer is responsible for creating and mandating voice calls and data sessions.
6Control Plane in Cellular Network Radio Resource Control (RRC)CS DomainMMCMPS DomainLayered protocol stackDomains separated for voice (CS) and data (PS)Connectivity Management (CM)Mobility Management (MM)Radio Resource Control (RRC)But it is more complex for two reasons. First, the cellular network control plane should regulate both the packet switching and circuit switching domain. Due to historical reasons, to adopt to different domains, a similar function can be realized in multiple signaling protocols from different domains.
7Control Plane in Cellular Network 3GPS DomainMMCMRRC4GRadio Resource Control (RRC)CS DomainMMCMPS DomainLayered protocol stackDomains separated for voice (CS) and data (PS)Hybrid 3G/4G systemsSecond, the control plane need operate both 3G and 4G system. With incremental deployment, hybrid 3G/4G operation is very common today. The users can easily experience 3G/4G switching due to mobility.
8Radio Resource Control Complex InteractionsProblem:Each individual protocol may be well designed.How about protocol interactions?Protocols work together to offer vital 3G/4G utilitiesRich patterns along three dimensionscross-domaincross-systemCS DomainPS DomainPS Domaincross-layer1minEach protocol implements a single function. To offer vital 3G/4G utilities, these signaling protocols should work together and communicate with each other.The control plane protocol interactions span in three dimensions, cross-layer/domain/system.As a mature system, each signaling protocol may be well designed and tested.But the question is: how well would protocol interaction behave?Note that this is not guaranteed today, even though single protocol has been largely justified.This is exactly we focus on in this work.In this work, we aim to explore whether there are any problematic interactions between signaling protocols.CMCMCMMMMMMMRadio Resource ControlRRC3G4G
9Rich Protocol Interactions Complex interactions in common scenariosInevitable interplay between radio, mobility, data/voiceConcurrent voice and data use3G/4G switch due to hybrid deployment, mobility, voiceTwo causes of problematic interactionsDesign defectsOperation/ImplementationslipsDiagnosis over one layer/domain/system isinsufficientSingle-type test fails to unveil both issues1minThis is not an easy task, and there are two challenges.First, each dimension of protocol interaction includes very common usage scenarios.Functions between different layers should communicate with each other.Voice and data can be concurrently used, so both domains should interact.3G/4G switching is so common in practice due to mobility and hybrid operation.Simply testing one layer, domain or system cannot uncover all the problems.Second, the troublesome interactions can be caused by two reasons.They can be rooted in the 3GPP standards, such that every implementation and operation will be affected.They can also be from some carriers’ specific improper operation practice, or implementation bugs.To unveil their root causes, single-type test may not offer sufficient information to tell if it is an design defect, or operation slipMoreover, in practice the cellular core network is largely a closed system to research community, so limited information can be leveraged.
10Rich Protocol Interactions Complex interactions in common scenariosInevitable interplay between radio, mobility, data/voiceConcurrent voice and data use3G/4G switch due to hybrid deployment, mobility, voiceTwo causes of problematic interactionsDesign defectsOperation/ImplementationslipsDiagnosis over one layer/domain/system isinsufficientSingle-type test fails to unveil both issuesClosed Core Network1minThis is not an easy task, and there are two challenges.First, each dimension of protocol interaction includes very common usage scenarios.Functions between different layers should communicate with each other.Voice and data can be concurrently used, so both domains should interact.3G/4G switching is so common in practice due to mobility and hybrid operation.Simply testing one layer, domain or system cannot uncover all the problems.Second, the troublesome interactions can be caused by two reasons.They can be rooted in the 3GPP standards, such that every implementation and operation will be affected.They can also be from some carriers’ specific improper operation practice, or implementation bugs.To unveil their root causes, single-type test may not offer sufficient information to tell if it is an design defect, or operation slipMoreover, in practice the cellular core network is largely a closed system to research community, so limited information can be leveraged.
11Our Solution: CNetVerifier Cellular-specific model checkingExtract full-stack cellular model from 3GPP standardsCreate a variety of usage scenariosDefine desirable user-perspective propertiesDiscover counterexamples for possible design defects1.5minTo overcome these issues, we developed CNetVerifier, a tool that helps us uncover both design and operation issues from rich interaction patterns.It has two phases.In the first phase, based on the 3GPP standards, we modeled the full-stack cellular networks. To emulate various usage scenarios, we enumerate all possible signaling exchanges. For some scenarios with unlimited choices (like mobility with different speed and unspecified timers), we randomly sample these parameters with a run-time signal generator, so that various scenarios are activated. Then we apply formal verification techniques to this reference model, and test whether it can always offer user data and voice service demands. This way, we can examines the signaling exchange without accessing the carrier network.The model checker will report counterexamples, which is a superset of possible design problems.Then we construct experimental scenarios based on these counterexamples, and validate the design defects in real network.Meanwhile, with extensive experiments in real network, we can also uncover operation slips and implementation bugs on the mobile phone.Protocol StacksModel CheckerViolated propertyCounterexamplesUsage SettingsDesirableProperties
12Our Solution: CNetVerifier Cellular-specific model checkingPhone-based experimental validationInstrument end devices to collect traces for verificationDiscover operational slips in real networksModel CheckerViolated propertyCounterexamplesProtocol StacksUsage SettingsDesirablePropertiesThen we construct experimental scenarios based on these counterexamples, and validate the design defects in real network.Meanwhile, with extensive experiments in real network, we can also uncover operation slips and implementation bugs on the mobile phone.For details about how to model the network, check its correctness and validate the problems with experiments, please read papers or talk to me offline.Scenario Setup“Black-box”Design FlawsOperational slips
13Finding Overview I. Necessary but problematic cooperation 1minWith the help of CNetVerifier, we indeed find various problems in inter-protocol communications.They can be classified into two categories.The first category is called “necessary but problematic cooperation”, which means protocols should work together, but their interactions are not well defined.The second category is called “independent but coupled operations”, which means some protocols are independent, but they are artificially correlated.Problems from both categories can result in functional incorrectness or performance downgrade.For each category, the troublesome interactions appear across layers, domains and systems.II. Independent but coupled operationscross-layercross-domaincross-system
14Improper cooperation: Cross-System Improper cooperation: Cross-SystemScenario: run data services during 4G3G4GRRCMMCM3G PS3G CS4G PS1. Setup 4G connectivity to access internet2. 4G3G: 4G conn. context is converted to 3G for seamless switch3. 3G4G: 3G conn. context is converted back to 4G3G3G Conn. Context3minNow let’s take a look at both categories.The first finding is an improper cooperation between 3G and 4G systems.All the protocols from 3G PS domain and 4G PS domain are involved.The scenario is to run data service when use switches from 4G to 3G, and switches back to 4G later.This is common in practice, because when user is roaming, he may leave 3G cell and enter 4G, and vice versa.To use the data service, the user should establish a connectivity context that has critical info like IP address, reserved resources and QoS requirements.When switching to 3G, to maintain the seamless data service, the context should be migrated to 3G.For most phones, they can only stay in 3G or 4G, not both. So after the migration, the old 4G context would be deleted.Later when the user switches back to 4G, the context is also migrated back.4G4G Conn. Context
15Improper cooperation: Cross-System How and why? Improper cooperation: Cross-System How and why?Problematic scenario: 3G context is deleted before returning to 4G1. 3G conn. context is deleted.2. 3G->4G: No 3G context transferred to 4G contextPS conn context is not mandatory in 3G (PS+CS),but mandatory in 4G (PS only)Shared context for 4G and 3G is not well protected in 3G3G3G Conn. ContextThis is the common procedure, but abnormal cases can happen.The problem arises when the connectivity context is deactivated in 3G.It can be deactivated for many reasons, such as the lower layer failures and the limited phone resource.When later the 4G user switches back to 3G, the device cannot register to the network, because 4G only supports the data service, and the connectivity context is required.As a result, the user loses both 3G and 4G services, until the user re-registers to 4G network again.Note that context deactivation in 3G is allowed, because 3G supports both PS and CS service. Without the context, the user can still use the voice service.Since the context is shared by 3G and 4G, 3G and 4G should work together to protect the data connectivity.But due to their different demands on data services, they fail to achieve this goal.Causes of deletion (in 3GPP)Low layer failuresUser disables data servicesNo enough resources….4G“Out-of-Service”
16Improper cooperation: Cross-System Improper cooperation: Cross-SystemReal-world impactOccurs 3.1% in user study“out-of-service” for up to 25sLessons: a design defectDifferent demands of packet switching in 3G & 4GDesirable but not enforced: shared context should be consistently protected in 4G & 3GProposed remediesAvoid unnecessary 3G PS context deactivationImmediately enable 4G PS context reactivationThis is indeed observed in our usage study. The users can stay in out-of-service state up to 25s.Once we know the reason, the solution to this problem is straightforward.The unnecessary context deactivation should be avoided in 3G.The context recovery should be enabled in 4G, rather than simply de-registering the 4G.
17Improper cooperation: cross-domain+system Improper cooperation: cross-domain+systemScenario: 4G users make calls via 3G CS FallbackRRCMMCM3G PS3G CS4G PS2. When the call ends, 3G4G1. To make a call, 4G user 3G3G2minThe 2nd finding is an improper cooperation between 3G and 4G, and between CS and PS domain.The scenario is that a 4G user makes a phone call.To reduce cost, today 4G voice is supported by reusing 3G circuit switching.The 4G user switches to 3G, make a call with CS service, and switch back to 4G after the call.4G
18Improper cooperation: cross-domain+system How and Why? Improper cooperation: cross-domain+system How and Why?Problematic Scenario: Call with background data2. When the call ends, No 3G4G (data is still on)A call makes 4G 3G;Data is migrated to 3G, too3GHowever, users cannot always finish this procedure.The problem happens when the user still uses the data service when making the call in 3G.After the call, the user finds itself stuck in 3G, and cannot access 4G’s high-rate data service.Why does it happen?4GUser gets stuck in 3G, losing 4G.
19Improper cooperation: cross-domain+system How and Why? Improper cooperation: cross-domain+system How and Why?Unexpected loop in RRC state machineRRC3G PS3G CSRRC4G PSVoice + Data(certain setting)CONN-EDIDLECONN-EDIDLERRC state transition is inconsistent with dual- domain, inter-system settingsIt turns out that, the root cause lies in the RRC protocol.Both the CS and PS domain behavior can affect the radio management. So although they do not directly interact, they indirectly affect each other through RRC protocol.When a call is made, the radio channel stays in the connected state. After the call, it switches back to idle state and switch back to 4G.But when the data is also used, the radio channel stays in connected state after the call.For some 3G4G switch options, it is not allowed to switch to 4G when RRC is in connected state.This is an illustrative example. The real scenario is even more subtle. If you are interested, please read our paper.The RRC state change policy is inconsistent with the inter-system switching.Voice onlyUser gets stuck in 3G, losing 4G.
20Improper cooperation: cross-domain+system Improper cooperation: cross-domain+systemReal-world impact62.1% 4G users being stuck in 3G after the callStuck in 3G for 39.6s in averageLessons: a design defect3G CS and 3G PS are indirectly coupled in RRCInconsistent state transition with all 3G4G optionsProposed remediesRevise the RRC state transition for possible settingsThis problem is widely observed in our user study. 62% 4G calls gets stuck in 3G after the call, and they get stuck for 39s in average.The standard should regulate the RRC state transition policy so that it work coherently with 3G4G switching.
21Improper cooperation: Cross-Layer How and why? Improper cooperation: Cross-Layer How and why?Problem Scenario: Signaling loss for registration“out-of-service” right after being attachedMM3G PSCM3G CS4G PSRRCDeregisteredDeregisteredAttach request0.5minWe also observe improper cooperation between different layers. When the user powers on and registers to 4G, the mobility management protocol initiates 3-way handshaking between mobile device and the network. But the last message is lost, so the phone thinks it’s registered, but the network does not. When the user later reports its location to network, the network rejects it, and detach the mobile phone. The network has the valid reason to do so, otherwise the register state would be inconsistent. But it is not prepared for the lost signal, and implicitly assumes the underlying protocol has guaranteed reliable signal delivery. Unfortunately the underlying radio resource control protocol cannot do so, because RRC only operates between mobile phone and the base station. A shim layer between MM and RRC is actually needed to offer reliable signal transfer.For more information about this scenario, please read our paper or talk to me offline.Attach acceptDeregisteredRegisteredAttach completeAttach completeUpper-layer (MM) assumes underlying reliable in- sequence signal transfer, but lower-layer (RRC) cannot offer this guaranteeRegisteredLocation updateLocation update response (error)Deregistered
22Unnecessary Coupling: Cross-layer Scenario: voice/data request with location update2. After location update, user can send/receive voice and data1. Location update is triggered by MM (e.g., user moves)RRCMMCM3G-CS3G-PS4G-PSLocation Update2minNow let’s turn our attention to the 2nd category of troublesome protocol interactions: artificial coupling between protocols.The first finding is an unnecessary coupling between mobility management layer and the data/voice management.To route data/voice to the user, the network should know user’s location.To achieve it, the mobile device reports its location to the network.After that, the the user can send and receive voice and data.MSCDial out
23Unnecessary Coupling: Cross-layer How and why? Problematic Scenario: voice/data request during the location update“Without user location, the cellular network cannot route user voice/data.”1. Location is triggered by MM (e.g., user moves)2. User dials outRRCMMCM3G-CS3G-PS4G-PSOutgoing voice/data requests can be routed without user location“Updating the location”Location UpdateLocation update is for possible incoming call. So it can be delayed by outgoing data/voice services.The problem happens when the mobile phone dials out during the location update. In this case, the call request is delayed, because the location update is not finished.This seems plausible, because at this moment the network doesn’t know the mobile device’s location, so the call cannot be served.However, note that the call is outbound, the device can always send it out, and the network does not need route anything to the mobile device.Moreover, since the call request should reach the same mobile switching center, the call request itself can implicitly serve as the location update message.It should not be blocked by the underlying location update.The root cause is that, by design the location update is unnecessarily assigned higher priority over outgoing data and voice requests.Unnecessary prioritization of location update over outgoing call/dataMSCDial outOutgoing call is delayed3G Gateways3G Base stations
24Unnecessary Coupling: Cross-layer Real-world Impactup to 8.3s call delay and 4.1s data delay7.6% of outgoing calls occur during location updateLessons: a design defectoutgoing data/voice requests and location update are independent, but they are artificially correlatedProposed remediesDecouple location update and outgoing data/voice requestsE.g., two parallel MM threads for different purposesIn our study, such head-of-line blocking delays the call for 8s and data for 4s.To solve this problem, they should be decoupled.
25Unnecessary Coupling: Cross-domain Scenario: dial a call during data service in 3G2. Dials a call1. Access internet at full rateMM3G PSCM3G CSRRC4G PSCircuit Switching (CS)12.2KbpsVoice: low rate, low loss (e.g., 16QAM)Data: high rate, loss tolerant (e.g., 64QAM)Packet Switching (PS)For RRC control plane, we assign the modulation scheme1minThe unnecessary coupling can also happen between voice and data domains.Our user study shows that when both voice and data service are used in 3G, the data rate significantly drops by up to 74%.This is caused by the fate sharing between CS and PS domain.The operator configures the RRC so that CS and PS traffic share the radio channel. But they voice and data have competing radio demands.Voice wants low loss radio, so it prefers robust modulation.Data service wants high rate, and is loss tolerant, so it wants high-performance modulation.When they share the channel, one modulation is used, so the data voice is significantly affected.12.2Kbps3G2.5Mbps10Mbps10Mbps2.5MbpsVoice and data have competing demands on the channel, but they have to share the radio channelData service rate declines up to 74%
26Unnecessary Coupling: Cross-system Scenario: Location update in 3G and 4G1. Update 4G location, and notify 3G MSC2. 3G location update fails, so 4G deregisters the networkDetachMSCunavailable3G PSMMCM3G CSRRC4G PS3GUnnecessary coupling can also happen between 3G and 4G system.Recall that the 4G voice service is supported by reusing 3G’s circuit switching.To prepare the possible incoming call, the 3G network should know 4G user’s location.When 4G user updates its location, the 4G network propagates this location information to 3G.However, 3G’s location update may not always succeed, so the 4G may receive 3G’s error message.Then the 4G network is configured to immediately deregister the 4G user, and the user loses 4G service.The 3G network improperly exposes this failure to the 4G network, and the 4G network misbehaves.The 3G and 4G should constrain its internal errors within the network infrastructure, and should not propagate it to 4G.The real case is more subtle. For more details, please read the paper4G3G internal failures are exposed to 4G devices
27ConclusionUncover problems in signaling protocol interactions in cellular networksThree LessonsThe layering rule should be fully honored (optimistic assumptions, coupled actions)Inter-domain difference should be well recognized (coupling independent services)Hybrid systems are not properly coordinated (context sharing, fault isolation)More rigorous efforts are neededIn conclusion, in this work we show that some inter-protocol signaling are not well designed, while others are nor properly operated.There are three lessons from these findings:First, the layering rule should be properly honored. If the lower layer does not provide certain information, the higher layer should be prepared to work without these functions. Coupling inter-layer actions is also not a good practice unless properly justified.Over-optimistic assumptions on lower-layerCoupled inter-layer actionsSecond, in the cross-domain case, signaling design should recognize the inter- domain difference. Treating domains identically seems to reduce design and operational complexity, but makes it overly simplistic and error prone.Simplistically treating them identicalIndependent services but coupledThird, in the cross-system case, failure messages can be shared and acted upon between systems. It is better not to expose such failure-handling operations outside the system unless absolutely needed.Unprotected context sharingFault isolation and toleranceIn the broader scope, research on control-plane protocols in cellular networks warrants more efforts.
28Questions? Q: Why hybrid 3G/4G operation? A: Incremental deployment. Q: Why does 4G still use 3G’s CS voice?A: 4G is designed to support voice service with VoLTE, which is similar to VoIP.For cost effectiveness, carriers would like to reuse existing infrastructures from 3G to offer the same service.Q: Why should a single function be “duplicated” in different domains?A: Historically 3G starts with CS service only. Later when PS was introduced, it is desired to make it work while not interrupting existing systems, so separate protocol stacks are used. From technical perspective, it is challenging to integrate CS and PS domain protocols, because they serve traffic with quite different demands.Q: In terms of solution, what’s the difference between cross-layer, cross-domain, and cross-system?A: The basic principles are largely the same, but the problems are shown in different contexts.Q: Does recent work on verification apply here?A: (0) Definitely the principles of these works can be applied.(1) But they should be tailored with the cellular-specific context, like radio control, mobility management and the connectivity management.Q: How to tackle the state explosion problem?A: We use the standard sampling techniques to reduce the state space, particularly on the usage scenario part. To completely solve this problem, definitely more efforts should be devoted.Q: How do you define the properties? How do they work?A: In this work, we defined 3 properties, which represent the availability of data service, voice service and the access to network in mobility. These properties reflect user’s demands. They are formulated as logic constrains over protocol’s finite state machines. (For more info, please read our paper).Q: For the following findings, how did you find them? Model checking? Experiments?A: We may not have time to discuss all details here.Please read our paper or we can have discussion after talk.Briefly speaking, the design defects are found by model checkers, and validated in experiments.Some operation slips are found in experiments.Q: Why connectivity context?A: In 3GPP term, it’s called PDP context and EPS bearer. You keep QoS info, reserved resources and IP address in it.Q: Location update lasts for 8s?A: This is actually a chain effect. Another problematic cross-layer interaction is involved between MM and RRC. For more details, you can read the paper.Q: “I didn’t get any content. Could you explain what you did?”A: Please read the abstractQ: For CSFB issue, will it disappear if VoLTE is used?A: The users can still get stuck in 3G in other scenarios. For example, the roaming user switch from 4G to 3G, and back to 4G later. In 3G, the user uses both data and voice. After the call, the same situation happens. CS Fallback is just an illustrative example to show how RRC state transition is inconsistent with 3G4G switching.Q: You methodology is not complete.A: yes, that’s true, we do have some limitations, like simplified data plane modeling and constrained tested scenarios. But even with these limitations, we still uncover two categories of troublesome protocol interactions in 3 dimensions. This is our initial step toward a complete understanding of cellular networks’ control planes.Q: Is your solution justified?A: we proposed some fixes, and tested in prototypes. We do not claim they are optimal. These fixes show the feasibility of addressing the problems. For better solutions, it calls for more research efforts.Tips:Keep answers short (<1.5min)If cannot explain, or into stuck: can we talk it offline? We can explain it to youMore slides for solution, otherwise people would think it is weirdLet people elaborate questions. Don’t rush to answer questionsAre you asking …?Structure the answer to bullet pointsMore questions here: many people work on it.Slow down control plane, 3G, control-plane interactionsSpeed up cross-domain/cross-system
30Related Work Protocol verification for the Internet Since 1990sSingle protocol with implementationE.g., [Cohrs’89, SIGCOMM], [Holzmann’91], [Smith’96], TCP [NSDI’04], Routing[SIGCOMM’05], …Emerging techniques for network verificationE.g., Anteater [SIGCOMM’11], Head Space Analysis[NSDI’12], NICE [NSDI’12], Alloy[SIGCOMM’13], NetCheck[NSDI’14], Software Dataplane [NSDI’14] …Largely unexplored territory in cellular networksFew efforts, e.g., 2G handoff [Orava’92], Authentication [Tang’13]There have been so many great ideas on control-plane and data-plane verification from the SIGCOMM community. Cellular network also greatly needs the formal methods for testing its functional correctness. In addition to the forwarding on data plane, cellular network includes many unique problems like radio control and mobility management, which calls for more efforts on the methodology of the verification. This work is the first step toward that goal, and we hope more research efforts can be involved in this area.