Presentation is loading. Please wait.

Presentation is loading. Please wait.

Control-Plane Protocol Interactions in Cellular Networks

Similar presentations


Presentation on theme: "Control-Plane Protocol Interactions in Cellular Networks"— Presentation transcript:

1 Control-Plane Protocol Interactions in Cellular Networks
Guan-Hua Tu*1, Yuanjie Li*1, Chunyi Peng 2, Chi-Yu Li 1, Hongyi Wang 1, Songwu Lu 1 20s Thanks for introduction. Hi, everyone, I’m very pleased to present our work on the study of cellular network control-plane. With the growing popularity of mobile usage, cellular network has drawn more and more research efforts. In this work, we will explore its control plane protocols, and uncover problematic interactions between protocols. I’d like to show some interesting findings on how unexpected function violation or performance downgrade happens, due to troublesome signaling protocol interactions. It is an joint work with Guan-Hua, Prof. Peng from OSH, Chiyu, Hongyi and Prof. Lu from UCLA. 1: University of California, Los Angeles; 2: The Ohio State University * The first two authors contribute equally to this work.

2 Cellular Services are Ubiquitous
Large-scale wireless infrastructure Offer data and voice services to anyone, anywhere, anytime 6.8+ billion 30s: slow down Cellular network is a large-scale wireless infrastructure that offers universal coverage and wide-area access. It provides mobile users with data service and carrier-grade quality voice service Source:

3 Cellular Network Architecture
Circuit Switching (CS) 3G (PS + CS) Mobile Switching Center 3G Gateways Mobility Management Entity (Control Node) 4G (PS only) 3G Base stations Packet Switching (PS) 1min Let us look at the mobile network architecture to see how these services are possible. Simply speaking, it has two parts. First, base stations to offer radio access Second, core network which consists of gateways (like edge routers), switching centers, and is used to base stations with external networks. To support voice and data services with carrier-grade quality, conventionally, here, 3G networks use CS to support voice and PS to support data. 4G has the similar core network architecture. But compared with 3G, the 4G is migrating to the pure packet-switch based design.

4 Control Plane in Cellular Network
3G Gateways Mobile Switching Center Circuit Switching (CS) Packet Switching (PS) 3G Mobility Management Entity (Control Node) 4G

5 Control Plane in Cellular Network
Layered protocol stack Connectivity Management (CM) Mobility Management (MM) Radio Resource Control (RRC) 1min: slow down The salient feature of its design is its control-plane protocols. The control plane provides critical functions like radio resource management, mobility support, voice/data service etc. Similar to the Internet, these functions are also organized as a layered protocol stack. The radio resource control (RRC) manages radio resources and help route signaling messages. The mobility management layer resides over RRC, which provides location update and mobility support for call/data sessions. The connectivity management layer is responsible for creating and mandating voice calls and data sessions.

6 Control Plane in Cellular Network
Radio Resource Control (RRC) CS Domain MM CM PS Domain Layered protocol stack Domains separated for voice (CS) and data (PS) Connectivity Management (CM) Mobility Management (MM) Radio Resource Control (RRC) But it is more complex for two reasons. First, the cellular network control plane should regulate both the packet switching and circuit switching domain. Due to historical reasons, to adopt to different domains, a similar function can be realized in multiple signaling protocols from different domains.

7 Control Plane in Cellular Network
3G PS Domain MM CM RRC 4G Radio Resource Control (RRC) CS Domain MM CM PS Domain Layered protocol stack Domains separated for voice (CS) and data (PS) Hybrid 3G/4G systems Second, the control plane need operate both 3G and 4G system. With incremental deployment, hybrid 3G/4G operation is very common today. The users can easily experience 3G/4G switching due to mobility.

8 Radio Resource Control
Complex Interactions Problem: Each individual protocol may be well designed. How about protocol interactions? Protocols work together to offer vital 3G/4G utilities Rich patterns along three dimensions cross-domain cross-system CS Domain PS Domain PS Domain cross-layer 1min Each protocol implements a single function. To offer vital 3G/4G utilities, these signaling protocols should work together and communicate with each other. The control plane protocol interactions span in three dimensions, cross-layer/domain/system. As a mature system, each signaling protocol may be well designed and tested. But the question is: how well would protocol interaction behave? Note that this is not guaranteed today, even though single protocol has been largely justified. This is exactly we focus on in this work. In this work, we aim to explore whether there are any problematic interactions between signaling protocols. CM CM CM MM MM MM Radio Resource Control RRC 3G 4G

9 Rich Protocol Interactions
Complex interactions in common scenarios Inevitable interplay between radio, mobility, data/voice Concurrent voice and data use 3G/4G switch due to hybrid deployment, mobility, voice Two causes of problematic interactions Design defects Operation/Implementation slips Diagnosis over one layer/domain/system is insufficient Single-type test fails to unveil both issues 1min This is not an easy task, and there are two challenges. First, each dimension of protocol interaction includes very common usage scenarios. Functions between different layers should communicate with each other. Voice and data can be concurrently used, so both domains should interact. 3G/4G switching is so common in practice due to mobility and hybrid operation. Simply testing one layer, domain or system cannot uncover all the problems. Second, the troublesome interactions can be caused by two reasons. They can be rooted in the 3GPP standards, such that every implementation and operation will be affected. They can also be from some carriers’ specific improper operation practice, or implementation bugs. To unveil their root causes, single-type test may not offer sufficient information to tell if it is an design defect, or operation slip Moreover, in practice the cellular core network is largely a closed system to research community, so limited information can be leveraged.

10 Rich Protocol Interactions
Complex interactions in common scenarios Inevitable interplay between radio, mobility, data/voice Concurrent voice and data use 3G/4G switch due to hybrid deployment, mobility, voice Two causes of problematic interactions Design defects Operation/Implementation slips Diagnosis over one layer/domain/system is insufficient Single-type test fails to unveil both issues Closed Core Network 1min This is not an easy task, and there are two challenges. First, each dimension of protocol interaction includes very common usage scenarios. Functions between different layers should communicate with each other. Voice and data can be concurrently used, so both domains should interact. 3G/4G switching is so common in practice due to mobility and hybrid operation. Simply testing one layer, domain or system cannot uncover all the problems. Second, the troublesome interactions can be caused by two reasons. They can be rooted in the 3GPP standards, such that every implementation and operation will be affected. They can also be from some carriers’ specific improper operation practice, or implementation bugs. To unveil their root causes, single-type test may not offer sufficient information to tell if it is an design defect, or operation slip Moreover, in practice the cellular core network is largely a closed system to research community, so limited information can be leveraged.

11 Our Solution: CNetVerifier
Cellular-specific model checking Extract full-stack cellular model from 3GPP standards Create a variety of usage scenarios Define desirable user-perspective properties Discover counterexamples for possible design defects 1.5min To overcome these issues, we developed CNetVerifier, a tool that helps us uncover both design and operation issues from rich interaction patterns. It has two phases. In the first phase, based on the 3GPP standards, we modeled the full-stack cellular networks. To emulate various usage scenarios, we enumerate all possible signaling exchanges. For some scenarios with unlimited choices (like mobility with different speed and unspecified timers), we randomly sample these parameters with a run-time signal generator, so that various scenarios are activated. Then we apply formal verification techniques to this reference model, and test whether it can always offer user data and voice service demands. This way, we can examines the signaling exchange without accessing the carrier network. The model checker will report counterexamples, which is a superset of possible design problems. Then we construct experimental scenarios based on these counterexamples, and validate the design defects in real network. Meanwhile, with extensive experiments in real network, we can also uncover operation slips and implementation bugs on the mobile phone. Protocol Stacks Model Checker Violated property Counterexamples Usage Settings Desirable Properties

12 Our Solution: CNetVerifier
Cellular-specific model checking Phone-based experimental validation Instrument end devices to collect traces for verification Discover operational slips in real networks Model Checker Violated property Counterexamples Protocol Stacks Usage Settings Desirable Properties Then we construct experimental scenarios based on these counterexamples, and validate the design defects in real network. Meanwhile, with extensive experiments in real network, we can also uncover operation slips and implementation bugs on the mobile phone. For details about how to model the network, check its correctness and validate the problems with experiments, please read papers or talk to me offline. Scenario Setup “Black-box” Design Flaws Operational slips

13 Finding Overview I. Necessary but problematic cooperation
1min With the help of CNetVerifier, we indeed find various problems in inter-protocol communications. They can be classified into two categories. The first category is called “necessary but problematic cooperation”, which means protocols should work together, but their interactions are not well defined. The second category is called “independent but coupled operations”, which means some protocols are independent, but they are artificially correlated. Problems from both categories can result in functional incorrectness or performance downgrade. For each category, the troublesome interactions appear across layers, domains and systems. II. Independent but coupled operations cross-layer cross-domain cross-system

14 Improper cooperation: Cross-System
Improper cooperation: Cross-System Scenario: run data services during 4G3G4G RRC MM CM 3G PS 3G CS 4G PS 1. Setup 4G connectivity to access internet 2. 4G3G: 4G conn. context is converted to 3G for seamless switch 3. 3G4G: 3G conn. context is converted back to 4G 3G 3G Conn. Context 3min Now let’s take a look at both categories. The first finding is an improper cooperation between 3G and 4G systems. All the protocols from 3G PS domain and 4G PS domain are involved. The scenario is to run data service when use switches from 4G to 3G, and switches back to 4G later. This is common in practice, because when user is roaming, he may leave 3G cell and enter 4G, and vice versa. To use the data service, the user should establish a connectivity context that has critical info like IP address, reserved resources and QoS requirements. When switching to 3G, to maintain the seamless data service, the context should be migrated to 3G. For most phones, they can only stay in 3G or 4G, not both. So after the migration, the old 4G context would be deleted. Later when the user switches back to 4G, the context is also migrated back. 4G 4G Conn. Context

15 Improper cooperation: Cross-System How and why?
Improper cooperation: Cross-System How and why? Problematic scenario: 3G context is deleted before returning to 4G 1. 3G conn. context is deleted. 2. 3G->4G: No 3G context transferred to 4G context PS conn context is not mandatory in 3G (PS+CS), but mandatory in 4G (PS only) Shared context for 4G and 3G is not well protected in 3G 3G 3G Conn. Context This is the common procedure, but abnormal cases can happen. The problem arises when the connectivity context is deactivated in 3G. It can be deactivated for many reasons, such as the lower layer failures and the limited phone resource. When later the 4G user switches back to 3G, the device cannot register to the network, because 4G only supports the data service, and the connectivity context is required. As a result, the user loses both 3G and 4G services, until the user re-registers to 4G network again. Note that context deactivation in 3G is allowed, because 3G supports both PS and CS service. Without the context, the user can still use the voice service. Since the context is shared by 3G and 4G, 3G and 4G should work together to protect the data connectivity. But due to their different demands on data services, they fail to achieve this goal. Causes of deletion (in 3GPP) Low layer failures User disables data services No enough resources …. 4G “Out-of-Service”

16 Improper cooperation: Cross-System
Improper cooperation: Cross-System Real-world impact Occurs 3.1% in user study “out-of-service” for up to 25s Lessons: a design defect Different demands of packet switching in 3G & 4G Desirable but not enforced: shared context should be consistently protected in 4G & 3G Proposed remedies Avoid unnecessary 3G PS context deactivation Immediately enable 4G PS context reactivation This is indeed observed in our usage study. The users can stay in out-of-service state up to 25s. Once we know the reason, the solution to this problem is straightforward. The unnecessary context deactivation should be avoided in 3G. The context recovery should be enabled in 4G, rather than simply de-registering the 4G.

17 Improper cooperation: cross-domain+system
Improper cooperation: cross-domain+system Scenario: 4G users make calls via 3G CS Fallback RRC MM CM 3G PS 3G CS 4G PS 2. When the call ends, 3G4G 1. To make a call, 4G user 3G 3G 2min The 2nd finding is an improper cooperation between 3G and 4G, and between CS and PS domain. The scenario is that a 4G user makes a phone call. To reduce cost, today 4G voice is supported by reusing 3G circuit switching. The 4G user switches to 3G, make a call with CS service, and switch back to 4G after the call. 4G

18 Improper cooperation: cross-domain+system How and Why?
Improper cooperation: cross-domain+system How and Why? Problematic Scenario: Call with background data 2. When the call ends, No 3G4G (data is still on) A call makes 4G  3G; Data is migrated to 3G, too 3G However, users cannot always finish this procedure. The problem happens when the user still uses the data service when making the call in 3G. After the call, the user finds itself stuck in 3G, and cannot access 4G’s high-rate data service. Why does it happen? 4G User gets stuck in 3G, losing 4G.

19 Improper cooperation: cross-domain+system How and Why?
Improper cooperation: cross-domain+system How and Why? Unexpected loop in RRC state machine RRC 3G PS 3G CS RRC 4G PS Voice + Data (certain setting) CONN-ED IDLE CONN-ED IDLE RRC state transition is inconsistent with dual- domain, inter-system settings It turns out that, the root cause lies in the RRC protocol. Both the CS and PS domain behavior can affect the radio management. So although they do not directly interact, they indirectly affect each other through RRC protocol. When a call is made, the radio channel stays in the connected state. After the call, it switches back to idle state and switch back to 4G. But when the data is also used, the radio channel stays in connected state after the call. For some 3G4G switch options, it is not allowed to switch to 4G when RRC is in connected state. This is an illustrative example. The real scenario is even more subtle. If you are interested, please read our paper. The RRC state change policy is inconsistent with the inter-system switching. Voice only User gets stuck in 3G, losing 4G.

20 Improper cooperation: cross-domain+system
Improper cooperation: cross-domain+system Real-world impact 62.1% 4G users being stuck in 3G after the call Stuck in 3G for 39.6s in average Lessons: a design defect 3G CS and 3G PS are indirectly coupled in RRC Inconsistent state transition with all 3G4G options Proposed remedies Revise the RRC state transition for possible settings This problem is widely observed in our user study. 62% 4G calls gets stuck in 3G after the call, and they get stuck for 39s in average. The standard should regulate the RRC state transition policy so that it work coherently with 3G4G switching.

21 Improper cooperation: Cross-Layer How and why?
Improper cooperation: Cross-Layer How and why? Problem Scenario: Signaling loss for registration “out-of-service” right after being attached MM 3G PS CM 3G CS 4G PS RRC Deregistered Deregistered Attach request 0.5min We also observe improper cooperation between different layers. When the user powers on and registers to 4G, the mobility management protocol initiates 3-way handshaking between mobile device and the network. But the last message is lost, so the phone thinks it’s registered, but the network does not. When the user later reports its location to network, the network rejects it, and detach the mobile phone. The network has the valid reason to do so, otherwise the register state would be inconsistent. But it is not prepared for the lost signal, and implicitly assumes the underlying protocol has guaranteed reliable signal delivery. Unfortunately the underlying radio resource control protocol cannot do so, because RRC only operates between mobile phone and the base station. A shim layer between MM and RRC is actually needed to offer reliable signal transfer. For more information about this scenario, please read our paper or talk to me offline. Attach accept Deregistered Registered Attach complete Attach complete Upper-layer (MM) assumes underlying reliable in- sequence signal transfer, but lower-layer (RRC) cannot offer this guarantee Registered Location update Location update response (error) Deregistered

22 Unnecessary Coupling: Cross-layer
Scenario: voice/data request with location update 2. After location update, user can send/receive voice and data 1. Location update is triggered by MM (e.g., user moves) RRC MM CM 3G-CS 3G-PS 4G-PS Location Update 2min Now let’s turn our attention to the 2nd category of troublesome protocol interactions: artificial coupling between protocols. The first finding is an unnecessary coupling between mobility management layer and the data/voice management. To route data/voice to the user, the network should know user’s location. To achieve it, the mobile device reports its location to the network. After that, the the user can send and receive voice and data. MSC Dial out

23 Unnecessary Coupling: Cross-layer How and why?
Problematic Scenario: voice/data request during the location update “Without user location, the cellular network cannot route user voice/data.” 1. Location is triggered by MM (e.g., user moves) 2. User dials out RRC MM CM 3G-CS 3G-PS 4G-PS Outgoing voice/data requests can be routed without user location “Updating the location” Location Update Location update is for possible incoming call. So it can be delayed by outgoing data/voice services. The problem happens when the mobile phone dials out during the location update. In this case, the call request is delayed, because the location update is not finished. This seems plausible, because at this moment the network doesn’t know the mobile device’s location, so the call cannot be served. However, note that the call is outbound, the device can always send it out, and the network does not need route anything to the mobile device. Moreover, since the call request should reach the same mobile switching center, the call request itself can implicitly serve as the location update message. It should not be blocked by the underlying location update. The root cause is that, by design the location update is unnecessarily assigned higher priority over outgoing data and voice requests. Unnecessary prioritization of location update over outgoing call/data MSC Dial out Outgoing call is delayed 3G Gateways 3G Base stations

24 Unnecessary Coupling: Cross-layer
Real-world Impact up to 8.3s call delay and 4.1s data delay 7.6% of outgoing calls occur during location update Lessons: a design defect outgoing data/voice requests and location update are independent, but they are artificially correlated Proposed remedies Decouple location update and outgoing data/voice requests E.g., two parallel MM threads for different purposes In our study, such head-of-line blocking delays the call for 8s and data for 4s. To solve this problem, they should be decoupled.

25 Unnecessary Coupling: Cross-domain
Scenario: dial a call during data service in 3G 2. Dials a call 1. Access internet at full rate MM 3G PS CM 3G CS RRC 4G PS Circuit Switching (CS) 12.2Kbps Voice: low rate, low loss (e.g., 16QAM) Data: high rate, loss tolerant (e.g., 64QAM) Packet Switching (PS) For RRC control plane, we assign the modulation scheme 1min The unnecessary coupling can also happen between voice and data domains. Our user study shows that when both voice and data service are used in 3G, the data rate significantly drops by up to 74%. This is caused by the fate sharing between CS and PS domain. The operator configures the RRC so that CS and PS traffic share the radio channel. But they voice and data have competing radio demands. Voice wants low loss radio, so it prefers robust modulation. Data service wants high rate, and is loss tolerant, so it wants high-performance modulation. When they share the channel, one modulation is used, so the data voice is significantly affected. 12.2Kbps 3G 2.5Mbps 10Mbps 10Mbps 2.5Mbps Voice and data have competing demands on the channel, but they have to share the radio channel Data service rate declines up to 74%

26 Unnecessary Coupling: Cross-system
Scenario: Location update in 3G and 4G 1. Update 4G location, and notify 3G MSC 2. 3G location update fails, so 4G deregisters the network Detach MSC unavailable 3G PS MM CM 3G CS RRC 4G PS 3G Unnecessary coupling can also happen between 3G and 4G system. Recall that the 4G voice service is supported by reusing 3G’s circuit switching. To prepare the possible incoming call, the 3G network should know 4G user’s location. When 4G user updates its location, the 4G network propagates this location information to 3G. However, 3G’s location update may not always succeed, so the 4G may receive 3G’s error message. Then the 4G network is configured to immediately deregister the 4G user, and the user loses 4G service. The 3G network improperly exposes this failure to the 4G network, and the 4G network misbehaves. The 3G and 4G should constrain its internal errors within the network infrastructure, and should not propagate it to 4G. The real case is more subtle. For more details, please read the paper 4G 3G internal failures are exposed to 4G devices

27 Conclusion Uncover problems in signaling protocol interactions in cellular networks Three Lessons The layering rule should be fully honored (optimistic assumptions, coupled actions) Inter-domain difference should be well recognized (coupling independent services) Hybrid systems are not properly coordinated (context sharing, fault isolation) More rigorous efforts are needed In conclusion, in this work we show that some inter-protocol signaling are not well designed, while others are nor properly operated. There are three lessons from these findings: First, the layering rule should be properly honored. If the lower layer does not provide certain information, the higher layer should be prepared to work without these functions. Coupling inter-layer actions is also not a good practice unless properly justified. Over-optimistic assumptions on lower-layer Coupled inter-layer actions Second, in the cross-domain case, signaling design should recognize the inter- domain difference. Treating domains identically seems to reduce design and operational complexity, but makes it overly simplistic and error prone. Simplistically treating them identical Independent services but coupled Third, in the cross-system case, failure messages can be shared and acted upon between systems. It is better not to expose such failure-handling operations outside the system unless absolutely needed. Unprotected context sharing Fault isolation and tolerance In the broader scope, research on control-plane protocols in cellular networks warrants more efforts.

28 Questions? Q: Why hybrid 3G/4G operation? A: Incremental deployment.
Q: Why does 4G still use 3G’s CS voice? A: 4G is designed to support voice service with VoLTE, which is similar to VoIP. For cost effectiveness, carriers would like to reuse existing infrastructures from 3G to offer the same service. Q: Why should a single function be “duplicated” in different domains? A: Historically 3G starts with CS service only. Later when PS was introduced, it is desired to make it work while not interrupting existing systems, so separate protocol stacks are used. From technical perspective, it is challenging to integrate CS and PS domain protocols, because they serve traffic with quite different demands. Q: In terms of solution, what’s the difference between cross-layer, cross-domain, and cross-system? A: The basic principles are largely the same, but the problems are shown in different contexts. Q: Does recent work on verification apply here? A: (0) Definitely the principles of these works can be applied. (1) But they should be tailored with the cellular-specific context, like radio control, mobility management and the connectivity management. Q: How to tackle the state explosion problem? A: We use the standard sampling techniques to reduce the state space, particularly on the usage scenario part. To completely solve this problem, definitely more efforts should be devoted. Q: How do you define the properties? How do they work? A: In this work, we defined 3 properties, which represent the availability of data service, voice service and the access to network in mobility. These properties reflect user’s demands. They are formulated as logic constrains over protocol’s finite state machines. (For more info, please read our paper). Q: For the following findings, how did you find them? Model checking? Experiments? A: We may not have time to discuss all details here. Please read our paper or we can have discussion after talk. Briefly speaking, the design defects are found by model checkers, and validated in experiments. Some operation slips are found in experiments. Q: Why connectivity context? A: In 3GPP term, it’s called PDP context and EPS bearer. You keep QoS info, reserved resources and IP address in it. Q: Location update lasts for 8s? A: This is actually a chain effect. Another problematic cross-layer interaction is involved between MM and RRC. For more details, you can read the paper. Q: “I didn’t get any content. Could you explain what you did?” A: Please read the abstract Q: For CSFB issue, will it disappear if VoLTE is used? A: The users can still get stuck in 3G in other scenarios. For example, the roaming user switch from 4G to 3G, and back to 4G later. In 3G, the user uses both data and voice. After the call, the same situation happens. CS Fallback is just an illustrative example to show how RRC state transition is inconsistent with 3G4G switching. Q: You methodology is not complete. A: yes, that’s true, we do have some limitations, like simplified data plane modeling and constrained tested scenarios. But even with these limitations, we still uncover two categories of troublesome protocol interactions in 3 dimensions. This is our initial step toward a complete understanding of cellular networks’ control planes. Q: Is your solution justified? A: we proposed some fixes, and tested in prototypes. We do not claim they are optimal. These fixes show the feasibility of addressing the problems. For better solutions, it calls for more research efforts. Tips: Keep answers short (<1.5min) If cannot explain, or into stuck: can we talk it offline? We can explain it to you More slides for solution, otherwise people would think it is weird Let people elaborate questions. Don’t rush to answer questions Are you asking …? Structure the answer to bullet points More questions here: many people work on it. Slow down control plane, 3G, control-plane interactions Speed up cross-domain/cross-system

29 Backup slides

30 Related Work Protocol verification for the Internet
Since 1990s Single protocol with implementation E.g., [Cohrs’89, SIGCOMM], [Holzmann’91], [Smith’96], TCP [NSDI’04], Routing[SIGCOMM’05], … Emerging techniques for network verification E.g., Anteater [SIGCOMM’11], Head Space Analysis[NSDI’12], NICE [NSDI’12], Alloy[SIGCOMM’13], NetCheck[NSDI’14], Software Dataplane [NSDI’14] … Largely unexplored territory in cellular networks Few efforts, e.g., 2G handoff [Orava’92], Authentication [Tang’13] There have been so many great ideas on control-plane and data-plane verification from the SIGCOMM community. Cellular network also greatly needs the formal methods for testing its functional correctness. In addition to the forwarding on data plane, cellular network includes many unique problems like radio control and mobility management, which calls for more efforts on the methodology of the verification. This work is the first step toward that goal, and we hope more research efforts can be involved in this area.


Download ppt "Control-Plane Protocol Interactions in Cellular Networks"

Similar presentations


Ads by Google