Presentation is loading. Please wait.

Presentation is loading. Please wait.

Wi-Fi / WLAN Performance Management and Optimization

Similar presentations


Presentation on theme: "Wi-Fi / WLAN Performance Management and Optimization"— Presentation transcript:

1 Wi-Fi / WLAN Performance Management and Optimization
Veli-Pekka Ketonen CTO, 7signal Solutions 7signal has developed Wi-Fi Performance platform. Based in Akron, Ohio Akron claims to invented many things – things like oatmeal, the first synthetic tire, the first artificial fish bait. and it also played a key role in developing Wi-Fi at Aironet, which became part of Cisco Poll – Education, Government. Healthcare, Other Enterprise, Sell solutions to these people?

2 The Wi-Fi Performance Challenge Factors Impacting Performance
Topics The Wi-Fi Performance Challenge Factors Impacting Performance The Wi-Fi Performance Cycle 10 step performance optimization flow Selected example data Summary / Questions 10:30 – 10:40 – INTRO (10 Minutes) 10:40 – 11:00 – FACTORS (20 Minutes) 11:00 – 11:10 – PERFORMANCE CYCLE 11:10 – 11:30 – OPTIMIZATION 11:30 – 11:40 – EXAMPLES & SUMMARY 11:40 – 11:45 -- QUESTIONS

3 Wi-Fi Networks are Everywhere
Wi-Fi Networks are Everywhere! But they are transitioning from “nice to have” to “must have”

4 Wi-Fi Networks are Everywhere
Wi-Fi Networks are Everywhere! But they are transitioning from “nice to have” to “must have” Challenges with Mission Critical Wi-Fi Networks: Connection issues with new devices & machines Bottlenecks from increasing data traffic Dropped or noisy voice calls Challenging physical environments Changes hourly, daily and weekly

5 Dependable Wi-Fi is Costly and Complex
BYOD Video Apps Reactive focus based on complaints $ Virtual Desktop Location Svcs Mobile Computing Cost Needed to Achieve Reliability Guest Networks Voice over Wi-Fi Today, you might use a series of troubleshooting steps and tools in order to identify and fix problems. But as your network gets more and more complex over time, the cost of deploying those tools and resources becomes enormous….and this is not a scalable solution. <click> Complexity of Network Number of access points, clients, applications

6 2. Factors impacting the performance

7 Improper Antenna Selection / Placement
Max gain sideways Antenna gain pattern Antenna gain direction Behind metal grid? Near to conductive or “dense” surface? In common ceiling mounted APs, sideways down tilted patterns is most useful Attenuation upwards Down tilted pattern Typically antennas built in More challenging physical environments, may be external In an environment having issues, good to be aware of what type – are they omnidirectional (typical) or some type of directional and whether they are near some type of structure like a metal grid or cement beam that effects transmissions In common APs, you want a radiated patter tilled downwards to reach clients

8 RF power level is not that simple
+20 dBm RF power isn’t always what your datasheet and settings tell you Impact of: AP/device model Rate/MCS HT 20/40/80 Assumed MIMO gain Assumed diversity/STBC gain Antenna gain Channel #, regulation Passing the Type Approval Back annotation reliability Lower output power and use antenna gain to reach further with higher rates MIMO/TX div. gain, +3 dB +17 dBm No high MCS/rates, + 3dB +14 dBm HT40 - > HT 20, +2 dB +11 dBm Antenna gain, +3 dB +8 dBm Radio output (no antenna), HT40, highest MCS Another factor impacting performance is RF Power Level, and you have to be careful here, because the power is not always what the datasheet on controller settings tell you, and there are a lot of components to RF power. MCS – modulation / coding scheme (QAM, rate, spatial streams, data rate, etc.) STBC gain – space/time/block code You may think the higher the radio output power, the better, but that is not always the case Transmitters are not linear, so at higher rates, the APs cut back on power to provide a non-distorted signal, so you may be better off with a TX power setting slightly lower than max and use antenna gain to be successful transmitting higher rates further distances 180Mbit/s 300 Mbit/s 300 Mbit/s

9 WLAN Transmit Power Control (TPC) can create issues
Common implementation measures neighbor APs levels and keep them below a fixed value Power levels may drift to end of the allowed range Clients commonly use dBm power, running APs much lower levels causes imbalance to link budget. Both uplink and downlink coverage are needed! High received neighbor AP level may drive AP power down Room Room Room Room Room Room ..and cause lack of coverage here Room Room Room Room Other problem with too high power is causing interference with other APs. WLAN manufacturers have automatic TX Power Control, try to create the ideal cell size We’ve found that they tend to drive TX Power tool low, so clients can’t actually hear the APs! Most manufacturers you can tweak the range, and we’ve found it is better to reduce the range – raise the lower limits and lower the higher limits But you don’t want to do this without having data that shows the impact – we’ll talk more about that later Room Room

10 Channel & Utilization Issues
Get the antennas and power right and have ideal coverage, but then have issues with channels being used There are some manufacturers that have invented their own protocol to enhance the basic Collision Avoidance in CSMA/CA but in general Adjacent APs need to use different channels or else they will spend most of the time waiting to talk in a very ineffective 2.4 GHz – 5 MHz spacing, but 20 MHz wide, so there will be overlap unless you stick with 1, 6, 11 5 GHz – 20 MHz spacing so you can use adjacent channels, but when you combine channels to get more bandwidths you need to make sure you don’t create overlap On this graph, which comes from our software console, each line represents a BSSID Channel 4 – neighbor, commercial grade rogue AP that someone has brought in because they’re unhappy with the Wi-Fi, which sometimes look for an “empty” channel, or it could be a default setting on a device like a printer – not a total disaster but it raises the noise floor Performance also impacted by other factors like the number of SSIDs, which create overhead from beaconing and probe responses and cuts down on airtime availability Channel overlap APs outside channel grid HT conflicts Amount of APs/SSIDs Empty AP vs.. loaded AP

11 Allocate channels properly
Use all spectrum you have The most important way to increase capacity -- avoid interference and lower utilization! Some devices do not support all 5 GHz channels, but…try really hard to use all available channels Channel automation parameters may help to make it converge towards a better channel plan If not, use manual channel plan 1 6 6 1 11 Really want to use all the spectrum you have – both 5GHz and 2.4GHZ Really want channel separation, not only on the same floor but between floors, so you have to think 3 dimensionally Manufacturers have mechanisms for channel automation work fine in the lab or small environment, but sometimes they get confused, and you end up with something looking like this Without a very good reason this should not ever happen

12 Sometimes channel automation is not working well and needs help
Continuous channel switching More stable operation Here is an example of channel automation at one university – we call this channel flapping and is not very productive – end users waste a lot of time getting disconnected and reconnected Channel Fly, ARM, RRM

13 Too high rates cause high retries
WLAN AP rate control often uses rates that are too high This causes high amount of retries, which have negative impact on performance Optimal rate Data rates also impact performance In general you’d think that you’d like to transmit at the highest rate possible – if you can send your information quickly, you can get off the air and let someone else talk. But the faster you transmit, the higher the SNR you need at the receiver to accurately receive the data. The general algorithm is to transmit at higher rate – and if you don’t get acknowledgement different manufacturers will decide to transmit again at the same rate or immediately go to a lower rate. If you see a lot of retransmissions, you might be better off lowering the max rate Acknowledgements may go out at lower rate. *Lakshmanan et. al. On link rate adaptation in n WLANs * Haratcherev et.al. : Automatic IEEE Rate Control for Streaming Applications

14 What can rates and retries tell you?
Data rates/MCS = HIGH Typical in WLAN  Target  Unstable, high jitter, packet loss, limited capacity Good coverage, reliable operation, high speed and capacity Retries = HIGH Retries = LOW Very slow, at the coverage boundary Speed limited, working ok It really is a balance between data rates and retransmissions as illustrated by this chart. You would like to operating in the green area. Data rates/MCS = LOW

15 Non Wi-Fi Interference
Impact of devices at 2.4G, not so much at 5G Important to understand that Wi-Fi will work in the presence of other devices, but the noise floor will be higher. A big mistake people make is to focus on these non-wifi devices without understanding how the network is working at a higher level with metrics like throughput, delay and jitter, which I will talk about later Bluetooth Microwave Video cameras Medical devices

16 Legacy mode drives speed down
The largest impact from is b protection When an AP detects an associated b client, AP turns on protection mode (in beacons and probe responses). AP may turn this on also when it detects another AP using protection mode. When protection mode is on, all clients need to start using either RTS/CTS or CTS-to-Shelf protection to avoid collisions This introduces a significant overhead that usually limits throughputs and capacity remarkably If –b support is off, it’s useful to try to remove devices completely. Otherwise they keep probing with –b rates is backward protocol is designed to be backward compatible with slower older devices, but this supports has a big impact. Not only do .11b packets take longer to transmit, but the APs and clients have to go into protection mode because b clients can’t detect the higher rate carrier, which means the added overhead of using RTS/CTS In practice, it is best to disable .b – a lot of universities are doing this, in hospitals, not so easy because of medical devices or VoIP badges still using .b JOKE – WHY DID IT TAKE GOD ONLY 6 DAYS TO CREATE THE WORLD? He didn’t have to deal with legacy systems!

17 TCP does not like lost packets or delay
TCP uses a mechanism called slow start If a packet loss occurs, TCP assumes that it is due to network congestion and takes steps to rapidly reduce the offered load to the network With slow start, TCP starts increasing rate again when consecutive acknowledgements are received properly Slow-start may perform poorly with wireless networks that are losing packets Let’s start to look at higher protocol issues. TCP is of course the basic mechanism to transfer files or web traffic, and it is very sensitive to delay and lost packets. It has a mechanism called slow start where it sends a few packets, waits for an acknowledgement, and based on that decides to increase the window and send more data before getting an acknowledgment. So a lost packet or acknowledgement can impact the overall data transfer rate

18 Retries at different layers using TCP
User User data User may lose patience in 4-10s Application (Layer 5-7) varies Desktop virtualization (used sometime to help with layer 1-4 problems) TCP (Layer 4) Not ACK’d within 2x RTT? -> Resend w/ SLOW START WLAN (Layer 1-2) This shows this process – at the TCP layer you have one level of transmit / acknowledge and below that you have the transmit / acknowledge at the Wi-Fi layer. A lost or delayed packet will impact the overall speed of the perceived connection and the user could get frustrated. Not ACK’d? -> Resend, 7-25 times = A data packet, illustration purposes only

19 Retries at different layers using UDP
User VoIP call, etc. Application (Layer 5-7) UDP does not retransmit, permanently lost packet UDP (Layer 4) WLAN (Layer 1-2) With UDP used in Voice over IP, no high level acknowledgements, so a lost or delayed packet results in jitter or noise How many of you have been on a conference call where one guy is on a bad Wi-Fi connection and the echo cancellers start going crazy – not a pleasant experience. Not ACK’d? -> Resend, 7-25 times Jitter Packet loss = A data packet, illustration purposes only

20 Layer 2 packet fragmentation makes radio more robust
If all goes well, good efficiency #1, B #2, 1500 B ACK ACK If error is detected, content of the whole 1500B packet is lost and needs to be retransmitted #1, 1500 B #1, Retry 1, 1500 B No ACK (lost or any error) Probability of errors in smaller packet is lower and transmitting it has taken less time in the first place #1, 750 B #2, 750 B #3, 750 B #4, 750 B Fragmentation is another factor that can actually improve performance in a noisy environment. By breaking up the packet into smaller fragments, there is a better chance of getting through. Tradeoff is more overhead. ACK ACK ACK Fragmenting packets increases robustness , but increases overhead Aggregating (e.g. Block ACK), reduces robustness, but increases efficiency Fragmentation threshold default value usually 2346B (>1500B, no fragmenting)

21 Higher QoS helps prioritize data
Another mechanism is assigned QoS traffic classes to help prioritize data within the network; a lot of times this gets translated to different SSIDs for Voice and Video, which are more delay sensitive. Voice (VO), Video (VI), Best Effort (BE) and Background (BK) classes * Source: IEEE / aa QoS Tutorial

22 3. The Wi-Fi Performance Cycle

23 Answering the Wi-Fi Challenge
Problem Solution Proactive measurements Check end-to-end performance Analyze historical trends Use metrics based reporting Centralize diagnosis of problems Wait for complaints Limited view of network Little historical data Guess at service levels Remote issues costly to resolve

24 Cost Needed to Achieve Reliability
Bending the Cost Curve BYOD Video Apps Reactive focus based on complaints $ Virtual Desktop Location Svcs Mobile Computing Cost Needed to Achieve Reliability Guest Networks Proactive focus based on continuous measurements Voice over Wi-Fi Complexity of Network Number of access points, clients, applications

25 Performance Management with a Systematic Approach
Simulate Client Traffic (Active Tests) Sensor Access Point(s) Mgmt Station Listen to AP / Client Traffic (Passive Tests)

26 The Eye’s Capabilities
Synthetic Tests End-to-end view at the application layer Data and voice quality measurements (throughput, packet loss, latency, jitter) Traffic Analysis Radio frame header analysis for traffic flow between clients and APs. KPIs for each client, SSID, AP, band and antenna beam RF Analysis AP settings, capabilities, signal levels, channels and noise levels KPIs for each AP, channel and antenna beam Spectrum Analysis High resolution (280kHz) for ISM band Interference source analysis with compass directional data on beams Full Packet Capture Capture remotely Easy export to Wireshark or other tool

27 The Wi-Fi Performance Cycle
If you can’t measure it, you can’t manage it! - Peter Drucker Measure Analyze Optimize Verify Assure

28 4. Optimization flow, 10 step process

29 The most important KPIs
End user metrics (active tests) Connection Success Throughput Packet Loss Latency Jitter Voice quality (MOS) Assess Optimize Layer 2 / Layer 1 metrics(passive tests) Data rates Retry rates Utilization Traffic volume Channels Signal level Spectrum data Want to use high level end user metrics to assess the network while making changes to the lower level network.

30 Optimization flow at a glance
Ensure that APs and antennas are positioned correctly Collect baseline data for a few days, check WLAN SW release, upgrade 1. Preparations and baseline Maximize available spectrum, organize channels for max capacity potential Use manual channel plan in dense areas 2. Channel plan Minimize utilization due to unnecessary traffic # of SSIDs, standards, beaconing, probing, data rates, protection, etc. 3. Minimize utilization Adjust AP power levels & TPC settings for improved SNR at both ends 4. Adjust power levels Remove non-WLAN interference, as much as possible There is always interference, understand whether it has significant impact 5. Reduce non-WLAN interference Make radio more robust towards remaining interference/noise Increased power, dropping max MCS, fragmentation, directional antennas 6. Improve radio robustness QoS categories, AP power levels, load balancing, SSID strategy, roaming 7. Prioritize and balance traffic Ensure sufficient LAN/WAN capacity and performance are present 8. LAN/WAN capabilities Drivers, location, models, settings 9. Improve client operation If performance is not sufficient, consider HW changes Directional antennas, add/move APs, replace equipment, end user devices 10. Physical network changes

31 #1. Understand the baseline
Collect and review all radio parameter settings Verify AP type, antenna performance and placement Collect baseline performance data for 3-5 days Understand peaks and valleys in performance Nighttime data is extremely useful - If empty network can’t provide good throughput, it won’t do that under load either! Analyze and find likely bottlenecks Draft a plan for optimization steps Make small changes and verify each step

32 #2. Plan the channels carefully
Understand # of AP/channel in the whole area Use maximum amount of radio spectrum & channels Align all APs to a common channel grid (1, 6, 11, etc) Fix HT bonding side, HT40+ or HT40- Do not overlap bonded with main channel If automation does not provide a balanced plan, assign channels manually Rotate channels evenly within floor Rotate with offset between floors Remove out of grid devices is possible

33 #3. Minimize utilization
Reduce number of SSIDs/AP to max. 3-4 Note: Every SSID sends an own beacon, days and nights Its common that networks run high utilization w/o clients! Remove b rates (1, 2, 5.5, 11) and their support Remove low MCS and SS multiples Increase beacon interval from 100ms to 300ms Note: Some devices do not allow this. E.g. Vocera badges, older VoIP phones and in general older equipment Increase CCA threshold Remove printers and other devices that keep air busy Why do people assign different SSIDs? Different authentications (guest, etc.), traffic types. What are examples that we have found? (typically we find 3 – guest, VoIP, general). (worse we’ve seen – 15-30) What is a typical beaconing rate? (100 msec). What do you change to from 3 to 1 (Dtim – dynamic traffic interval map)

34 #4. Adjust power levels Define a limited range for TPC algorithms instead of default Observe power level changes also from metrics. Do they correlate with settings? Assign 3-5 dB higher power range for 5 vs. 2.4 GHz Use manual power levels if TPC noes not yield good results If possible, do not exceed the power level that still supports all data rates/MCSs. Consider compensating with higher gain antennas if needed

35 #5. Reduce non-Wi-Fi interference
Interference is present, always! Understand level of impact How are end user metrics impacted? Correlate spectrum data with metrics Analyze spectrum, where does the noise come from? Bluetooth is the most common non-WLAN source Keyboard, mouse, headset, handheld readers Many other potential sources especially at 2.4 GHz band Remove sources when possible Observe impact to throughput and other end user metrics when changes are made If changes are helping, it’s visible in active data

36 #6. Improve WLAN robustness
Remove highest rates/MCS (most sensitive) Run voice SSIDs only -g/-a mode without –n Use radio packet fragmentation Enable interference resistant mode if supported

37 #7. Prioritize and balance traffic
Separate SSIDs (but keep quantity to minimum) Assign QoS classes with WMM (Wireless Multimedia Extensions) Adjust relative AP power levels to move clients Consider use of load balancing, band steering/select and admission control features Different features offered depending on vendor Balancing AP (by forcing highly utilized APs to lower power levels) – do we do this a lot?

38 #8. Ensure sufficient LAN/WAN capacity
Observe utilization at the switch/router interfaces Observe packet loss metrics Internet connection speed may be a bottleneck at remote sites Routing data packets always to controller may impact performance Understand what is sufficient throughput for end user and dimension connections accordingly

39 #9. Improve client operation
Review all client devices and understand where are their antennas Ensure that antennas are not hidden within metal enclosures and have space to operate properly Upgrade WLAN drivers Turn roaming aggressiveness to medium or low Adjust client power level CTS-to-Self may be more efficient than RTS/CTS

40 #10. Physical changes to network
Move APs Add APs Upgrade APs Use good quality and right type of external antennas Every network can be made perform well!

41 5. Examples

42 Akron Children’s Medical Center

43 Uplink throughput Average improved from ~11 to ~14 Mbit/s (27%)
The worst APs improved from ~4 to ~13 Mbit/s. (225%) Antenna change is step 10 in optimization process, but this was step 1? Antenna change ready Channel change Power level change Codec changes Core LAN upgrade

44 Downlink Throughput Average improved from 13 to 17 Mbit/s (30%)
The worst APs improved from 7 to 15 Mbit/s. (110%) Antenna change is step 10 in optimization process, but this was step 1? Antenna change ready Channel change Power level change Codec changes Core LAN upgrade

45 Packet loss From ~2.5% to ~0.5% Antenna change ready Channel change
Power level change Codec changes Core LAN upgrade

46 University, Iowa

47 Downlink throughput (daily)
1st 2nd 3rd 4th 5th 6th 7th Downlink throughput daily averages have improved 50% 1st) Disabling power saving 2nd) Disabling b-data rates , area 1 3rd) Disabling b-data rates in other locations 4th) New channel plan areas 1 &2 5th) New TxPwr settings in XXX and channel plan in YYY 6th) Beacon interval change 7th( Channel re-plan area 3 2.4GHz

48 Downlink throughput (hour)
Minimum values increase up to ~10x 2nd 3rd 4th 5th 6th 1st 7th 1st) Disabling power saving 2nd) Disabling b-data rates , area 1 3rd) Disabling b-data rates in other locations 4th) New channel plan areas 1 &2 5th) New TxPwr settings in XXX and channel plan in YYY 6th) Beacon interval change 7th( Channel re-plan area 3 2.4GHz

49 Avans University of Applied Sciences

50 TCP downlink throughput
1 2 3 4 5 900% improvement in 1st floor 100% improvement in ground floor AP power levels More channels Beacon 300ms HT40

51 HTTP downlink throughput
1 2 3 4 5 90%/50% improvements

52 Voice Quality (MOS), downlink, hourly
+0.25MOS in ground +0.25MOS in 1st floor 1 2 3 4 5

53 50% improvement in 1st floor
Network latency (RTT) 1 2 3 4 5 50% improvement in 1st floor

54 Performance Dashboard
Before Analysis and Optimization After Analysis and optimization We baseline, then we optimize. Tracking the trends

55 6. Summary

56 Summary Wi-Fi is very sensitive to the surroundings and network parameters, even though it somehow works almost no matter where you put it Performance can often be improved significantly by adjusting the network parameters Need relevant continuous data to validate changes Need knowledge of WLAN/RF to decide the actions Optimization requires a pragmatic approach

57 Thank You! www.7signal.com @7signal
Presentation: @7signal


Download ppt "Wi-Fi / WLAN Performance Management and Optimization"

Similar presentations


Ads by Google