Download presentation
Published byRegina McCarthy Modified over 9 years ago
1
Proxysg performance Thank you for joining today’s Blue Coat Customer Support Technical Webcast! The Webcast will begin just a minute or so after the top of the hour to allow today’s very large audience sufficient time to join You may join the teleconference through the numbers provided in your invite, or listen through your computer speakers Audio broadcast will only go live when the Webcast begins – there will be silence until then The Presentation will run approximately 60 minutes There will be a 30-minute Q/A session thereafter Please submit questions using the Webex Q/A feature!
2
ProxySG Performance Webcast
Paul Kao Director Product Management December 16, 2014
3
Agenda ProxySG Overview Performance Model
Architecture (SGOS, CW, SW, Policy checkpoints) System resources/metrics Performance Model Factors Impacting Performance Authentication, ICAP, Policy, SSL, misc. Critical Resource Monitoring CPU, Memory, CW, network Troubleshooting Performance Problems Baseline, CPU monitor, Policy trace, Sysinfo
4
ProxySG Overview
5
SGOS Overview SGOS is a secure, hardened and proprietary OS developed by Blue Coat to be robust and scalable at the highest levels of performance It is unlike other operating systems Microkernel, message pass architecture using “admin” and “worker” model for processes Run to completion semantics Uses an object store (cache engine/cache admin), no file system, no directory structure Policy is deeply integrated into SGOS Checkpoints at entry/exit of proxy traffic flow to evaluate policy transaction
6
SGOS Architecture Client Worker (CW) – Processes HTTP session between SG and client Server Worker (SW) – Processes HTTP session between SG and OCS Retrieval Worker (RW) – Pipeline and keeps the content of the cache fresh Specialized Worker – Handles a specific protocol, like streaming, CIFS, etc.
7
Policy Checkpoints Workers provide available information to policy
server_url.domain= client.address= set(response.header.Set-Cookie, “x") http.response.apparent_data_type= Workers provide available information to policy Policy transaction re-evaluated at each check point Policy decisions are stored a policy ticket
8
ProxySG Appliance Physical Resources
Core appliance resources are: CPU, Memory, Disk, Network Interface CPU No CPU throttling - continue to handle more load until appliance is at CPU limit (assuming other resources are available). At this point, requests take longer to process, with longer transaction times. Memory Threshold Monitor (TM) engages at 80% memory pressure, goes into regulation, which limits HTTP acceptance to reduce rate of processing new incoming connections. Disk At high disk utilization, back off mechanisms will engage to maintain throughput at the expense of cache efficiency (disk read/writes) Network Interface Will trigger event log if network interface is saturated (TCP livelock)
9
ProxySG Appliance Metric User count & Client Worker
Appliance has fixed CPU/Memory/Disk/Network resources One additional metric – “Licensed Client IP” From a sizing perspective, “Licensed Client IP” is the maximum unique IPs that a given SG appliance should handle Usually, Client IP is synonymous with user/employee Licensed Client IP A “soft” limit on HW appliances A “hard” limit on Virtual appliances Performance of appliances constrained by available number of HTTP/TCP-Tunnel “Client Workers” (CW) for processing Each appliance model has it’s own CW limit CW limit does not limit any other TCP session on SG CW limit is only a count of active client side sessions
10
Performance Model
11
Performance Model
12
Factors impacting Performance
Client Network deployment Authentication mode DNS, Content Filtering ICAP REQMOD (DLP) ICAP RESPMOD (CAS) System services, logging Policy SSL
13
Performance Factors 1. Client
14
1. Client Side Client to SG connection (client side) S-Series hardware
Limited by HTTP/TCP-Tunnel CW User (client IP) is not an enforced metric. User is a model for sizing CW limit does not include other TCP sessions (auth, ICAP, bypass,..) Don’t confuse TCP-Tunnel proxy CW as the TCP connection limit!!! S-Series hardware S-series models – 5 connections/per user (user = unique client IP) Examples: Financial trader, 50 conns per user Kiosk, 1 connection per user S200-10 S200-20 S200-30 S200-40 S400-20 S400-30 S400-40 S500-10 S500-20 Users 400 1,200 2,600 5,000 6,000 14,000 25,000 30,000 50,000 Max CW 2,000 13,000 70,000 125,000 150,000 250,000
15
Performance Factors 2. Network Deployment
16
2. Network Deployment Network 101 WCCP Physically Inline (bridging)
Link/duplex settings WCCP GRE vs L2 Set MTU appropriately to avoid fragmentation with GRE Physically Inline (bridging) Good for smaller sites Larger sites with significant non web (bypass) traffic that can consume network resources
17
Performance Factors 3. Authentication Mode
18
3. Authentication Evaluated at CI
Choice of Authentication mode can impact performance Explicit proxy with NTLM: SG issues a 407 challenge for each connection IP Surrogate: After initial authentication, will use authentication cache Kerberos: credentials validated without need to contact DC NTLM does not scale well NTLM credential cannot be cached, and must be validated by DC Default Windows configuration processes only one request at a time via Schannel Exacerbated by latency and load on DC (SG-DC or SG-BCAA-DC) Kerberos preferred for scalability
19
Performance Factors 4. DNS, Content Filtering
20
4. DNS, Content filter DNS Content Filtering (evaluated at Client In)
Not a high consumer of CPU, but can be cause of latency If external DNS servers are slow/overloaded, Proxy will amplify the problem Use caution for policies/logging that trigger RDNS lookups Content Filtering (evaluated at Client In) BCWF Efficient categorization for high performance Settings for lower memory footprint appliances Web Pulse DRTR Minimal overhead
21
Performance Factors 5. ICAP REQMOD (DLP)
22
5. ICAP General & ICAP REQMOD
ICAP – Internet Content Adaptation Protocol Used to vector both REQuest and RESPonse traffic for scanning ICAP – General Performance considerations Persistent connection with re-use Sufficient ICAP connections to handle throughput or queuing will occur Relatively “expensive” – content must be sent over ICAP Policy dictates how much content is sent (ICAP best practices) Worst case is all content sent to ICAP ICAP REQMOD evaluated at CI (before Server Out) Scan data on outbound request Scanning POST body data Incremental cost due to low volume of data (POST body data)
23
Performance Factors 6. ICAP RESPMOD (CAS/AV)
24
6. ICAP RESPMOD (Content Analysis)
Evaluated at Server In (SI) Higher cost due to volume of incoming request data For ICAP RESPMOD, cache to disk for performance (no need to return payload when response is 204 No Modification) Infinite Streams ICAP deferred connections ICAP mirroring (SG6.5) Secure ICAP SSL cost in initial connection setup SSL overhead of bulk encryption low
25
Performance Factors 7. System Services
26
7. System Services Access logging Health Checks SNMP Attack Detection
Log entry written when connection is complete A few percent overhead when enabled Obviously more overhead if multiple log facilities in use Health Checks SNMP Attack Detection Failover, SGRP (VRRP) Connection Forwarding Scripts, polling of local policy Snapshots, Debug logs
27
Performance Factors 8. Policy
28
8. Policy and CPU Policy impact can range from minimal to majority of CPU cost on SG Look for policy best practices Avoid regexes, order rules most likely to match first, group rules, etc. A point of reference Policy used for SWG/ICAP/SSL consumes about 15% of total CPU Scale appropriately for higher/lower policy usage Variation across platforms Only use as a rule of thumb Not guaranteed to be exact May change in the future
29
Performance Factors 9. SSL
30
9. SSL Intercept
31
SSL DHE and 1024/2048 keys DHE Cipher DHE cipher was added for FIPs certification for secure administrative management DHE cipher is computationally expensive, requires 3-6x more CPU DHE cipher also available by SSL proxy for intercepting data plane traffic Will see CPU increase from DHE in SSL upgrading from earlier releases to x DHE Resolution patch has CLI to do same: #co t -> ssl -> proxy dhe-ciphers disable Upgrade to , DHE for SSL proxy is now disabled by default (can still be enabled) 2048 keys Added support for 2048 keys in 6.5 Significant cost to emulate 2048 certificates, but certs are cached in steady state (2hrs) Incremental cost to support 2048 bit key exchange 2048 keys Resolution Upgrade to , use new CLI to increase certificate cache timeout to tune perf #co t -> ssl -> proxy set-cert-cache-timeout <hrs> Upgrade to , use new CLI to force emulation back to 1024 (only if necessary) #co t -> ssl -> proxy force-emulated-cert-keysize 1024
32
Certificate Emulation Statistics (SG6.5.5.1)
SSL Statistics (in Sysinfo and SSL/Statistics URL) Certificate Emulation SPS51 Total certificates emulated 2,264 SPS52 Total RSA 2048 bit key certificates emulated 2,250 SPS53 Current cached emulated server certificates 1,078 SPS54 Total emulated server certificates added to cache 1,390 SPS55 Total emulated server certificates removed from cache due to timeout SPS56 Total emulated server certificates removed from cache due to maxsize SPS57 Total emulated server certificates removed from cache due to signature mismatch 312 SPS58 Total emulated server certificates removed from cache due to config changes SPS59 Total emulated server certificates add to cache failures 874 SPS61 Total server certificate cache successful lookups 42,109 SPS62 Total proxy certificates emulated 5 SPS63 Total certificate emulation failures % certificate emulation change = SPS51 / (SPS51 + SPS61) In steady state, % of new emulations should be very small
33
SSL & Wildcard certificates
Wildcard certificates (e.g., *.google.com and others) Google and other properties starting to use wildcard certificates Wildcards allow certs with the same CN to appear on multiple servers. Different servers have different certs (different expiration, keys, extensions, etc.) SG’s emulated certificates are cached using “CN” as the key value SG is seeing these different certs all with the same CN, causing a collision in the certificate cache and forcing SG to re-emulate certificate This can lead to high CPU on all SG6.x versions (6.2 through to 6.5) Future certificate cache enhancement planned, use policy resolution below Wildcard certificates Resolution Install the following policy (creates a unique instance for each certificate) <ssl-intercept> ssl.forward_proxy(https) ssl.forward_proxy.splash_text("$(x- rs-certificate-serial-number)$(x-rs-certificate-valid- from)$(x-rs-certificate-valid-to)") Monitor efficacy using % certificate emulations (=SPS51 / (SPS51 + SPS61))
34
SSL Proxy Certificate Cache
Advanced URL SSL Proxy Certificate Cache URL_Path /sslproxy/certcache <PRE>Certificate Cache Contents Number of cache entries: 1078 Common Name, Splash Text, Splash URL, Server Keyring rtax.criteo.com,, cloudfront.net,, s3.wpc.edgecastcdn.net,, beacon.walmart.com,, *.linkedin.com, FAAB168CFFE4A Apr 17 12:30: GMT Apr 17 12:30: GMT, beis.cc.iup.edu,, *.widget.custhelp.com, BAC372720E3496C661336F0Feb 28 00:00: GMTMar 30 23:59: GMT, ads.dotomi.com,02F7CASep 3 03:33: GMTNov 5 14:50: GMT, *.wer.microsoft.com,28DB34EB Apr 4 17:56: GMTApr 4 17:56: GMT, *.ebay.com,, *.googleusercontent.com,, *.reson8.com,D3C03378DC74A2ABF36132E69E273C45Jun 2 00:00: GMTJul 21 23:59: GMT, stage.tracker.springserve.com,, services.addons.mozilla.org,, *.tapad.com,024906Jun 2 08:10: GMTSep 3 03:30: GMT, *.dropbox.com,, $(x-rs-certificate-serial-number) $(x-rs-certificate-valid-from) $(x-rs-certificate-valid-to)
35
Wildcard Certificate resolution VPM
From VPM, edit SSL-Intercept layer Click on "Splash Text" and paste the below text in the box: $(x-rs-certificate-serial-number)$(x-rs-certificate-valid-from)$(x-rs-certificate-valid-to)
36
Critical Resource Monitoring & Troubleshooting Performance
37
Critical Resource Monitoring
What key metrics should be monitored? CPU Utilization Memory Pressure Network Throughput Client side HTTP connections (CWs) Response time through ProxySG (and DNS response time) Establish a Baseline and Peak utilization Beware trend averages over long time intervals that “flatten” peaks Identify true peak CPU utilization in busy hour Peak CPU typically correlates with memory and connections Baseline CPU distribution across components with CPU monitor SNMP MIBs See BLUECOAT-SG-PROXY-MIB.txt for resource monitoring Also BLUECOAT-SG-ICAP-MIB.txt has been added in SG6.5 See “Critical Resource Monitoring of the ProxySG” on BTO Has the connection limit for each platform
38
Troubleshooting Performance
Common performance issues High CPU Slowness Easier to troubleshoot if you have already established a point of reference (baseline) Issue repeatable? Time of occurrence Over a long period of time? Over a short period of time? Intermittent? Tools CPU Monitor Sysinfo snapshots Policy trace
39
Troubleshooting Performance High CPU
External Network Factors Typically not going to be cause of high CPU on SG Dependent Factors Problem with Authentication server or Auth configuration (Kerberos falling back to NTLM) Internal factors Audit config changes to SG – complex policy/regexes? Loops – authentication, forwarding loops Upgrade of SG version/bug? Undersized? Self inflicted - enabling snapshots/debug logs too frequently? Traffic patterns that change SG resource utilization Change in traffic pattern resulted in lack of available resources Change in traffic pattern hitting expensive policy Under attack? – Viruses, rogue apps, open proxy Bug?
40
Troubleshooting Performance High CPU
Data collection Enable CPU monitor Create and enable 5 min snapshots Don’t change the existing daily or hourly snapshot values Is high CPU constant, randomly spiking or just at peak busy hour?
41
Troubleshooting Performance High CPU Example 1
Lots of regex rules in policy Very complex policy (lots of rules) Authentication problem High number of transection per sec Example-1 >>>> CPU is high for Policy evaluation CPU Monitor CPU % Policy evaluation - HTTP % HTTP and FTP % Object Store % Access Logging % Miscellaneous % CPU % Policy evaluation - HTTP % TCPIP % DNS service %
42
Troubleshooting Performance High CPU Example 2
System had hard time to read or write anything to disk. Indicate problem with Disk. Example-2 >>>> CPU is high in Object Store CPU Monitor CPU % Object Store % ce_admin % Access Logging % CPU % TCPIP % tcpip % HTTP and FTP % http % kernel % Policy evaluation - HTTP % policy_enforcement %
43
Troubleshooting Performance High CPU Example 3
CPU is almost evenly distributed between HTTP and FTP TCPIP Object store Policy evaluation Load/sizing issue Example-3 >>>> CPU is high across multiple components CPU Monitor: Configured interval duration: 5 seconds Current interval complete in: 2 seconds CPU % TCPIP % HTTP and FTP % Object Store % Policy evaluation - HTTP % DNS service % Access Logging % Miscellaneous %
44
Troubleshooting Performance High CPU Example 4
Too much bypass traffic. Too many TCP connections. May be a TCP attack. Too many entries in time wait state. Example-4 >>>> CPU is high in TCP Configured interval duration: 5 seconds Current interval complete in: 0 seconds CPU % Object Store % HTTP and FTP % Policy evaluation - HTTP % Miscellaneous % CPU % TCPIP % HTTP and FTP % Policy evaluation - HTTP % DNS service %
45
Troubleshooting Performance Slowness
Can be difficult to troubleshoot, especially if intermittent External Network Factors audit change requests to (upstream) network (over last week) E.g., new FW installed last weekend Network: Packet loss, retransmissions, asymmetric routing Dependent Factors DNS, Authentication, 3rd party ICAP servers Internal factors Audit config changes to SG, starting with most recent (work backwards to last 2-3 days if intermittent problem) Traffic patterns that change SG resource utilization SSL ciphers Attack/bot
46
Troubleshooting Performance Slowness
Data collection May require multiple rounds of troubleshooting (PCAP & Sysinfo snapshots) Easiest to target specific client or server to test May need to test with different configurations and capture with different filter to narrow down the issue Important to analyze Snapshots. Check if resource load are high (e.g. CPU, memory, HTTP worker and etc.) Check on any priority 1 events & health check occurred during the time of the issue. Check on the trend of the issue (how frequent it occurs and any correlation with other components or stats)
47
summary ProxySG Architecture Performance Model
Appliance resources, CW limit Performance Model Factors Impacting Performance ICAP (built into sizing model/guide) Policy (sky is the limit) SSL (SSL traffic mix amount of SSL decryption) Resource and Health Monitoring Critical resource monitoring Health monitoring Troubleshooting Importance of establishing a performance baseline Tools to troubleshoot performance
49
Thank you for Joining Today!
Please provide feedback on this webcast and suggestions for future webcasts to: Webcast replay and slide deck found here within 48 hours: support-technical-webcasts (Requires BTO log-in)
50
Blue Coat Customer Forums
Community where you can learn from and share your valuable knowledge and experience with other Blue Coat customers Research, post and reply to topics relevant to you at your own convenience Blue Coat Moderator Team ready to offer guidance, answer questions, and help get you on the right track Access at forums.bluecoat.com and register for an account today!
51
Questions for Paul? Quick Survey
We are truly committed to continuous improvement for these Technical Webcasts. At the end of the event you will be re- directed to a very short survey about satisfaction with this Program. Please help us out by taking two minutes to complete it. Thank you! Questions for Paul?
52
Questions?
53
ProxySG Performance Webcast Questions
Q1:Is a Client Worker (CW) created for every HTTP connection? or a single CW can handle multiple HTTP connections? Q2: The cost with the wildcard certificates -- does that generate a lower "cost" in a reverse proxy model where the wildcard cert is on the proxy, not on the OCS/Internet? Q3: How does Licensed Client IP correlate to Concurrent users? Q4: Is it possible to monitor the number of client connections per IP in the SG?
54
ProxySG Performance Webcast Questions
Q5: So I notice that BC has recommended the S-series to replace many 510/810/300/600 ProxySG's - does this mean the S-series is exactly the same or are they truly an improvement in performance and connection numbers? Q6: In "Critical Resource Monitoring" Guide talks about Connection limit per device... that's the same than CW that has that particular model? Q7: What about multiple users on a single Citrix server? Licenses Q8: Client side, how will you typically handle the reach of TCP limit (65,000) for a specific IP in the larger models that could handle a lot more connections?
55
ProxySG Performance Webcast Questions
Q9: Note: IP surrogate won't work in a NATed load balancer configuration. About IP surrogate, do you have advice about implementation. Especially about the possibility that one IP can be shared between users (hiding IP or Citrix users)? Q10: we are currently logging the category of URLs. What kind of impact can we expect if we add the application field in our logs for BCWF? Q11: where can we view utilization on the proxy like the pie graph like the BC SWG policy pie chart? Q12: Can you talk about ECDHE, from a performance standpoint, what should the default policy be set to?
56
ProxySG Performance Webcast Questions
Q13: for ICAP, will the proxy perform better if a dedicated interface is assigned for ICAP communication versus the same interface for all other user traffic? Q14: Today's sizing guides assumes 15% of SSL traffic. That's not realistic. At least 60% of Web browsing is SSL. Is there any sizing guide that assumes higher SSL percentage use? We're having serious problems sizing the right SG to our customers. Q15: Will be sysinfo "reader" (tool that support rep uses) available for channels? Q16: For troubleshooting we saw recommendations to add snapshots every 5 minutes. How much free CPU resources (%) should be free to enable this without generating a new problem ?
57
ProxySG Performance Webcast Questions
Q17: When the ProxySG is in high CPU/high memory panic mode.. is there anything we can do to bring that down other than reboot the device? Q18: Regarding the CW limit: We've long seen it as our primary bottleneck. Does Bluecoat publish the CW figures publically yet, or do we have to ask our VAR to get the figures on the proxy models at purchase time? Q19: does rebooting the proxy impact performance from a caching standpoint? Q20: Hi, regarding memory pressure - do I understand correctly that while the proxy is in the regulation state, it just regulates NEW client connections but keeps processing the active ones?
58
ProxySG Performance Webcast Questions
Q21: Is it common to see a small amount of traffic bound for blocked URLs on our outside sensors? Is this part of the handshake process before the block is implemented? Q22: Good morning. Regarding the licensed client IP... Is there a way for us to identify the "soft" limit on the ProxySG's GUI or CLI? Q23: Is there a way to monitor the number of CWs in use? Q24: What is the cost of running Trace layers (80 and 443) in the VPM? Q25: What might it indicate if the memory utilization is significantly higher than the cpu utilization on average?
59
ProxySG Performance Webcast Questions
Q26: For bandwidth performance issues. Is there a way to see who is downloading what in real-time? Q27: If the network throughput is above the recommended threshold by bluecoat but CPU is still normal, will this cause any issue on performance? Q28: From a performance standpoint. What are the recommendations around attack detection and delete on abandonment?
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.