Presentation is loading. Please wait.

Presentation is loading. Please wait.

About Ivan Neganov Founder and CEO of SoftForte, Inc. 11 years of experience in developing WCM solutions based on ASP.NET and SharePoint platforms. Focusing.

Similar presentations


Presentation on theme: "About Ivan Neganov Founder and CEO of SoftForte, Inc. 11 years of experience in developing WCM solutions based on ASP.NET and SharePoint platforms. Focusing."— Presentation transcript:

1

2 About Ivan Neganov Founder and CEO of SoftForte, Inc. 11 years of experience in developing WCM solutions based on ASP.NET and SharePoint platforms. Focusing on SharePoint since Blog: neganov.blogspot.comneganov.blogspot.com the Science of Quality Web: 2

3 Agenda  Part I – Planning for Performance  Part II – Planning for Throughput 3

4 Part I – Planning for Performance  Performance Defined 4

5 How Fast is “Fast”?  Human Psysiology Factor Under 0.1 sec – virtually unnoticeable. Under 1 sec – perceived as interactive Under 10 sec – willing to focus on a task  2006 Akamai/Jupiter Research 33% of broadband consumers will wait no longer than 4 sec for a page to load.  2009 Akamai/Forrester Research 2 sec. – average expectation of online shopper 3 sec. – max time 40% shoppers are willing to wait for a page to load  KB40 – Keynote Business 40. Keynote Systems, Inc. maintains index of fastest business internet sites:  WM100 – Webmetrics maintains index of top 100 sites by performance 5

6 SharePoint Response Time Guidance 6 Type of operation Examples Acceptable user response time Common operation ·Browsing to the home page ·Browsing to a document library <3 seconds Uncommon operation ·Creating a subsite Creating a list ·Uploading a document to a document library <5 seconds Rare operation·Backing up a site ·Creating a site collection <7 seconds

7 How Fast is “Fast” in my Company?  Study publicly available metrics  Study organization’s historical metrics  Estimate average and peak traffic  Define a matrix of PLT1 and PLT2: For various pages For various authentication groups For peak and average usage 7

8 Response Time  Page Load Time (PLT) or User Response Time (URT) – time until a page fully renders.  Microsoft uses PLT1 and PLT2 – the very first access to the page, and subsequent access to the same page. 8

9 Part I – Planning for Performance  CNS Model: Client Network Server 9

10 URT Formula 10

11 URT Formula (Netforecast) R – response time Payload – total size of page and all its resources AppTurns – round trips made at application level (excluding TCP handshake/congestion control round trips & authentication) RTT – round trip time C s – constant server time component C c – constant client time component Reference: bandwidth-and-response-times.html bandwidth-and-response-times.html 11

12 Need for Testing  Simply applying the formula will lead to significant errors.  You need to calibrate every part of it.  Testing produces data for calibration. 12

13 Part I – Planning for Performance  Client Performance 13

14 Client Scripting Performance  J-Query profiler from John Resig allows to measure performance by method and calculate Big-O breakdown. profiling/http://ejohn.org/blog/function-call- profiling/  Profiling script from within script is very imprecise, partly due to platform implementation. For example, on Windows XP timer would show intervals shorter than 15ms as 0.  Profilers: YSlow for Firebug - JScript Profiler ie8-developer-tools-jscript-profiler.aspx ie8-developer-tools-jscript-profiler.aspx DynaTrace profiler (can profile script parsing time!) Article: 14

15 Part I – Planning for Performance  Network Performance 15

16 Network Performance – the Bottleneck  Bandwidth limitations – can be addressed via technology  Latency limitations – Speed of Light RTT/2 = (36,000 *2)/300,000 RTT ~ 0.5 sec.  TCP limitations  Signal strength/QoS 16

17 Latency and Bandwidth Overall link bandwidth = 3 Mbit/s What is my actual bandwidth & latency? detects your local bandwidth and latency. 17

18 TCP Communication  A max. packet size on Ethernet is 1500 bytes, aka MTU or max. transfer unit.  On IPv4 networks IP overhead takes 40 bytes, hence max payload equals 1460 bytes, aka MSS or max. segment size.  TCP requires acknowledgement (ACK) of all packets sent but allows sending a number of packets without waiting for ACK to improve speed. Eventually ACK must arrive.  If some packets are lost, i.e. there is no ACK within a timeout, then packets are re-transmitted. 18

19 TCP Communication: Naïve Model 19

20 TCP Communication: Realistic Model 20

21 TCP Communication: TCP Window 21

22 TCP Window  TCP Window is a number of bytes a receiver can accept without sending ACK immediately.  Too large window means network congestion >> lost packets >> re- transmission >> performance degradation  Too small window means low bandwidth utilization >> performance degradation 22

23 TCP Slow Start Optimal window size is twice the amount of data that can be “in flight” on the wire from sender to receiver at any given time: RWIN = 2 * (Bandwidth * RTT/2), or RWIN = 2 * BDP BDP – bandwidth-delay product. RWIN – TCP receive window buffer. TCP detects bandwidth and latency and dynamically sets window size. Usually initial RWIN = 64KB. Once connection is established, TCP increases RWIN, process aka “Slow Start”. ”. On a slower WAN it can take up to 12 round trips to optimize the receive window. Initial RWIN size on W2K3: us/library/ms aspxhttp://msdn.microsoft.com/en- us/library/ms aspx 23

24 TCP Congestion Control Sender maintains congestion window, CWND and constantly tweaks it according to bandwidth and delay to avoid congestion: Effective bandwidth = CWND/RTT Various congestion control algorithms are known, ex. Tahoe, Reno. Windows Vista, 7 and 2008 use CTCP. It is advantageous over WAN, enabled by default on 2008, but not on Vista and Windows 7. Reference: us/library/bb aspxhttp://technet.microsoft.com/en- us/library/bb aspx 24

25 TCP Congestion Window Scaling 25

26 TCP Packet Loss Packet loss may occur for many reasons, ex. when network is congested or equipment is misconfigured, or there is a signal loss, etc. Packet loss severely impacts throughput: Throughput <= 0.7 * MSS/(RTT * Sqrt(P loss )) MSS – Max. segment size, 1460 bytes for IPv4, 1440 bytes for IPv6 on Ethernet. P loss – probability of a packet loss. Example: At 100ms round trip time and probability of a packet loss you would get no more than 8Mbit/s throughput. Contemporary networks have very low packet loss probability, yet some packet loss occurs on long links. WAN testing is sometimes done assuming 1 – 3% of packet loss. 26

27 Addressing TCP Limitations  Using UDP instead of TCP  Minimizing number of round trips  Using few large files vs. many small files  Using multiple browser connections  Using HTTP persistent connections  Using client-side caching  Using Content Delivery Networks (CDN)  Using WAN accelerators & offloading devices 27

28 Multiple Browser Connections  Contemporary browsers use multiple TCP connections per hostname: IE6, IE7 – 2 connections max; IE8, FireFox 3.5 – 6 connections max.  Open multiple (source) ports for multiple TCP connections.  Despite having multiple connections a lot of sequential loading still takes place. IE8 is the first browser to download multiple script files in parallel. 28

29 HTTP Persistent Connections  HTTP 1.1 supports persistent connections through Keep-Alive header.  The goal is to re-use underlying TCP connection with its current CWND avoiding having to go through Slow Start again.  Enabled by default on most browsers and on IIS 6, 7. Keep-alive timeout is 1 min for IE and 15 sec. for FireFox, and is adjustable. For changing timeout on IE6, 7 see:  Enabling Keep-Alive in IIS7: us/library/cc772183(WS.10).aspx us/library/cc772183(WS.10).aspx 29

30 Content Delivery Networks  CDNs distribute cached content on multiple servers, which are close to end users. Internet traffic is redirected to the closest CDN server instead of the origin server.  Advantages: Low latency & high bandwidth when accessing a CDN server result in much better performance for the end users. As a result of many users hitting CDN cache the load on original server is reduced. Excellent for media streaming.  Disadvantages: Very expensive, typically affordable to large enterprises only. Ex. $0.5/GB on 50 TB monthly ~25,000$/month Less efficient for highly volatile content. It can be technically difficult to invalidate CDN cache explicitly.  Free CDNs, primarily AJAX support: Google AJAX Libraries API - Microsoft AJAX CDN - cdn.aspx cdn.aspx  More Info about CDNs: 30

31 WAN Accelerators & Offloading Devices  Use packet compression, differencing, caching, optimal route calculation algorithms, reducing packet loss.  Solutions include Cisco, Citrix, Packeteer, Riverbed, F5, Brocade.  Microsoft’s ISA and IAG, and their successor Unified Access Gateway (UAG 2010) provide caching, offloaded compression, differencing and authentication delegation. 31

32 Determining Network Performance  Nature of network transmission complicates its mathematical modeling and projection of results between different networks. This increases amount of calibration testing needed.  Create a reference set of web pages and test them on various networks. Calibrate earlier discussed CNS formulas using these test results.  Tools are available:

33 Part I – Planning for Performance  Server Performance 33

34 Server Performance  Create baseline measurement for various load profiles and PLT1/PLT2  Use Performance counters: ASP.NET Request Execution Time ASP.NET Request Wait Time Server Response Time (SRT) = the sum of the two. Essential performance counters: 34

35 SharePoint 2010 Performance Improvements  More load on WFE, SQL & Client  PLT performance improvements and optimization for WAN, early page rendering  “Cobalt” protocol – asynchronous uploading of an office file from client cache to server.  Developer Dashboard – improves bottleneck diagnostics 35

36 36

37 Part II – Planning for Throughput  Objectives  Models  Rules of Thumb  Selecting Hardware  SharePoint 2010 & Capacity Management 37

38 About Capacity Planning  Objectives: Know expected load levels for the application Ensure acceptable performance at expected load levels Determine how to scale application for the future  In the CNS model above, focus is primarily on Server part.  Networking part matters however: CDNs do reduce server load for Internet scenarios. In geographically distributed farms WAN bandwidth and latency affect capacity planning. 38

39 Theoretical Web Server Model 39

40 Server Under Load: Theoretical Model  M/M/1 queue for single web server and MM/c queue for load-balanced servers  Poisson Distribution – Memorylessness: knowledge of last occurred event does not have an impact on successive events  Little’s Law: N queue = SRT * Rate arrival  Consequences: Understanding of physical capacity limits Approximate but practical server load function Importance of RPS as a measure of capacity 40

41 Theoretical Server Response Time Server performance is analyzed together with the server load. From queuing analysis for M/M/1 queue: SRT = SRT(0)/(1 – U) SRT – server response time SRT(0) – server response time at 0 utilization U – utilization, or average percentage of time the server is busy. 41

42 SRT is a Function of Utilization 42

43 Load-Balanced Servers 43

44 SharePoint Farm Capacity Planning  Theory explains guidance parameters & helps with rough estimates  Rules-of-Thumb, best practices & reference performance tests are used to determine components of the farm  Requests per Second (RPS) are used to measure farm capacity  Additional tools: SPCP: 44

45 Throughput Targets: Classic Usage Model 1. All SharePoint site users can be classified into 4 groups: 1. Light users – generate 20 RPH or 2 User Ops/Hour 2. Typical users – generate 36 RPH or 3.6 User Ops/Hour 3. Heavy users – generate 60 RPH or 6 User Ops/Hour 4. Extreme users – generate 120 RPH or 12 User Ops/Hour 3. RPH are calculated based on daily average non-401 requests made by distinct users. 4. Given total number of users in each class set percentage of them that is active, i.e. actively using the SharePoint site. This is also known as concurrency. Even at peak usage 10% is a high concurrency, 5% is typical. 5. Weighted sum yields total demand in RPS. Reference: 45

46 Classic Usage Model - Example There are total of 30,000 users of the portal. 25,000 of them are typical users. 4,500 of them are heavy users. 500 of them are extreme users. During the peak hour on average 10% of typical users and 5% of heavy and extreme users are accessing the site. What is the required farm capacity? Capacity = (0.1 * 25,000 * * 4,500 * * 500 * 120)/3600 = 29.6 RPS 46

47 SharePoint Activities Affect Capacity  A farm is serving a number of activities: User operations (web page & file requests) Search indexing Publishing Profile import/sync Variations, workflows, scheduled jobs Backup Office clients requests AJAX calls  User activity and number of concurrent users are the primary factors used in capacity planning.  The picture is different when backend activities cannot be confined into 12-hour window.  Plan for Peak Concurrency! 47

48 Rules-of-Thumb: Web Front End Portal Collaboration Scenario WSS Collaboration Scenario technet.microsoft.com/en- us/library/cc aspx 48

49 Rules-of-Thumb: Web Front End  HA prevail over capacity requirements for small and medium installations.  Max RPS achieved at 5 WFEs per DB server. More WFEs overload ConfigDB.  1 DC per 3-4 WFEs, if NTLM authentication is used.  Set 1 WFE as crawl target, remove it from load balancer.  Average WFE CPU utilization should be 30%. 49

50 Rules-of-Thumb: Storage Sizing  Important for performance planning because storage estimates contribute to IOPS requirement for the disk subsystem.  100 GB per content database  Use reference installations, or Microsoft estimation guidance: ca/library/cc aspx ca/library/cc aspx 50

51 IOPS  Two common measures of disk throughput: IOPS – used for random access to disk, typical for SharePoint workloads. MB/s – used for mostly sequential access, common to serving large files, running large reports on cubes.  Use performance counter: Disk Transefers/sec to determine peak IOPS based on RPS.  10K RPM drives give IOPS; 15K RPM drives give IOPS.  Use sqlio.exe utility to determine actual IOPS of a hardware. 51

52 Rules-of-Thumb: SQL Server  Resources on SQL for SharePoint Planning:  Resources on SQL Mirroring: 52

53 Rules-of-Thumb: SQL Server  Disk Latency: Disk sec/transfer Data files < 10ms T-log files < 5ms  Disk Capacity: *RAID-5 can be used for static web content. 53

54 Rules-of-Thumb: SQL Server Typical Deployment Sizes: 54 MetricSmallMediumLarge Content db size< 50GB50GB> 50GB # of Content dbs< 2020> 20 # of concurrent requests to SQL< > 200 # of Users< > 1000 # of items in regularly accessed list < > 2000 # of columns in regularly accessed list < 2020> 20

55 Rules-of-Thumb: SQL Server Recommended Capacities: 55 ResourceSmallMediumLarge Recommended DB server memory 8 GB +16 GB +32 GB + Processor L2 cache2 MB> 2 MB Bus bandwidthMediumHigh Disks latencies (msec)< 20< 10 < 10 (data) < 5 (T-log) NetworkGigabit Network latency (msec)< 1

56 Global Solutions  Central  Central with regional sites  Distributed us/library/cc aspx 56

57 Stretched SharePoint Farms  LAN requirement: All servers within a farm must reside on the same LAN. Separating servers across WAN links is not supported. Network segment should be the same in order to avoid added latency in switches and routers.  MOSS 2007 SSP requirement: All SSP server roles (index, query, Excel Services) need to reside in the same data center with the database server.  Latency requirement: must be < 1ms. This is achievable if data centers are located typically within 10 miles, or in the best case within 100 miles of each other.  Bandwidth requirement: at least 1Gbit/s 57

58 Capacity Planning Summary 58

59 SharePoint 2010 Capacity Improvements  Large list throttling  WFE will return 503 when overloaded  Office clients are aware of this, and will in turn throttle server requests Co-authoring of documents; PPT broadcasting.  HTTP throttling Blocks robots, search indexing Gives first priority to client traffic  Bit rate throttling – used by assets library, implemented in IIS Media Services extension  SQL Server 2008 Throttling – Resource Governor can limit use of resources by specific processes  Software boundaries improvement 59

60 SharePoint 2010 Capacity Planning 60

61 SharePoint 2010 Capacity Management  Logging DB  Developer Dashboard  Load Testing Toolkit (a part of SharePoint Administration Toolkit)  There is more to come… 61

62

63 Performance Counters  Performance counters are central in determining all aspects of performance. One example for capacity planning: ASP.NET Applications\Request/sec  A comprehensive list of relevant counters is available here: performance-counters.aspx 63

64 Load Testing Tools  SharePoint 2010:Load Testing Kit, part of SharePoint Administration Toolkit – reference Web & Load tests.  VSTT  Useful blog post by Bill Baer lists tools used for stress testing of SharePoint microsoft-office-sharepoint-server-2007-windows-sharepoint- services-3-0.aspx 64

65 Part III – Best Practices  Information Architecture  Web Front End (WFE) Servers  SQL Server 65

66 Information Architecture: Best Practices  Account for software boundaries:  For large lists, follow performance guidance:  Separate content with different usage profiles into different site collections  Account for authentication performance impact: Anonymous - fastest Kerberos NTLM Basic Forms - slowest 66

67 WFE Best Practices: Caching  Output caching & cache profiles Native to ASP.NET 2.0, individual page level Turned off by default in Need Publishing Infrastructure Feature on. Enable for read-only users. Never cache search results for authenticated users, alternatively disable search results page. Uses RAM on WFE, adjust ASP.NET private byte limit  BLOB caching Used on document libraries only Minimizes round-trips to database for HTML, CSS, image or media files, etc. by creating disk-based cache on WFE Not enabled by default Important to use max-age attribute to instruct clients to cache resources Affects disk I/O of the WFE servers  Object caching Benefit for certain page items: navigation data, cross-list query data Uses RAM: default 100MB. Monitor cache hit ratio counters and adjust RAM to have over 90% hits. The only caching turned on by default  Office Web Applications Caching (SharePoint 2010)  Branch Caching (Windows 2008)  More Info: 67

68 WFE Best Practices: IIS Compression  Static IIS compression is on by default in IIS 6, 7. Used for *.html, *.htm, *.css, *.txt files by default.  Dynamic compression is off by default on both IIS 6 & 7. Used for *.asp, *exe files by default.  Using IIS compression increases load on WFE CPU, but it reduces disk I/O, which is much slower, so it can dramatically boost performance.  You need to configure compression levels, and add extensions for *.js, *.aspx etc.  IIS 7 can be configured to compress items before adding them to cache. This needs to be turned on to reduce load on the CPU. 68

69 WFE Best Practices: Custom Code  Releasing resources for SPSite, SPWeb  Avoid thread synchronization issues when caching objects  Accessing folders and lists Do not use SPList.Items Use SPList.GetItems(SPQuery) Do not iterate over SPList.Items Use PortalSiteMapProvider to enumerate lists  Scalability: avoid code, enumerating OM objects for large # of concurrent users  SPQuery objects Do not use unbounded SPQuery objects Use indexed fields in queries  Timer jobs Break long-running operations into small pieces to minimize re- do work when restarting a job. 69

70 WFE Best Practices: Other  Load scripts outside of script engine using document.write(

Ads by Google