Windows NT Scalability Jim Gray Microsoft Research http/www.research.Microsoft.com/~Gray/talks/

Windows NT Scalability Jim Gray Microsoft Research Gray@Microsoft.com http/www.research.Microsoft.com/~Gray/talks/

Outline Scalability: What & Why? Scale UP: NT SMP scalability Scale OUT: NT Cluster scalability Key Message: –NT can do the most demanding apps today. –Tomorrow will be even better. Scale Out Scale Up Down Scale Down

Scale Out Server Cluster What is Scalability? Grow without limits –Capacity –Throughput Do not add complexity –design –administer –Operate –Use Down Scale Down Win Term NetPC Handheld Portable TV Scale Up SuperServer Server PCWorkstation

Scale UP & OUT Focus Here Grow without limits –SMP: 4, 8, 16, 32 CPUs –64-bit addressing –Huge storage Cluster Requirements –Auto manage –High availability –Transparency apps –Programming tools & apps Scale Out Server Cluster Scale Up SuperServer Server

Scalability is Important Automation benefits growing –ROI of 1 month.... Slice price going to zero –Cyberbrick costs 5k$ Design, Implement & Manage cost going down –DCOM & Viper make it easy! –NT Clusters are easy! Billions of clients imply millions of HUGE servers. Thin clients imply huge servers.Server

Q: Why Does Microsoft Care? A: Billions of clients need millions of servers Expect Microsoft to work hard on Scaleable Windows NT and Scaleable BackOffice. Key technique: INTEGRATION. 0 300 600 900 1,200 1,500 1,800 2,100 2,400 2,700 19941995199619971998199920002001 Unix WindowsNT Server NetWare Servers Shipped per year (97-01 are MS estimates)

How Scaleable is NT?? The Single Node Story 64 bit file system in NT 1, 2, 3, 4, 5 8 node SMP in NT 4.E, 32 node OEM 64 bit addressing in NT 5 1 Terabyte SQL Databases (PetaByte capable) 10,000 users (TPC-C benchmark) 100 Million web hits per day (IIS) 50 GB Exchange mail store next release designed for 16 TB 50,000 POP3 users on Exchange (1.8 M messages/day) And, more coming…..

Windows NT Server Enterprise Edition Scalability –8x SMP support (32x in OEM kit) –Larger process memory (3GB Intel) –Unlimited Virtual Roots in IIS (web) Transactions –DCOM transactions (Viper TP mon) –Message Queuing (Falcon) Availability –Clustering (WolfPack) –Web, File, Print,DB … servers fail over.

What Happened? Moore’s law: Things get 4x better every 3 years (applies to computers, storage, and networks) New Economics: Commodity classprice/mips software $/mips k$/year mainframe 10,000100 minicomputer 100 10 microcomputer 10 1 GUI: Human - computer tradeoff optimize for people, not computers mainframe mini micro time price

Billions Of Clients Need Millions Of Servers Mobile clients Fixed clients Server Superserver Clients Servers  All clients networked to servers  May be nomadic or on-demand  Fast clients want faster servers  Servers provide  Shared Data  Control  Coordination  Communication

Thesis Many little beat few big  Smoking, hairy golf ball  How to connect the many little parts?  How to program the many little parts?  Fault tolerance? $1 million $100 K $10 K Mainframe Mini Micro Nano 14" 9" 5.25" 3.5" 2.5" 1.8" 1 M SPECmarks, 1TFLOP 10 6 clocks to bulk ram Event-horizon on chip VM reincarnated Multiprogram cache, On-Chip SMP 10 microsecond ram 10 millisecond disc 10 second tape archive 10 nano-second ram Pico Processor 10 pico-second ram 1 MM 3 100 TB 1 TB 10 GB 1 MB 100 MB

Future Super Server: 4T Machine  Array of 1,000 4B machines  1 bps processors  1 BB DRAM  10 BB disks  1 Bbps comm lines  1 TB tape robot  A few megabucks  Challenge:  Manageability  Programmability  Security  Availability  Scaleability  Affordability  As easy as a single system Future servers are CLUSTERS of processors, discs Distributed database techniques make clusters work CPU 50 GB Disc 5 GB RAM Cyber Brick a 4B machine

The Hardware Is In Place… And then a miracle occurs ?  SNAP: scaleable network and platforms  Commodity-distributed OS built on:  Commodity platforms  Commodity network interconnect  Enables parallel applications

Thesis: Scaleable Servers Scaleable Servers –Commodity hardware allows new applications –New applications need huge servers –Clients and servers are built of the same “stuff” Commodity software and Commodity hardware Servers should be able to –Scale up (grow node by adding CPUs, disks, networks) –Scale out (grow by adding nodes) –Scale down (can start small) Key software technologies –Objects, Transactions, Clusters, Parallelism

Scaleable Servers BOTH SMP And Cluster Grow up with SMP; 4xP6 is now standard Grow out with cluster Cluster has inexpensive parts Cluster of PCs SMP super server Departmental server Personal system

SMPs Have Advantages Single system image easier to manage, easier to program threads in shared memory, disk, Net 4x SMP is commodity Software capable of 16x Problems: –>4 not commodity –Scale-down problem (starter systems expensive) There is a BIGGEST one SMP super server Departmental server Personal system

Tpc-C Web-Based Benchmarks Client is a Web browser (9,200 of them!) Submits –Order –Invoice –Query to server via Web page interface Web server translates to DB SQL does DB work Net: –easy to implement –performance is GREAT! HTTP ODBC SQL SQL IIS = Web

1987: 256 tps $ 14 million computer A dozen people Two rooms of machines 1997: 1,250 tps $ 50 k$ computer One person 1 micro-dollar per transaction (1,000x cheaper) What Happens in 10 Years? Ready for the next 10 years?

1988: DB2 + CICS Mainframe 65 tps IBM 4391 Simulated network of 800 clients 2m$ computer Staff of 6 to do benchmark 2 x 3725 network controllers 16 GB disk farm 4 x 8 x.5GB Refrigerator-sized CPU

NT vs UNIX SMPs NT traditionally ran on 1 to 4 cpus –Scales near-linear on them UNIX boxes: 32-64 way SMPs –They do 3x more tpmC –They cost 10x more. 10 way NT machines are available –They cost more –They are faster My view (shared by many) –Need clusters for availability –Cluster commodity servers to make huge systems –a la Tandem, Teradata, VMScluster, IBM Sysplex, IBM SP2 –Clusters reduce need for giant SMPs

Transaction Throughput TPC-C On comparable hardware: NT scales better! SQL Server & NT Improving 250% per year NT has best Price Performance (2x cheaper)

NT Scales Better Than Solaris Microsoft SQL NT Intel scales to 6x Beats Sybase Solaris UltraSPARC up to 11-way Sybase/Solaris/UltraSPARC MS SQL/NT/Intel

New News: SUN is Waking Up Sybase on 4x Sun UltraSPARC –4x250Mhz 57$/tpmC @ 11.6 ktpmC –6x300Mhz 69$/tpmC @ 14.6 ktpmC Microsoft & Unisys –4x200Mhz 43$/tpmC @ 10.7 ktpmC –6x200Mhz 40$/tpmC @ 12.2 ktpmC SUN: –10% better performance –20% higher unit price

New News: WOW! HPUX-HPPA-Sybase Sybase on HP 16x SMP scales to 40 ktpmC! Price/Performance is flat (no diseconomy)

Low end More Competitive premium on CPUs, disks, & Oracle

Only NT Has Economy of Scale NT is 2x less expensive 40$/tpmC vs 110$/tpmC Only NT has economy of scale Unix has dis-economy of scale

TPC-D Decision Support Benchmark NT has good performance and price/performance.

Scaleup To Big Databases? NT 4 and SQL Server 6.5 –DBs up to 1 Billion records, –100 GB –Covers most (80%) data warehouses SQL Server 7.0 –Designed for Terabytes Hundreds of disks per server. SMP parallel search –Data Mining and Multi-Media TerraServer is good MM example Excel spreadsheet Manhattan phone book (15MB) Human Genome (3GB) Dayton-Hudson Sales records (300GB) Satellite photos of Earth (1 TB)

Database Scaleup: TerraServer™ Demo NT and SQL Server scalability Stress test SQL Server 7.0 Requirements –1 TB –Unencumbered (put on www) –Interesting to everyone everywhere –And not offensive to anyone anywhere Loaded –1.1 M place names from Encarta World Atlas –1 M Sq Km from USGS (1 meter resolution) –2 M Sq Km from Russian Space agency (2 m) Will be on web (world’s largest atlas) Sell images with commerce server. USGS CRDA: 3 TB more coming.

TerraServer System DEC Alpha 4100 (4x smp) + 324 StorageWorks Drives (1.4 TB) RAID 5 Protected SQL Server 7.0 USGS 1-meter data (30% of US) Russian Space data Two meter resolution images (2 M km 2 2% of earth) SPIN-2

http://msrlab/terraserver Demo

Manageability Windows NT 5.0 and Windows 98 Active Directory tracks all objects in net Integration with IE 4. –Web-centric user interface Management Console –Component architecture Zero Admin Kit and Systems Management Server PlugNPlay, Instant On, Remote Boot,.. Hydra and Intelli-Mirroring

Windows NT Server with “Hydra” Server Dedicated Windows terminal Existing, Desktop PC MS-DOS,UNIX,Macclients Net PC Thin Client Support TSO comes to NT lower per-client costs

Best of PC and centralized computing advantages Windows NT 5.0 IntelliMirror ™ Extends CMU Coda File System ideas Files and settings mirrored on client and server Great for disconnected users Facilitates roaming Easy to replace PCs Optimizes network performance

Scale OUT Clusters Have Advantages Fault tolerance: –Spare modules mask failures without limits Modular growth without limits –Grow by adding small modules Parallel data search –Use multiple processors and disks Clients and servers made from the same stuff –Inexpensive: built with commodity CyberBricks

How scaleable is NT?? The Cluster Story 16-node Tandem Cluster –64 cpus –2 TB of disk –Decision support 45-node Compaq Cluster –140 cpus –14 GB DRAM –4 TB RAID disk –OLTP (Debit Credit) 1 B tpd (14 k tps)

microsoft.com Production –Windows NT.4 and IIS.3 20 HTTP, 3 download, 3 FTP 5 SQL 6.5 Index Server + 3 search Stagers –Site Server for content –DCOM Publishing wizard Network –6 DS3 –4 TB/day download capacity Replicas in UK and Japan 90m hits/day –17m page views –#4 site on Internet 900k visitors per day Not cheap –Data Centers –Bandwidth –27 people on content –22 people on systems

Tandem 2 Ton 2 TB SQL database 1.2 TB user data 16 node cluster 64 cpus, 480 disks Decision support parallel data-mining Will be Wolf Pack aware Demoed at DB Expo in ServerNet™ interconnect

Billion Transactions per Day Project Built a 45-node Windows NT Cluster (with help from Intel & Compaq) > 900 disks All off-the-shelf parts Using SQL Server & DTC distributed transactions DCOM & ODBC clients on 20 front-end nodes DebitCredit Transaction Each server node has 1/20 th of the DB Each server node does 1/20 th of the work 15% of the transactions are “distributed”

Billion Transactions Per Day Hardware 45 nodes (Compaq Proliant) Clustered with 100 Mbps Switched Ethernet 140 cpu, 13 GB, 3 TB (RAID 1, 5).

Cluster Architecture Switch Driver Database DTC Control VIPDC2VIPDC3 VIPDC4VIPDC5VIPDC6VIPDC7VIPDC8VIPDC9VIPDC10VIPDC11VIPDC12VIPDC13VIPDC14VIPDC15VIPDC16VIPDC17VIPDC18VIPDC19VIPDC20VIPDC21VIPDTC1VIPDTC2VIPDTC3VIPDTC4VIPDTC5 VIPDC42VIPDC43 VIPDC44VIPDC45VIPDC46VIPDC47VIPDC48VIPDC49VIPDC50VIPDC51

DebitCredit Driver DebitCredit Component DatabaseDriver Thread Local Debit Credit 1 2 4 5 7 8 9 12 13 14 Loop 3 Run 6 Init 10 DebitCredit 11 DebitCredit

Distributed Debit Credit - Same DTC Database1 DebitCredit Database2 DTC 11 12 13 14 15 17 18 19 20 21 25 26 27 28 23 16 24 29 22 UpdateAcct

Distributed Debit Credit - Different DTC Database1 DebitCredit Database2 DTC1 11 12 13 DTC2 14 15 16 19 27 18 17 25 23 20 21 22 28 29 32 33 30 31 34 26 35 24 UpdateAcct

1.2 B tpd 1 B tpd ran for 24 hrs. Out-of-the-box software Off-the- shelf hardware AMAZING! Sized for 30 days Linear growth 5 micro-dollars per transaction

48 How Much Is 1 Billion Tpd? 1 billion tpd = 11,574 tps ~ 700,000 tpm (transactions/minute) ATT –185 million calls per peak day (worldwide) Visa ~20 million tpd –400 million customers –250K ATMs worldwide –7 billion transactions (card+cheque) in 1994 New York Stock Exchange –600,000 tpd Bank of America –20 million tpd checks cleared (more than any other bank) –1.4 million tpd ATM transactions Worldwide Airlines Reservations: 250 Mtpd

1 B tpd: So What? Shows what is possible, easy to build –Grows without limits Shows scaleup of DTC, MTS, SQL… Shows (again) that shared-nothing clusters scale Next task: make it easy. –auto partition data –auto partition application –auto manage & operate

Parallelism The OTHER aspect of clusters Clusters of machines allow two kinds of parallelism –Many little jobs: online transaction processing TPC-A, B, C… –A few big jobs: data search and analysis TPC-D, DSS, OLAP Both give automatic parallelism

Kinds of Parallel Execution Pipeline Partition outputs split N ways inputs merge M ways Any Sequential Program Any Sequential Program Any Sequential Any Sequential Program

Data Rivers Split + Merge Streams River M Consumers N producers Producers add records to the river, Consumers consume records from the river Purely sequential programming. River does flow control and buffering does partition and merge of data records River = Split/Merge in Gamma = Exchange operator in Volcano. N X M Data Streams

Partitioned Execution Spreads computation and IO among processors Partitioned data gives NATURAL parallelism

N x M way Parallelism N inputs, M outputs, no bottlenecks. Partitioned Data Partitioned and Pipelined Data Flows

Clusters (Plumbing) Single system image –naming –protection/security –management/load balance Fault Tolerance –Wolfpack Hot Pluggable hardware & Software

Windows NT clusters Key goals: –Easy: to install, manage, program –Reliable: better than a single node –Scaleable: added parts add power Microsoft & 60 vendors defining NT clusters –Almost all big hardware and software vendors involved No special hardware needed - but it may help Enables –Commodity fault-tolerance –Commodity parallelism (data mining, virtual reality…) –Also great for workgroups! Initial: two-node failover –Beta testing since December96 –SAP, Microsoft, Oracle giving demos. –File, print, Internet, mail, DB, other services –Easy to manage –Each node can be 4x (or more) SMP Next (NT5) “Wolfpack” is modest size cluster –About 16 nodes (so 64 to 128 CPUs) –No hard limit, algorithms designed to go further

SQL Failover Using NT Clusters Each server “owns” half the database When one fails… –The other server takes over the shared disks –Recovers the database and serves it Shared SCSI disk strings A B Private disks Private disks Clients

So, What’s New? When slices cost 50k$, you buy 10 or 20. When slices cost 5k$ you buy 100 or 200. Manageability, programmability, usability become key issues (total cost of ownership). PCs are MUCH easier to use and program New MPP & NewOS New App New MPP & NewOS New App New MPP & NewOS New App New MPP & NewOS New App Customers MPP Vicious Cycle No Customers! CP/Commodity Virtuous Cycle: Standards allow progress and investment protection Apps Standard platform

Thesis: Scaleable Servers Scaleable Servers –Commodity hardware allows new applications –New applications need huge servers –Clients and servers are built of the same “stuff” Commodity software and Commodity hardware Servers should be able to –Scale up (grow node by adding CPUs, disks, networks) –Scale out (grow by adding nodes) –Scale down (can start small) Key software technologies –Objects, Transactions, Clusters, Parallelism

Web site Database Web site files Database files Alice Betty Browser WolfPack Cluster IIS & SQL Failover Demo Web site Database AliceBetty

Summary SMP Scale UP: OK but limited Cluster Scale OUT: OK and unlimited Manageability: –fault tolerance OK & easy! –more needed CyberBricks work Manual Federation now Automatic in future Scale Out Scale Up Down Scale Down

Scalability Research Problems Automatic everything Scaleable applications –Parallel programming with clusters –Harvesting cluster resources Data and process placement –auto load balance –dealing with scale (thousands of nodes) High-performance DCOM –active messages meet ORBs? Process pairs, other FT concepts? Real time: instant failover Geographic (WAN) failover

Windows NT Scalability Jim Gray Microsoft Research http/www.research.Microsoft.com/~Gray/talks/

Similar presentations

Presentation on theme: "Windows NT Scalability Jim Gray Microsoft Research http/www.research.Microsoft.com/~Gray/talks/"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Windows NT Scalability Jim Gray Microsoft Research http/www.research.Microsoft.com/~Gray/talks/

Similar presentations

Presentation on theme: "Windows NT Scalability Jim Gray Microsoft Research http/www.research.Microsoft.com/~Gray/talks/"— Presentation transcript:

Similar presentations

About project

Feedback