Presentation is loading. Please wait.

Presentation is loading. Please wait.

What Happens When Processing Storage Bandwidth are Free and Infinite?

Similar presentations

Presentation on theme: "What Happens When Processing Storage Bandwidth are Free and Infinite?"— Presentation transcript:

1 What Happens When Processing Storage Bandwidth are Free and Infinite?
Jim Gray Microsoft Research

2 Outline Hardware CyberBricks Software CyberBricks What next?
all nodes are very intelligent Software CyberBricks standard way to interconnect intelligent nodes What next? Processing migrates to where the power is Disk, network, display controllers have full-blown OS Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them Computer is a federated distributed system.

3 A Hypothetical Question Taking things to the limit
Moore’s law 100x per decade: Exa-instructions per second in 30 years Exa-bit memory chips Exa-byte disks Gilder’s Law of the Telecosom 3x/year more bandwidth ,000x per decade! 40 Gbps per fiber today

4 Grove’s Law Link Bandwidth doubles every 100 years!
Not much has happened to telephones lately Still twisted pair

5 Gilder’s Telecosom Law: 3x bandwidth/year for 25 more years
Today: 10 Gbps per channel 4 channels per fiber: 40 Gbps 32 fibers/bundle = 1.2 Tbps/bundle In lab 3 Tbps/fiber (400 x WDM) In theory 25 Tbps per fiber 1 Tbps = USA 1996 WAN bisection bandwidth 1 fiber = 25 Tbps

6 Thesis Many little beat few big
$1 million 1 MM 3 $100 K $10 K Pico Processor Micro Nano 1 MB 10 pico-second ram Mainframe Mini 10 microsecond ram 10 millisecond disc 10 second tape archive 10 nano-second ram 10 MB 1 0 GB 1 TB 1 00 TB 1.8" 3.5" 2.5" 5.25" 1 M SPEC marks, 1TFLOP 106 clocks to bulk ram Event-horizon on chip VM reincarnated Multi-program cache, On-Chip SMP 9" 14" Smoking, hairy golf ball How to connect the many little parts? How to program the many little parts? Fault tolerance? Gray OGI 12/11/97

7 Billion Instructions/Sec .1 Billion Bytes RAM Billion Bits/s Net
Year B Machine 10 GB byte Disk .1 B byte RAM 1 Bips Processor 1 B bits/sec LAN/WAN The Year 2000 commodity PC Billion Instructions/Sec .1 Billion Bytes RAM Billion Bits/s Net 10 B Bytes Disk Billion Pixel display 3000 x 3000 x 24 1,000 $ Gray OGI 12/11/97

8 4 B PC’s: The Bricks of Cyberspace
Cost 1,000 $ Come with OS (NT, POSIX,..) DBMS High speed Net System management GUI / OOUI Tools Compatible with everyone else CyberBricks Gray OGI 12/11/97

9 Super Server: 4T Machine
Array of 1,000 4B machines 1 b ips processors 1 B B DRAM 10 B B disks 1 Bbps comm lines 1 TB tape robot A few megabucks Challenge: Manageability Programmability Security Availability Scaleability Affordability As easy as a single system CPU 50 GB Disc 5 GB RAM Cyber Brick a 4B machine Future servers are CLUSTERS of processors, discs Distributed database techniques make clusters work Gray OGI 12/11/97

10 Functionally Specialized Cards
P mips processor Storage Network Display Today: P=50 mips M= 2 MB ASIC M MB DRAM In a few years P= 200 mips M= 64 MB ASIC ASIC

11 It’s Already True of Printers Peripheral = CyberBrick
You buy a printer You get a several network interfaces A Postscript engine cpu, memory, software, a spooler (soon) and… a print engine.

12 System On A Chip Integrate Processing with memory on one chip
chip is 75% memory now 1MB cache >> 1960 supercomputers 256 Mb memory chip is 32 MB! IRAM, CRAM, PIM,… projects abound Integrate Networking with processing on one chip system bus is a kind of network ATM, FiberChannel, Ethernet,.. Logic on chip. Direct IO (no intermediate bus) Functionally specialized cards shrink to a chip.

13 All Device Controllers will be Cray 1’s
TODAY Disk controller is 10 mips risc engine with 2MB DRAM NIC is similar power SOON Will become 100 mips systems with 100 MB DRAM. They are nodes in a federation (can run Oracle on NT in disk controller). Advantages Uniform programming model Great tools Security economics (cyberbricks) Move computation to data (minimize traffic) Central Processor & Memory Tera Byte Backplane

14 With Tera Byte Interconnect and Super Computer Adapters
Processing is incidental to Networking Storage UI Disk Controller/NIC is faster than device close to device Can borrow device package & power So use idle capacity for computation. Run app in device. Tera Byte Backplane

15 Implications Conventional Radical Move app to NIC/device controller
higher-higher level protocols: CORBA / DCOM. Cluster parallelism is VERY important. Offload device handling to NIC/HBA higher level protocols: I2O, NASD, VIA… SMP and Cluster parallelism is important. Central Processor & Memory Tera Byte Backplane

16 How Do They Talk to Each Other?
Each node has an OS Each node has local resources: A federation. Each node does not completely trust the others. Nodes use RPC to talk to each other CORBA? DCOM? IIOP? RMI? One or all of the above. Huge leverage in high-level interfaces. Same old distributed system story. Applications Applications datagrams streams RPC ? ? RPC streams datagrams VIAL/VIPL VIAL/VIPL Wire(s)

17 Outline Hardware CyberBricks Software CyberBricks What next?
all nodes are very intelligent Software CyberBricks standard way to interconnect intelligent nodes What next? Processing migrates to where the power is Disk, network, display controllers have full-blown OS Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them Computer is a federated distributed system.

18 Objects! It’s a zoo ORBs, COM, CORBA,.. Object Relationa1 Databases
Objects and 3-tier computing

19 History and Alphabet Soup
DCE RPC GUIDs IDL Kerberos DNS COM Microsoft DCOM based on OSF-DCE Technology DCOM and ActiveX extend it 1985 Solaris International UNIX OSF DCE Foundation (OSF) Open software NT X/Open 1990 Management Group (OMG) Object CORBA ODBC XA / TX 1995 Open Group COM Gray OGI 12/11/97

20 The Promise Both camps Objects are Software CyberBricks
productivity breakthrough (plug ins) manageability breakthrough (modules) Microsoft Promises Cairo distributed objects, secure, transparent, fast invocation IBM/Sun/Oracle/Netscape promise CORBA + Open Doc + Java Beans + All will deliver Customers can pick the best one Both camps Share key goals: Encapsulation: hide implementation Polymorphism: generic ops key to GUI and reuse Uniform Naming Discovery: finding a service Fault handling: transactions Versioning: allow upgrades Transparency: local/remote Security: who has authority Shrink-wrap: minimal inheritance Automation: easy

21 The OLE-COM Experience
Macintosh had Publish & Subscribe PowerPoint needed graphs: plugged MS Graph in as an component. Office adopted OLE one graph program for all of office Internet arrived URLs are object references, Office is Web Enabled right away! Office97 smaller than Office95 because of shared components It works!!

22 Linking And Embedding Objects are data modules; transactions are execution modules
Link: pointer to object somewhere else Think URL in Internet Embed: bytes are here Objects may be active; can callback to subscribers

23 Objects Meet Databases basis for universal data servers, access, & integration
Object-oriented (COM oriented) interface to data Breaks DBMS into components Anything can be a data source Optimization/navigation “on top of” other data sources Makes an RDBMS an O-R DBMS assuming optimizer understands objects Database Spreadsheet Photos Mail Map Document DBMS engine

24 The BIG Picture Components and transactions
Software modules are objects Object Request Broker (a.k.a., Transaction Processing Monitor) connects objects (clients to servers) Standard interfaces allow software plug-ins Transaction ties execution of a “job” into an atomic unit: all-or-nothing, durable, isolated ActiveX Components are a 250M$/year business. Object Request Broker

25 Object Request Broker (ORB) Orchestrates RPC
Registers Servers Manages pools of servers Connects clients to servers Does Naming, request-level authorization, Provides transaction coordination Direct and queued invocation Old names: Transaction Processing Monitor, Web server, NetWare Transaction Object-Request Broker

26 The OO Points So Far Next points: Objects are software Cyber Bricks
Object interconnect standards are emerging Cyber Bricks become Federated Systems. Next points: put processing close to data do parallel processing.

27 Three Tier Computing Clients do presentation, gather input
Clients do some workflow (Xscript) Clients send high-level requests to ORB ORB dispatches work-flows and business objects -- proxies for client, orchestrate flows & queues Server-side workflow scripts call on distributed business objects to execute task Presentation workflow Business Objects Database

28 The Three Tiers DCOM (oleDB, ODBC,...) Object server Pool HTTP+ DCOM
Web Client HTML VB or Java Script Engine Virt Machine VBscritpt JavaScrpt VB Java plug-ins Internet ORB HTTP+ DCOM Object server Pool Middleware TP Monitor Web Server... DCOM (oleDB, ODBC,...) Object & Data server. LU6.2 IBM Legacy Gateways

29 Transaction Processing Evolution to Three Tier Intelligence migrated to clients
Mainframe cards Mainframe Batch processing (centralized) Dumb terminals & Remote Job Entry Intelligent terminals database backends Workflow Systems Object Request Brokers Application Generators green screen 3270 Server TP Monitor ORB Active

30 Web Evolution to Three Tier Intelligence migrated to clients (like TP)
Server WAIS Character-mode clients, smart servers GUI Browsers - Web file servers GUI Plugins - Web dispatchers - CGI Smart clients - Web dispatcher (ORB) pools of app servers (ISAPI, Viper) workflow scripts at client & server archie ghopher green screen Mosaic NS & IE Active

31 PC Evolution to Three Tier Intelligence migrated to server
Stand-alone PC (centralized) PC + File & print server message per I/O PC + Database server message per SQL statement PC + App server message per transaction ActiveX Client, ORB ActiveX server, Xscript IO request reply disk I/O SQL Statement Transaction

32 Why Did Everyone Go To Three-Tier?
Manageability Business rules must be with data Middleware operations tools Performance (scaleability) Server resources are precious ORB dispatches requests to server pools Technology & Physics Put UI processing near user Put shared data processing near shared data Minimizes data moves Encapsulate / modularity Presentation workflow Business Objects Database

33 Why Put Business Objects at Server?
Customer comes to store with list Gives list to clerk Clerk gets goods, makes invoice Customer pays clerk, gets goods Easy to manage Clerks controls access Encapsulation MOM’s Business Objects DAD’sRaw Data Customer comes to store Takes what he wants Fills out invoice Leaves money for goods Easy to build No clerks Gray OGI 12/11/97

34 The OO Points So Far Put processing close to data Next point:
Objects are software Cyber Bricks Object interconnect standards are emerging Cyber Bricks become Federated Systems. Put processing close to data Next point: do parallel processing.

35 Parallelism: the OTHER half of Super-Servers
Clusters of machines allow two kinds of parallelism Many little jobs: Online transaction processing TPC A, B, C,… A few big jobs: data search & analysis TPC D, DSS, OLAP Both give automatic Parallelism

36 Why Parallel Access To Data?
At 10 MB/s 1.2 days to scan 1,000 x parallel 100 second SCAN. BANDWIDTH Parallelism: divide a big problem into many smaller ones to be solved in parallel. Gray OGI 12/11/97

37 Kinds of Parallel Execution
Any Any Sequential Sequential Pipeline Program Program Sequential Partition outputs split N ways inputs merge M ways Sequential Any Any Sequential Sequential Sequential Sequential Program Program Gray OGI 12/11/97

38 Why are Relational Operators Successful for Parallelism?
Relational data model uniform operators on uniform data stream Closed under composition Each operator consumes 1 or 2 input streams Each stream is a uniform collection of data Sequential data in and out: Pure dataflow partitioning some operators (e.g. aggregates, non-equi-join, sort,..) requires innovation AUTOMATIC PARALLELISM Gray OGI 12/11/97

39 Database Systems “Hide” Parallelism
Automate system management via tools data placement data organization (indexing) periodic tasks (dump / recover / reorganize) Automatic fault tolerance duplex & failover transactions Automatic parallelism among transactions (locking) within a transaction (parallel execution) Gray OGI 12/11/97

40 SQL a Non-Procedural Programming Language
SQL: functional programming language describes answer set. Optimizer picks best execution plan Picks data flow web (pipeline), degree of parallelism (partitioning) other execution parameters (process placement, memory,...) Planning Execution Monitor Schema Executors GUI Plan Optimizer Rivers Gray OGI 12/11/97

41 Automatic Data Partitioning
Split a SQL table to subset of nodes & disks Partition within set: Range Hash Round Robin Good for equijoins, range queries group-by Good for equijoins Good to spread load Shared disk and memory less sensitive to partitioning, Shared nothing benefits from "good" partitioning Gray OGI 12/11/97

42 N x M way Parallelism N inputs, M outputs, no bottlenecks.
Gray OGI 12/11/97

43 Parallel Objects? How does all this DB parallelism connect to hardware/software Cyber Bricks? To scale to large client sets need lots of independent parallel execution. Comes for from from ORB. To scale to large data sets need intra-program parallelism (like parallel DBs) Requires some invention.

44 Outline Hardware CyberBricks Software CyberBricks What next?
all nodes are very intelligent Software CyberBricks standard way to interconnect intelligent nodes What next? Processing migrates to where the power is Disk, network, display controllers have full-blown OS Send RPCs to them (SQL, Java, HTTP, DCOM, CORBA) to them Computer is a federated distributed system. Parallel execution is important

45 MORE SLIDES but there is only so much time.
Too bad

46 The Disk Farm On a Card The 100GB disc card An array of discs
Can be used as 100 discs 1 striped disc 10 Fault Tolerant discs ....etc LOTS of accesses/second bandwidth 14" Life is cheap, its the accessories that cost ya. Processors are cheap, it’s the peripherals that cost ya (a 10k$ disc card).

47 Parallelism: Performance is the Goal
Goal is to get 'good' performance. Trade time for money. Law 1: parallel system should be faster than serial system Law 2: parallel system should give near-linear scaleup or near-linear speedup or both. Parallel DBMSs obey these laws Gray OGI 12/11/97

48 Success Stories Online Transaction Processing
many little jobs SQL systems support 50 k tpm-C (44 cpu, 600 disk 2 node ) Batch (decision support and Utility) few big jobs, parallelism inside Scan data at 100 MB/s Linear Scaleup to 1,000 processors transactions / sec hardware recs/ sec hardware Gray OGI 12/11/97

49 The New Law of Computing
Grosch's Law: Parallel Law: Needs Linear Speedup and Linear Scaleup Not always possible 1 MIPS 1 $ 1,000 MIPS 32 $ .03$/MIPS 2x $ is 4x performance 1 MIPS 1 $ 1,000 $ 1,000 MIPS 2x $ is 2x performance Gray OGI 12/11/97

50 Clusters being built Teradata 1,000 nodes (30k$/slice)
Tandem,VMScluster 150 nodes (100k$/slice) Intel, 9,000 55M$ ( 6k$/slice) Teradata, Tandem, DEC moving to NT+low slice price IBM: 512 nodes ASCI @ 100m$ (200k$/slice) PC clusters (bare handed) at dozens of nodes web servers (msn, PointCast,…), DB servers KEY TECHNOLOGY HERE IS THE APPS. Apps distribute data Apps distribute execution Gray OGI 12/11/97

51 Great Debate: Shared What? SMP or Cluster?
Shared Memory (SMP) Shared Disk Shared Nothing (network) Easy to program Difficult to build Difficult to scaleup Hard to program Easy to build Easy to scaleup Sequent, SGI, Sun VMScluster, Sysplex Tandem, Teradata, SP2 Winner will be a synthesis of these ideas Distributed shared memory (DASH, Encore) blurs distinction between Network and Bus (locality still important) But gives Shared memory message cost. Gray OGI 12/11/97

52 BOTH SMP and Cluster? Cluster of PCs Grow Up with SMP
4xP6 is now standard Grow Out with Cluster Cluster has inexpensive parts Cluster of PCs Gray OGI 12/11/97

53 Clusters Have Advantages
Clients and Servers made from the same stuff. Inexpensive: Built with commodity components Fault tolerance: Spare modules mask failures Modular growth grow by adding small modules

54 Meta-Message: Technology Ratios Are Important
If everything gets faster & cheaper at the same rate THEN nothing really changes. Things getting MUCH BETTER: communication speed & cost 1,000x processor speed & cost 100x storage size & cost 100x Things staying about the same speed of light (more or less constant) people (10x more expensive) storage speed (only 10x better)

55 Storage Ratios Changed
10x better access time 10x more bandwidth 4,000x lower media price DRAM/DISK 100:1 to 10:10 to 50:1

56 Today’s Storage Hierarchy : Speed & Capacity vs Cost Tradeoffs
Size vs Speed Price vs Speed 10 15 12 9 6 3 10 4 2 -2 -4 Cache Nearline Tape Offline Main Tape Disc Secondary Online Online $/MB Secondary Tape Tape Typical System (bytes) Disc Main Offline Nearline Tape Tape Cache 10 -9 10 -6 10 -3 10 10 3 10 -9 10 -6 10 -3 10 10 3 Access Time (seconds) Access Time (seconds)

57 Network Speeds Speed of light did not change
Link bandwidth grew 60% / year WAN speeds limited by politics if voice is X$/minute, how much is video? Gbps to desktop today! 10 Gbps channel is coming. 3Tbps fibers in laboratory thru parallelism (WDM). Paradox: WAN link has 40Gbps Processor bus is Gbps Comm Speedups 1e 9 1e 8 1e 7 1e 6 1e 5 1e 4 1e 3 Processors (i/s) LANs & WANs (b/s) 1960 1970 1980 1990 2000 Year Gray OGI 12/11/97

58 MicroProcessor Speeds Went Up
Clock rates went from 10Khz to 400Mhz Processors now 6x issue SPECInt fits in Cache, it tracks cpu speed Peak Advertised Performance (PAP) is 1.2 BIPS Real Application Performance (RAP) is 100 MIPS Similar curves for DEC VAX & Alpha HP/PA IBM R6000/ PowerPC MIPS & SGI SUN 0.1 1 10 100 1000 1980 1990 2000 8088 286 386 486 Pentium P6 Intel MicroProcessor Speeds (mips) source: Intel Gray OGI 12/11/97

59 Performance = Storage Accesses not Instructions Executed
In the “old days” we counted instructions and IO’s Now we count memory references Processors wait most of the time Where the time goes: clock ticks used by AlphaSort Components 70 MIPS “real” apps have worse Icache misses so run at 60 MIPS if well tuned, 20 MIPS if not Sort Disc Wait OS Memory Wait D-Cache Miss I-Cache B-Cache Data Miss Gray OGI 12/11/97

60 Storage Latency: How Far Away is the Data?
Gray OGI 12/11/97

61 Tape Farms for Tertiary Storage Not Mainframe Silos
100 robots 1M$ 50TB 50$/GB 3K Maps 10K$ robot 14 tapes 27 hr Scan 500 GB 5 MB/s 20$/GB Scan in 27 hours. many independent tape robots (like a disc farm) 30 Maps

62 The Metrics: Disk and Tape Farms Win
Data Motel: Data checks in, but it never checks out GB/K$ 1 , 000 , 000 Kaps 100 , 000 Maps 10 , 000 SCANS/Day 1 , 000 100 10 1 0.1 0.01 1000 x D i sc Farm STC Tape Robot 100x DLT Tape Farm 6,000 tapes, 8 readers

63 Tape & Optical: Beware of the Media Myth
Optical is cheap: 200 $/platter 2 GB/platter => 100$/GB (2x cheaper than disc) Tape is cheap: 30 $/tape 20 GB/tape => 1.5 $/GB (100x cheaper than disc).

64 Tape & Optical Reality: Media is 10% of System Cost
Tape needs a robot (10 k$ m$ ) tapes (at 20GB each) => 20$/GB $/GB (1x…10x cheaper than disc) Optical needs a robot (100 k$ ) 100 platters = 200GB ( TODAY ) => 400 $/GB ( more expensive than mag disc ) Robots have poor access times Not good for Library of Congress (25TB) Data motel: data checks in but it never checks out!

65 The Access Time Myth The Myth: seek or pick time dominates
The reality: (1) Queuing dominates (2) Transfer dominates BLOBs (3) Disk seeks often short Implication: many cheap servers better than one fast expensive server shorter queues parallel transfer lower cost/access and cost/byte This is now obvious for disk arrays This will be obvious for tape arrays

66 Billions Of Clients Every device will be “intelligent”
Doors, rooms, cars… Computing will be ubiquitous Gray OGI 12/11/97

67 Billions Of Clients Need Millions Of Servers
All clients networked to servers May be nomadic or on-demand Fast clients want faster servers Servers provide Shared Data Control Coordination Communication Clients Mobile clients Fixed clients Servers Server Super server Gray OGI 12/11/97

68 1987: 256 tps Benchmark 14 M$ computer (Tandem) A dozen people
False floor, 2 rooms of machines Admin expert Hardware experts A 32 node processor array Auditor Network expert Simulate 25,600 clients Manager Performance expert OS expert DB expert A 40 GB disk array (80 drives)

69 1988: DB2 + CICS Mainframe 65 tps
IBM 4391 Simulated network of 800 clients 2m$ computer Staff of 6 to do benchmark 2 x 3725 network controllers Refrigerator-sized CPU 16 GB disk farm 4 x 8 x .5GB

70 1997: 10 years later 1 Person and 1 box = 1250 tps
1 Breadbox ~ 5x 1987 machine room 23 GB is hand-held One person does all the work Cost/tps is 1,000x less 25 micro dollars per transaction 4x200 Mhz cpu 1/2 GB DRAM 12 x 4GB disk Hardware expert OS expert Net expert DB expert App expert 3 x7 x 4GB disk arrays

71 What Happened? Moore’s law: Things get 4x better every 3 years (applies to computers, storage, and networks) New Economics: Commodity class price/mips software $/mips k$/year mainframe , minicomputer microcomputer GUI: Human - computer tradeoff optimize for people, not computers mainframe mini micro time price

72 What Happens Next ? Last 10 years: 1000x improvement
1985 2005 1995 performance ? Last 10 years: x improvement Next 10 years: ???? Today: text and image servers are free 25 m$/hit => advertising pays for them Future: video, audio, … servers are free “You ain’t seen nothing yet!”

73 Smart Cards Then (1979) Now (1997)
EMV card with dynamic authentication (EMV=Europay, MasterCard, Visa standard) door key, vending machines, photocopiers Now (1997) Bull CP8 two chip card first public demonstration 1979 Courtesy of Dennis Roberson NCR.

74 Applications Memory Capacity 16 KB today but growing
Smart Card Memory Capacity Source: PIN/Card -Tech/ Courtesy of Dennis Roberson NCR 1990 1992 1996 1998 2000 2002 Memory Size (Bits) 300 M 1 M 3 K 10 K You are here 2004 16 KB today but growing super-exponentially Applications Cards will be able to store data (e.g. medical) books, movies,… money One of the factors limiting smart card deployment is the limited memory size that can be stored on the card. Smart cards today with 3 to 10 kilobytes of storage have advantages over magnetic stripe cards, but are limited in their ability to carry massive amounts of application data. As memory costs continue to improve, and miniaturization of the chips continues to improve, smart cards will move to a few hundred megabytes thus having the ability to store sufficient amounts of data to perform practical applications. The past tens years of smart card evolution has taught us that no single application is significantly strong to drive the market acceptance. Multifunction cards with massive memory capabilities can and will change that. Gray OGI 12/11/97 20 5 7

Download ppt "What Happens When Processing Storage Bandwidth are Free and Infinite?"

Similar presentations

Ads by Google