8 The MAD Future Terror Bytes8 In the beginning there was the Paramagnetic Limit: 10GbpsiLimit keeps growing (now ~ 200Gbpsi)Mark H. Kryder, Seagate Future Magnetic Recording Technologies FAST PDF. apologizes: “Only 100x density improvement, then we are out of ideas”That’s 20 TB desktop TB laptop!
9 Outline Changing Ratios History Who Needs a Petabyte? Disk to Ram DASD is DeadDisk space is freeDisk Archive-InterchangeNetwork faster than diskCapacity, AccessTCO == people costSmart disks happenedThe entry cost barrierWho Needs a Petabyte?
10 Storage Ratios Changed 10x better access time10x more bandwidth100x more capacityData 25x cooler (1Kaps/20MB vs 1Kaps/500MB)4,000x lower media price20x to 100x lower disk priceScan takes 10x longer (3 min vs 45 min)RAM/disk media price ratio changed:1:1:1today ~ $/GB disk 200: $/GB dram
11 Price_Ram_TB(t+10) = Price_Disk_TB(t) Disk Data Can Move to RAM in 10 years Disk ~100x cheaper than RAM per byteBoth get 100x bigger in 10 years.Move data to main memorySeems: RAM/Disk bandwidth ~100:1100:110 years
12 DASD (direct access storage device) is Dead accesses got cheaperBetter disksCheaper disks!Disk access/bandwidth: the scarce resource2003: 100 minute Scan : 5 minute ScanSequential bandwidth 50x faster than random Random Scan 3 daysRatio will get 10x worse in 10 years 100x more capacity, 10x more bandwidth.Invent ways to trade capacity for bandwidthUse the capacity without using bandwidth.300 GB50 MB/s
13 Disk Space is “free” Bandwidth & Accesses/sec are not 1k$/TB going to 100$/TB20 TB disks on the (distant) horizon100x density,Waste capacity intelligentlyVersion everythingNever delete anythingKeep many copiesSnapshotsMirrors (triple and geoplex)Cooperative caching (Farsite and OceanStore)Disk Archive
14 Disk as Archive-Interchange Tape is archive / interchange / low costDisc now competitive in all 3 categoriesWhat format? Fat? CDFS?..What tools?Need the software to do disk-based backup/restoreCommonly snapshot (multi-version FS)Radical: peer-to-peer file archivingMany researchers looking at this OceanStore, Farsite, others…
15 Disk vs Network Now the Network is Faster (!) Old days:10 MBps disk, low cpu cost ( 0.1 ins/b)1 MBps net, huge cpu cost (10 ins/b)New days:50 MBps disk, low cpu cost100 MBps net, low cpu cost (toe, rdma)Consequence:You can remote disks.Allows consolidationAggregate (bisection) bandwidth still a problem.
16 Storage TCO == people time 1980 rules-of-thumb:1 systems programmer per mips1 data admin per 10GB800 sys programmers + 4 data admins for your laptopSometimes it must seem like that but…Today one data admin per 1 TB TBDepending on process and data value.Automate everythingUse redundancy to mask (and repair) problems.Save people, spend hardware
17 Disk Evolution: Smart Disks KiloMegaGigaTeraPetaExaZettaYottaSystem on a chipHigh-speed LANDisk is super computer!
18 Smart Disks Happened Disk appliances are here: Cameras Games PVRs FileServersChallenge:entry price
19 The Entry Cost Barrier Connect the Dots Consumer electronics want low entry cost1970: 20,000$1980: 2,000$2000: $$If magnetics can’t do this, another technology will.Think: copiers, hydraulic shovels,…WantedTodayln(price)Time
20 Outline Yotta Zetta Exa History Peta Changing Ratios TeraGigaMegaKiloOutlineHistoryChanging RatiosWho Needs a Petabyte?Petabyte for 1k$ in yearsAffordable but uselessHow much information is there?The Memex visionMyLifeBitsThe other 20% (enterprise storage)We are here
21 A Bleak Future: The ½ Platter Society? Conclusion from Information Storage Industry Consortium HDD Applications Roadmap Workshop:“Most users need only 20GB”We are heading to a ½ platter industry.80% of units and capacity is personal disks (not enterprise servers).The end of disk capacity demand.A zero billion dollar industry?
22 Try to fill a terabyte in a year ItemItems/TBItems/day300 KB JPEG3 M9,8001 MB Doc1 M2,9001 hour 256 kb/s MP3 audio9 K261 hour 1.5 Mbp/s MPEG video2900.8Petabyte volume has to be some form of video.
23 Growth Comes From NEW Apps The 10M$ computer of 1980 costs 1k$ todayIf we were still doing the same things, IT would be a 0 B$/y industryNEW things absorb the new capacity2010 Portable ?100 Gips processor1 GB RAM1 TB disk1 Gbps networkMany form factors
24 The Terror Bytes are Here 1 TB costs 1k$ to buy1 TB costs 300k$/y to ownManagement & curation are expensive(I manage about 15TB in my spare time.no, I am not paid 4.5M$/y to manage it)Searching 1TB takes minutes or hours or days or..I am Petrified by Peta BytesBut… people can “afford” them so, we have lots to do – Automate!YottaZettaExaPetaTeraGigaMegaKiloWe arehere
25 How much information is there? YottaZettaExaPetaTeraGigaMegaKiloSoon everything can be recorded and indexedMost bytes will never be seen by humans.Data summarization, trend detection anomaly detection are key technologiesSee Mike Lesk: How much information is there:See Lyman & Varian:How much informationEverything!RecordedAll Books MultiMediaAll books(words).MovieA PhotoA Book24 Yecto, 21 zepto, 18 atto, 15 femto, 12 pico, 9 nano, 6 micro, 3 milli
26 Memex As We May Think, Vannevar Bush, 1945 “A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility”“yet if the user inserted 5000 pages of material a day it would take him hundreds of years to fill the repository, so that he can be profligate and enter material freely”
27 Why Put Everything in Cyberspace? Low rentmin $/byteShrinks timenow or laterShrinks spacehere or thereAutomate processingknowbotsPoint-to-PointORBroadcastImmediate OR Time DelayedLocateProcessAnalyzeSummarize
28 How Will We Find Anything? Need Queries, Indexing, Pivoting, Scalability, Backup, Replication, Online update, Set-oriented accessIf you don’t use a DBMS, you will implement one!Simple logical structure:Blob and link is all that is inherentAdditional properties (facets == extra tables) and methods on those tables (encapsulation)More than a file systemUnifies data and meta-dataSQL ++ DBMS
29 MyLifeBits The guinea pig Gordon Bell is digitizing his lifeHas now scanned virtually all:Books written (and read when possible)Personal documents (correspondence, memos, , bills, legal,0…)PhotosPosters, paintings, photo of things (artifacts, …medals, plaques)Home movies and videosCD collectionAnd, of course, all PC filesNow recording: phone, radio, TV (movies), web pages… conversationsPaperless throughout ” scanned, 12’ discarded.Only 30 GB!!! Excluding digital videosVideo is 2+ TB and growing fast
32 gbell wag: 67 yr, 25Kday life a Personal Petabyte 1PB
33 80% of data is personal / individual. But, what about the other 20%? BusinessWall Mart online: 1PB and growing….Paradox: most “transaction” systems < 1 PB.Have to go to image/data monitoring for big dataGovernmentGovernment is the biggest business.ScienceLOTS of data.
34 Information Avalanche Bothbetter observational instruments andBetter simulationsare producing a data avalancheExamplesTurbulence: 100 TB simulation then mine the InformationBaBar: Grows 1TB/day 2/3 simulation Information 1/3 observational InformationCERN: LHC will generate 1GB/s 10 PB/yVLBA (NRAO) generates 1GB/s todayNCBI: “only ½ TB” but doubling each year, very rich dataset.Pixar: 100 TB/MovieImage courtesy of C. Meneveau & A. Szalay @ JHU
35 Q: Where will the Data Come From? A: Sensor Applications Earth Observation15 PB by 2007Medical Images & Information + Health MonitoringPotential 1 GB/patient/y 1 EB/yVideo Monitoring~1E8 video 1E5 MBps 10TB/s 100 EB/y filtered???Airplane Engines1 GB sensor data/flight,100,000 engine hours/day30PB/ySmart Dust: ?? EB/y
36 Instruments: CERN – LHC Peta Bytes per Year Looking for the Higgs ParticleSensors: GB/s (1TB/s ~ 30 EB/y)Events GB/sFiltered GB/sReduced GB/s ~ 2 PB/yData pyramid: 100GB : 1TB : 100TB : 1PB : 10PBCERN Tier 0
37 LHC Requirements (2005- ) 1E9 events pa @ 1MB/ev = 1PB/year/expt Reconstructed = 100TB/recon/year/exptSend to Tier1 Regional Centres=> 400TB/year to RAL?Keep one set + derivatives on disk…and rest on tapeBut UK plans a Tier1 cloneMany data clonesSource: John GordonIT Department, CLRC/RALCUF Meeting, October 2000
38 Science Data Volume ESO/STECF Science Archive 100 TB archiveSimilar at Hubble, Keck, SDSS,…~1PB aggregate
39 Data Pipeline: NASA Level 0: raw data data stream Level 1: calibrated data measured valuesLevel 1A: calibrated & normalized flux/magnitude/…Level 2: derived data metrics vegetation indexData volume0 ~ 1 ~ 1A << 2Level 2 >> level 1 becauseMANY data productsMust keep all publisheddata Editions (versions)E1E2E3E4timeLevel 1A4 editions of 4 Level 2 products, each is small, but…EOSDIS Core System Information for Scientists,
40 DataGrid Computing Store exabytes twice (for redundancy) Access them from anywhereImplies huge archive/data centersSupercomputer centers become super data centersExamples: Google, Yahoo!, Hotmail, BaBar, CERN, Fermilab, SDSC, …
41 Outline History Changing Ratios Who Needs a Petabyte? Thesis: in 20 years, Personal Petabyte will be affordable.Most personal bytes will be video.Enterprise Exabytes will be sensor data.
43 TerraServer V4 8 web front end 4x8cpu+4GB DB 18TB triplicate disks Classic SAN (tape not shown)~2M$Works GREAT!2000…2004Now replaced by..WEBx8SANSQL x4
44 TerraServer V5 Storage Bricks KVM / IPStorage Bricks“White-box commodity servers”4tb raw / 2TB Raid1 SATA storageDual Hyper-threaded Xeon 2.4ghz, 4GB RAMPartitioned Databases (PACS – partitioned array)3 Storage Bricks = 1 TerraServer dataData partitioned across 20 databasesMore data & partitions comingLow Cost Availability4 copies of the dataRAID1 SATA Mirroring2 redundant “Bunches”Spare brick to repair failed brick 2N+1 designWeb Application “bunch aware”Load balances between redundant databasesFails over to surviving database on failure~100K$ capital expense.
45 How Do You Move A Terabyte? Time/TB$/TB Sent$/MbpsRent $/monthSpeed MbpsContext6 years3,0861,000400.04Home phone5 months360117700.6Home DSL2 months2,4698001,2001.5T12 days2,01065128,00043T314 hours97631649,000155OC314 minutes6172001,920,0009600OC 1921 day100100 Mpbs2.2 hours1000GbpsSource: TeraScale Sneakernet, Microsoft Research, Jim Gray et. all
46 Key Observations for Personal Store And for Larger Stores. Schematized storage can help organization and search.Schematized XML data sets a universal way exchange data answers and new data.If data are objects, then need standard representation for classes & methods.
47 Longhorn - For Knowledge Workers Simple (Self-*): auto install/manage/tune/repair.Schema: data carries semanticsSearch: find things fast (driven by schema)Sync: “desktop state” anywhereSecurity: (Palladium) -- trustworthy- privacy- trustworthy (virus, spam,..)- DRM (protect IP)Shell: task-based UI (aka activity-based UI)Office-LonghornIntelligent documentsXML and Schemas
48 How Do We Represent It To The Outside World? Schematized Storage <?xml version="1.0" encoding="utf-8" ?>- <DataSet xmlns="http://WWT.sdss.org/">- <xs:schema id="radec" xmlns="" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:msdata="urn:schemas-microsoft-com:xml-msdata"><xs:element name="radec" msdata:IsDataSet="true"><xs:element name="Table"> <xs:element name="ra" type="xs:double" minOccurs="0" /> <xs:element name="dec" type="xs:double" minOccurs="0" />…- <diffgr:diffgram xmlns:msdata="urn:schemas-microsoft-com:xml-msdata" xmlns:diffgr="urn:schemas-microsoft-com:xml-diffgram-v1">- <radec xmlns="">- <Table diffgr:id="Table1" msdata:rowOrder="0"> <ra> </ra> <dec> </dec> </Table>- <Table diffgr:id="Table10" msdata:rowOrder="9"> <ra> </ra> <dec> </dec></Table></radec> </diffgr:diffgram></DataSet>File metaphor too primitive: just a blobTable metaphor too primitive: just recordsNeed Metadata describing data contextFormatProvidence (author/publisher/ citations/…)RightsHistoryRelated documentsIn a standard formatXML and XML schemaDataSet is great example of thisWorld is now defining standard schemasschemaData ordifgram
49 There Is A ProblemNiklaus Wirth: Algorithms + Data Structures = ProgramsGREAT!!!!XML documents are portable objectsXML documents are complex objectsWSDL defines the methods on objects (the class)But will all the implementations match?Think of UNIX or SQL or C or…This is a work in progress.
50 Disk Storage Cheaper Than Paper File Cabinet (4 drawer) 250$ Cabinet: Paper (24,000 sheets) 250$ Space 10€/ft2) 180$ Total 700$ $/sheet pennies per pageDisk: disk (250 GB =) 250$ ASCII: 100 m pages 2e-6 $/sheet(10,000x cheaper) micro-dollar per page Image: m photos 3e-4 $/photo (100x cheaper) milli-dollar per photoStore everything on disk Note: Disk is 100x to 1000x cheaper than RAM
51 Data Analysis Looking for Needles are easier than haystacks Needles in haystacks – the Higgs particleHaystacks: Dark matter, Dark energyNeedles are easier than haystacksGlobal statistics have poor scalingCorrelation functions are N2, likelihood techniques N3As data and computers grow at same rate, we can only keep up with N logNA way out?Discard notion of optimal (data is fuzzy, answers are approximate)Don’t assume infinite computational resources or memoryRequires combination of statistics & computer science
52 Analysis and Databases Much statistical analysis deals withCreating uniform samples –Data filteringAssembling relevant subsetsEstimating completenessCensoring bad dataCounting and building histogramsGenerating Monte-Carlo subsetsLikelihood calculationsHypothesis testingTraditionally these are performed on filesMost of these tasks are much better done inside DBBring Mohamed to the mountain, not the mountain to him
53 Data Access is hitting a wall FTP and GREP are not adequate You can GREP 1 MB in a secondYou can GREP 1 GB in a minuteYou can GREP 1 TB in 2 daysYou can GREP 1 PB in 3 years.Oh!, and 1PB ~5,000 disksAt some point you need indices to limit search parallel data search and analysisThis is where databases can helpYou can FTP 1 MB in 1 secYou can FTP 1 GB / min (= 1 $/GB)… days and 1K$… 3 years and 1M$
54 Smart Data (active databases) If there is too much data to move around,take the analysis to the data!Do all data manipulations at databaseBuild custom procedures and functions in the databaseAutomatic parallelismEasy to build-in custom functionalityDatabases & Procedures being unifiedExample temporal and spatial indexing pixel processing, …Easy to reorganize the dataMultiple views, each optimal for certain types of analysesBuilding hierarchical summaries are trivialScalable to Petabyte datasets