Presentation on theme: "Technion, Haifa Israel, June 2013"— Presentation transcript:
1 Technion, Haifa Israel, June 2013 21st Century Computer Architecture A community white paperTechnion, Haifa Israel, June 2013Information & Commun. Tech’s ImpactSemiconductor Technology’s ChallengesComputer Architecture’s FutureExample: Bypassing Paged Virtual Memory
2 White Paper Participants Sarita Adve, U Illinois *David H. Albonesi, Cornell UDavid Brooks, Harvard ULuis Ceze, U Washington *Sandhya Dwarkadas, U RochesterJoel Emer, Intel/MITBabak Falsafi, EPFLAntonio Gonzalez, Intel/UPCMark D. Hill, U Wisconsin *,**Mary Jane Irwin, Penn State U *David Kaeli, Northeastern U *Stephen W. Keckler, NVIDIA/U TexasChristos Kozyrakis, Stanford UAlvin Lebeck, Duke UMilo Martin, U PennsylvaniaJosé F. Martínez, Cornell UMargaret Martonosi, Princeton U *Kunle Olukotun, Stanford UMark Oskin, U WashingtonLi-Shiuan Peh, M.I.T.Milos Prvulovic, Georgia TechSteven K. Reinhardt, AMDMichael Schulte, AMD/U WisconsinSimha Sethumadhavan, Columbia UGuri Sohi, U WisconsinDaniel Sorin, Duke UJosep Torrellas, U Illinois *Thomas F. Wenisch, U Michigan *David Wood, U Wisconsin *Katherine Yelick, UC Berkeley/LBNL *“*” contributed prose; “**” effort coordinator Thanks of CCC, Erwin Gianchandani & Ed Lazowska for guidance and Jim Larus & Jeannette Wing for feedback
3 20th Century ICT Set UpInformation & Communication Technology (ICT) Has Changed Our World<long list omitted>Required innovations in algorithms, applications, programming languages, … , & system softwareKey (invisible) enablers (cost-)performance gainsSemiconductor technology (“Moore’s Law”)Computer architecture (~80x per Danowitz et al.)
5 21st Century Promise ICT Promises Much More Characterized by Data-centric personalized health careComputation-driven scientific discoveryHuman network analysisMuch more: known & unknownCharacterized byBig DataAlways OnlineSecure/Private…Whither enablers of future (cost-)performance gains?
6 Technology’s Challenges 1/2 Late 20th CenturyThe New RealityMoore’s Law — 2× transistors/chipTransistor count still 2× BUT…Dennard Scaling —~constant power/chipGone. Can’t repeatedly double power/chip
7 Classic CMOS Dennard Scaling: the Science behind Moore’s Law (Finding 2)Classic CMOS Dennard Scaling: the Science behind Moore’s LawSource: Future of Computing Performance: Game Over or Next Level?, National Academy Press, 2011Scaling:Voltage:V/aOxide:tOX/aResults:Power/ckt:1/a2Power Density:~ConstantNational Research Council (NRC) – Computer Science and Telecommunications Board (CSTB.org)
8 Post-classic CMOS Dennard Scaling Post Dennard CMOS Scaling RuleTODO:Chips w/ higher power (no), smaller (), dark silicon (), or other (?)Scaling:Voltage:V/aVOxide:tOX/aResults:Power/ckt:1/a21Power Density:~Constanta2National Research Council (NRC) – Computer Science and Telecommunications Board (CSTB.org)
9 Technology’s Challenges 2/2 Late 20th CenturyThe New RealityMoore’s Law — 2× transistors/chipTransistor count still 2× BUT…Dennard Scaling —~constant power/chipGone. Can’t repeatedly double power/chipModest (hidden) transistor unreliabilityIncreasing transistor unreliability can’t be hiddenFocus on computation over communicationCommunication (energy) more expensive than computation1-time costs amortized via mass marketOne-time cost much worse & want specialized platformsHow should architects step up as technology falters?
10 21st Century Comp Architecture 20th Century21st CenturySingle-chip in generic computerArchitecture as Infrastructure: Spanning sensors to cloudsPerformance plus security, privacy, availability, programmability, …Cross-Cutting:Break current layers with new interfacesPerformance via invisible instr.-level parallelismEnergy FirstParallelismSpecializationCross-layer designPredictable technologies: CMOS, DRAM, & disksNew technologies (non-volatile memory, near-threshold, 3D, photonics, …) Rethink: memory & storage, reliability, communicationXX
11 21st Century Comp Architecture 20th Century21st CenturySingle-chip in stand-alone computerArchitecture as Infrastructure: Spanning sensors to cloudsPerformance plus security, privacy, availability, programmability, …Cross-Cutting:Break current layers with new interfacesPerformance via invisible instr.-level parallelismEnergy FirstParallelismSpecializationCross-layer designPredictable technologies: CMOS, DRAM, & disksNew technologies (non-volatile memory, near-threshold, 3D, photonics, …) Rethink: memory & storage, reliability, communication
12 What Research Exactly? Research areas in white paper (& backup slides) Architecture as Infrastructure: Spanning Sensors to CloudsEnergy FirstTechnology Impacts on ArchitectureCross-Cutting Issues & InterfacesMuch more research developed by future PIs!E.g.: Efficient Virtual Memory for Big Memory ServersBasu, Gandhi, Chang, Hill, & Swift [ISCA 2013]Big Memory: graph500, memcached, databasesSelf-manage most memory (e.g., bufferpool)
13 Execution Time Overhead: TLB Misses Significant wasteLarger memory?Byte-addr NVM?Lower is betterFirst lets see whether the problem actually exists or not.A set of workloads, first column shows how many cycles hardware page table walker in a 32 nm Sandybridntge/westmere machine spends as percentage of the execution time.Second column shows number of L1 + L2 TLB misses experienced per 1K instructionsMore interestingly when we do the same experiment on non-server things are very different10/5/12
14 Hardware: Direct Segment 1Conventional Paging2Direct SegmentBASE LIMITVAOFFSETPAWhy Direct Segment?Matches Big Memory Workload needsNO Paging => NO TLB Miss
15 Execution Time Overhead: TLB Misses 92-100% TLB “misses” to direct segmentRequires: Both small SW + small HW changes10/5/12
16 Technion, Haifa Israel, June 2013 21st Century Computer Architecture A community white paperTechnion, Haifa Israel, June 2013Information & Commun. Tech’s ImpactSemiconductor Technology’s ChallengesComputer Architecture’s FutureExample: Bypassing Paged Virtual Memory
17 Pre-Competitive Research Justified Retain (cost-)performance enabler to ICT revolutionSuccessful companies cannot do this by themselvesLack needed long-term focusDon’t want to pay for what benefits allResist transcending interfaces that define their products
18 White Paper Process Late March 2012 April 2012 May 2012 CCC contacts coordinator & forms groupApril 2012Brainstorm (meetings/online doc)Read related docs (PCAST, NRC Game Over, ACAR1/2, …)Use online doc for intro & outline then parallel sectionsRotated authors to revise sectionsMay 2012Brainstorm list of researcher in/out of comp. architectureSolicit researcher feedback/endorsementDo distributed revision & redo of introRelease May 25 to CCC & viaKudos to participants on executing on a tight timetable
19 Back Up Slides Detailed research areas in white paper Architecture as Infrastructure: Spanning Sensors to CloudsEnergy FirstTechnology Impacts on ArchitectureCross-Cutting Issues & InterfacesFindings on National Academy “Game Over” StudyGlimpse at DARPA/ISAT Workshop “Advancing Computer Systems without Technology Progress”
20 1. Architecture as Infrastructure: Spanning Sensors to Clouds Beyond a chip in a generic computerTo pillar of 21st century societal infrastructure.Computation in context (sensor, mobile, …, data center)Systems often large & distributedCommunication issues can dominate computationGoals beyond performance (battery life, form factor)Opportunities (not exhaustive)Reliable sensors harvesting (intermittent) energySmart phones to Star Trek’s medical “tricorder”Cloud infrastructure suitable for both “Big Data” streams & low-latency qualify-of-service with stragglersAnalysis & design tools that scale
21 2. Energy First Beyond single-core performance computer To (cost-)performance per watt/jouleEnergy across the layersCircuit/technology (near-threshold CMOS, 3D stacking)Architecture (reducing unnecessary data movement)Software (communication-reducing algorithms)Parallelism to save energyVast (fined-grained) homogeneous & heterogeneousImproved SW stackApplications focus (beyond graphic processing units)Specialization for performance & energy efficiencyAbstractions for specialization (reducing 1-time cost)Energy-efficient memory hierarchiesReconfigurable logic structures
22 3. Technology Impacts on Architecture Beyond CMOS, Dram, & Disks of last 3+ decades toUsing replacement circuit technologiesSub/near-threshold CMOS, QWFETs, TFETs, and QCAsNon-volatile storageBeyond flash memory to STT-RAM, PCRAM, & memristor3D die stacking & interposerslogic, cache, small main memoryPhotonic interconnectsInter- & even intra-chipDesign automationfrom circuit-design w/ new technologies topre-RTL functional, performance, power, area modeling of heterogeneous chips & systems
23 4. Cross-Cutting Issues & Interfaces Beyond performance w/ stable interfaces toNew design goals (for pillar of societal infrastructure)Verifiability (bugs kill)Reliability (“dependability” computing base?)Security/Privacy (w/ non-volatile memory?)Programmability (time to correct-performant solution)Better InterfacesHigh-level information (quality of service, provenance)Parallelism ((in)dependence, (lack of) side-effects)Orchestrating communication ((recursive) locality)Security/Reliability (fine-grain protection)
24 Executive summary (Added to National Academy Slides) Highlights of National Academy Findings(F1) Computer hardware has transitioned to multicore(F2) Dennard scaling of CMOS has broken down(F3) Parallelism and locality must be exploited by software(F4) Chip power will soon limit multicore scalingEight recommendations from algorithms to educationWe know all of this at some level, BUT:Are we all acting on this knowledge or hoping for business as usual?Thinking beyond next paper to where future value will be created?Questions Asked but Not Answered Embedded in NA TalkBriefly Close with Kübler-Ross Stages of Grief: Denial … AcceptanceSource: Future of Computing Performance: Game Over or Next Level?, National Academy Press, 2011Mark Hill talk (http://www.cs.wisc.edu/~markhill/NRCgameover_wisconsin_2011_05.pptx)
25 System Capability (log) The GraphNew TechnologyOur FocusCMOSSystem Capability (log)Fallow Period80s90s00s10s20s30s40s50sSource: Advancing Computer Systems without Technology Progress,ISAT Outbrief (http://www.cs.wisc.edu/~markhill/papers/isat2012_ACSWTP.pdf) Mark D. Hill and Christos Kozyrakis, DARPA/ISAT Workshop, March 26-27, 2012.Approved for Public Release, Distribution UnlimitedThe views expressed are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S. Government.
26 ~1000x = 2 decades of Moore’s Law! Surprise 1 of 2Can Harvest in the “Fallow” Period!2 decades of Moore’s Law-like perf./energy gainsWring out inefficiencies used to harvest Moore’s LawHW/SW Specialization/Co-design (3-100x)Reduce SW Bloat (2-1000x)Approximate Computing (2-500x)~1000x = 2 decades of Moore’s Law!
27 “Surprise” 2 of 2 Systems must exploit LOCALITY-AWARE parallelism Parallelism Necessary, but not SufficientAs communication’s energy costs dominateShouldn’t be a surprise, but many are in denialBoth surprises hard, requiring “vertical cut” thru SW/HW