Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.

Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems

 We need a paradigm shift to make supercomputers more usable for mainstream computational scientists.  A similar shift occurred in computing in the 1970s when the advent of inexpensive minicomputers into academia spurred a large body of computing research.  Results from this research went back to industry creating a growth cycle that lead computing being a commodity.  This requires a comprehensive “rethink” of programming languages, runtime systems, operating systems, scheduling, reliability and operations and management  Moving to petascale and exascale class systems significantly complicates this challenge.  Need a computing environment that can efficiently and usably span the scales from department sized systems to national resources.

 The majority of our supercomputers today are distributed memory systems that use the message passing model of parallel computation.  The shared/distributed memory view is a dichotomy imposed by hardware constraints.  Modern high performance interconnects such as Infiniband are memory based systems.  Provides the hardware basis to envision DSM systems that deepen the memory hierarchy.  Most common operations are accelerated through hardware offload.

 Common question.  My application runs on my desktop, but it takes too long. Can I just run it on the supercomputer and make it run faster?  Short answer: no. Longer answer: almost certainly not.  As core frequencies have flattened, multi-core and many core architectures are here to stay.  This is increasing the prevalence of threaded codes.  Can we take standard threaded codes and run them on a cluster supercomputer without any modifications or recompilation?

 The goal of our work is to enable Pthread based threaded codes to transparently run cluster supercomputers.  The DSM system acts as the runtime, and provides a globally consistent memory abstraction.  New consistency algorithm with release consistency semantics guarantees correct operation for valid threaded codes.  No, it won’t fix your bugs, but it may make deadlock and livelock detection easier, possibly even automatic.

 Separation of concerns  The system is divided into a consistency layer and a lower level communication layer.  The communication layer uses a well-defined architecture similar to MPI’s ADI to enable a wide variety of lower level interconnects.  System consists of either dedicated memory servers or nodes may share a portion of their memory into a global pool.  Dedicated memory servers are essentially low end servers that can host a lot of memory over a fast interconnect.  Memory striping algorithms employed to mitigate memory access hotspots.

 The DSM architecture uses a global scheduler that treats cluster nodes as a set of processors.  Thread migration is simple and relatively inexpensive.  This enables load balancing through runtime migration.  Two issues:  Compute Load Imbalance  Data Affinity

 Extending the threads model to support adaptivity.  Transactional Memory  An artifact of our consistency protocol enables us to provide transactional memory semantics fairly inexpensively.  This enables speculative and/or adaptive execution models, particularly in hard to parallelize sequential sections of code.  Speculation enables us to explore multiple execution paths, with with the DSM guaranteeing that there are no memory side effects. Invalid paths are simply pruned.  Adaptive execution enables optimistic and conservative algorithms to be started concurrently.

 Current threaded and message passing models are inadequate for peta and exascale systems.  Growth in heterogeneous multi-core systems significantly complicates this problem.  Need more comprehensive runtime systems that can aid in load balancing, profile guided optimization and code adaptation.  The move in the compilers community towards greater emphasis on dynamic analysis is a step in this direction.

 We are working on hybrid programming models that combine von Neumann program counter based elements embedded in dataflow constructs.  A new model must provide insights into problem decomposition as well as map existing decomposition methods.  Coordination models that can operate at peta and exascale.

 Methods to evolve applications easily when requirements change.  Working with the compilers, programming languages, architectures, software engineering, applications and systems communities to realize this goal.

 System G: 2600 core Intel x86 Xeon Penryn processors with Quad Data Rate Infiniband.  12,000 thermal sensors, 5000 power sensors  System X: 2200 processor PowerPC cluster with Infiniband interconnect.  Anantham: 400 processor Opteron Cluster with Myrinet interconnect  Several 8-32 processor research clusters.  12 processor SGI Altix shared memory system  8 processor AMD Opteron shared memory system.  16 core AMD Opteron shared memory system  16 node Playstation 3 cluster

Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.

Similar presentations

Presentation on theme: "Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems.

Similar presentations

Presentation on theme: "Prof. Srinidhi Varadarajan Director Center for High-End Computing Systems."— Presentation transcript:

Similar presentations

About project

Feedback