University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,

L.V. KaleMD on very Large PIM machines2 Overview  EIA-0081307: “ITR: Intelligent Memory Architectures and Algorithms to Crack the Protein Folding Problem”  PIs: –Josep Torrellas and Laxmikant Kale (University of Illinois) –Mark Tuckerman (New York University) –Michael Klein (University of Pennsylvania) –Also associated: Glenn Martyna (IBM)  Period: 8/00 - 7/03

L.V. KaleMD on very Large PIM machines3 Project Description  Multidisciplinary project in computer architecture and software, and computational biology  Goals: –Design improved algorithms to help solve the protein folding problem –Design the architecture and software of general- purpose parallel machines that speed-up the solution of the problem

L.V. KaleMD on very Large PIM machines4 Some Recent Progress: Ideas  Developed REPSWA –(Reference Potential Spatial Warping Algorithm) –Novel algorithm for accelerating conformational sampling in molecular dynamics, a key element in protein folding –Based on ``spatial warping'' variable transformation.  This transformation is designed to shrink barrier regions on the energy landscape and grow attractive basins without altering the equilibrium properties of the system –Result: large gains in sampling efficiency –Using novel variable transformations to enhance conformational sampling in molecular dynamics Z. Zhu, M. E. Tuckerman, S. O. Samuelson and G. J. Martyna, Phys. Rev. Lett. 88, 100201 (2002).Using novel variable transformations to enhance conformational sampling in molecular dynamics

L.V. KaleMD on very Large PIM machines5 Some Recent Progress: Tools  Developed LeanMD, a molecular dynamics parallel program that targets at very large scale parallel machines –Research-quality program based on the Charm++ parallel object oriented language –Descendant from NAMD (another parallel molecular dynamics application) that achieved unprecedented speedup on thousands of processors –LeanMD to be able to run on next generation parallel machines with ten thousands or even millions of processors such as Blue Gene/L or Blue Gene/C –Requires a new parallelization strategy that can break up the simulation problem in a more fine grained manner to generate parallelism enough to effectively distribute work across a million processors.

L.V. KaleMD on very Large PIM machines6 Some Recent Progress: Tools  Developed a high-performance communication library –For collective communication operations  AlltoAll personalized communication, AlltoAll multicast, and AllReduce  These operations can be complex and time consuming in large parallel machines  Especially costly for applications that involve all-to-all patterns –such as 3-D FFT and sorting –Library optimizes collective communication operations  by performing message combining via imposing a virtual topology –The overhead of AlltoAll communication for 76-byte message exchanges between 2058 processors is in the low tens of milliseconds

L.V. KaleMD on very Large PIM machines7 Some Recent Progress: People  The following graduate student researchers have been supported: –Sameer Kumar (University of Illinois) –Gengbin Zheng (University of Illinois) –Jun Nakano (University of Illinois) –Zhongwei Zhu (New York University)

L.V. KaleMD on very Large PIM machines8 Overview  Rest of the talk: –Objective: Develop a Molecular Dynamics program that will run effectively on a million processors  Each with low memory to processor ratio –Method:  Use parallel objects methodology  Develop an emulator/simulator that allows one to run full- fledged programs on simulated architecture –Presenting Today:  Simulator details  LeanMD Simulation on BG/L and BG/C

L.V. KaleMD on very Large PIM machines9 Performance Prediction on Large Machines  Problem: –How to predict performance of applications on future machines? –How to do performance tuning without continuous access to a large machine?  Solution: –Leverage virtualization –Develop a machine emulator –Simulator: accurate time modeling –Run a program on “100,000 processors” using only hundreds of processors

L.V. KaleMD on very Large PIM machines10 Blue Gene Emulator: functional view Affinity message queues Communication threads Worker threads inBuff Non-affinity message queues Correction Q Converse scheduler Converse Q Communication threads Worker threads inBuff Non-affinity message queues Correction Q Affinity message queues

L.V. KaleMD on very Large PIM machines11 Emulator to Simulator  Emulator: –Study programming model and application development  Simulator: –performance prediction capability –models communication latency based on network model; –Doesn’t model memory access on chip, or network contention  Parallel performance is hard to model –Communication subsystem  Out of order messages  Communication/computati on overlap –Event dependencies  Parallel Discrete Event Simulation –Emulation program executes in parallel with event time stamp correction. –Exploit inherent determinacy of application

L.V. KaleMD on very Large PIM machines12 How to simulate?  Time stamping events –Per thread timer (sharing one physical timer) –Time stamp messages  Calculate communication latency based on network model  Parallel event simulation –When a message is sent out, calculate the predicted arrival time for the destination bluegene-processor –When a message is received, update current time as:  currTime = max(currTime,recvTime) –Time stamp correction

L.V. KaleMD on very Large PIM machines13 Parallel correction algorithm  Sort message execution by receive time;  Adjust time stamps when needed  Use correction message to inform the change in event startTime.  Send out correction messages following the path message was sent  The events already in the timeline may have to move.

L.V. KaleMD on very Large PIM machines14 M8 M1M7M6M5M4M3M2 RecvTime Execution TimeLine Timestamps Correction

L.V. KaleMD on very Large PIM machines15 M8 M1M7M6M5M4M3M2 RecvTime Execution TimeLine Timestamps Correction

L.V. KaleMD on very Large PIM machines16 M1M7M6M5M4M3M2 RecvTime Execution TimeLine M8 Execution TimeLine M1M7M6M5M4M3M2M8 RecvTime Correction Message Timestamps Correction

L.V. KaleMD on very Large PIM machines17 M1M7M6M5M4M3M2 RecvTime Execution TimeLine Correction Message (M4) M4 Correction Message (M4) M4 M1M7M4M3M2 RecvTime Execution TimeLine M5M6 Correction Message M1M7M6M4M3M2 RecvTime Execution TimeLine M5 Correction Message Timestamps Correction

L.V. KaleMD on very Large PIM machines18 Predicted time vs latency factor Validation

L.V. KaleMD on very Large PIM machines19 LeanMD  LeanMD is a molecular dynamics simulation application written in Charm++  Next generation of NAMD, –The Gordon Bell Award winner in SC2002.  Requires a new parallelization strategy –break up the problem in a more fine-grained manner to effectively distribute work across the extreme large number of processors.

L.V. KaleMD on very Large PIM machines20 LeanMD Performance Analysis Need readable graphs: 1 to a page is fine, but with larger fonts, thicker lines

L.V. KaleMD on very Large PIM machines21

University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,

Similar presentations

Presentation on theme: "University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,

Similar presentations

Presentation on theme: "University of Illinois at Urbana-Champaign Memory Architectures for Protein Folding: MD on million PIM processors Fort Lauderdale, May 03,"— Presentation transcript:

Similar presentations

About project

Feedback