1 Combinatorial Scientific Computing: Experiences, Directions, and Challenges John R. Gilbert University of California, Santa Barbara DOE CSCAPES Workshop.

Slides:



Advertisements
Similar presentations
1 Computational models of the physical world Cortical bone Trabecular bone.
Advertisements

SE263 Video Analytics Course Project Initial Report Presented by M. Aravind Krishnan, SERC, IISc X. Mei and H. Ling, ICCV’09.
Seunghwa Kang David A. Bader Large Scale Complex Network Analysis using the Hybrid Combination of a MapReduce Cluster and a Highly Multithreaded System.
CS 240A: Solving Ax = b in parallel Dense A: Gaussian elimination with partial pivoting (LU) Same flavor as matrix * matrix, but more complicated Sparse.
1 Parallel Sparse Operations in Matlab: Exploring Large Graphs John R. Gilbert University of California at Santa Barbara Aydin Buluc (UCSB) Brad McRae.
CSE 380 – Computer Game Programming Pathfinding AI
MATH 685/ CSI 700/ OR 682 Lecture Notes
Sparse Matrices in Matlab John R. Gilbert Xerox Palo Alto Research Center with Cleve Moler (MathWorks) and Rob Schreiber (HP Labs)
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Mining and Searching Massive Graphs (Networks)
Revisiting a slide from the syllabus: CS 525 will cover Parallel and distributed computing architectures – Shared memory processors – Distributed memory.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 1 (Part 1) Introduction/Overview Tuesday, 9/4/01.
Graph & BFS.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
Graph Analysis with High Performance Computing by Bruce Hendrickson and Jonathan W. Berry Sandria National Laboratories Published in the March/April 2008.
Algorithms and Problem Solving-1 Algorithms and Problem Solving.
High-Performance Computation for Path Problems in Graphs
Algorithms and Problem Solving. Learn about problem solving skills Explore the algorithmic approach for problem solving Learn about algorithm development.
Avoiding Communication in Sparse Iterative Solvers Erin Carson Nick Knight CS294, Fall 2011.
The Landscape of Ax=b Solvers Direct A = LU Iterative y’ = Ay Non- symmetric Symmetric positive definite More RobustLess Storage (if sparse) More Robust.
High Performance Computing 1 Parallelization Strategies and Load Balancing Some material borrowed from lectures of J. Demmel, UC Berkeley.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
1 A High-Performance Interactive Tool for Exploring Large Graphs John R. Gilbert University of California, Santa Barbara Aydin Buluc & Viral Shah (UCSB)
Sparse Matrix Methods Day 1: Overview Matlab and examples Data structures Ax=b Sparse matrices and graphs Fill-reducing matrix permutations Matching and.
1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE.
1 Lecture 24: Parallel Algorithms I Topics: sort and matrix algorithms.
Computer Science Prof. Bill Pugh Dept. of Computer Science.
CS240A: Conjugate Gradients and the Model Problem.
Tools and Primitives for High Performance Graph Computation
CS240A: Computation on Graphs. Graphs and Sparse Matrices Sparse matrix is a representation.
1 High-Performance Graph Computation via Sparse Matrices John R. Gilbert University of California, Santa Barbara with Aydin Buluc, LBNL; Armando Fox, UCB;
Conjugate gradients, sparse matrix-vector multiplication, graphs, and meshes Thanks to Aydin Buluc, Umit Catalyurek, Alan Edelman, and Kathy Yelick for.
1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara Grand Challenges in Data-Intensive Discovery.
© Fujitsu Laboratories of Europe 2009 HPC and Chaste: Towards Real-Time Simulation 24 March
Combinatorial Scientific Computing is concerned with the development, analysis and utilization of discrete algorithms in scientific and engineering applications.
MapReduce and Graph Data Chapter 5 Based on slides from Jimmy Lin’s lecture slides ( (licensed.
Graph Algorithms for Irregular, Unstructured Data John Feo Center for Adaptive Supercomputing Software Pacific Northwest National Laboratory July, 2010.
Computational issues in Carbon nanotube simulation Ashok Srinivasan Department of Computer Science Florida State University.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Combinatorial Scientific Computing and Petascale Simulation (CSCAPES) A SciDAC Institute Funded by DOE’s Office of Science Investigators Alex Pothen, Florin.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
InterConnection Network Topologies to Minimize graph diameter: Low Diameter Regular graphs and Physical Wire Length Constrained networks Nilesh Choudhury.
CS 240A Applied Parallel Computing John R. Gilbert Thanks to Kathy Yelick and Jim Demmel at UCB for.
Interactive Supercomputing Update IDC HPC User’s Forum, September 2008.
CS240A: Conjugate Gradients and the Model Problem.
CS240A: Computation on Graphs. Graphs and Sparse Matrices Sparse matrix is a representation.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Data Structures and Algorithms in Parallel Computing Lecture 3.
Data Structures and Algorithms in Parallel Computing Lecture 7.
CS 290H Administrivia: May 14, 2008 Course project progress reports due next Wed 21 May. Reading in Saad (second edition): Sections
An Interactive Environment for Combinatorial Scientific Computing Viral B. Shah John R. Gilbert Steve Reinhardt With thanks to: Brad McRae, Stefan Karpinski,
Application: Multiresolution Curves Jyun-Ming Chen Spring 2001.
Extreme Computing’05 Parallel Graph Algorithms: Architectural Demands of Pathological Applications Bruce Hendrickson Jonathan Berry Keith Underwood Sandia.
CES 592 Theory of Software Systems B. Ravikumar (Ravi) Office: 124 Darwin Hall.
CS 290H Lecture 15 GESP concluded Final presentations for survey projects next Tue and Thu 20-minute talk with at least 5 min for questions and discussion.
UMass Lowell Computer Science Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Review Lecture Tuesday, 12/11/01.
Monte Carlo Linear Algebra Techniques and Their Parallelization Ashok Srinivasan Computer Science Florida State University
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy’s National Nuclear.
A Parallel Hierarchical Solver for the Poisson Equation Seung Lee Deparment of Mechanical Engineering
Symmetric-pattern multifrontal factorization T(A) G(A)
Conjugate gradient iteration One matrix-vector multiplication per iteration Two vector dot products per iteration Four n-vectors of working storage x 0.
CSCAPES Mission Research and development Provide load balancing and parallelization toolkits for petascale computation Develop advanced automatic differentiation.
Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.
Parallel Algorithm Design & Analysis Course Dr. Stephen V. Providence Motivation, Overview, Expectations, What’s next.
Sub-fields of computer science. Sub-fields of computer science.
Parallel Direct Methods for Sparse Linear Systems
Algorithms and Problem Solving
Objective of This Course
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
Presentation transcript:

1 Combinatorial Scientific Computing: Experiences, Directions, and Challenges John R. Gilbert University of California, Santa Barbara DOE CSCAPES Workshop June 11, 2008 Support: DOE Office of Science, DARPA, SGI, MIT Lincoln Labs

2 Combinatorial Scientific Computing “I observed that most of the coefficients in our matrices were zero; i.e., the nonzeros were ‘sparse’ in the matrix, and that typically the triangular matrices associated with the forward and back solution provided by Gaussian elimination would remain sparse if pivot elements were chosen with care” - Harry Markowitz, describing the 1950s work on portfolio theory that won the 1990 Nobel Prize for Economics

3 Combinatorial Scientific Computing “The emphasis on mathematical methods seems to be shifted more towards combinatorics and set theory – and away from the algorithm of differential equations which dominates mathematical physics.” - John von Neumann & Oskar Morgenstern, 1944

4 “Combinatorial problems generated by challenges in data mining and related topics are now central to computational science. Finally, there’s the Internet itself, probably the largest graph-theory problem ever confronted.” Combinatorial Scientific Computing - Isabel Beichl & Francis Sullivan, 2008

5 A few directions in CSC Hybrid discrete & continuous computations Multiscale combinatorial computation Analysis, management, and propagation of uncertainty Economic & game-theoretic considerations Computational biology & bioinformatics Computational ecology Knowledge discovery & machine learning Relationship analysis Web search and information retrieval Sparse matrix methods Geometric modeling...

6 Ten Challenges in Combinatorial Scientific Computing

7 #1: The Parallelism Challenge LANL / IBM Roadrunner > 1 PFLOPS Two Nvidia 8800 GPUs > 1 TFLOPS Intel 80- core chip > 1 TFLOPS  Different programming models  Different levels of fit to irregular problems & graph algorithms

8 #2: The Architecture Challenge  The memory wall: Most of memory is hundreds or thousands of cycles away from the processor that wants it.  Computations that follow the edges of irregular graphs are unavoidably latency-limited.  Speed of light: “You can buy more bandwidth, but you can’t buy less latency.”  Some help from encapsulating coarse-grained primitives in carefully-tuned library routines... ... but the difficulty is intrinsic to most graph computations, hence can likely only be addressed by machine architecture.

9 An architectural approach: Cray MTA / XMT Hide latency by massive multithreading Per-tick context switching Slower clock rate Uniform (sort of) memory access time But the economic case is less than completely clear….

10 #3: The Algorithms Challenge  Efficient sequential algorithms for combinatorial problems often follow long sequential dependencies.  Example: Assefaw’s talk on graph coloring  Several parallelization strategies exist, but no silver bullet:  Partitioning (e.g. for coloring)  Pointer-jumping (e.g. for connected components)  Sometimes it just depends on the graph....

11 Sample kernel: Sort logically triangular matrix Used in sparse linear solvers (e.g. Matlab’s) Simple kernel, abstracts many other graph operations (see next) Sequential: linear time; greedy topological sort; no locality Parallel: very unbalanced; one DAG level per step; possible long sequential dependencies Original matrix Permuted to upper triangular form

12 Matching in bipartite graph Perfect matching: set of edges that hits each vertex exactly once Matrix permutation to put nonzeros on diagonal Variant: Maximum-weight matching A PA

Strongly connected components Symmetric permutation to block triangular form Diagonal blocks are strong Hall (irreducible / strongly connected) Sequential: linear time by depth-first search [Tarjan] Parallel: divide & conquer algorithm, performance depends on input [Fleischer, Hendrickson, Pinar] PAP T G(A)

14 Strong components of 1M-vertex RMAT graph

15 Coloring for parallel nonsymmetric preconditioning [Aggarwal, Gibou, G] Level set method for multiphase interface problems in 3D. Nonsymmetric-structure, second-order-accurate octree discretization. BiCGSTAB preconditioned by parallel triangular solves. 263 million DOF

16 By analogy to numerical linear algebra, What would the combinatorial BLAS look like? #4: The Primitives Challenge BLAS 3 BLAS 2 BLAS 1 BLAS 3 (n-by-n matrix-matrix multiply) BLAS 2 (n-by-n matrix-vector multiply) BLAS 1 (sum of scaled n-vectors) Peak

17 Primitives for HPC graph programming Visitor-based multithreaded – MTGL + XMT + search templates natural for many algorithms + relatively simple load balancing – complex thread interactions, race conditions – unclear how applicable to standard architectures Array-based data parallel – GAPDT + parallel Matlab / Python + relatively simple control structure + user-friendly interface – some algorithms hard to express naturally – load balancing not so easy Scan-based vectorized – NESL: something of a wild card We don’t really know the right set of primitives yet!

18 Graph algorithms study [Kepner, Fineman, Kahn, Robinson]

19 “Graph Algorithms in the Language of Linear Algebra” Graph Algorithms in the Language of Linear Algebra Jeremy Kepner and John R. Gilbert (editors) Editors: Kepner (MIT-LL) and Gilbert (UCSB) Contributors – Bader (GA Tech) – Buluc (UCSB) – Chakrabarti (CMU) – Dunlavy (Sandia) – Faloutsos (CMU) – Fineman (MIT-LL & MIT) – Gilbert (UCSB) – Kahn (MIT-LL & Brown) – Kegelmeyer (Sandia) – Kepner (MIT-LL) – Kleinberg (Cornell) – Kolda (Sandia) – Leskovec (CMU) – Madduri (GA Tech) – Robinson (MIT-LL & NEU) – Shah (ISC & UCSB)

20 Multiple-source breadth-first search X ATAT

21 X ATAT ATXATX  Multiple-source breadth-first search

22 Sparse array representation => space efficient Sparse matrix-matrix multiplication => work efficient Load balance depends on SpGEMM implementation Not a panacea for the memory latency wall! X ATAT ATXATX  Multiple-source breadth-first search

23 Shortest path calculations (APSP) Betweenness centrality BFS from multiple source vertices Subgraph / submatrix indexing Graph contraction Cycle detection Multigrid interpolation & restriction Colored intersection searching Applying constraints in finite element computations Context-free parsing SpGEMM: Sparse Matrix x Sparse Matrix [Buluc, G]

24 Parallel dense case In the dense case, 2-D scales better with the number of processors Turns out to be same for the sparse case.... Parallel Efficiency: 1-D Layout: 2-D Layout: Should be zero for perfect efficiency p(0,0) p(0,1) p(0,2) p(1,0) p(1,1) p(1,2) p(2,0) p(2,1) p(2,2)

25 Upper bounds on speedup, sparse 1-D & 2-D [ICPP’08] 1-D algorithms do not scale beyond 40x Break-even point is around 50 processors. N P 1-D algorithm N P 2-D algorithm

26 * = i j A ik k k B kj C ij 2-D example: Sparse SUMMA  C ij += A ik * B kj  Based on dense SUMMA  Generalizes to nonsquare matrices, etc.

27 Submatrices are hypersparse (i.e. nnz << n) blocks Total Storage: nnz ’ = Average of c nonzeros per column A data structure or algorithm that depends on the matrix dimension n (e.g. CSR or CSC) is asymptotically too wasteful for submatrices

28 Complexity measure trends with increasing p Standard algorithm is O(nnz+ flops+n) flops nnz n

29 #5: The Libraries Challenge  The software version of the primitives challenge!  What languages, libraries, and environments will support combinatorial scientific computing?  Zoltan, (P)BGL, MTGL,....

30 SNAP [Bader & Madduri]

31 GAPDT: Toolbox for graph analysis and pattern discovery [G, Reinhardt, Shah] Layer 1: Graph Theoretic Tools Graph operations Global structure of graphs Graph partitioning and clustering Graph generators Visualization and graphics Scan and combining operations Utilities

32 Sample application stack Distributed Sparse Matrices Arithmetic, matrix multiplication, indexing, solvers (\, eigs) Graph Analysis & PD Toolbox Graph querying & manipulation, connectivity, spanning trees, geometric partitioning, nested dissection, NNMF,... Preconditioned Iterative Methods CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya) Applications Computational ecology, CFD, data exploration

33 Landscape connectivity modeling Habitat quality, gene flow, corridor identification, conservation planning Pumas in southern California: 12 million nodes, < 1 hour Targeting larger problems: Yellowstone-to-Yukon corridor Figures courtesy of Brad McRae, NCEAS

34 #6: The Productivity Challenge “Once we settled down on it, it was sort of like digging the Panama Canal - one shovelful at a time.” - Ken Appel (& Wolfgang Haken), 1976

35 Productivity Raw performance isn’t always the only criterion. Other factors include: Seamless scaling from desktop to HPC Interactive response for exploration and visualization Rapid prototyping Usability by non-experts Just plain programmability

36 Interactive graph viz [Hollerer & Trethewey] Nonlinearly-scaled breadth-first search Distant vertices stay put, selected vertex moves to place Real-time click&drag for moderately large graphs

37 Click, Drag, & Smooth

38 #7: The Data Size Challenge “Can we understand anything interesting about our data when we do not even have time to read all of it?” - Ronitt Rubinfeld

39 Issues in (many) large graph applications Where does the graph live? Disk or memory? Often want approximate answers from sampling Multiple simultaneous queries to same graph –Graph may be fixed, or slowly changing –Throughput and response time both important Dynamic subsetting –User needs to solve problem on “my own” version of the main graph –E.g. landscape data masked by geographic location, filtered by obstruction type, resolved by species of interest

40 Factoring network flow behavior [Karpinski, Almeroth, Belding, G]

41 NMF traffic analysis results

42 “Discrete” quantities may be probability distributions May want to manage and quantify uncertainty between multiple levels of modeling May want to statistically sample too-large data, or extrapolate probabilistically from incomplete data For example, in graph algorithms: –The graph itself may not be known with certainty –Vertex / edge labels may be stochastic –May want analysis of sensitivities or thresholds #8: The Uncertainty Challenge

43 Horizontal-vertical decomposition of dynamical systems [Mezic et al.]

44 Stable and unstable directions at multiple scales? How to identify functional vs regulatory components? Propagation of uncertainty

45 Approach: 1.Decompose networks 2.Propagate uncertainty through components 3.Iteratively aggregate component uncertainty Spectral graph decomposition technique combined with dynamical systems analysis leads to deconstruction of a possibly unknown network into inputs, outputs, forward and feedback loops and allows identification of a minimal functional unit (MFU) of a system. (node 4 and several connections pruned, with no loss of performance) H-V decomposition Output, execution Feedback loops Trim the network, preserve dynamics! Input, initiator Forward, production unit Additional functional requirements Minimal functional units: sensitive edges (leading to lack of production) easily identifiable Allows identification of roles of different feedback loops Level of output For MFU Level of output with feedback loops Mezic group, UCSB Model reduction and graph decomposition

46

47 Larval connectivity

48 Regional ocean modeling + particle advection

49 Connectivity matrix (conditional probabilities)

50 Future: Desired state

51 Parallel modeling of fish interaction [Barbaro, Trethewey, Youssef, Birnir, G] Capelin schools in seas around Iceland –Economic impact and ecological impact –Collapse of stock in several prominent fishing areas demonstrates the need for careful tracking of fish Limitations on modeling –Group-behavior phenomena missed by lumped models –Real schools contain billions of fish; thousands of iterations Challenges include dynamic load balancing and accurate multiscale modeling

52 #9: The Education Challenge  How do you teach this stuff?  Where do you go to take courses in  Graph algorithms …  … on massive data sets …  … in the presence of uncertainty …  … analyzed on parallel computers …  … applied to a domain science?  This another whole discussion, but a crucial one.

53 #10: The Foundations Challenge “Numerical analysis is the study of algorithms for the problems of continuous mathematics.” - L. Nick Trefethen

54 » Combinatorial Scientific Computing CS and CS&E (Hendrickson 2003) l What’s in the intersection? Computer Science Computational Science Algorithmics

55 NIST workshop 2007: “Foundations of Measurement Science for Information Systems” A few suggested research areas in measurement science for complex networks:  Measurement of global properties of networks: –Not just density, diameter, degree distribution, etc. –Connectivity, robustness –Spectral properties: Laplacian eigenvectors, Cheeger bounds, … –Other global measures of complexity? –Sensitivity analysis of all of the above –Stochastic settings for all of the above  Multiscale modeling of complex networks  Building useful reference data sets and generators  Fundamentals of high-performance combinatorial computing

56 Ten Challenges In CSC 1. Parallelism 2. Architecture 3. Algorithms 4. Primitives 5. Libraries 6. Productivity 7. Data size 8. Uncertainty 9. Education 10. Foundations

57 Morals (from Hendrickson, 2003) Things are clearer if you look at them from multiple perspectives Combinatorial algorithms are pervasive in scientific computing and will become more so Lots of exciting opportunities –High impact for discrete algorithms work –Enabling for scientific computing

58 Conclusion This is a great time to be doing research in combinatorial scientific computing!

59 Thanks … Vikram Aggarwal, David Bader, Alethea Barbaro, Jon Berry, Aydin Buluc, Alan Edelman, Jeremy Fineman, Frederic Gibou, Bruce Hendrickson, Tobias Hollerer, Crystal Kahn, Stefan Karpinski, Jeremy Kepner, Jure Leskovic, Brad McRae, Igor Mezic, Cleve Moler, Steve Reinhardt, Eric Robinson, Rob Schreiber, Viral Shah, Peterson Trethewey, James Watson, Lamia Youssef