Presentation is loading. Please wait.

Presentation is loading. Please wait.

Benchmarking Parallel Eigen Decomposition for Residuals Analysis of Very Large Graphs Edward Rutledge, Benjamin Miller, Michelle Beard HPEC 2012 September.

Similar presentations


Presentation on theme: "Benchmarking Parallel Eigen Decomposition for Residuals Analysis of Very Large Graphs Edward Rutledge, Benjamin Miller, Michelle Beard HPEC 2012 September."— Presentation transcript:

1 Benchmarking Parallel Eigen Decomposition for Residuals Analysis of Very Large Graphs Edward Rutledge, Benjamin Miller, Michelle Beard HPEC 2012 September 10-12, 2012 This work is sponsored by the Intelligence Advanced Research Projects Activity (IARPA) under Air Force Contract FA8721-05-C-0002. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the author and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of IARPA or the U.S. Government.

2 Graph Eigen-2 EMR 09/12/12 Outline Introduction Algorithm description Implementation Benchmarks Summary

3 Graph Eigen-3 EMR 09/12/12 Application of Very Large Graph Analysis Cyber Graphs represent communication patterns of computers on a network 1,000,000s – 1,000,000,000s network events GOAL: Detect cyber attacks or malicious software Graphs represent communication patterns of computers on a network 1,000,000s – 1,000,000,000s network events GOAL: Detect cyber attacks or malicious software Cross-Mission Challenge: Detection of subtle patterns in massive multi-source noisy datasets Cross-Mission Challenge: Detection of subtle patterns in massive multi-source noisy datasets Social Graphs represent relationships between individuals or documents 10,000s – 10,000,000s individual and interactions GOAL: Identify hidden social networks Graphs represent relationships between individuals or documents 10,000s – 10,000,000s individual and interactions GOAL: Identify hidden social networks Graphs represent entities and relationships detected through multi-INT sources 1,000s – 1,000,000s tracks and locations GOAL: Identify anomalous patterns of life Graphs represent entities and relationships detected through multi-INT sources 1,000s – 1,000,000s tracks and locations GOAL: Identify anomalous patterns of life ISR

4 Graph Eigen-4 EMR 09/12/12 Approach: Analysis of Graph Residuals Linear Regression Graph Regression

5 Graph Eigen-5 EMR 09/12/12 Processing Chain Input Graph No cue Graph No cue Output Statistically anomalous subgraph(s) R ESIDUAL D ECOMPOSITION C OMPONENT S ELECTION A NOMALY D ETECTION I DENTIFICATION G RAPH M ODEL C ONSTRUCTION DIMENSIONALITY REDUCTION

6 Graph Eigen-6 EMR 09/12/12 Focus: Dimensionality Reduction R ESIDUAL D ECOMPOSITION C OMPONENT S ELECTION A NOMALY D ETECTION I DENTIFICATION G RAPH M ODEL C ONSTRUCTION DIMENSIONALITY REDUCTION Computational driver for graph analysis method Dominant kernel is eigen decomposition Parallel implementation required for large problems Computational driver for graph analysis method Dominant kernel is eigen decomposition Parallel implementation required for large problems Benchmark parallel eigen decomposition for dimensionality reduction of graph residuals

7 Graph Eigen-7 EMR 09/12/12 Outline Introduction Algorithm description Implementation Benchmarks Summary

8 Graph Eigen-8 EMR 09/12/12 Directed Graph Basics 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 1 0 0 0 1 0 0 0 1 1 1 0 1 2 3 4 5 6 7 8 12345678 1 2 3 4 5 6 7 8 Graph G Adjacency Matrix A G = (V, E) V = vertices (entities) E = edges (relationships) A(i,j) ≠ 0 if Edge exists from vertex i to vertex j

9 Graph Eigen-9 EMR 09/12/12 Modularity for Directed Graphs* 1 2 3 4 7 6 5 EXAMPLE: GRAPH G 12 1 – 1 2 3 4 5 6 7 1234567 * 2 2 1 2 1 1 3 1132221 Our baseline residuals model for directed graphs OUT-DEGREE VECTOR (k out ) ADJACENCY MATRIX (A) NUMBER OF EDGES (|E|) IN-DEGREE VECTOR (k in ) R ESIDUAL D ECOMPOSITION C OMPONENT S ELECTION A NOMALY D ETECTION I DENTIFICATION G RAPH M ODEL C ONSTRUCTION DIMENSIONALITY REDUCTION *E.A. Leicht and M.E.J. Newman, “Community Structure in Directed Networks,” Phys. Rev. Lett., vol. 100, no. 11, pp. 118703-(1-4), Mar 2008.

10 Graph Eigen-10 EMR 09/12/12 Dimensionality Reduction 1 2 N = Select vectors pointing towards the strongest residuals R ESIDUAL D ECOMPOSITION C OMPONENT S ELECTION A NOMALY D ETECTION I DENTIFICATION G RAPH M ODEL C ONSTRUCTION DIMENSIONALITY REDUCTION

11 Graph Eigen-11 EMR 09/12/12 Computational Scaling Bx can be computed without storing B (modularity matrix) dot product: O(|V|) scalar-vector product: O(|V|) dense matrix-vector product: O(|V| 2 ) sparse matrix-vector product: O(|E|) Matrix-vector multiplication is at the heart of eigensolver algorithms

12 Graph Eigen-12 EMR 09/12/12 Outline Introduction Algorithm description Implementation Benchmarks Summary

13 Graph Eigen-13 EMR 09/12/12 SLEPc Overview PETSc (Portable, Extensible Toolkit for Scientific Computation) SLEPc (Scalable Library for Eigen Problem Computations) Application MPI (Message Passing Interface) LAPACK (Linear Algebra Package) BLAS (Basic Linear Algebra Subprograms) “matrix shell” Free parallel eigen solver ‘C’ library based on widely available software SLEPc: Scalable Library for Eigen Problem Computations. http://www.grycap.upv.es/slepc/ PETSc: Portable, Extensible Toolkit for Scientific Computation. http://www.mcs.anl.gov/petsc/ MPI: Message Passing Interface. http://www.mcs.anl.gov/research/projects/mpi/ LAPACK: Linear Algebra Package. http://www.netlib.org/lapack/ BLAS: Basic Linear Algebra Subprograms. http://www.netlib.org/blas/

14 Graph Eigen-14 EMR 09/12/12 Implementing Eigen Decomposition of the Modularity Matrix using SLEPc PETSc (Portable, Extensible Toolkit for Scientific Computation) SLEPc (Scalable Library for Eigen Problem Computations) Application Modularity Matrix Adjacency matrix Matrix-vector multiplication PETSc “matrix shell” PETSc sparse matrixUser-defined operation Krylov-Schur Eigensolver SLEPc Eigensolver Operates on In-degree vector Out-degree vector PETSc vector Key: = operation= data objectitalics= type PETSc “matrix shell” enables efficient modularity matrix implementation Used default PETSc/SLEPc build parameters and solver options Compressed Sparse Row (CSR) matrix data structure Double precision (8 byte) values for matrix and vector entries Krylov-Schur eigensolver algorithm Limitation: current implementation will not scale past 2 32 vertices Uses 32 bit integers to represent vertices Only tested up to 2 30 vertices SLEPc/PETSc supports efficient implementation of modularity matrix eigen decomposition

15 Graph Eigen-15 EMR 09/12/12 PETSc y = Bx Parallel Mapping 4 Processor Example y=Bx 1.Each processor begins receiving non-local parts of x it needs. 2.Each processor computes partial results from its local part of x and B, and stores in y. 3.Each processor finishes receiving non-local parts of x it needs. 4.Each processor computes partial results from non-local part of x and B, and adds to partial result in y. Processor 1 Processor 2 Processor 3 Processor 4 = local part of data object= buffer for non-local part of data object

16 Graph Eigen-16 EMR 09/12/12 PETSc y = Bx Parallel Mapping 4 Processor Example y=Bx 1.Each processor begins receiving non-local parts of x it needs. 2.Each processor computes partial results from its local part of x and B, and stores in y. 3.Each processor finishes receiving non-local parts of x it needs. 4.Each processor computes partial results from non-local part of x and B, and adds to partial result in y. = local part of data object= buffer for non-local part of data object Processor 1 Processor 2 Processor 3 Processor 4

17 Graph Eigen-17 EMR 09/12/12 PETSc y = Bx Parallel Mapping 4 Processor Example y=Bx 1.Each processor begins receiving non-local parts of x it needs. 2.Each processor computes partial results from its local part of x and B, and stores in y. 3.Each processor finishes receiving non-local parts of x it needs. 4.Each processor computes partial results from non-local part of x and B, and adds to partial result in y. Processor 1 Processor 2 Processor 3 Processor 4 = local part of data object= buffer for non-local part of data object

18 Graph Eigen-18 EMR 09/12/12 PETSc y = Bx Parallel Mapping 4 Processor Example y=Bx 1.Each processor begins receiving non-local parts of x it needs. 2.Each processor computes partial results from its local part of x and B, and stores in y. 3.Each processor finishes receiving non-local parts of x it needs. 4.Each processor computes partial results from non-local part of x and B, and adds to partial result in y. = local part of data object= buffer for non-local part of data object Processor 1 Processor 2 Processor 3 Processor 4

19 Graph Eigen-19 EMR 09/12/12 Outline Introduction Algorithm description Implementation Benchmarks Summary

20 Graph Eigen-20 EMR 09/12/12 Overview of Experiments # Graph Vertices # Processors # Computed Eigenvectors 1M 2M 4M 8M 16M 32M 64M 128M 256M 512M 1B 1248163264 1 10 100 Parameter SpaceHardware: LLGrid Limited to 64 nodes per job Per node: 2x 3.2 GHz Intel Xeon processors 8GB RAM Gigabit Ethernet network Data Sets Generated with parallel R-MAT generator –Single process R-MAT runs out of memory for larger data sets –Parameters: Average in- (out-) degree = ~8 (does not iterate if there is a collision) Probabilities = 0.5, 0.125, 0.125, 0.25 Randomizes vertices to make load balancing easier

21 Graph Eigen-21 EMR 09/12/12 Results SLEPc vs. MATLAB Average Execution Time Single-processor SLEPc and Matlab have similar performance Problem size limited by node memory Note: on workstation with 96GB memory, Matlab implementation was 2-3x faster for 100 eigenvector computation than on LL Grid (2) (19) (20) (21) (25) (23) (6) (7) Iterations of the method

22 Graph Eigen-22 EMR 09/12/12 Results SLEPc 64 Node Average Execution Time Able to compute 2 eigenvectors for 1 billion node graph (in ~9 hrs) Problem size limited by memory Larger problems could be solved with >64 compute nodes (2) (19) (21) (26) (25) (29) (34) (29) (36) (37) (6)(7) (8) Iterations of the method ~3 trillion ops, ~0.1% efficiency 10 leading eigenvalues (64M vertex data set):

23 Graph Eigen-23 EMR 09/12/12 Results Effect of Processor Count on Execution Time Additional processing resources decrease processing time Speedup nearly linear for a few nodes, decreases with increasing node count (2) Iterations of the method

24 Graph Eigen-24 EMR 09/12/12 Outline Introduction Algorithm description Implementation Benchmarks Summary

25 Graph Eigen-25 EMR 09/12/12 Summary Reviewed problem of computing eigen decomposition for directed graph modularity matrix Benchmarked directed graph modularity matrix eigen decomposition using SLEPc –Performance similar to Matlab on single node –Performance scales reasonably well as compute nodes are added Able to solve large problems on commodity cluster hardware: –1.1 hours for 1 eigenvalue of billion vertex graph –9 hours for 2 eigenvalues of billion vertex graph –5.8 hours for 10 eigenvalues of 512 million vertex graph –3.2 hours for 100 eigenvalues of 128 million vertex graph Graph analysis based on modularity matrix eigen decomposition is feasible for graphs with billions of nodes and edges

26 Graph Eigen-26 EMR 09/12/12 Potential Future Work Optimize implementation –Use SLEPc/PETSc parameters better suited to our application Example: storing values in single precision instead of double precision will roughly halve memory use –Further specialize data structures for our application Example: eliminate storage of non-zero adjacency matrix entries Run with greater than 64 nodes to process larger problems Modify implementation to remove 4 billion vertex limitation Experiment with other eigensolvers (specifically, ANASAZI) Apply these methods to other graph problems –E.g., finding eigenvectors with smallest magnitude in graph Laplacian

27 Backup

28 Graph Eigen-28 EMR 09/12/12 Graph Model Construction - = AE(A)R(A) ObservedExpectedResiduals R ESIDUAL D ECOMPOSITION C OMPONENT S ELECTION A NOMALY D ETECTION I DENTIFICATION G RAPH M ODEL C ONSTRUCTION DIMENSIONALITY REDUCTION

29 Graph Eigen-29 EMR 09/12/12 NameDescriptionDistributed Memory? Latest Release Language ANASAZIBlock Krylov-Schur, block Davidson, LOBPCG yes2012C++ BLOPEXLOBPCGyes2011C/Matlab BLZPACKBlock Lanczosyes2000F77 MPBConjugate Gradient, Davidsonyes2003C PDACGDeflation-accelerated Conjugate Gradientyes2000F77 PRIMMEBlock Davidson, JDQMR, JDQR, LOBPCG yes2006C/F77 PROPACKSVD via Lanczosno2005F77/Matlab SLEPcKrylov-Schur, Arnoldi, Lanczos, RQI, Subspace yes2012C/F77 TRLANLanczos (dynamic thick-restart)yes2010F90 Readily Available Free Parallel Eigensolvers* * V. Hernandez, J. E. Roman, A. Tomas, V. Vidal (2009). A Survey of Software for Sparse Eigenvalue Problems. SLEPc Technical Report STR-6, Universidad Politecnica de Valencia. Both SLEPc and ANASAZI are actively supported and either should meet our needs


Download ppt "Benchmarking Parallel Eigen Decomposition for Residuals Analysis of Very Large Graphs Edward Rutledge, Benjamin Miller, Michelle Beard HPEC 2012 September."

Similar presentations


Ads by Google