Carnegie Mellon Optimized Parallel Distribution Load Flow Solver on Commodity Multi-core CPU Tao Cui (Presenter) Franz Franchetti Dept. of ECE. Carnegie.

Carnegie Mellon Optimized Parallel Distribution Load Flow Solver on Commodity Multi-core CPU Tao Cui (Presenter) Franz Franchetti Dept. of ECE. Carnegie Mellon University Pittsburgh PA tcui@ece.cmu.edu This work is supported by NSF 0931978 & 1116802

Carnegie Mellon Smart Grids 2 Image by Dr. M.Sanchez

Carnegie Mellon Smart Grids New players in the grid Challenges  Undispatchable, large variances, great impact on grid  Large population exhibits stochastic properties 3 Images from wikipedia Source: LBNL-3884eSource: ORNL/TM2004/291 Source: Pantos 2011

Carnegie Mellon 4 Conventional Distribution System  Passively receiving power  Few realtime monitoring or controls Challenges in Distribution System  Solar, wind, stochastic  Large variance and impact Smart Distribution System  New Sensors: Smart Meters  High Performance Deskside Supercomputer A Computational Tool for Probabilistic Grid Monitoring Motivation Image from: Wikipedia ~Tflop/s $1000 1kW power Image from: Dell

Carnegie Mellon Outline Motivation Distribution System Load Flow Analysis Code Optimization Real Time Implementation Conclusion 5

Carnegie Mellon 6 Core: Distribution Load Flow Distribution System:  Radial, high R/X ratio, varying Z, unbalanced  NOT suitable for transmission load flow  Forward / Backward Sweep (FBS)  Implicit Z-matrix, detail model, convergence Generalized Component Model [Kersting2006]  One Terminal Node Model: Constant PQ:  Two Terminal Link Model: Source: IEEE PES Distribution System Analysis Subcommittee IEEE 37 NodeTest Feeder: Based on an actual feeder in California Backward: Forward:

Carnegie Mellon 7 Core: Distribution Load Flow Forward / Backward Sweep [Kersting2006]  Branch current based  Input: substation voltage, load; output: all node voltages  Steps:  1: Initial current = 0, Initial voltage V = V 0 ;  2: Compute node current I n using Node model;  3: Backward: Compute branch current I b using Link model & KCL;  4: Forward: Update V k+1 = V k based on I b over Link model;  5: Check convergence (|dS|<Error Limit) stop or go to step 2. Forward Backward IEEE 4 Node Test Feeder Example

Carnegie Mellon 8 Core: Distribution Load Flow 3-Phase Voltage on IEEE 37 Nodes Test Feeder Phase APhase B Phase C ANSI C84.1: Nominal: 115, Range A:110~126V Range B:107~127V 1.1 0.90 Nominal

Carnegie Mellon 9 Our Approach Random Number Generator  Basic Uniform RNG + Transformation for different PDFs  Parallel strategy for multi-thread implementation Optimized Parallel Distribution Load Flow Solver  Code optimizations  Highly parallel implementation for Monte Carlo applications Density Estimation & Visualization  Kernel density estimation Random Variable Sampling Parallel High Performance Power Flow Solver

Carnegie Mellon 11 Optimization: Data Structures Data Structure Optimization  Baseline: C++ object oriented, a tree object  Translate to array access, exploit spatial/temporal locality  Other techniques: unroll innermost loops, scalar replacement, pre- compute as much as possible (C++) (C Array) GridLab-D: the Smart Grid Simulator www.gridlabd.orgwww.gridlabd.org, opensource since 2009

Carnegie Mellon Optimization: Pattern Based Syntehsis 12 Algorithm-level Optimization  Pattern based matrix-vector multiplication For A,B,c,d matrices:  Multi-grounded Cable: diagonal matrix  Ignore shunt & coupling: c = 0, d = I, A = I Reduce unnecessary operations Reduce unnecessary storage for better memory access Similar to [Belgin2009] case 1:case 2:case N: … code 1code 2code N (C Pattern) switch (mat_type){ case real_diag_equal_mat: output[0] = *constant * input[0];... output[5] = *constant * input[5]; break; case imag_diag_equal_mat: output[0] = -*constant * input[3]; output[1] = -*constant * input[4]; output[2] = -*constant * input[5]; output[3] = *constant * input[0]; output[4] = *constant * input[1]; output[5] = *constant * input[2]; break;... }

Carnegie Mellon 13 Data Parallelism (SIMD) SIMD parallelization  SIMD: Single Instruction Multiple Data SSE: Streaming SIMD Extensions  128bit, 4 floats in one register  eg. 4 “fadd” at cost of 1 “addps” AVX: Advanced Vector eXtensions (256bit, 8 floats), Larrabee (512bit, 16 floats)  Vectorized solver on SIMD level for MCS:  Assumptions & Limitations: converge at same step vector register xmm1 vector operation addps xmm0, xmm1 xmm0 4-way SSE example (SIMD)

Carnegie Mellon Variant Synthesis with SPIRAL 14 Symbolic process [Puschel2005] :  pattern based matrix vector product code case 1:case 2:case N: … SPL Compiler

Carnegie Mellon Multithreading, Run across All CPUs  Vectorized load flow solver in each thread  Each thread pinned to a physical core exclusively  Fully utilize computation power of Multi-core CPUs  Double buffer (automatic load balancing for MCS application) 15 (Multi-Core) Multithreading

Carnegie Mellon 16 Performance Results: Across Sizes Performance of Optimized Code, Mass Amount Load Flow Pseudo flop/s: >60 % peak Flop/s: 50% Peak

Carnegie Mellon Details: Performance Gains 17 >20x >50x

Carnegie Mellon Performance Results: Across Machines 18 Problem Size (IEEE Test Feeders) Approx. flops Approx. Time / Core2 Extreme Approx. Time / Core i7 Baseline. C++ ICC –o3 (~300x faster then pure Matlab scripts) Comments IEEE37: one iteration12 K~ 0.3 us IEEE37: one load flow (5 Iter)60 K~ 1.5 us 0.01 kVA error IEEE37: 1 million load flow60 G~ < 2 s~ < 1 s~ 60 s (>5 hrs Matlab)SCADA Interval: 4 seconds IEEE123: 1 million load flow200 G~ < 10 s~ < 3.5 s~ 200 s (>15 hrs Matlab)

Carnegie Mellon 19 Accuracy Convergence of Monte Carlo  Very crude. MCG59+ICDF, 50 trials with “time(NULL)” seeds Out: Voltage on Node 738 In: Active Power P~ u=0,std=100kw on Phase A of Node 738,711,741

Carnegie Mellon 21 System Implementation Distribution System Probabilistic Monitoring System (DSPMS)  System Structure:  MCS solver running on Multi-core Desktop Server (Code optimization)  Results published via ECE Web Server (TCP/IP socket)  Web based dynamic User Interface by client side scripts (JavaScript)  Smart meters in campus building (MODBUS/TCP)

Carnegie Mellon 22 System Implementation Distribution System Probabilistic Monitoring System (DSPMS)  Web Server and User Interface  Link: www.ece.cmu.edu/~tcui/test/DistSim/DSPMS.htmwww.ece.cmu.edu/~tcui/test/DistSim/DSPMS.htm

Carnegie Mellon 23 Conclusion Smart Distribution Network: Impact of renewable and stochastic Commodity HPC & code optimization: Millions of cases /sec on $1K class machine Distribution System Probabilistic Monitor: A prove of concept real time application source: LBNL-3884e

Carnegie Mellon References [LBNL-3884E]. Mills, A. Implications of Wide-Area Geographic Diversity for Short-Term Variability of Solar Power, LBNL-3884E. Lawrence Berkeley National Laboratory, Berkeley, [ORNL/TM2004/291]. B. Kirby, "Frequency Regulation Basics and Trends," ORNL/TM 2004/291, Oak Ridge National Laboratory, December 2004. [Pantos2011]. Miloš Pantoš Stochastic optimal charging of electric-drive vehicles with renewable energy Energy, Volume 36, Issue 11, November 2011 [Ghosh97]. A.K. Ghosh, D. L. Lubkeman, M. J. Downey, R. H. Jones “Distribution Circuit State Estimation Using a Probabilistic Approach,” IEEE Transactions on Power Systems, vol. 12, no. 1, pp. 45-51, 1997 [Belgin2009]. M. Belgin, G. Back, and C. J. Ribbens, “Pattern-based sparse matrix representation for memory-efficient smvm kernels,” in Proceedings of the 23rd international conference on Supercomputing, ser. ICS ’09. New York, NY, USA: ACM, 2009, pp. 100–109. [Puschel2005]. M. Puschel, J. M. F. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. W. Johnson, and N. Rizzolo, “SPIRAL: Code generation for DSP transforms,” Proceedings of the IEEE, special issue on “Program Generation, Optimization, and Adaptation”, vol. 93, no. 2, pp. 232– 275, 2005. [Kersting2006]. W. Kersting, Distribution system modeling and analysis. CRC, 2006 24

Carnegie Mellon The End Thank You! Q&A 25

Carnegie Mellon Optimized Parallel Distribution Load Flow Solver on Commodity Multi-core CPU Tao Cui (Presenter) Franz Franchetti Dept. of ECE. Carnegie.

Similar presentations

Presentation on theme: "Carnegie Mellon Optimized Parallel Distribution Load Flow Solver on Commodity Multi-core CPU Tao Cui (Presenter) Franz Franchetti Dept. of ECE. Carnegie."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Carnegie Mellon Optimized Parallel Distribution Load Flow Solver on Commodity Multi-core CPU Tao Cui (Presenter) Franz Franchetti Dept. of ECE. Carnegie.

Similar presentations

Presentation on theme: "Carnegie Mellon Optimized Parallel Distribution Load Flow Solver on Commodity Multi-core CPU Tao Cui (Presenter) Franz Franchetti Dept. of ECE. Carnegie."— Presentation transcript:

Similar presentations

About project

Feedback