On parallelizing dual decomposition in stochastic integer programming

Name: On parallelizing dual decomposition in stochastic integer programming
Uploaded: 2017-10-19T19:20:33+00:00
Duration: PTM12S49
Channel: Blake Leonard
Description: On parallelizing dual decomposition in stochastic integer programming

On parallelizing dual decomposition in stochastic integer programming
Cosmin Petra Mathematics and Computer Science Division Argonne National Laboratory, USA Joint work with Miles Lubin (MIT), Burhaneddin Sandikçi, Kipp Martin (U Chicago) INFORMS Annual Meeting Oct 2012

Overview Block-angular structure
Motivation: stochastic optimization of the power grid Revisiting dual decomposition algorithm of Carøe and Schultz Parallelizing the solution to Lagrangian dual Parallel numerical experiments

Stochastic Formulation
Discrete distribution leads to block-angular (MI)LP

Large-scale (dual) block-angular LPs
Extensive form In terminology of stochastic LPs: First-stage variables (decision now): x0 Second-stage variables (recourse decision): x1, …, xN Each diagonal block is a realization of a random variable (scenario)

Stochastic Optimization and the Power Grid
Unit Commitment: Determine optimal on/off schedule of thermal (coal, natural gas, nuclear) generators. Day-ahead market prices. (hourly) Mixed-integer Economic Dispatch: Set real-time market prices. (every 5-10 min.) Continuous Linear/Quadratic Challenge: Integrate energy produced by highly variable renewable sources into these control systems. Minimize operating costs, subject to: Physical generation and transmission constraints Reserve levels Demand …

Stochastic unit commitment with wind power
Scenarios obtained using numerical weather prediction codes Real-time grid-nested 24h parallel simulation using WRF Thermal generator Wind farm Slide courtesy of V. Zavala & E. Constantinescu

Computational challenges and difficulties in power grid
May require many scenarios (100s, 1,000s, 10,000s …) to accurately model uncertainty “Large” scenarios (Wi up to 100,000 x 100,000) “Large” 1st stage (1,000s, 10,000s of variables) Easy to build a practical instance that requires 100+ GB of RAM to solve  Requires distributed memory Integer constraints Real-time solution needed in our applications

Dual decomposition - formulation
Feasibility sets Extensive form in this notation is Split-variable formulation Non-anticipativity constraints are explicitly enforced

Dual decomposition (Carøe and Schultz, 1999)
Apply Lagrangian relaxation (LR) to non-anticipativity constraints For mixed-integer problems, LR provides a lower bound on the optimal value Branch and bound on non-anticipativity variables

Dual decomposition – computational appeal
Relaxing non-anticipativity decouples the scenarios Typically better lower bound than from LP relaxation Lagrangian dual has similar computational pattern to Benders At each iteration, solve a continuous master problem and an independent MILP for each scenario Trivially parallel? Not quite, there are interesting algorithmic and computational questions, as we'll see. To our knowledge, no previously published parallel implementations, although scope for parallelism has been observed

“State variable” formulation of non-anticipativity
Caroe and Schultz consider r-1 constraints in the form We use “state variable” formulation with r constraints (Sen, 2005) Lagrangian dual problem can be stated as Will see later why this formulation is useful for parallel computations

Characterization of the solution
Proposition (Carøe and Schultz) Optimal objective of Lagrangian dual is that of a partially convexified LP relaxation Theoretical equivalence with Lulli and Sen’s (2004) branch-and-price (column generation) However, solving the Lagrangian dual also gives a solution to (1) (useful in branch-and-bound) The optimal value equals the optimal value of the linear program Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

Optimization of the Lagrangian Dual
Each is concave and non-differentiable For a fixed , is a subgradient of , where By solving/evaluating (mixed-integer LP), we get at least one subgradient for free Carøe and Schultz suggests using proximal bundle black-box solvers. Alternatives are other variants of cutting-plane algorithms (boxstep or level regularization) - Use subgradients to form a piecewise linear model of the objective function. - Find optimum of model to determine next trial point. (Master problem) - Evaluate trial point (MILP subproblems) to check objective value for convergence. If not converged, update model with new subgradient(s) and repeat. Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

Let’s open the proximal bundle black box
Proximal bundle QP master is the trial point, the regularization parameter The dual, below, is typically advantageous for computation The proximal bundle master QP has a (dual) block-angular structure - Use subgradients to form a piecewise linear model of the objective function. - Find optimum of model to determine next trial point. (Master problem) - Evaluate trial point (MILP subproblems) to check objective value for convergence. If not converged, update model with new subgradient(s) and repeat. Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

Dual of Proximal Bundle QP Master
Block-angular structure Direct result of equality-constrained formulation, doesn't hold for other formulations of non-anticipativity. But also applies to other forms of regularization.

Computational significance
The master QP is overall sparse (but may be block dense). Dense QP solvers, which are typically used in black-box proximal bundle codes, can't efficiently solve it. Master QP’s structure can be exploited: PIPS (Petra) or OOPS(Gondzio) With the ability to solve the master in parallel, we address a serial bottleneck of execution. Now, both MILP subproblems and QP master can be solved in parallel. Greater potential for parallel speedup (Amdahl's law)

Numerical experiments
Implementation in C++ using MPI. Looking at solving Lagrangian dual, no branching implemented SCIP used for MILP subproblems. No “compression of bundle” (removing cuts/subgradients) Serial experiments Cutting plane vs. black-box proximal bundle vs. our implementation Parallel experiments Scalability of full proximal bundle algorithm More detailed look at scalability of parallel QP solver

Test instances Stochastic mixed-integer instances dcap and sslp from SIPLIB by Shabbir Ahmed Stochastic LP product instance from Huseyin Topaloglu

Test architecture “Fusion” high-performance cluster at Argonne
320 nodes InfiniBand QDR interconnect Two 2.6 Ghz Xeon processors per node (total 8 cores) Most nodes have 36 GB of RAM, some have 96 GB We use 1 MPI process (= parallel process) per core

Serial experiments OOQP - General sparsity-exploiting QP IPM solver
Time limit 7200 seconds OOQP - General sparsity-exploiting QP IPM solver PIPS - Specialized IPM solver for block-angular structure ConicBundle - Open-source off-the-shelf proximal bundle code

Parallel experiments – serial master, parallel subproblems

Parallel experiments – parallel master, parallel subproblems
- Overall 10x speedup vs. 2x speedup over on dcap332 by solving master in parallel - Master has little eect for sslp Also little speedup by solving subproblems in parallel. Imbalance in time to solve MILP subproblems. Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

Scalability of master QP
Scope for parallelism in solving master QP depends on number of linking variables (= number of first-stage variables) being small relative to diagonal blocks (= subgradient cuts per scenario). Stochastic integer test problems have very small first stage (order of 10). Important to consider performance on larger first stage for practical problems. Stochastic LP product instances used for this (1,000 scenarios) We look only at QP solves, not proximal bundle convergence.

Small first stage

Medium first stage

Large first stage

Energy application – stochastic unit commitment
State of Illinois power grid 12-hour horizon 64 scenarios 3,621,180 vars. 3,744,468 cons. 3,132 binary LP Relaxation objective: 939,208 LP Relaxation + CglProbing cuts: 939,626 Feasible solution (rounding): 942,237 Optimality gap: 0.27% (0.5% is acceptable in industry practice) Lagrangian relaxation: 941,176 Feasible solution (rounding): 943,351 Optimality gap: 0.23% (combined: 0.11%)

Conclusions and future work
Revisited dual decomposition from the perspective of parallel computation, addressed bottleneck of solving master Dual decomposition promising approach for parallel solution of stochastic mixed-integer programs More work needed to address load imbalance, perhaps asynchronism as in Linderoth and Wright Branch and bound implementation Large scale computational study for stochastic unit commitment

Weather System Operator ON/OFF Generation Levels Demand Distribution
Supplier Weather Consumer

On parallelizing dual decomposition in stochastic integer programming

Similar presentations

Presentation on theme: "On parallelizing dual decomposition in stochastic integer programming"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

On parallelizing dual decomposition in stochastic integer programming

Similar presentations

Presentation on theme: "On parallelizing dual decomposition in stochastic integer programming"— Presentation transcript:

Similar presentations

About project

Feedback