Robust Optimization Concepts and Examples Yuriy Zinchenko Shane G. Henderson ORIE, Cornell University
Zinchenko and Henderson 2005 Outline What can go wrong with LP? A familiar blend problem The general picture Robust linear programming Software, resources, practicalities Radiation therapy for cancer treatment Zinchenko and Henderson 2005
What can go wrong with LP? Tough LP problem: max x + y s/t 1 x 1 1 y 1 x, y 0 ? Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Blend Problem but properties change with time $$ $$$ $ blend to get output properties at minimum cost for any input properties within reason Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Blend constraints Typical constraint looks like Low ≤ 10 x1 + 12 x2 + 7 x3 ≤ High Changes to Low ≤ a1 x1 + a2 x2 + a3 x3 ≤ High for any vector a that is “close” to (10, 12, 7) Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 General robust LP min cTx s/t A(1) x b1 A(2) x b2 A(3) x b3 Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 A more detailed view Simple linear constraint a x 1 x 0 with a “close” to 1, namely 0 a 2 Want x to work for all such a How do we deal with it? Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 a x 1, x 0 for all 0 a 2 max a x 1, x 0 0 a 2, x 0 2 x 1 , x 0 x 1/2 , x 0 Zinchenko and Henderson 2005
A slightly more involved example: a x + b y 1 where (a, b) “close” to (1, 1), namely in Ellipsoidal (spherical) “uncertainty” set U (a, b) is in U if (a, b) = (a0, b0) + (Da, Db) with (a0, b0) = (1, 1) and Da2 + Db2 1 Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Ellipsoidal “uncertainty” set U (a, b) = (a0, b0) + (Da, Db) (a0, b0) = (1, 1) Da2 + Db2 1 Want (x, y) to satisfy a x + b y 1, for all (a, b) from U U (a0, b0) Zinchenko and Henderson 2005
What can we say about a x + b y ? a x + b y 1 for all (a, b) in U max a x + b y 1 (a, b) in U What can we say about a x + b y ? a x + b y = (a0 + Da) x + (b0 + Db) y = (a0 x + b0 y) + (Da x + Db y) Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 For a moment, think of (x, y) as your objective function (fixed) max a x + b y ( 1 ?) (a, b) in U same as (a0 x + b0 y) + max (Da x + Db y) ( 1 ?) Da2 + Db2 1 (x, y) U (a0, b0) Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 max (Da x + Db y) ( 1 - (a0 x + b0 y) ?) Da2 + Db2 1 Here Da x + Db y ||(x, y)|| = (x2 + y2)1/2 the “length” of (x, y) (x1, y1) U (x2, y2) (a0, b0) Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 a x + b y 1 for all (a, b) in U max a x + b y 1 (a, b) in U (a0 x + b0 y) + max (Da x + Db y) 1 Da2 + Db2 1 ||(x, y)|| 1 - (a0 x + b0 y) Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Good news Can handle constraints of this type ||(x, y)|| 1 - (1 x + 1 y) easily (the so-called second-order conic programming (SOCP)) Not much harder than linear programming! Zinchenko and Henderson 2005
General Robust LP formulation max cTx s/t A(i) x bi, i = 1,…,m where c, x Î Rn, A(i) Î R1 x n, A(i)=A(i)0 + wi Pi with wi Î R1 x ki, ||wi|| 1, i=1,…,m, Pi Î Rki x n Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 SOCP equivalent: max cTx s/t || Pi x || bi - A(i)0 x, i = 1,…,m Probabilistic interpretation: think of A(i) taken from an a-level set of your favorite probability distribution (e.g. multivariate normal) the robust constraint will read satisfy the constraint with a given probability a Zinchenko and Henderson 2005
Where’d the ellipse come from? Expert opinion Statistics: Averages live in ellipsoids Doesn’t have to be an ellipse. Can be some other shape (e.g., boxes) Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Software Commercial: Mosek (http://www.mosek.com/) “Free”: SeDuMi (http://sedumi.mcmaster.ca/) SDPT3.x (http://www.math.nus.edu.sg/~mattohkc/sdpt3.html/) Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Practicalities Realistic problem sizes number of variables/constraints on the order of 103 – 104 depends (greatly) on the problem data structure/sparsity Possible to obtain a “good”, “inexpensive” approximation with LP Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Generality Possible to extend this approach to quite a few other convex programming problems Resources Lectures on Modern Convex Optimization: Analysis, Algorithms, and Engineering Applications by A. Ben-Tal, A. S. Nemirovskii Google for Robust Optimization (robust LP etc.) Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 In practice... Joint work with Millie Chu (Cornell) and Michael B. Sharpe (Princess Margaret Hospital, Toronto) Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Cancer treatment About 1.3 million new cancer cases in the U.S. each year Nearly 60% receive radiation therapy (in conjunction with surgery, chemotherapy etc) Zinchenko and Henderson 2005
External beam radiation therapy Radiation delivered by a linear accelerator Cancer cells more susceptible than normal cells Overlay beams from different angles Dose given in daily fractions for ~ 6 weeks Zinchenko and Henderson 2005
Intensity Modulated Radiation Therapy Block parts of the radiation beam – discretize the whole beam into a grid of smaller “beamlets” Choose different intensities for each beamlet Intensity Modulated Radiation Therapy Collaborative Working Group, 2001 Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Treatment Planning Goal: Choose beam angles and beamlet intensities that deliver enough radiation to kill all tumor cells, while avoiding healthy organs & tissue as much as possible Take CT scan Delineate target region and healthy structures Discretize body as small cubes, or “voxels” Formulate & solve a mathematical program to find a “good” plan Zinchenko and Henderson 2005 Princess Margaret Hospital
Robust Treatment Planning Setup errors + Patient motion + Structural changes during treatment = uncertainty in geometry Don’t rescan patient much if at all Use RO to “robustify” mathematical program Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Model Formulation • Many different formulations exist – we use a penalty formulation minimize: penalty objective subject to: Pr(Dose to voxel i in healthy structure k ≤ Uk) ≥ 0.95 Pr(Dose to voxel i in tumor ≥ L) ≥ 0.95 x = beamlet intensities ≥ 0 Zinchenko and Henderson 2005
Computational Results Prostate: tumor + 5 healthy regions 5 equi-spaced beams, ~ 225 beamlets from each angle Voxel size = 2 cm, ~ 400 total voxels Solver: Mosek, v. 3.0.1.18 Solve time = 6 seconds (LP), 45 minutes (SOCP) Zinchenko and Henderson 2005
Dose-Volume Histograms % of structure receiving ≥ x Gy deterministic solution’s plan DVH of expected dose stochastic solution’s plan Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Comparison Simulate 10 treatments (45 fractions each) For each of the 10 treatments, and for each solution (deterministic & stochastic), calculated dose delivered to each voxel in each fraction summed over the 45 fractions to get total dose delivered to each voxel plotted DVH Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 DVH – Treatment 1 det stoch Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 DVH – Treatment 2 det stoch Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 DVH – Treatment 3 det stoch Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 DVH – Treatment 4 det stoch Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 DVH – Treatment 5 det stoch Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 DVH – Treatment 6 det stoch Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 DVH – Treatment 7 det stoch Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 DVH – Treatment 8 det stoch Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 DVH – Treatment 9 det stoch Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 DVH – Treatment 10 det stoch Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 Conclusions LP “pushes you into a corner” True situation never same as data Robust LP: Find good solution that is always feasible within reason Efficient solution methods: can solve real problems Software available The End Zinchenko and Henderson 2005
Zinchenko and Henderson 2005 A Bit More Detail Di(x) = Dose delivered to voxel i in N fractions, with intensities x, a random variable Di(x) is the sum of N random variables (N = 45), assume iid, apply CLT, so Di(x) is approximately normally distributed Take n sample shifts, s1,...,sn, with associated probabilities p = (p1,...,pn)T Let ai(∙)T = ai(s1)T ai(s2)T dose delivered to voxel i, shifted by sj, from each beamlet with unit intensity ai(sn)T so that NpTai(∙)Tx = expected total dose delivered to voxel i, for N fractions. Let vi(x) = sample variance of dose delivered to voxel i Di(x) ~ Normal ( NpTai(·)Tx, Nvi(x) ) … Zinchenko and Henderson 2005
Probabilistic Constraints Want constraints to be violated with low probability (say, δ = .05) Example: maximum dose constraint on voxel i in Hk: Assuming Di(x) ~ Normal ( NpTai(∙)Tx, Nvi(x) ), mk Want P(Di(x) > mk) ≤ δ Second order cone program (SOCP) Zinchenko and Henderson 2005
Dose-Volume Constraints Physicians like constraints of form: “<= fraction fk of structure Hk gets >= dk” 0-1 var for each voxel: = 1 if dose is > dk. MIP: Hard to solve! Many voxels get near max allowed dose Alternative: upper bound the “excess” dose. For healthy structure Hk, we require: Linear constraints☺ Zinchenko and Henderson 2005