The Complexity of Matrix Completion Nick Harvey David Karger Sergey Yekhanin
What is matrix completion? Given matrix containing variables, substitute values for the variables to get full rank 1 x 1 y x=1, y=0 1 x 1 y 1 x=1, y=1 Bad
Why should I care? Combinatorics Many combinatorial problems relate to matrices of variables Tutte ’47, Edmonds ’67, Lovasz ’79 Relation to Algebra Tomizawa-Iri ’74, Murota ’00 Gessel-Viennot ’85 Graph Matching Matroid Intersection Counting paths in DAG Problem God (i.e., the BOOK)
Why should I care? Algorithms Often yields highly efficient algorithms RNC: KUW’86, MVV’87 Sequential O(n 2.38 ) time: MS’04, H’06 O(nr 1.38 ) time: H’06 Random Network Codes: Koetter-Medard ’03, Ho et al. ’03 Graph Matching Matroid Intersection Counting paths in DAG Algorithms Problem
Why should I care? Complexity Depending on parameters, can be NP-complete, in RP, or in P Key parameters: Field size, # variables, # occurrences of each variable Contains polynomial identity testing as special case (Valiant ’79) Derandomizing PIT implies strong circuit lower bounds (Kabanets-Impagliazzo ’03)
Field Size Why care about field size? Relevant to complexity: random works over large fields Understanding smaller fields may provide insight to derandomization Important for network coding efficiency (i.e., complexity of routers)
Complexity Regions Field Size # Occurences of an variable 2 357n NP Hard RP P Buss et al. ‘99 Lovasz ‘79 H., Karger, Murota ‘05 P Geelen ‘99 ??????
Complexity Regions # Occurences of an variable 2 357n NP Hard RP NP Hard Field Size P P
Variant: Simultaneous Completion We have set of matrices A := {A 1, …, A d } Each variable appears at most once per matrix An variable can appear in several matrices Def: A simultaneous completion for A assigns values to variables while preserving the rank of all matrices RP algorithm still works over large field Application to Network Coding uses Simultaneous Completion
Relationship to Single Matrix Completion Hardness for Simultaneous Completion Hardness for Single Matrix Completion w/many occurrences of variables 1 A B C Simultaneous Completion 1 A D E 1 B C D Single Matrix Completion
Simultaneous Completion Algorithm Input: d matrices Compute rank of all matrices Pick an variable x for i {0,…,d} Set x := i If all matrices have unchanged rank Recurse (# variables has decreased) Simple self-reducibility algorithm Operates over field F q, where d := # matrices < q Non-trivial! Murota ’93.
A Sharp Threshold Simple self-reducibility algorithm Operates over field F q, where d := # matrices < q Thm: Simultaneous completion for d matrices over F q is: in P if q > d[HKM ’05] NP-hard if q ≤ d[This paper]
A Sharp Threshold Thm: Simultaneous completion for d matrices over F q is: in P if q > d[HKM ’05] NP-hard if q ≤ d[This paper] Cor: Single matrix completion with d occurrences of variables over F q is NP-hard if q ≤ d
Approach Reduction from Circuit-SAT A NAND B C C = ( A B ) C = 1 - A ∙ B (if A, B, C {0, 1}) det 0 1 A B C (if A, B, C {0, 1})
What have we shown so far? Simultaneous completion of an unbounded number of matrices over F 2 is NP-hard Can we use fewer? Combine small matrices into huge matrix? Problem: Variables appear too many times Need to somehow make “copies” of a variable Coming up next: completing two matrices over F 2 is NP-hard
A Curious Matrix x1x x2x2 111 x3x3 11 xnxn 1 R n :=
A Curious Matrix x1x x2x2 111 x3x3 11 xnxn 1 R n := Thm: det R n =
Linearity of Determinant x1x x2x2 111 x3x3 11 xnxn 1 det x1x x2x2 111 x3x3 11 xnxn x1x x2x2 110 x3x3 10 xnxn 0 det + =
Column Expansion x1x1 111 x2x2 11 x3x3 1 xnxn (-1) n+1 det == x1x x2x2 111 x3x3 11 xnxn 1 det 1111 x1x x2x2 110 x3x3 10 xnxn 0 det +
11111 x1x x2x2 111 x3x3 11 xnxn 1
Schur Complement Identity x1x x2x2 111 x3x3 11 xnxn 1 det = det x1x1 111 x2x2 11 x3x3 1 xnxn ∙∙ -
Applying Outer Product = det 1-x 1 11-x x x n = det x1x1 111 x2x2 11 x3x3 1 xnxn ∙∙ -
Finishing up = det 1-x 1 11-x x x n = QED
Proof: det R n =, which is arithmetization of So either all variables true, or all false. Replicating Variables Corollary: If {x 1, x 2, …, x n } in {0,1} then det R n 0 x i = x j i,j x i x i. ii
Replicating Variables Corollary: If {x 1, x 2, …, x n } in {0,1} then det R n 0 x i = x j i,j Consequence: over F 2, need only 2 matrices NAND A := RnRn RnRn RnRn B :=
What have we shown so far? Simultaneous completion of: an unbounded number of matrices over F 2 is NP-hard two matrices over F 2 is NP-hard Next: q matrices over F q is NP-hard
Handling Fields F q Previous gadgets only work if each x {0,1}. How can we ensure this over F q ? Introduce q-2 auxiliary variables: x=x (1), x (2), …, x (q-1) Sufficient to enforce that: x (i) x (j) i,j and x (i) {0,1} i 2 det 0 1 x (i) x (j) etc.
Handling Fields F q x (i) x (j) i,j and x (i) {0,1} i 2 x (2) 01 x (1) x (3) x (4) x (q-1) Edge indicates endpoints non-equal
Handling Fields F q x (i) x (j) i,j and x (i) {0,1} i 2 x (2) 01 x (1) x (3) x (4) x (q-1) Pack these constraints into few matrices Each variable used once per matrix Amounts to edge-coloring From (K n ), conclude that q matrices suffice
What have we shown so far? Simultaneous completion of: an unbounded number of matrices over F 2 is NP-hard two matrices over F 2 is NP-hard q matrices over F q is NP-hard
Main Results Thm: A simultaneous completion for d matrices over F q is NP-hard if q ≤ d Cor: Completion of single matrix, variables appearing d times is NP-hard if q ≤ d Cor: Completion of skew-symmetric matrix, variables appearing d times is NP-hard if q ≤ d
Open Questions Improved hardess results / algorithms for matrix completion? Lower bounds / hardness for field size in network coding? More combinatorial uses of matrix completion