General Strong Polarization Madhu Sudan Harvard University Joint work with Jaroslaw Blasiok (Harvard), Venkatesan Guruswami (CMU), Preetum Nakkiran (Harvard) and Atri Rudra (Buffalo) May 18, 2018 Cornell: General Strong Polarization
This Talk: Martingales and Polarization [0,1]-Martingale: Sequence of r.v.s 𝑋 0 , 𝑋 1 , 𝑋 2 ,… 𝑋 𝑡 ∈ 0,1 , 𝔼 𝑋 𝑡+1 | 𝑋 0 ,…, 𝑋 𝑡 = 𝑋 𝑡 Well-studied in algorithms: [MotwaniRaghavan, §4.4], [MitzenmacherUpfal, §12] E.g., 𝑋 𝑡 = Expected approx. factor of algorithm given first 𝑡 random choices. Usually study “concentration” E.g. … whp 𝑋 final ∈[ 𝑋 0 ±𝜖] Today: Polarization: … whp 𝑋 final ∉ 𝜖,1−𝜖 Why? – Polar codes … ⇔ lim 𝑡→∞ 𝑋 𝑡 = 𝑋 0 ⇔ lim 𝑡→∞ 𝑋 𝑡 = Bern 𝑋 0 May 18, 2018 Cornell: General Strong Polarization
Main Result: Definition and Theorem Strong Polarization: (informally) Pr 𝑋 𝑡 ∈(𝜏,1−𝜏) ≤𝜖 if 𝑡≥ max 𝑂( log 1/𝜖), 𝑜( log 1/𝜏) formally ∀𝛾>0 ∃𝛽<1,𝑐 𝑠.𝑡. ∀𝑡 Pr 𝑋 𝑡 ∈( 𝛾 𝑡 ,1− 𝛾 𝑡 ) ≤𝑐⋅ 𝛽 𝑡 A non-polarizing example: 𝑋 0 = 1 2 ; 𝑋 𝑡+1 = 𝑋 𝑡 ± 2 −𝑡−2 Local Polarization: Variance in the middle: 𝑋 𝑡 ∈(𝜏,1−𝜏) ∀𝜏>0 ∃𝜎>0 𝑠.𝑡. ∀𝑡, 𝑋 𝑡 ∈ 𝜏,1−𝜏 ⇒𝑉𝑎𝑟 𝑋 𝑡+1 𝑋 𝑡 ≥𝜎 Suction at the ends: 𝑋 𝑡 ∉ 𝜏,1−𝜏 ∃𝜃>0, ∀𝑐<∞, ∃𝜏>0 𝑠.𝑡. 𝑋 𝑡 <𝜏⇒ Pr 𝑋 𝑡+1 < 𝑋 𝑡 𝑐 ≥𝜃 Theorem: Local Polarization ⇒ Strong Polarization. Both definitions qualitative! “low end” condition. Similar condition for high end May 18, 2018 Cornell: General Strong Polarization
Cornell: General Strong Polarization Rest of the talk Background/Motivation: The Shannon Capacity Challenge Resolution by [Arikan’08], [Guruswami-Xia’13], [Hassani,Alishahi,Urbanke’13]: Polar Codes! Key ideas: Polar Codes & Arikan-Martingale. Implications of Weak/Strong Polarization. Our Contributions: Simplification & Generalization Local Polarization ⇒ Strong Polarization Arikan-Martingale Locally Polarizes (generally) May 18, 2018 Cornell: General Strong Polarization
Shannon and Channel Capacity Shannon [‘48]: For every (noisy) channel of communication, ∃ capacity 𝐶 s.t. communication at rate 𝑅<𝐶 is possible and 𝑅>𝐶 is not. Needs lot of formalization – omitted. Example: Acts independently on bits Capacity =1 – 𝐻(𝑝) ; 𝐻(𝑝) = binary entropy! 𝐻 𝑝 =𝑝⋅ log 1 𝑝 + 1−𝑝 ⋅ log 1 1−𝑝 Rate? Encode: 𝔽 2 𝑘 𝑛 → 𝔽 2 𝑛 lim 𝑛→∞ 𝑘 𝑛 𝑛 ≥𝑅 𝑋∈ 𝔽 2 𝑋 w.p. 1−𝑝 1−𝑋 w.p. 𝑝 𝐵𝑆𝐶(𝑝) May 18, 2018 Cornell: General Strong Polarization
“Achieving” Channel Capacity Commonly used phrase. Many mathematical interpretations. Some possibilities: How small can 𝑛 be? Get 𝑅>𝐶−𝜖 with polytime algorithms? [Luby et al.’95] Make running time poly 𝑛 𝜖 ? Open till 2008 Arikan’08: Invented “Polar Codes” … Final resolution: Guruswami+Xia’13, Hassani+Alishahi+Urbanke’13 – Strong analysis Shannon ‘48 : 𝑛=𝑂 𝐶−𝑅 −2 Forney ‘66 :time =𝑝𝑜𝑙𝑦 𝑛, 2 1 𝜖 2 May 18, 2018 Cornell: General Strong Polarization
Polar Codes and Polarization Arikan: Defined Polar Codes, one for every 𝑡 Associated Martingale 𝑋 0 ,…, 𝑋 𝑡 ,… 𝑡th code: Length 𝑛= 2 𝑡 If Pr 𝑋 𝑡 ∈ 𝛾,1−𝛿 ≤𝜖 then 𝑡th code is 𝜖+𝛿 -close to capacity, and Pr 𝐷𝑒𝑐𝑜𝑑𝑒 𝐸𝑛𝑐𝑜𝑑𝑒 𝑚 +𝑒𝑟𝑟𝑜𝑟 ≠𝑚 ≤𝑛⋅𝛾 Need 𝛾=𝑜 1 𝑛 (strong polarization ⇒𝛾=𝑛𝑒𝑔(𝑛)) Need 𝜖=1/ 𝑛 Ω(1) (⇐ strong polarization) May 18, 2018 Cornell: General Strong Polarization
Context for today’s talk [Arikan’08]: Martingale associated to general class of codes. Martingale polarizes (weakly) implies code achieves capacity (weakly) Martingale polarizes weakly for entire class. [GX’13, HAU’13]: Strong polarization for one specific code. Why so specific? Bottleneck: Strong Polarization Rest of talk: Polar codes, Martingale and Strong Polarization May 18, 2018 Cornell: General Strong Polarization
Lesson 0: Compression ⇒ Coding Linear Compression Scheme: 𝐻,𝐷 for compression scheme for bern 𝑝 𝑛 if Linear map 𝐻: 𝔽 2 𝑛 → 𝔽 2 𝑚 Pr 𝑍∼bern 𝑝 𝑛 𝐷 𝐻⋅𝑍 ≠𝑍 =𝑜 1 Hopefully 𝑚 𝑛 ≤𝐻 𝑝 +𝜖 Compression ⇒ Coding Let 𝐻 ⊥ be such that 𝐻⋅𝐻 ⊥ =0; Encoder: 𝑋↦ 𝐻 ⊥ ⋅𝑋 Error-Corrector: 𝑌= 𝐻 ⊥ ⋅𝑋+𝜂↦𝑌−𝐷 𝐻⋅𝑌 = 𝑤.𝑝. 1−𝑜(1) 𝐻 ⊥ ⋅𝑋 May 18, 2018 Cornell: General Strong Polarization
Question: How to compress? Arikan’s key idea: Start with 2×2 “Polarization Transform”: 𝑈,𝑉 → 𝑈+𝑉,𝑉 Invertible – so does nothing? If 𝑈,𝑉 independent, then 𝑈+𝑉 “more random” than either 𝑉 | 𝑈+𝑉 “less random” than either Iterate (ignoring conditioning) End with bits that are almost random, or almost determined (by others). Output “totally random part” to get compression! May 18, 2018 Cornell: General Strong Polarization
Information Theory Basics Entropy: Defn: for random variable 𝑋∈Ω, 𝐻 𝑋 = 𝑥∈Ω 𝑝 𝑥 log 1 𝑝 𝑥 where 𝑝 𝑥 = Pr 𝑋=𝑥 Bounds: 0≤𝐻 𝑋 ≤ log Ω Conditional Entropy Defn: for jointly distributed 𝑋,𝑌 𝐻 𝑋 𝑌 = 𝑦 𝑝 𝑦 𝐻(𝑋|𝑌=𝑦) Chain Rule: 𝐻 𝑋,𝑌 =𝐻 𝑋 +𝐻(𝑌|𝑋) Conditioning does not increase entropy: 𝐻 𝑋 𝑌 ≤𝐻(𝑋) Equality ⇔ Independence: 𝐻 𝑋 𝑌 =𝐻 𝑋 ⇔ 𝑋⊥𝑌 May 18, 2018 Cornell: General Strong Polarization
Iteration by Example: 𝐴,𝐵,𝐶,𝐷 → 𝐴+𝐵+𝐶+𝐷,𝐵+𝐷,𝐶+𝐷, 𝐷 𝐸,𝐹,𝐺,𝐻 → 𝐸+𝐹+𝐺+𝐻,𝐹+𝐻,𝐺+𝐻,𝐻 𝐵+𝐷 𝐹+𝐻 𝐵+𝐷+𝐹+𝐻 𝐻 𝐵+𝐷 𝐴+𝐵+𝐶+𝐷) 𝐻 𝐵+𝐷+𝐹+𝐻| 𝐴+𝐵+𝐶+𝐷, 𝐸+𝐹+𝐺+𝐻 𝐻 𝐹+𝐻| 𝐴+𝐵+𝐶+𝐷, 𝐸+𝐹+𝐺+𝐻, 𝐵+𝐷+𝐹+𝐻 𝐻 𝐹+𝐻 𝐸+𝐹+𝐺+𝐻) May 18, 2018 Cornell: General Strong Polarization
The Polarization Butterfly 𝑃 𝑛 𝑈,𝑉 = 𝑃 𝑛 2 𝑈+𝑉 , 𝑃 𝑛 2 𝑉 Polarization ⇒∃𝑆⊆ 𝑛 𝐻 𝑊 𝑛 −𝑆 𝑊 𝑆 →0 𝑆 ≤ 𝐻 𝑝 +𝜖 ⋅𝑛 𝑍 1 𝑍 2 𝑍 𝑛 2 𝑍 𝑛 2 +1 𝑍 𝑛 2 +2 𝑍 𝑛 𝑍 1 + 𝑍 𝑛 2 +1 𝑍 2 + 𝑍 𝑛 2 +2 𝑍 𝑛 2 + 𝑍 𝑛 𝑍 𝑛 2 +1 𝑍 𝑛 2 +2 𝑍 𝑛 𝑊 1 𝑊 2 𝑊 𝑛 2 𝑊 𝑛 2 +1 𝑊 𝑛 2 +2 𝑊 𝑛 𝑈 Compression 𝐸 𝑍 = 𝑃 𝑛 𝑍 𝑆 Encoding time =𝑂 𝑛 log 𝑛 (given 𝑆) 𝑉 To be shown: 1. Decoding = 𝑂 𝑛 log 𝑛 2. Polarization! May 18, 2018 Cornell: General Strong Polarization
The Polarization Butterfly: Decoding Decoding idea: Given 𝑆, 𝑊 𝑆 = 𝑃 𝑛 𝑈,𝑉 Compute 𝑈+𝑉∼Bern 2𝑝−2 𝑝 2 Determine 𝑉∼ ? 𝑍 1 𝑍 2 𝑍 𝑛 2 𝑍 𝑛 2 +1 𝑍 𝑛 2 +2 𝑍 𝑛 𝑍 1 + 𝑍 𝑛 2 +1 𝑊 1 𝑊 2 𝑊 𝑛 2 𝑊 𝑛 2 +1 𝑊 𝑛 2 +2 𝑊 𝑛 𝑍 2 + 𝑍 𝑛 2 +2 𝑉|𝑈+𝑉∼ 𝑖 Bern( 𝑟 𝑖 ) Key idea: Stronger induction! - non-identical product distribution 𝑍 𝑛 2 + 𝑍 𝑛 𝑍 𝑛 2 +1 Decoding Algorithm: Given 𝑆, 𝑊 𝑆 = 𝑃 𝑛 𝑈,𝑉 , 𝑝 1 ,…, 𝑝 𝑛 Compute 𝑞 1 ,.. 𝑞 𝑛 2 ← 𝑝 1 … 𝑝 𝑛 Determine 𝑈+𝑉=𝐷 𝑊 𝑆 + , 𝑞 1 … 𝑞 𝑛 2 Compute 𝑟 1 … 𝑟 𝑛 2 ← 𝑝 1 … 𝑝 𝑛 ;𝑈+𝑉 Determine 𝑉=𝐷 𝑊 𝑆 − , 𝑟 1 … 𝑟 𝑛 2 𝑍 𝑛 2 +2 𝑍 𝑛 May 18, 2018 Cornell: General Strong Polarization
Polarization? Martingale? Idea: Follow 𝑋 𝑡 = 𝐻 𝑍 𝑖,𝑡 𝑍 <𝑖,𝑡 for random 𝑖 𝑍 1 𝑍 2 𝑍 𝑛 2 𝑍 𝑛 2 +1 𝑍 𝑛 2 +2 𝑍 𝑛 𝑍 1 + 𝑍 𝑛 2 +1 𝑊 1 𝑊 2 𝑊 𝑛 2 𝑊 𝑛 2 +1 𝑊 𝑛 2 +2 𝑊 𝑛 Martingale? 𝑍 2 + 𝑍 𝑛 2 +2 𝑍 𝑖,0 𝑍 𝑖,1 … Z 𝑖,𝑡 … 𝑍 𝑖, log 𝑛 𝑍 𝑗,𝑡 𝑍 𝑘,𝑡 𝑍 𝑗,𝑡+1 𝑍 𝑘,𝑡+1 𝑍 𝑛 2 + 𝑍 𝑛 𝑍 𝑛 2 +1 𝑍 𝑛 2 +2 𝑋 𝑡 can’t distinguish 𝑖=𝑗 from 𝑖=𝑘 𝐻 𝑍 𝑗,𝑡 , 𝑍 𝑘,𝑡 =𝐻 𝑍 𝑗,𝑡+1 , 𝑍 𝑘,𝑡+1 Chain Rule … ⇒𝔼 𝑋 𝑡+1 𝑋 𝑡 = 𝑋 𝑡 𝑍 𝑛 May 18, 2018 Cornell: General Strong Polarization
Cornell: General Strong Polarization Remaining Tasks: Prove that 𝑋 𝑡 ≝𝐻 𝑍 𝑖,𝑡 𝑍 <𝑖,𝑡 polarizes locally Prove that Local polarization ⇒ Strong Polarization. Recall Strong Polarization: (informally) Pr 𝑋 𝑡 ∈(𝜏,1−𝜏) ≤𝜖 if 𝑡≥ max 𝑂( log 1/𝜖), 𝑜( log 1/𝜏) formally ∀𝛾>0 ∃𝛽<1,𝑐 𝑠.𝑡. ∀𝑡 Pr 𝑋 𝑡 ∈( 𝛾 𝑡 ,1− 𝛾 𝑡 ) ≤𝑐⋅ 𝛽 𝑡 Local Polarization: Variance in the middle: 𝑋 𝑡 ∈(𝜏,1−𝜏) ∀𝜏>0 ∃𝜎>0 𝑠.𝑡. ∀𝑡, 𝑋 𝑡 ∈ 𝜏,1−𝜏 ⇒𝑉𝑎𝑟 𝑋 𝑡+1 𝑋 𝑡 ≥𝜎 Suction at the ends: 𝑋 𝑡 ∉ 𝜏,1−𝜏 ∃𝜃>0, ∀𝑐<∞, ∃𝜏>0 𝑠.𝑡. 𝑋 𝑡 <𝜏⇒ Pr 𝑋 𝑡+1 < 𝑋 𝑡 𝑐 ≥𝜃 May 18, 2018 Cornell: General Strong Polarization
Local ⇒ (Global) Strong Polarization Step 1: Medium Polarization: ∃𝛼,𝛽<1 Pr 𝑋 𝑡 ∈ 𝛽 𝑡 ,1− 𝛽 𝑡 ≤ 𝛼 𝑡 Proof: Potential Φ 𝑡 =𝔼 min 𝑋 𝑡 , 1− 𝑋 𝑡 Variance+Suction ⇒ ∃𝜂<1 s.t. Φ 𝑡 ≤𝜂⋅ 𝜙 𝑡−1 (Aside: Why 𝑋 𝑡 ? Clearly 𝑋 𝑡 won’t work. And 𝑋 𝑡 2 increases!) ⇒ Φ 𝑡 ≤ 𝜂 𝑡 ⇒ Pr 𝑋 𝑡 ∈ 𝛽 𝑡 ,1− 𝛽 𝑡 ≤ 𝜂 𝛽 𝑡 May 18, 2018 Cornell: General Strong Polarization
Local ⇒ (Global) Strong Polarization Step 1: Medium Polarization: ∃𝛼,𝛽<1 Pr 𝑋 𝑡 ∈ 𝛽 𝑡 ,1− 𝛽 𝑡 ≤ 𝛼 𝑡 Step 2: Medium Polarization + 𝑡-more steps ⇒ Strong Polarization: Proof: Assume 𝑋 0 ≤ 𝛼 𝑡 ; Want 𝑋 𝑡 ≤ 𝛾 𝑡 w.p. 1 −𝑂 𝛼 1 𝑡 Pick 𝑐 large & 𝜏 s.t. Pr 𝑋 𝑖 < 𝑋 𝑖−1 𝑐 | 𝑋 𝑖−1 ≤𝜏 ≥𝜃 2.1: Pr ∃𝑖 𝑠.𝑡. 𝑋 𝑖 ≥𝜏 ≤ 𝜏 −1 ⋅ 𝛼 𝑡 (Doob’s Inequality) 2.2: Let 𝑌 𝑖 = log 𝑋 𝑖 ; Assuming 𝑋 𝑖 ≤𝜏 for all 𝑖 𝑌 𝑖 − 𝑌 𝑖−1 ≤− log 𝑐 w.p. ≥𝜃; & ≥𝑘 w.p. ≤ 2 −𝑘 . 𝔼xp 𝑌 𝑡 ≤−𝑡 log 𝑐 𝜃 +1 ≤2𝑡⋅ log 𝛾 Concentration! Pr 𝑌 𝑡 ≥𝑡⋅ log 𝛾 ≤ 𝛼 2 𝑡 QED! May 18, 2018 Cornell: General Strong Polarization
Polarization of Arikan Martingale Ignoring conditioning: 𝑋 𝑡 =𝐻 𝑝 ⇒ 𝑋 𝑡+1 =𝐻 2𝑝−2 𝑝 2 w.p. ½ =2𝐻 𝑝 −𝐻 2𝑝−2 𝑝 2 w.p. ½ Variance in the middle: Continuity of 𝐻 ⋅ and separation of 2𝑝−2 𝑝 2 from 𝑝 Suction at high end: 𝑋 𝑡 =1−𝜖 ⇒ 𝑋 𝑡+1 =1− 𝑂(𝜖 2 ) w.p. ½ Suction at low end: 𝑋 𝑡 =𝐻 𝑝 ⇒ 𝑋 𝑡+1 =𝐻 𝑝 log 𝑝 w.p. ½ May 18, 2018 Cornell: General Strong Polarization
General Strong Polarization? So far: 𝑃 𝑛 = 𝐻 2 ⊗ℓ for 𝐻 2 = 1 1 1 0 What about other matrices? E.g., 𝑃 𝑛 = 𝐻 3 ⊗ℓ for 𝐻 3 = 1 1 0 1 0 0 0 0 1 . Open till our work! Necessary conditions: 𝐻 invertible, not-lower-triangular Our work: Necessary conditions suffice (for local polarization)! May 18, 2018 Cornell: General Strong Polarization
Cornell: General Strong Polarization Conclusion Simple General Analysis of Strong Polarization But main reason to study general polarization: Maybe some matrices polarize even faster! This analysis does not get us there. But hopefully following the (simpler) steps in specific cases can lead to improvements. May 18, 2018 Cornell: General Strong Polarization
Cornell: General Strong Polarization Thank You! May 18, 2018 Cornell: General Strong Polarization