Pseudo-derandomizing learning and approximation

Slides:

Advertisements

Similar presentations

Quantum Software Copy-Protection Scott Aaronson (MIT) |

Advertisements

Impagliazzos Worlds in Arithmetic Complexity: A Progress Report Scott Aaronson and Andrew Drucker MIT 100% QUANTUM-FREE TALK (FROM COWS NOT TREATED WITH.

The Equivalence of Sampling and Searching Scott Aaronson MIT.

Function Technique Eduardo Pinheiro Paul Ilardi Athanasios E. Papathanasiou The.

Models of Computation Prepared by John Reif, Ph.D. Distinguished Professor of Computer Science Duke University Analysis of Algorithms Week 1, Lecture 2.

Complexity Theory Lecture 6

Pseudorandomness from Shrinkage David Zuckerman University of Texas at Austin Joint with Russell Impagliazzo and Raghu Meka.

Shortest Vector In A Lattice is NP-Hard to approximate

Approximate List- Decoding and Hardness Amplification Valentine Kabanets (SFU) joint work with Russell Impagliazzo and Ragesh Jaiswal (UCSD)

Many-to-one Trapdoor Functions and their Relations to Public-key Cryptosystems M. Bellare S. Halevi A. Saha S. Vadhan.

Talk for Topics course. Pseudo-Random Generators pseudo-random bits PRG seed Use a short “ seed ” of very few truly random bits to generate a long string.

Foundations of Cryptography Lecture 10 Lecturer: Moni Naor.

An Efficient Membership-Query Algorithm for Learning DNF with Respect to the Uniform Distribution Jeffrey C. Jackson Presented By: Eitan Yaakobi Tamar.

Agrawal-Kayal-Saxena Presented by: Xiaosi Zhou

Introduction to Modern Cryptography Lecture 6 1. Testing Primitive elements in Z p 2. Primality Testing. 3. Integer Multiplication & Factoring as a One.

Time vs Randomness a GITCS presentation February 13, 2012.

Yi Wu (CMU) Joint work with Parikshit Gopalan (MSR SVC) Ryan O’Donnell (CMU) David Zuckerman (UT Austin) Pseudorandom Generators for Halfspaces TexPoint.

Some Thoughts regarding Unconditional Derandomization Oded Goldreich Weizmann Institute of Science RANDOM 2010.

Derandomization: New Results and Applications Emanuele Viola Harvard University March 2006.

Arithmetic Hardness vs. Randomness Valentine Kabanets SFU.

Computability and Complexity 32-1 Computability and Complexity Andrei Bulatov Boolean Circuits.

EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.

In a World of BPP=P Oded Goldreich Weizmann Institute of Science.

Foundations of Cryptography Lecture 2 Lecturer: Moni Naor.

Quantum Computing MAS 725 Hartmut Klauck NTU TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A.

Why Extractors? … Extractors, and the closely related “Dispersers”, exhibit some of the most “random-like” properties of explicitly constructed combinatorial.

Quantum Computing MAS 725 Hartmut Klauck NTU

My Favorite Ten Complexity Theorems of the Past Decade II Lance Fortnow University of Chicago.

Umans Complexity Theory Lectures Lecture 17: Natural Proofs.

Pseudo-random generators Talk for Amnon ’ s seminar.

Comparing Notions of Full Derandomization Lance Fortnow NEC Research Institute With thanks to Dieter van Melkebeek.

Technion Haifa Research Labs Israel Institute of Technology Underapproximation for Model-Checking Based on Random Cryptographic Constructions Arie Matsliah.

Almost SL=L, and Near-Perfect Derandomization Oded Goldreich The Weizmann Institute Avi Wigderson IAS, Princeton Hebrew University.

1 Introduction to Quantum Information Processing QIC 710 / CS 667 / PH 767 / CO 681 / AM 871 Richard Cleve DC 2117 Lectures

Pseudorandomness: New Results and Applications Emanuele Viola IAS April 2007.

Complexity Theory and Explicit Constructions of Ramsey Graphs Rahul Santhanam University of Edinburgh.

PRIMES is in P Manindra Agrawal Neeraj Kayal Nitin Saxena Dept of CSE, IIT Kanpur.

1 CS 391L: Machine Learning: Computational Learning Theory Raymond J. Mooney University of Texas at Austin.

Computational Complexity Theory

Probabilistic Algorithms

Derandomization & Cryptography

Randomness and Computation

Introduction to Machine Learning

Algorithms vs. Circuit Lower Bounds

Pseudodeterministic Constructions in Subexponential Time

Constructing hard functions from learning algorithms

Igor Carboni Oliveira University of Oxford

Pseudorandomness when the odds are against you

Background: Lattices and the Learning-with-Errors problem

Hardness Magnification

Cryptography Lecture 5.

An average-case lower bound against ACC0

Nikhil Bansal, Shashwat Garg, Jesper Nederlof, Nikhil Vyas

Cryptography Lecture 6.

Constrained Bipartite Vertex Cover: The Easy Kernel is Essentially Tight Bart M. P. Jansen June 4th, WORKER 2015, Nordfjordeid, Norway.

Constrained Bipartite Vertex Cover: The Easy Kernel is Essentially Tight Bart M. P. Jansen February 18th, STACS 2016, Orléans, France.

CSE838 Lecture notes copy right: Moon Jung Chung

Pseudo-deterministic Proofs

On the effect of randomness on planted 3-coloring models

Cryptography Lecture 5.

Cryptography Lecture 8.

Cryptography Lecture 6.

Emanuele Viola Harvard University June 2005

9. Complexity Theory : The Frontier

Cryptography Lecture 20.

Oracle Separation of BQP and PH

On Derandomizing Algorithms that Err Extremely Rarely

Recent Structure Lemmas for Depth-Two Threshold Circuits

Stronger Connections Between Circuit Analysis and Circuit Lower Bounds, via PCPs of Proximity Lijie Chen Ryan Williams.

Oracle Separation of BQP and PH

Presentation transcript:

Pseudo-derandomizing learning and approximation Igor Carboni Oliveira Oxford / Simons Institute RANDOM 2018 Princeton, 20/August/2018 Joint work with Rahul Santhanam

Randomness is a powerful resource in computation: Cryptography, Distributed Computing, Property Testing, etc. But it comes with issues: - Truly random bits might not be available. - Randomness introduces uncertainty.

Progress in derandomization for weaker models: In many cases, randomness can be eliminated. Example: Primality testing. However, power of randomness in computation remains open in general: - Is BPP = P? - Can we deterministically construct a hard Boolean function h in time poly(|h|)? Progress in derandomization for weaker models: bounded space, small-depth circuits, concrete combinatorial settings, etc. But we seem to be a long way from showing that BPP is contained in P…

Given this state of affairs: Even if BPP = P, eliminating randomness might still cause significant overhead. Given this state of affairs: 1. Is there a way to explore the benefits of randomness, while reducing its shortcomings? (such as uncertainty in the output) 2. Can we prove some form of unconditional derandomization in strong computational models?

Given 1n, output a prime on n bits. Pseudodeterministic Algorithms (Gat-Goldwasser, 2011) - Randomized algorithms for search problems that output a canonical solution with high probability. Example: Given 1n, output a prime on n bits. Easy with randomness … but how to output a canonical prime? - Algorithms that exploit randomness, yet have little to no uncertainty in the output.

Previous work - Pseudodeterministic algorithms have been investigated in a variety of settings: Polynomial-time computation: [OS17], [Hol17] Parallel computation: bipartite matching problem [GG17] Bounded-space computation [GL18] Sub-linear time and query complexity [GGR13] Computational number theory: primate roots [Gro15], quadratic non-residues [GG11] Proof systems: [GGH17]

“Pseudo-derandomization” Relaxed goal of turning randomized algorithms into pseudo-deterministic algorithms. (randomized, but canonical output) ! - Unconditional pseudo-derandomization is possible even in strong computational models Theorem [O-Santhanam, 2017] Let Q be a dense language in P. There is an algorithm A running in sub-exponential time such that on infinitely many values of n: A(1n) outputs a canonical n-bit string wn in Q.

Pseudo-derandomization in Learning and Approximation This work: Pseudo-derandomization in Learning and Approximation

Learning Theory We consider learning under the uniform distribution with membership queries. Fix a circuit class C. A learning algorithm for C is given oracle access to an unknown function f in C. It should output w.h.p. over its internal randomness a hypothesis h that approximates f well under the uniform distribution. Majority of algorithms designed in this model are randomized… … but no systematic investigation of the power of randomness in learning theory before this work.

Some learning algorithms DNF AC0 AC0[p] Randomized learning in poly time Deterministic learning in quasi-poly time Randomised learning in quasi-poly time [LMN, 1993] [CIKK, 2016] [Jackson, 1997] [Sitharam, 1995]

Pseudodeterministic Learning In addition, we require that the randomized learner “behaves” as a deterministic learner: w.r.t the output: For every f in C, there is a canonical hypothesis hf (circuit) that is output w.h.p. w.r.t. the input access: For every f in C, there is a canonical set Qf of queries that are made w.h.p. Running the learner again is likely to access the same data and to generate the same hypothesis.

Conditional pseudo-derandomization of learning

First Result: Observation that standard assumptions suffice to (pseudo) derandomize learning algorithms. Not so obvious, since PRGs are not originally designed to operate in the presence of oracles (the unknown function to be learned). Proposition 1 (Informal). Suppose functions in a circuit class C can be learned by a randomized algorithm running in time t(n). If E requires circuits of exponential size almost everywhere, then C can be deterministically learned in time poly(t(n)). If BPE requires circuits of exponential size almost everywhere, then C can be pseudo-deterministically learned in time poly(t(n)). Corollaries: Jackson’s algorithm for learning DNFs and CIKK algorithm for learning AC0[p] can be conditionally derandomized.

Unconditional pseudo-derandomization

Current techniques allow us to show an unconditional result … … but only for circuit classes that are unlikely to admit learning algorithms. Theorem (Informal). Suppose Circuit[poly] can be learned by a randomized algorithm running in quasi-polynomial time (similarly to AC0[p], for instance), Then Circuit[poly] can be learned pseudo-deterministically in sub-exponential time exp(no(1)) on infinitely many input lengths. - Result holds for any self-learnable circuit class: Learning algorithm implementable by (larger) circuits from the same class. - Our argument requires the self-learnable class to contain TC0.

Comments about the proof - Employs powerful theory showing that learning algorithms for C imply lower bounds in BPTIME[.] against C. [FK06], [KKO13], [OS17] - Such worst-case lower bounds are then amplified to average-case lower bounds using standard techniques (requires TC0 contained in C). - For self-learnable classes, can construct (pseudodeterministic) PRGs that are powerful enough to pseudo-derandomize the learning computation (Proposition 1).

Learning implies pseudorandomness Previous results: “Pseudorandomness derandomizes learning” Learning implies pseudorandomness

Learning implies HSGs Theorem. Suppose a class C can be deterministically learned in time T(n) to error < 1/4. Then there is a uniform hitting set generator that hits every ½-dense function in C. Observations: - Works for all circuit classes, including DNF. - Extends to pseudo-deterministic learning which yields pseudo-deterministic HSGs. Therefore: conditional derandomizations of Jackson’s algorithm for DNF and CIKK for AC0[p] cannot be made unconditional without improving the state-of-the-art in pseudorandomness.

Summary of learning results

Pseudo-derandomizing Approximate Counting

Approximate Counting Q. Interested in integer-valued functions Example: Counting number of solutions to a CSP. Q. Existence of efficient randomized approximation scheme for f Existence of somewhat efficient pseudodeterministic approximation scheme? Different from previous settings: Given a candidate value v, can’t efficiently check if v is the correct number of solutions, i.e. f could be very hard to compute. (as opposed to generating canonical prime)

and w.h.p. over any poly-time samplable distribution of inputs. Example: [JSV04] (0,1)-Permanent has a randomized approximation scheme running in polynomial time. Our contribution: We establish a generic pseudo-derandomization of randomized approximation schemes. Caveat: Pseudo-derandomization is only guaranteed to work on infinitely many input lengths, and w.h.p. over any poly-time samplable distribution of inputs.

An application of our results: Approximate Canonization of AC0[p]

Approximate Canonization (Exact) Canonization of Boolean circuits (informal). Let C be a circuit class. Given a circuit D from C computing a function g, output a canonical form of D that only depends on g. Cannot be done efficiently even for C = DNF, unless P = NP. In some applications, outputting a canonical circuit that 1/poly(n)-approximates g is sufficient. We consider randomized approximate canonizers which output a fixed circuit with high probability.

Some open problems Approximate canonization: Unconditional sub-exponential time algorithm for AC0[p]? Pseudodeterministic pseudorandomness: Better PRGs/HSGs in the pseudodeterministic setting for classes such as DNF? (given seed, produce w.h.p. a fixed output string) Pseudo-derandomization of learning: Is it possible to pseudo-derandomize the [CIKK16] learning algorithm? Circuit complexity of learning: Can we learn AC0[p] in quasi-polynomial size TC0?

Thank you! Pseudo-derandomizing learning and approximation RANDOM 2018 Princeton, 20/August/2018