# Randomized Computation

## Presentation on theme: "Randomized Computation"— Presentation transcript:

Randomized Computation
Slides submitted by Igor Nor , Alex Pomeransky and Shimon Pomeransky Adapted from Ely Porat’s course lecture

Randomized computation
Extend the notion of “efficient computation” by allowing Turing machines to toss coins and run no more than a fixed polynomial in the input length. We consider a language that can be recognized by a polynomial time probabilistic TM (Turing Machine) with good probability as relatively easy.

Randomized computation class types
One-sided error machines (RP, co-RP) Two-sided error machines (BPP) Zero error machines (ZPP) Two-sided error machines without gap restriction (PP) Randomized space complexity classes (RSPACE, badRSPACE)

Random vs. Non-deterministic computations
In an NDTM (Non Deterministic Turing Machine), one accepting computation is enough to cause the input to be accepted. In the randomized model - we require the probability of getting an accepting computation to be larger than a certain threshold to cause the input be accepted.

Two approaches of Randomized computation
There are two ways to define randomized computation: Online approach – enable us to enter randomized steps Offline approach – enables us to use an additional randomizing input and evaluate the output on such random input.

Online approach NDTM: d: (<state>,<symbol>,<guess>) ® (<state>,<move>,<symbol>) Randomized TM: d: (<state>,<symbol>,<toss>) ® (<state>,<move>,<symbol>) The main difference between these two machines is at definition of the accepted language: In NDTM is required to have at least one accepting computation for the input to be accepted. In Randomized TM the probability of accepting computation is a cause for the input to be accepted

Offline approach Another way to look at NDTM is to give him an additional guess input for a witness. In an offline approach of randomized TM there also an additional guess input for the outcome of coin tosses, which can be imagined as the output from an external ‘coin tossing device’.

Offline approach (cont)
A language LNP iff there exists a DTM M and a polynomial p(), s.t. for every xL \$y |y|£p(|x|) and M(x,y) accepts. The DTM has access to a “witness” string of polynomial length. Similarly, a random TM has access to a “guess” input string. But there must be sufficient such witnesses

The classes RP and coRP Definition (RP): A language LRP, iff there exists a probabilistic TM M, such that xL  Prob[M(x)=1] ³ ½ xL  Prob[M(x)=1] = 0 Definition (coRP): A language LcoRP, iff there exists a probabilistic TM exists M, such that xL  Prob[M(x)=1] = 1 xL  Prob[M(x)=0] ³ ½ These classes complement each other: coRP = {~L: LRP }

Comparing RP with NP Let L be a language in NP or in RP.
Let RL be the relation defining the witness/guess for L for a certain TM. Then the definition for NP and RP are: L  NP L  RP xL  \$y s.t. (x,y)RL xL  Prob[(x,r)RL] ³ ½ xL  "y (x,y)RL xL  "r (x,r)RL

Explanations

Claim: RPNP Let L be an arbitrary language in RP. If x  L then there exist a TM M and a coin-tosses y such that M(x,y)=1 in more than ½ coin tosses. We can use the same coin toss y as a witness. If x  L then Probr[M(x,r) = 1] = 0, so there is no witness. ∎

Explanations

Amplification The constant ½ in the definition of RP is an arbitrary.
If we have a probabilistic TM that accepts xL with probability p<½, we can run this TM several times to “amplify” the probability. If xL and we run it n times, then the probability that none of these accepts is Prob[Mn(x)=1] = 1-Prob[Mn(x)¹1] = = 1-Prob[M(x)¹1]n = = 1-(1-Prob[M(x)=1])n = 1-(1-p)n If xL, all runs will return 0.

Explanations

Alternative Definitions for RP (RP1 and RP2)
Definition (RP1): LRP1 iff \$ probabilistic Poly-time TM M and a polynomial p(), s.t. xL  Prob[M(x)=1] ³ xL  Prob[M(x)=1] = 0 Definition (RP2): LRP2 iff \$ probabilistic Poly-time TM M and a polynomial p(), s.t. xL  Prob[M(x)=1] ³ 1-2-p(|x|) xL  Prob[M(x)=1] = 0

Explanations The definitions for RP1 and RP2 seem to be very far from each other, because: RP1 answers correctly with a very small probability (but not negligible) RP2 can almost ignore the probability of it’s mistake. However these two definitions actually define the same class. This implies that having algorithm with a noticeable probability of success implies existence of an efficient algorithm with negligible probability of error.

Lemma: RP1=RP2 RP2RP1 – this direction is trivial.
RP1RP2 – the method of amplification is used for proving this. Suppose LRP1: then there exists M1 s.t. "xL: Prob[M1(x,r)=1] ³ 1/p(|x|) If M2 runs M1 t(|x|) times, then Prob[M2(x,r)=0] £ (1-1/p(|x|))t(|x|) Solving equation (1-1/p(|x|))t(|x|) = 1-2-p(|x|) gives us that t(|x|) ³ p(|x|)2/log2e is polynomial to |x|. Hence, M2 runs in polynomial time and thus LRP2  RP1RP ∎

Explanations RP2RP1: if |x| is long enough, then 1-2-p(|x|) ³ 1/p(|x|). So being in RP2 implies being in RP1 for almost all inputs. Claim: RP= RP1 = RP2 RP2RP : if |x| is long enough, then 1-2-p(|x|) ³ ½. RPRP1 : if |x| is long enough, then ½³ 1/p(|x|). From these two claims, it follows that : RP2RP RP1 . It was proven beforehand that RP1 = RP2 and with RP2RP RP1 , we get that RP= RP1 = RP2 . ∎

The class BPP Definition (BPP): LBPP iff there exists a polynomial-time probabilistic TM M, such that "xL: Prob[M(x)=cL(x)] ³

Explanations We can say that RP is too strict because it ask that the machine has to give 100% correct answer for inputs that are not in the language. We derived the definition of RP from the definition of NP , but NP didn't reflect an actual computational model for search problems but rather a model for verification. But it seems that looking at a two­sided error is more appealing as a model for search problem computations. We want a machine that will recognize the language with high probability, where probability refers to the event “The machine answers correctly on an input x regardless if x  L or x  L”.

Explanations The BPP machine is a machine that makes mistakes but returns the correct answer most of the time. By running the machine a large number of times and returning the majority of the answers we are guaranteed by the law of large numbers that our mistake will be very small. The idea behind the BPP class is that M accept by majority with a noticeable gap between the probability to accept inputs that are in language and the probability to accept inputs that are not in the language, and it's running time is bounded by a polynomial.

Invariance of constant
The in the definition of BPP is, again, an arbitrary constant. Replacing the in the definition by any other constant greater than ½ does not change the class defined.

Explanations In the RP case we had two probability spaces that we could distinguish easily because we had a guarantee that if x  L then the probability to get one is zero, hence if you get M(x) = 1 for some input x, you could say for sure that x  L. In the BPP case, the amplification is less trivial because we have zeroes and ones in both probability spaces. The reason that we can apply amplification in the BPP case is that invoking the machine many times and counting how many times it returns one gives us an estimation on the fraction of ones in the whole probability space.

Invariance of constant(cont)
We consider the following change of constants: xL  Prob[M(x)=1] ³ p+e xL  Prob[M(x)=1] £ p-e for any given p  (0,1) and 0 < e < min{p,1-p}. We can build a machine M2 that runs M n times, and return the majority of results of M. After a constant number of executions (depending on p and e but not on x) we would get the desired probability.

Explanations

The weakest possible BPP definition(BPP1)
Definition (BPP1): LBPP1 iff there exists a polynomial-time computable function f:N®[0,1], a positive polynomial p() and a probabilistic polynomial-time TM M, such that: "xL  Prob[M(x)=1] ³ f(|x|) + 1/p(|x|) "xL  Prob[M(x)=1] £ f(|x|) - 1/p(|x|) Every BPP language satisfies the above condition.

Explanations If we set f(|x|)=½ and p(|x|)=6 we get the original definition. Original definition of BPP is: LBPP iff there exists a polynomial-time probabilistic TM M, such that "xL: Prob[M(x)=cL(x)] ³

Claim: BPP=BPP1 Proof: Let LBPP1 and let M be its computing TM.
Then, for any given input x, we look at the random variable M(x), which is a Bernoulli random variable with unknown parameter p = Exp[M(x)]. Using a well known fact that the expectation of a Bernoulli random variable is exactly the probability to get one we get that p = Prob[M(x) = 1]. So by estimating p we can say something about whether x  L or x  L. The most natural estimator is to take the mean of n samples of the random variable.

Explanations Bernoulli random variable
A Bernoulli trial is an experiment that can result in only one of two possible outcomes, labeled success and failure.  The Bernoulli random variable is defined as X=0 if a failure occurs and X=1 if a success occurs. If p = P(success) then E(X) = p and Var(X) = p(1-p). In our case we use a fact that the expectation of a Bernoulli random variable is exactly the probability to get one(success).

Explanations We will use the known statistical method of confidence intervals on the parameter p. The confidence interval method gives a bound within which a parameter is expected to lie with a certain probability. Interval estimation of a parameter is often useful in observing the accuracy of an estimator as well as in making statistical inferences about the parameter in question.

Claim: BPP=BPP1(cont) In our case we want to know with probability higher than if p  [0, f(|x|) - 1/p(|x|)] or p  [f(|x|) + 1/p(|x|), 1]. This is enough because if p  [0, f(|x|) - 1/p(|x|)] then x  L and if p  [f(|x|) + 1/p(|x|), 1] then x  L. Note that p  (f(|x|) - 1/p(|x|),f(|x|) + 1/p(|x|)) is impossible.

Explanations Because we assumed that LBPP1 then according to BPP1 definition : "xL  Prob[M(x)=1] ³ f(|x|) + 1/p(|x|) "xL  Prob[M(x)=1] £ f(|x|) - 1/p(|x|) We get that : p  (f(|x|) - 1/p(|x|),f(|x|) + 1/p(|x|)) is impossible.

Claim: BPP=BPP1(cont) Define M’(x)  invoke ti=M(x) "i=1…n, compute p=ti/n, if p>f(|x|) return YES else return NO. Notice p is exactly the mean of a sample of size n taken from the random variable M(x).

Explanations This machine does the normal statistical process of estimating a random variable by taking samples and using the mean as an estimator for the expectation. If we will be able to show that with an appropriate n the estimator will not fall too far from the real value with a good probability, it will follow that this machine answers correctly with the same probability. To resolve n we will use Chernoff's inequality.

Claim: BPP=BPP1(cont) Chernoff’s inequality:
If Xi are n independent Bernoulli random variables with the same expectation p£ ½, then "d: 0<d£p(p-1), we have

Claim: BPP=BPP1(cont) By setting and
we get that our Turing machine M’ will decide L(M) with probability greater than , suggesting that L(M)  BPP. ∎

The strongest possible BPP definition(BPP2)
Definition (BPP2): LBPP2 iff there exists a polynomial-time probabilistic TM M and a polynomial p(), such that "xL: Prob[M(x)=cL(x)] ³ 1-2-p(|x|).

Explanations If we set p(|x|)=2 we get the original definition, because ³ Hence, BPP2BPP.

Claim: BPP=BPP2 Let LBPP and let M be its computing TM.
We can amplify the probability of right answer by invoking M many times and taking the majority of it's answers. Define: M’(x)  invoke ti=M(x) "i=1…n, compute p=ti/n, if p > ½ return YES else return NO.

Claim: BPP=BPP2(cont) From the original definition, we get that if we know that Exp[M(x)] > ½ it follows that x  L and if we know that Exp[M(x)] < ½ it follows that x  L. But original definition gives us more. It says that the expectation of M(x) is bounded away from ½ so we can use the confidence interval method.

Explanations Original definition of BPP is:
LBPP iff there exists a polynomial-time probabilistic TM M, such that "xL: Prob[M(x)=cL(x)] ³ The confidence interval method gives a bound within which a parameter is expected to lie with a certain probability.

Claim: BPP=BPP2(cont) From Chernoff’s inequality we get that:
Prob[|M’(x)-Exp[M(x)]|£ ] ³ 1-2e-n/18 But if |M’(x)-Exp[M(x)]| < we get that the answer of M’ is correct, because it is close enough to the the expectation of M(x) which is guaranteed to be above when x  L and below when x  L.

Claim: BPP=BPP2(cont) So we get that: Prob[M’(x)=cL(x)] ³ 1-2e-n/18.
If we choose n=18ln(2)p(|x|) we get the desired probability. ∎

Explanations We see that a gap of 1/p(|x|) and a gap of 1- 2-p(|x|) which look like “weak” and “strong” versions of BPP are the same. As shown above the “weak” version is actually equivalent to the “strong” version, and both are equivalent to the original definition of BPP.

RP BPP – one side error is a special case of two sided error. It’s not known if BPP NP. It might be so but we don't get it from the definition like we did in RP. coBPP = BPP from the symmetry of the definition of BPP. And of course P  BPP.

The class PP Definition (PP): probabilistic TM M s.t.
LPP iff exists a polynomial-time probabilistic TM M s.t. xL  Prob[M(x)=1] > ½ xL  Prob[M(x)=1] < ½

Explanations The class PP is wider than what we have seen so far.
In the BPP case we had a gap between the number of accepting computations and non­accepting computations. This gap enabled us to determine with good probability if x  L or x  L. The gap was wide enough so we could invoke the machine polynomially many times and notice the difference between inputs that are in the language and inputs that are not in the language. The PP class don't put the gap restriction, hence the gap may be very small (even one guess can make a difference), so running the machine polynomially many times may not help.

Claim: PP  PSPACE Let LPP, M be the TM that recognizes L
and p(|x|) be the time bound. Define M’(x)  run M(x) using all p(|x|) possible coin tosses and decide by majority. M’ uses polynomial space for each of the invocations and answers correctly since M is a PP machine that answers correctly for more than half of the guesses.

The class PP1 Definition (PP1): probabilistic TM M s.t.
L PP1 iff exists a polynomial-time probabilistic TM M s.t. xL  Prob[M(x)=1] > ½ xL  Prob[M(x)=1] £ ½ Note: PP  PP1.

Explanations The idea of class PP1: since we cannot add the equal
value to both xL and xL cases (because it will not give us information – any coin flipping will satisfy the definition), we want to try it in xL case only , and to show that it will not change the class of languages. Obviously , PP  PP1 since if exists a machine that satisfies PP definition , it clearly satisfies PP1 definition.

Claim: PP1  PP Proof : and p(|x|) be the time bound.
Let LPP1, M be the TM that recognizes L and p(|x|) be the time bound. M’(x,(a1,a2,...,ap(|x|),b1,b2,...,bp(|x|)))  if ( a1=a2=...=ap(|x|)=1 ) return NO else return M(x,(b1,b2,...,bp(|x|)))

Explanations We define MM’ that choose one of two possible moves. One move, which happens with probability 2-p(|x|)+1 , will return NO. The second move , which happens with probability 1-2-p(|x|)+1 , will invoke M with independent coin tosses. Note: M' is exactly the same as M, except for an additional probability of 2-p(|x|)+1 for returning NO.

Claim: PP1  PP (cont) Thus PP1  PP  PP1 = PP ∎
xL  Prob[M(x)=1] > ½  Prob[M(x)=1] ³ ½+2-p(|x|)+1  Prob[M’(x)=1] ³ (½+2-p(|x|))(1-2-p(|x|)+1) > ½ xL  Prob[M(x)=0] ³ ½  Prob[M’(x)=1] ³ ½(1-2-p(|x|)+1)+2-p(|x|)+1 > ½ Thus PP1  PP  PP1 = PP ∎

Claim: NPPP Let LNP, M be the NDTM that recognizes L, and p(|x|) be the time bound. M’(x,(b1,b2,...,bp(|x|)))  if ( b1=1 ) return M(x,(b2,...,bp(|x|))) else return YES

Explanations We define MM’ that uses it's random
coin-tosses as a witness to M with only one toss that it does not pass to M’. This toss is used to choose it's move. One of the two possible moves gets it to the ordinary computation of M with the same input (and the witness is the random input). The other choice gets it to the computation that always accepts.

Claim: NPPP (cont) Thus NP  PP1. Note: coNP  PP because of symmetry
xL  Prob[M’(x)=1]>½ (a witness exists). xL  Prob[M’(x)=1]=½ (b1=1 only). Thus NP  PP1. Since PP1 = PP, NP  PP ∎ Note: coNP  PP because of symmetry of PP definition.

The class ZPP Definition (ZPP): LZPP iff there exists a polynomial-time probabilistic TM M, s.t. xL: M(x)={0,1, ^}, Prob[M(x)=^] £ ½, and Prob[M(x)= c(x) or M(x)=^] = 1 ( then Prob[M(x)= c(x) ]>½ ). The symbol ^ means “I don’t know”. The value ½ is arbitrary and can be replaced by 2-p(|x|) or 1-1/p(|x|).

Explanations The idea of class ZPP: we want to let the machine say
“I don’t know” for some inputs. We define the machine that will never give a wrong answer - it will say “I don’t know” if it is not sure. Same time , this machine will define class , that will be equal (as we will prove) to RPcoRP.

Claim: ZPP = RP  coRP Let LZPP, M be a polynomial-time
probabilistic TM recognizes L. let b=M(x). M’(x)  if ( b = ^ ) return 0 else return b

Claim: ZPP = RP  coRP (cont)
xL  Prob[M’(x)=1]>½. xL  M’(x) will never return 1. Thus ZPP  RP. Note: The proof that ZPP is in coRP is similar ∎

Explanations We define MM’ so that for xL the probability of getting the right answer with M’ is greater than ½ and M’s definite answer are always correct (it never return a wrong answer because it returns ^ when it is uncertain). In case that xL , by returning 0 when M(x) = ^ we will always answer correctly , because in this case M(x)!=^  M’(x)= c(x)  M’(x)= 0.

Randomized space complexity
Definition (RSPACE): LRSPACE(S) iff there exists a randomized TM M s.t. "x  {0,1}* xL  Prob[M(x)=1] ³ ½ xL  Prob[M(x)=1] = 0 and M uses at most S(|x|) space and runs at most exp(S(|x|)) time.

Randomized space complexity
Definition (badRSPACE): LbadRSPACE(S) iff there exists a randomized TM M s.t. " x  {0, 1}* xL  Prob[M(x)=1] ³ ½ xL  Prob[M(x)=1] = 0 and M uses at most S(|x|) space

Explanations Note: RSPACE machine has both
space and time restriction , while badRSPACE has no time restriction.

Let LbadRSPACE(S). xL  x NSPACE(S). xL  x NSPACE(S). Thus badRSPACE(S)  NSPACE(S)

Explanations If xL , there are many witnesses
( but one is enough). Thus x NSPACE(S). On other hand , if xL , there are no witness , thus x NSPACE(S).

Let LNSPACE(S), and M be the NDTM for L, which decides L in space S(|x|). For every x  L, there an accepting computation in exp(S(|x|)) time  \$r , |r| < exp(S(|x|)) s.t. M(x,r)=1. Thus , for uniform distributed random r: Prob[M(x,r)=1] ³ exp(S(|x|)) .

Idea: running M 2exp(s|x|) times will guarantee that one of these runs should find the right r. Problem: counting 2exp(s|x|) runs needs exp(S(|x|)) space ( which we don’t have! )