Chernoff al contrario


31

Esiste un limite di Chernoff inverso che limita che la probabilità di coda è almeno così grande.

cioè se X 1 , X 2 , , X nX1,X2,,Xn sono variabili casuali binomiali indipendenti e μ = E [ n i = 1 X i ]μ=E[ni=1Xi] . Quindi possiamo dimostrare P r [ n i = 1 X i( 1 + δ ) μ ] f ( μ , δ , n )Pr[ni=1Xi(1+δ)μ]f(μ,δ,n) per alcune funzioni ff .


1
Il vostro esempio è chiedere troppo: con p = n - 2 / 3p=n2/3 , uno standard Chernoff legato spettacoli che Pr [ | T S 1 | 1.1 n 1 / 3 ]Pr[|TS1|1.1n1/3]ePr[| TS2| 1.1n 1 / 3 ]Pr[|TS2|1.1n1/3]sono al massimoexp(-cn 1 / 3 )exp(cn1/3)per un certocc.
Colin McQuillan,

Hai ragione, mi sono confuso su quale termine in Chernoff rilegato abbia il quadrato. Ho modificato la domanda per riflettere un limite più debole. Non penso che mi aiuterà nella mia attuale applicazione, ma potrebbe essere interessante per altri motivi.
Ashwinkumar BV,

Risposte:


28

Ecco una prova esplicita che un limite Chernoff standard è strettamente legato a fattori costanti nell'esponente per un determinato intervallo di parametri. (In particolare, se le variabili sono 0 o 1, e 1 con probabilità 1/2 o meno, e ε ( 0 , 1 / 2 )ϵ(0,1/2) , e il Chernoff limite superiore è inferiore a una costante.)

Se trovi un errore, per favore fammi sapere.

Lemma 1. (ermeticità del limite di Chernoff) Sia XX la media di kk variabili casuali indipendenti 0/1 (rv). Per ogni ε ( 0 , 1 / 2 ]ϵ(0,1/2] e p ( 0 , 1 / 2 ]p(0,1/2] , assumendo ε 2 p k3ϵ2pk3 ,

(i) Se ogni rv è 1 con probabilità al massimo pp , allora Pr [ X ( 1 - ϵ ) p ] exp ( - 9 ϵ 2 p k )   .

Pr[X(1ϵ)p]  exp(9ϵ2pk).

(ii) Se ogni rv è 1 con probabilità almeno pp , allora Pr [ X ( 1 + ϵ ) p ] exp ( - 9 ϵ 2 p k )   .

Pr[X(1+ϵ)p]  exp(9ϵ2pk).

Prova. Usiamo la seguente osservazione:

Rivendicazione 1. Se 1 k - 1 , allora ( k1k1 )1  e 2 π (k )(kk - ) k-(k)  1e2π(k)(kk)k

Prova del reclamo 1. Con l'approssimazione di Stirling, io ! = 2 π i (i/e)ieλdoveλ[1/(12i+1),1/12ii!=2πi(i/e)ieλ ] .λ[1/(12i+1),1/12i].

Pertanto, ( k ) , che è k!(k)! ( k - ) ! , è almeno k!!(k)!2 π k( ke )k2 π ( e )  2π(k)(ke)kexp(112k+1112112(k))

2πk(ke)k2π(e)  2π(k)(ke)kexp(112k+1112112(k))
  12π(k)(kk)ke1.
  12π(k)(kk)ke1.
QED

Proof of Lemma 1 Part (i). Without loss of generality assume each 0/1 random variable in the sum XX is 1 with probability exactly pp. Note Pr[X(1ϵ)p]Pr[X(1ϵ)p] equals the sum (1ϵ)pki=0Pr[X=i/k](1ϵ)pki=0Pr[X=i/k], and Pr[X=i/k]=(ki)pi(1p)kiPr[X=i/k]=(ki)pi(1p)ki.

Fix =(12ϵ)pk+1=(12ϵ)pk+1. The terms in the sum are increasing, so the terms with index ii each have value at least Pr[X=/k]Pr[X=/k], so their sum has total value at least (ϵpk2)Pr[X=/k](ϵpk2)Pr[X=/k]. To complete the proof, we show that (ϵpk2)Pr[X=/k]  exp(9ϵ2pk).

(ϵpk2)Pr[X=/k]  exp(9ϵ2pk).

The assumptions ϵ2pk3ϵ2pk3 and ϵ1/2ϵ1/2 give ϵpk6ϵpk6, so the left-hand side above is at least 23ϵpk(k)p(1p)k23ϵpk(k)p(1p)k. Using Claim 1, to bound (k)(k), this is in turn at least ABAB where A=23eϵpk/2πA=23eϵpk/2π and B=(k)(kk)kp(1p)k.B=(k)(kk)kp(1p)k.

To finish we show Aexp(ϵ2pk)Aexp(ϵ2pk) and Bexp(8ϵ2pk)Bexp(8ϵ2pk).

Claim 2. Aexp(ϵ2pk)Aexp(ϵ2pk)

Proof of Claim 2. The assumptions ϵ2pk3ϵ2pk3 and ϵ1/2ϵ1/2 imply (i) pk12pk12.

By definition, pk+1pk+1. By (i), pk12pk12. Thus, (ii) 1.1pk1.1pk.

Substituting the right-hand side of (ii) for in AA gives (iii) A23eϵpk/2.2πA23eϵpk/2.2π.

The assumption, ϵ2pk3ϵ2pk3, implies ϵpk3ϵpk3, which with (iii) gives (iv) A23e3/2.2π0.1A23e3/2.2π0.1.

From ϵ2pk3ϵ2pk3 it follows that (v) exp(ϵ2pk)exp(3)0.04exp(ϵ2pk)exp(3)0.04.

(iv) and (v) together give the claim. QED

Claim 3. Bexp(8ϵ2pk)Bexp(8ϵ2pk).

Proof of Claim 3. Fix δδ such that =(1δ)pk=(1δ)pk.
The choice of implies δ2ϵδ2ϵ, so the claim will hold as long as Bexp(2δ2pk)Bexp(2δ2pk). Taking each side of this latter inequality to the power 1/1/ and simplifying, it is equivalent to pk(k(1p)k)k/1  exp(2δ2pk).

pk(k(1p)k)k/1  exp(2δ2pk).
Substituting =(1δ)pk=(1δ)pk and simplifying, it is equivalent to (1δ)(1+δp1p)1(1δ)p1  exp(2δ21δ).
(1δ)(1+δp1p)1(1δ)p1  exp(2δ21δ).
Taking the logarithm of both sides and using ln(1+z)zln(1+z)z twice, it will hold as long as δ+δp1p(1(1δ)p1)  2δ21δ.
δ+δp1p(1(1δ)p1)  2δ21δ.
The left-hand side above simplifies to δ2/(1p)(1δ)δ2/(1p)(1δ), which is less than 2δ2/(1δ)2δ2/(1δ) because p1/2p1/2. QED

Claims 2 and 3 imply ABexp(ϵ2pk)exp(8ϵ2pk)ABexp(ϵ2pk)exp(8ϵ2pk). This implies part (i) of the lemma.

Proof of Lemma 1 Part (ii). Without loss of generality assume each random variable is 11 with probability exactly pp.

Note Pr[X(1+ϵ)p]=ni=(1ϵ)pkPr[X=i/k]Pr[X(1+ϵ)p]=ni=(1ϵ)pkPr[X=i/k]. Fix ˆ=(1+2ϵ)pk1^=(1+2ϵ)pk1.

The last ϵpkϵpk terms in the sum total at least (ϵpk2)Pr[X=ˆ/k](ϵpk2)Pr[X=^/k], which is at least exp(9ϵ2pk)exp(9ϵ2pk). (The proof of that is the same as for (i), except with replaced by ˆ^ and δδ replaced by ˆδδ^ such that ˆ=(1+ˆδ)pk^=(1+δ^)pk.) QED


Several [math processing error]s -- any chance of fixing them?
Aryeh

Those math expressions used to display just fine. For some reason the \choose command is not working in mathjax. Neither is \binom. E.g. $a \choose b$ gives (ab)(ab). Presumably this is a bug in the mathjax configuration. Hopefully it will be fixed soon. Meanwhile see Lemma 5.2 in the appendix of arxiv.org/pdf/cs/0205046v2.pdf or cs.ucr.edu/~neal/Klein15Number.
Neal Young

22

The Berry-Esseen theorem can give tail probability lower bounds, as long as they are higher than n1/2n1/2.

Another tool you can use is the Paley-Zygmund inequality. It implies that for any even integer kk, and any real-valued random variable XX,

Pr[|X|>=12(E[Xk])1/k]E[Xk]24E[X2k]

Pr[|X|>=12(E[Xk])1/k]E[Xk]24E[X2k]

Together with the multinomial theorem, for XX a sum of nn rademacher random variables Paley-Zygmund can get you pretty strong lower bounds. Also it works with bounded-independence random variables. For example you easily get that the sum of nn 4-wise independent ±1±1 random variables is Ω(n)Ω(n) with constant probability.


14

If you are indeed okay with bounding sums of Bernoulli trials (and not, say, bounded random variables), the following is pretty tight.

Slud's Inequality*. Let {Xi}ni=1{Xi}ni=1 be i.i.d. draws from a Bernoulli r.v. with E(X1)=pE(X1)=p, and let integer knkn be given. If either (a) p1/4p1/4 and npknpk, or (b) npkn(1p)npkn(1p), then Pr[iXik]1Φ(knpnp(1p)),

Pr[iXik]1Φ(knpnp(1p)),
where ΦΦ is the cdf of a standard normal.

(Treating the argument to ΦΦ as transforming the standard normal, this agrees exactly with what the CLT tells you; in fact, it tells us that Binomials satisfying the conditions of the theorem will dominate their corresponding Gaussians on upper tails.)

From here, you can use bounds on ΦΦ to get something nicer. For instance, in Feller's first book, in the section on Gaussians, it is shown for every z>0z>0 that z1+z2φ(z)<1Φ(z)<1zφ(z),

z1+z2φ(z)<1Φ(z)<1zφ(z),
where φφ is the density of a standard normal. There are similar bounds in the Wikipedia article for "Q-function" as well.

Other than that, and what other people have said, you can also try using the Binomial directly, perhaps with some Stirling.

(*) Some newer statements of Slud's inequality leave out some of these conditions; I've reproduced the one in Slud's paper.


7

The de Moivre-Laplace Theorem shows that variables like |TS1||TS1|, after being suitably normalised and under certain conditions, will converge in distribution to a normal distribution. That's enough if you want constant lower bounds.

For lower bounds like ncnc, you need a slightly finer tool. Here's one reference I know of (but only by accident - I've never had the opportunity to use such an inequality myself). Some explicit lower bounds on tail probabilities of binomial distributions are given as Theorem 1.5 the book Random graphs by Béla Bollobás, Cambridge, 2nd edition, where further references are given to An introduction to probability and its applications by Feller and Foundations of Probability by Rényi.


4

The Generalized Littlewood-Offord Theorem isn't exactly what you want, but it gives what I think of as a "reverse Chernoff" bound by showing that the sum of random variables is unlikely to fall within a small range around any particular value (including the expectation). Perhaps it will be useful.

Formally, the theorem is as follows.

Generalized Littlewood-Offord Theorem: Let a1,,ana1,,an, and s>0s>0 be real numbers such that |ai|s|ai|s for 1in1in and let X1,,XnX1,,Xn be independent random variables that have values zero and one. For 0<p120<p12, suppose that pPr[Xi=0]1ppPr[Xi=0]1p for all 1in1in. Then, for any rRrR, Pr[rni=1aiXi<r+s]cpn

Pr[ri=1naiXi<r+s]cpn
Where cpcp is a constant depending only on pp.

3
It may be helpful to others to know that this type of result is also known as a "small ball inequality" and Nguyen and Vu have a terrific survey people.math.osu.edu/nguyen.1261/cikk/LO-survey.pdf. My perspective here slightly differs from yours. I think of a "reverse Chernoff" bound as giving a lower estimate of the probability mass of the small ball around 0. I think of a small ball inequality as qualitatively saying that the small ball probability is maximized by the ball at 0. In this sense reverse Chernoff bounds are usually easier to prove than small ball inequalities.
Sasho Nikolov

3

The exponent in the standard Chernoff bound as it is stated on Wikipedia is tight for 0/1-valued random variables. Let 0<p<10<p<1 and let X1,X2,X1,X2, be a sequence of independent random variables such that for each ii, Pr[Xi=1]=pPr[Xi=1]=p and Pr[Xi=0]=1pPr[Xi=0]=1p. Then for every ε>0ε>0, 2D(p+εp)nn+1Pr[ni=1Xi(p+ε)n]2D(p+εp)n.

2D(p+εp)nn+1Pr[i=1nXi(p+ε)n]2D(p+εp)n.

Here, D(xy)=xlog2(x/y)+(1x)log2((1x)/(1y))D(xy)=xlog2(x/y)+(1x)log2((1x)/(1y)), which is the Kullback-Leibler divergence between Bernoulli random variables with parameters xx and yy.

As mentioned, the upper bound in the inequality above is proved on Wikipedia (https://en.wikipedia.org/wiki/Chernoff_bound) under the name "Chernoff-Hoeffding Theorem, additive form". The lower bound can be proved using e.g. the "method of types". See Lemma II.2 in [1]. Also, this is covered in the classic textbook on information theory by Cover and Thomas.

[1] Imre Csiszár: The Method of Types. IEEE Transactions on Information Theory (1998). http://dx.doi.org/10.1109/18.720546


It is also worth noting that D(p+δpp)=p22pδ2+O(δ3)D(p+δpp)=p22pδ2+O(δ3), and for common case of p=1/2p=1/2 it is 12δ2+O(δ4)12δ2+O(δ4). This shows that when δ=O(n1/3) the typical eCδ2 bound is sharp. (And when δ=O(n1/4) for p=1/2).
Thomas Ahle
Utilizzando il nostro sito, riconosci di aver letto e compreso le nostre Informativa sui cookie e Informativa sulla privacy.
Licensed under cc by-sa 3.0 with attribution required.