Dove si trova


36

Una versione molto semplice del teorema centrale limitato come di seguito

n((1ni=1nXi)μ) d N(0,σ2)
che è Lindeberg – Lévy CLT. Non capisco perché c'è unn sul lato sinistro. E Lyapunov CLT dice
1sni=1n(Xiμi) d N(0,1)
ma perché nosn ? Qualcuno potrebbe dirmi quali sono questi fattori, comen e1sn ? come li inseriamo nel teorema?

3
Questo è spiegato su stats.stackexchange.com/questions/3734 . Questa risposta è lunga, perché richiede "intuizione". Conclude, "Questa semplice approssimazione, tuttavia, suggerisce come in origine Moivre avrebbe potuto sospettare l'esistenza di una distribuzione limitante universale, che il suo logaritmo sia una funzione quadratica e che il fattore di scala corretto sn debba essere proporzionale a n ... ".
whuber

1
Intuitivamente, se tutto σi=σ allora e la 2a riga segue dalla 1a riga: sn=σi2=nσ dividere per σ = s n
n((1ni=1nXi)μ)=1ni=1n(Xiμ)d N(0,σ2)
1σ=snn (ovviamente la condizione di Lyapunov, combinazione di tutto σ i , è un'altra domanda)
1ni=1n(Xiμ)snn=1sni=1n(Xiμi)d N(0,1)
σi
Sisto Empirico

Risposte:


33

Bella domanda (+1) !!

Ricorderete che per variabili casuali indipendenti e Y , V a r ( X + Y ) = V a r ( X ) + V a r ( Y ) e V a r ( a X ) = a 2V a r ( X ) . Quindi la varianza di n i = 1 X i èXYVar(X+Y)=Var(X)+Var(Y)Var(aX)=a2Var(X)i=1nXie la varianza di ˉ X =1i=1nσ2=nσ2ènσ2/n2=σ2/n.X¯=1ni=1nXinσ2/n2=σ2/n

Questo è per la varianza . Per standardizzare una variabile casuale, la dividi per la sua deviazione standard. Come sapete, il valore atteso di è μ , quindi la variabileX¯μ

X¯E(X¯)Var(X¯)=nX¯μσ
has expected value 0 and variance 1. So if it tends to a Gaussian, it has to be the standard Gaussian . La tua formulazione nella prima equazione è equivalente. Moltiplicando il lato sinistro per σN(0,1)σ you set the variance to σ2.

Per quanto riguarda il tuo secondo punto, credo che l'equazione mostrata sopra mostri che devi dividere per σ and not per standardizzare l'equazione, spiegando perché usisn(lo stimatore diσσsnσ) and not sn.

aggiunta: @whuber suggests to discuss the why of the scaling by n. He does it there, but because the answer is very long I will try to capture the essense of his argument (which is a reconstruction of de Moivre's thoughts).

Se aggiungi un numero elevato di + 1 e -1, puoi approssimare la probabilità che la somma sarà j con il conteggio elementare. Il registro di questa probabilità è proporzionale a - j 2 / n . Quindi se vogliamo che la probabilità sopra riportata converga in una costante quando n diventa grande, dobbiamo usare un fattore di normalizzazione innjj2/nnO(n).

Using modern (post de Moivre) mathematical tools, you can see the approximation mentioned above by noticing that the sought probability is

P(j)=(nn/2+j)2n=n!2n(n/2+j)!(n/2j)!

which we approximate by Stirling's formula

P(j)nnen/2+jen/2j2nen(n/2+j)n/2+j(n/2j)n/2j=(11+2j/n)n+j(112j/n)nj.

log(P(j))=(n+j)log(1+2j/n)(nj)log(12j/n)2j(n+j)/n+2j(nj)/nj2/n.

Si prega di vedere i miei commenti alle risposte precedenti di Michael C. e ragazzo.
whuber

n((1ni=1nXi)μ) d N(0,1)? That confused me as well that σ2 appeared as the variance.
B_Miner

If you parametrize the Gaussian with mean and variance (not standard deviation) then I believe OP's formula is correct.
gui11aume

1
Ahh..Given that X¯E(X¯)Var(X¯)=nX¯μσd N(0,1) if we multiply X¯E(X¯)Var(X¯) by σ we get what was shown by the OP (σ cancel): namely n((1ni=1nXi)μ). But we know that VAR(aX) = a^2Var(X) where in this case a= σ2 and Var(X) is 1 so the distribution is N(0,σ2).
B_Miner

Gui,If not too late I wanted to make sure I had this correct. If we assume X¯E(X¯)Var(X¯)=n(X¯μ)d N(0,1) and we multiply by a constant (σ), the expected value of this quantity (i.e. n(X¯μ)), which was zero is still zero as E[aX]=a*E[X] => σ*0=0. Is this correct?
B_Miner

8

There is a nice theory of what kind of distributions can be limiting distributions of sums of random variables. The nice resource is the following book by Petrov, which I personally enjoyed immensely.

It turns out, that if you are investigating limits of this type

1ani=1nXnbn,(1)
where Xi are independent random variables, the distributions of limits are only certain distributions.

There is a lot of mathematics going around then, which boils to several theorems which completely characterizes what happens in the limit. One of such theorems is due to Feller:

Theorem Let {Xn;n=1,2,...} be a sequence of independent random variables, Vn(x) be the distribution function of Xn, and an be a sequence of positive constant. In order that

max1knP(|Xk|εan)0, for every fixed ε>0

and

supx|P(an1k=1nXk<x)Φ(x)|0

it is necessary and sufficient that

k=1n|x|εandVk(x)0 for every fixed ε>0,

an2k=1n(|x|<anx2dVk(x)(|x|<anxdVk(x))2)1

and

an1k=1n|x|<anxdVk(x)0.

This theorem then gives you an idea of what an should look like.

The general theory in the book is constructed in such way that norming constant is restricted in any way, but final theorems which give necessary and sufficient conditions, do not leave any room for norming constant other than n.


4

sn represents the sample standard deviation for the sample mean. sn2 is the sample variance for the sample mean and it equals Sn2/n. Where Sn2 is the sample estimate of the population variance. Since sn =Sn/√n that explains how √n appears in the first formula. Note there would be a σ in the denominator if the limit were

N(0,1) but the limit is given as N(0, σ2). Since Sn is a consistent estimate of σ it is used in the secnd equation to taken σ out of the limit.


What about the other (more basic and important) part of the question: why sn and not some other measure of dispersion?
whuber

@whuber That may be up for discussion but it was not part of the question. The OP just wanted to known why sn and √n appear in the formula for the CLT. Of course Sn is there because it is consistent for σ and in that form of the CLT σ is removed.
Michael R. Chernick

1
To me it's not at all clear that sn is present because it is "consistent for σ". Why wouldn't that also imply, say, that sn should be used to normalize extreme-value statistics (which would not work)? Am I missing something simple and self-evident? And, to echo the OP, why not use sn--after all, that is consistent for σ!
whuber

The theorem as stated has convergence to N(0,1), so to accomplish that you either have to know σ and use it or use a consistent estimate of it which works by Slutsky's theorem I think. Was I that unclear?
Michael R. Chernick

I don't think you were unclear; I just think that an important point may be missing. After all, for many distributions we can obtain a limiting normal distribution by using the IQR instead of sn--but then the result is not as neat (the SD of the limiting distribution depends on the distribution we begin with). I'm just suggesting that this deserves to be called out and explained. It will not be quite as obvious to someone who does not have the intuition developed by 40 years of standardizing all the distributions they encounter!
whuber

2

Intuitively, if ZnN(0,σ2) for some σ2 we should expect that Var(Zn) is roughly equal to σ2; it seems like a pretty reasonable expectation, though I don't think it is necessary in general. The reason for the n in the first expression is that the variance of X¯nμ goes to 0 like 1n and so the n is inflating the variance so that the expression just has variance equal to σ2. In the second expression, the term sn is defined to be i=1nVar(Xi) while the variance of the numerator grows like i=1nVar(Xi), so we again have that the variance of the whole expression is a constant (1 in this case).

Essentially, we know something "interesting" is happening with the distribution of X¯n:=1niXi, but if we don't properly center and scale it we won't be able to see it. I've heard this described sometimes as needing to adjust the microscope. If we don't blow up (e.g.) X¯μ by n then we just have X¯nμ0 in distribution by the weak law; an interesting result in it's own right but not as informative as the CLT. If we inflate by any factor an which is dominated by n, we still get an(X¯nμ)0 while any factor an which dominates n gives an(X¯nμ). It turns out n is just the right magnification to be able to see what is going on in this case (note: all convergence here is in distribution; there is another level of magnification which is interesting for almost sure convergence, which gives rise to the law of iterated logarithm).


4
A more fundamental question, which ought to be addressed first, is why the SD is used to measure dispersion. Why not the absolute central kth moment for some other value of k? Or why not the IQR or any of its relatives? Once that is answered, then simple properties of covariance immediately give the n dependence (as @Gui11aume has recently explained.)
whuber

1
@whuber I agree, which is why I presented this as heuristic. I'm not certain it is amenable to a simple explanation, though I'd love to hear one. For me I'm not sure that I have a simpler, explainable reason past "because the square term is the relevant term in the Taylor expansion of the characteristic function once you subtract off the mean."
guy
Utilizzando il nostro sito, riconosci di aver letto e compreso le nostre Informativa sui cookie e Informativa sulla privacy.
Licensed under cc by-sa 3.0 with attribution required.