Qual è l'intuizione alla base della definizione di completezza in una statistica come impossibile da cui ricavare uno stimatore imparziale da

21

Nelle statistiche classiche, esiste una definizione secondo cui una statistica $T$ di un insieme di dati $y_1, \ldots, y_n$ è definita come completa per un parametro $\theta$ è impossibile formare uno stimatore imparziale di $0$ da esso non banalmente. Cioè, l'unico modo per avere $E h(T (y )) = 0$ per tutti $\theta$ è avere $h$ essere $0$ quasi certamente.

C'è un'intuizione dietro questo? Sembra un modo piuttosto meccanico di definirlo, sono consapevole che questo è stato chiesto prima, ma mi chiedevo se ci fosse un'intuizione molto facile da capire che avrebbe reso gli studenti introduttivi più facili a digerire il materiale.

— user1398057
fonte

2

Questa è un'ottima domanda, ho dovuto scavare da solo. Si scopre che la ragione per cui è una tale definizione meccanica e non appare intuitivamente significativa per un praticante standard come me è che viene principalmente utilizzato per dimostrare contributi fondamentali nelle statistiche matematiche. In particolare, la mia breve ricerca ha rivelato che il teorema di Lehmann-Scheffé e il teorema di Basu richiedono completezza di una statistica per essere validi . Questi sono contributi della metà degli anni '50. Non posso offrirti una spiegazione intuitiva - ma se vuoi davvero costruirne una, forse le prove associate

— Jeremias K,

18

Proverò ad aggiungere all'altra risposta. Innanzitutto, la completezza è una condizione tecnica che è principalmente giustificata dai teoremi che la utilizzano. Cominciamo quindi con alcuni concetti e teoremi correlati in cui si verificano.

Sia $X=(X_1,X_2,\dotsc,X_n)$ rappresentino un vettore di dati iid, che modelliamo come avente una distribuzione $f(x;\theta), \theta \in \Theta$ dove il parametro $\theta$ governa i dati è sconosciuto. $T=T(X)$ è sufficiente se la distribuzione condizionale di $X \mid T$ non dipende dal parametro $\theta$ . $V=V(X)$ èaccessoriose la distribuzione di $V$ non dipende da $\theta$ (all'interno della famiglia $f(x;\theta)$ ). $U=U(X)$ è unostimatore imparziale di zerose la sua aspettativa è zero, indipendentemente da $\theta$ . $S=S(X)$ è unastatistica completase qualsiasi stimatore imparziale di zero basato su $S$ è identicamente zero, cioè se $\DeclareMathOperator{\E}{\mathbb{E}} \E g(S)=0 (\text{for all $\theta$})$ quindi $g(S)=0$ ae (per tutti $\theta$ ).

Supponiamo ora di avere due diversi stimatori imparziali di $\theta$ basati sulla statistica sufficiente $T$ , $g_1(T), g_2(T)$ . Cioè, nei simboli

E g_{1} (T) = θ, E g_{2} (T) = θ

$\E g_1(T)=\theta ,\\ \E g_2(T)=\theta$ e

P (g_{1} (T) \neq g_{2} (T)) > 0

$\DeclareMathOperator{\P}{\mathbb{P}} \P(g_1(T) \not= g_2(T) ) > 0$ (per tutto

θ

$\theta$ ). Quindi

g_{1} (T) - g_{2} (T)

$g_1(T)-g_2(T)$ è uno stimatore imparziale di zero, che non è identicamente zero, dimostrando che

T

$T$ non è completo. Quindi, la completezza di una statistica

T

$T$ sufficienteci dà che esiste un solo stimatore imparziale unico di

θ

$\theta$ based on

T

$T$ . That is already very close to the Lehmann–Scheffé theorem.

Vediamo alcuni esempi. Supponiamo che $X_1, \dotsc, X_n$ ora siano uniforme sull'intervallo $(\theta, \theta+1)$ . Possiamo mostrare che ( $X_{(1)} < X_{(2)} < \dotsm < X_{(n)}$ è la statistica dell'ordine) la coppia $(X_{(1)}, X_{(n)})$ è sufficiente, ma non è completa, perché il differenza $X_{(n)}-X_{(1)}$ è accessorio, possiamo calcolare la sua aspettativa, lasciare che sia $c$ (che è solo una funzione di $n$ ), e quindi $X_{(n)}-X_{(1)} -c$ sarà uno stimatore imparziale di zero che non è identicamente zero. Quindi la nostra statistica sufficiente, in questo caso, non è completa e sufficiente. E possiamo vedere cosa significhi: esistono funzioni della statistica sufficiente che non sono informative su $\theta$ (in the context of the model). This cannot happen with a complete sufficient statistic; it is in a sense maximally informative, in that no functions of it are uninformative. On the other hand, if there is some function of the minimally sufficient statistic that has expectation zero, that could be seen as a noise term, disturbance/noise terms in models have expectation zero. So we could say that non-complete sufficient statistics do contain some noise.

Look again at the range $R=X_{(n)}-X_{(1)}$ in this example. Since its distribution does not depend on $\theta$ , it doesn't by itself alone contain any information about $\theta$ . But, together with the sufficient statistic, it does! How? Look at the case where $R=1$ is observed.Then, in the context of our (known to be true) model, we have perfect knowledge of $\theta$ ! Namely, we can say with certainty that $\theta = X_{(1)}$ . You can check that any other value for $\theta$ then leads to either $X_{(1)}$ or $X_{(n)}$ being an impossible observation, under the assumed model. On the other hand, if we observe $R=0.1$ , then the range of possible values for $\theta$ is rather large (exercise ...).

In this sense, the ancillary statistic $R$ does contain some information about the precision with which we can estimate $\theta$ based on this data and model. In this example, and others, the ancillary statistic $R$ "takes over the role of the sample size". Usually, confidence intervals and such needs the sample size $n$ , but in this example, we can make a conditional confidence interval this is computed using only $R$ , not $n$ (exercise.) This was an idea of Fisher, that inference should be conditional on some ancillary statistic.

Now, Basu's theorem: If $T$ is complete sufficient, then it is independent of any ancillary statistic. That is, inference based on a complete sufficient statistic is simpler, in that we do not need to consider conditional inference. Conditioning on a statistic which is independent of $T$ does not change anything, of course.

Then, a last example to give some more intuition. Change our uniform distribution example to a uniform distribution on the interval $(\theta_1, \theta_2)$ (with $\theta_1<\theta_2$ ). In this case the statistic $(X_{(1)}, X_{(n)})$ is complete and sufficient. What changed? We can see that completeness is really a property of the model. In the former case, we had a restricted parameter space. This restriction destroyed completeness by introducing relationships on the order statistics. By removing this restriction we got completeness! So, in a sense, lack of completeness means that the parameter space is not big enough, and by enlarging it we can hope to restore completeness (and thus, easier inference).

Some other examples where lack of completeness is caused by restrictions on the parameter space,

see my answer to: What kind of information is Fisher information?
Let $X_1, \dotsc, X_n$ be iid $\mathcal{Cauchy}(\theta,\sigma)$ (a location-scale model). Then the order statistics in sufficient but not complete. But now enlarge this model to a fully nonparametric model, still iid but from some completely unspecified distribution $F$ . Then the order statistics is sufficient and complete.
For exponential families with canonical parameter space (that is, as large as possible) the minimal sufficient statistic is also complete. But in many cases, introducing restrictions on the parameter space, as with curved exponential families, destroys completeness.

A very relevant paper is An Interpretation of Completeness and Basu's Theorem.

— kjetil b halvorsen
fonte

7

Some intuition may be available from the theory of best (minimum variance) unbiased estimators.

If $E_\theta W=\tau(\theta)$ then $W$ is a best unbiased estimator of $\tau(\theta)$ iff $W$ is uncorrelated with all unbiased estimators of zero.

$W$ $W'$ $E_\theta W'=E_\theta W=\tau(\theta)$ $W'=W+(W'-W)$ . By assumption, $Var_\theta W'=Var_\theta W+Var_\theta (W'-W)$ . Hence, for any $W'$ , $Var_\theta W'\geq Var_\theta W$ .

Now assume that $W$ is a best unbiased estimator. Let there be some other estimator $U$ with $E_\theta U=0$ . $\phi_a:=W+aU$ is also unbiased for $\tau(\theta)$ . We have

V a r_{θ} ϕ_{a} := V a r_{θ} W + 2 a C o v_{θ} (W, U) + a^{2} V a r_{θ} U .

$Var_\theta \phi_a:=Var_\theta W+2aCov_\theta(W,U)+a^2Var_\theta U.$ If there were a

θ_{0} \in Θ

$\theta_0\in\Theta$ such that

C o v_{θ_{0}} (W, U) < 0

$Cov_{\theta_0}(W,U)<0$ , we would obtain

V a r_{θ} ϕ_{a} < V a r_{θ} W

$Var_\theta \phi_a<Var_\theta W$ for

a \in (0, - 2 C o v_{θ_{0}} (W, U) / V a r_{θ_{0}} U)

$a\in(0,-2Cov_{\theta_0}(W,U)/Var_{\theta_0} U)$ .

W

$W$ could then not be the best unbiased estimator. QED

Intuitively, the result says that if an estimator is optimal, it must not be possible to improve it by just adding some noise to it, in the sense of combining it with an estimator that is just zero on average (being an unbiased estimator of zero).

Unfortunately, it is difficult to characterize all unbiased estimators of zero. The situation becomes much simpler if zero itself is the only unbiased estimator of zero, as any statistic $W$ satisfies $Cov_\theta(W,0)=0$ . Completeness describes such a situation.

— Christoph Hanck
fonte