Un problema sulla stimabilità dei parametri

Sia e quattro variabili casuali tali che , dove sono parametri sconosciuti. Supponi anche che ,Allora quale è vero? $Y_1,Y_2,Y_3$ $Y_4$ $E(Y_1)=\theta_1-\theta_3;\space\space E(Y_2)=\theta_1+\theta_2-\theta_3;\space\space E(Y_3)=\theta_1-\theta_3;\space\space E(Y_4)=\theta_1-\theta_2-\theta_3$ $\theta_1,\theta_2,\theta_3$ $Var(Y_i)=\sigma^2$ $i=1,2,3,4.$

A. sono stimabili. $\theta_1,\theta_2,\theta_3$

B. è stimabile. $\theta_1+\theta_3$

C. è stimabile e è la migliore stima imparziale lineare di . $\theta_1-\theta_3$ $\dfrac{1}{2}(Y_1+Y_3)$ $\theta_1-\theta_3$

D. è stimabile. $\theta_2$

La risposta è C è che mi sembra strano (perché ho D).

Perché ho D? Da allora, . $E(Y_2-Y_4)=2\theta_2$

Perché non capisco che C potrebbe essere una risposta? Ok, posso vedere, è uno stimatore imparziale di e la sua "varianza è inferiore a . $\dfrac{Y_1+Y_2+Y_3+Y_4}{4}$ $\theta_1-\theta_3$ $\dfrac{Y_1+Y_3}{2}$

Per favore, dimmi dove sto sbagliando.

Anche pubblicato qui: /math/2568894/a-problem-on-estimability-of-parameters

self-study estimation inference

— Stat_prob_001
fonte

Metti il self-studytag o qualcuno verrà e chiudi la tua domanda.

— Carl,

@Carl è fatto ma perché?

— Stat_prob_001,

Sono le regole del sito, non le mie regole, le regole del sito.

— Carl,

Y_{1} \neq Y_{3}

$Y_1\neq Y_3$ ?

— Carl,

@Carl puoi pensare in questo modo:

Y_{1} = θ_{1} - θ_{3} + ϵ_{1}

$Y_1=\theta_1-\theta_3+\epsilon_1$ dove

ϵ_{1}

$\epsilon_1$ è un camper con media

0

$0$ e varianza

σ^{2}

$\sigma^2$ . E,

Y_{3} = θ_{1} - θ_{3} + ϵ_{3}

$Y_3=\theta_1-\theta_3+\epsilon_3$ dove

ϵ_{3}

$\epsilon_3$ è un camper con media

0

$0$ e varianza

σ^{2}

$\sigma^2$

— Stat_prob_001

Risposte:

Questa risposta sottolinea la verifica della stimabilità. La proprietà varianza minima è di mia considerazione secondaria.

Per cominciare, riassumi le informazioni in termini di forma matriciale di un modello lineare come segue:

\begin{aligned} (1) & Y := [\begin{matrix} Y_{1} \\ Y_{2} \\ Y_{3} \\ Y_{4} \end{matrix}] = [\begin{matrix} 1 & 0 & - 1 \\ 1 & 1 & - 1 \\ 1 & 0 & - 1 \\ 1 & - 1 & - 1 \end{matrix}] [\begin{matrix} θ_{1} \\ θ_{2} \\ θ_{3} \end{matrix}] + [\begin{matrix} ε_{1} \\ ε_{2} \\ ε_{3} \\ ε_{4} \end{matrix}] := X β + ε, \end{aligned}

$\begin{align} Y := \begin{bmatrix} Y_1 \\ Y_2 \\ Y_3 \\ Y_4 \end{bmatrix} = \begin{bmatrix} 1 & 0 & -1 \\ 1 & 1 & -1 \\ 1 & 0 & -1 \\ 1 & -1 & -1 \\ \end{bmatrix} \begin{bmatrix} \theta_1 \\ \theta_2 \\ \theta_3 \end{bmatrix} + \begin{bmatrix} \varepsilon_1 \\ \varepsilon_2 \\ \varepsilon_3 \\ \varepsilon_4 \end{bmatrix}:= X\beta + \varepsilon, \tag{1} \end{align}$ dove

E (ε) = 0, Var (ε) = σ^{2} I

$E(\varepsilon) = 0, \text{Var}(\varepsilon) = \sigma^2 I$ (per discutere di stimabilità, l'assunzione di sferità non è necessaria. Ma per discutere della proprietà di Gauss-Markov, dobbiamo assumere la sferità di

ε

$\varepsilon$ ).

Se la matrice disegno è di rango pieno, allora il parametro originale ammette un unico minimi quadrati Stima . Di conseguenza, qualsiasi parametro , definita come una funzione lineare di è stimabili nel senso che può essere univocamente stimata dati tramite i minimi quadrati stimare come . $X$ $\beta$ $\hat{\beta} = (X'X)^{-1}X'Y$ $\phi$ $\phi(\beta)$ $\beta$ $\hat{\beta}$ $\hat{\phi} = p'\hat{\beta}$

La sottigliezza sorge quando non è al massimo. Per avere una discussione approfondita, fissiamo alcune notazioni e termini prima di seguito (seguo la convenzione di The Free-Coordinate Free Approach to Linear Models , Sezione 4.8. Alcuni termini sembrano inutilmente tecnici). Inoltre, la discussione si applica al modello lineare generale con e . $X$ $Y = X\beta + \varepsilon$ $X \in \mathbb{R}^{n \times k}$ $\beta \in \mathbb{R}^k$

Una varietà di regressione è la raccolta di vettori medi poiché varia su : $\beta$ $\mathbb{R}^k$ $M = {X β : β \in R^{k}} .$ $M = \{X\beta: \beta \in \mathbb{R}^k\}.$

Una funzione parametrica è una funzione lineare di , $\phi = \phi(\beta)$ $\beta$ $ϕ (β) = p^{'} β = p_{1} β_{1} + \dots + p_{k} β_{k} .$ $\phi(\beta) = p'\beta = p_1\beta_1 + \cdots + p_k\beta_k.$

Come accennato in precedenza, quando , non tutte le funzioni parametriche sono stimabili. Ma, aspetta, qual è la definizione del termine stimabile tecnicamente? Sembra difficile dare una definizione chiara senza disturbare una piccola algebra lineare. Una definizione, che ritengo sia la più intuitiva, è la seguente (dallo stesso riferimento di cui sopra): $\text{rank}(X) < k$ $\phi(\beta)$

Definizione 1. Una funzione parametrica è stimabile se è determinata in modo univoco da nel senso che ogni volta che soddisfa . $\phi(\beta)$ $X\beta$ $\phi(\beta_1) = \phi(\beta_2)$ $\beta_1,\beta_2 \in \mathbb{R}^k$ $X\beta_1 = X\beta_2$

Interpretazione. La definizione di cui sopra stabilisce che la mappatura dal collettore di regressione allo spazio dei parametri di deve essere uno a uno, il che è garantito quando (cioè, quando stesso è uno a uno). Quando , sappiamo che esiste tale che $M$ $\phi$ $\text{rank}(X) = k$ $X$ $\text{rank}(X) < k$ $\beta_1 \neq \beta_2$ $X\beta_1 = X\beta_2$ . La definizione stimabile sopra in effetti esclude quei funzionali parametrici carenti strutturali che danno loro stessi valori diversi anche con lo stesso valore su , che non hanno senso naturalmente. D'altra parte, una funzione parametrica stimabile consente il caso con , purché sia soddisfatta la condizione . $M$ $\phi(\cdot)$ $\phi(\beta_1) = \phi(\beta_2)$ $\beta_1 \neq \beta_2$ $X\beta_1 = X\beta_2$

Esistono altre condizioni equivalenti per verificare la stimabilità di una funzione parametrica fornita nello stesso riferimento, Proposizione 8.4.

Dopo un'introduzione così dettagliata, torniamo alla tua domanda.

A. stesso non è stimabile per il motivo che , che comporta con . Sebbene la definizione di cui sopra sia data per i funzionali scalari, è facilmente generalizzabile ai funzionali a valori vettoriali. $\beta$ $\text{rank}(X) < 3$ $X\beta_1 = X\beta_2$ $\beta_1 \neq \beta_2$

B. non è stimabile. Per intenderci, considera e , che dà ma $\phi_1(\beta) = \theta_1 + \theta_3 = (1, 0, 1)'\beta$ $\beta_1 = (0, 1, 0)'$ $\beta_2 = (1, 1, 1)'$ $X\beta_1 = X\beta_2$ $\phi_1(\beta_1) = 0 + 0 = 0 \neq \phi_1(\beta_2) = 1 + 1 = 2$ .

C. è stimabile. Perché implica banalmente , cioè, $\phi_2(\beta) = \theta_1 - \theta_3 = (1, 0, -1)'\beta$ $X\beta_1 = X\beta_2$ $\theta_1^{(1)} - \theta_3^{(1)} = \theta_1^{(2)} - \theta_3^{(2)}$ $\phi_2(\beta_1) = \phi_2(\beta_2)$ .

D. $\phi_3(\beta) = \theta_2 = (0, 1, 0)'\beta$ is also estimable. The derivation from $X\beta_1 = X\beta_2$ to $\phi_3(\beta_1) = \phi_3(\beta_2)$ is also trivial.

After the estimability is verified, there is a theorem (Proposition 8.16, same reference) claims the Gauss-Markov property of $\phi(\beta)$ . Based on that theorem, the second part of option C is incorrect. The best linear unbiased estimate is $\bar{Y} = (Y_1 + Y_2 + Y_3 + Y_4)/4$ , by the theorem below.

Theorem. Let $\phi(\beta) = p'\beta$ be an estimable parametric functional, then its best linear unbiased estimate (aka, Gauss-Markov estimate) is $\phi(\hat{\beta})$ for any solution $\hat{\beta}$ to the normal equations $X'X\hat{\beta} = X'Y$ .

The proof goes as follows:

Proof. Straightforward calculation shows that the normal equations is
$[\begin{matrix} 4 & 0 & - 4 \\ 0 & 2 & 0 \\ - 4 & 0 & 4 \end{matrix}] \hat{β} = [\begin{matrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & - 1 \\ - 1 & - 1 & - 1 & - 1 \end{matrix}] Y,$ $\begin{equation} \begin{bmatrix} 4 & 0 & -4 \\ 0 & 2 & 0 \\ -4 & 0 & 4 \end{bmatrix} \hat{\beta} = \begin{bmatrix} 1 & 1 & 1 & 1 \\ 0 & 1 & 0 & -1 \\ -1 & -1 & -1 & -1 \end{bmatrix} Y, \end{equation}$ which, after simplification, is $[\begin{matrix} ϕ (\hat{β}) \\ {\hat{θ}}_{2} / 2 \\ - ϕ (\hat{β}) \end{matrix}] = [\begin{matrix} \bar{Y} \\ (Y_{2} - Y_{4}) / 4 \\ - \bar{Y} \end{matrix}],$ $\begin{equation} \begin{bmatrix} \phi(\hat{\beta}) \\ \hat{\theta}_2/2 \\ -\phi(\hat{\beta}) \end{bmatrix} = \begin{bmatrix} \bar{Y} \\ (Y_2 - Y_4)/4 \\ -\bar{Y} \end{bmatrix}, \end{equation}$ i.e., $\phi(\hat{\beta}) = \bar{Y}$ .

Therefore, option D is the only correct answer.

Addendum: The connection of estimability and identifiability

When I was at school, a professor briefly mentioned that the estimability of the parametric functional $\phi$ corresponds to the model identifiability. I took this claim for granted then. However, the equivalance needs to be spelled out more explicitly.

According to A.C. Davison's monograph Statistical Models p.144,

Definition 2. A parametric model in which each parameter $\theta$ generates a different distribution is called identifiable.

For linear model $(1)$ , regardless the spherity condition $\text{Var}(\varepsilon) = \sigma^2 I$ , it can be reformulated as

\begin{matrix} (2) & E [Y] = X β, β \in R^{k} . \end{matrix}

$\begin{equation} E[Y] = X\beta, \quad \beta \in \mathbb{R}^k. \tag{2} \end{equation}$

It is such a simple model that we only specified the first moment form of the response vector $Y$ . When $\text{rank}(X) = k$ , model $(2)$ is identifiable since $\beta_1 \neq \beta_2$ implies $X\beta_1 \neq X\beta_2$ (the word "distribution" in the original definition, naturally reduces to "mean" under model $(2)$ .).

Now suppose that $\text{rank}(X) < k$ and a given parametric functional $\phi(\beta) = p'\beta$ , how do we reconcile Definition 1 and Definition 2?

Well, by manipulating notations and words, we can show that (the "proof" is rather trivial) the estimability of $\phi(\beta)$ is equivalent to that the model $(2)$ is identifiable when it is parametrized with parameter $\phi = \phi(\beta) = p'\beta$ (the design matrix $X$ is likely to change accordingly). To prove, suppose $\phi(\beta)$ is estimable so that $X\beta_1 = X\beta_2$ implies $p'\beta_1 = p'\beta_2$ , by definition, this is $\phi_1 = \phi_2$ , hence model $(3)$ is identifiable when indexing with $\phi$ . Conversely, suppose model $(3)$ is identifiable so that $X\beta_1 = X\beta_2$ implies $\phi_1 = \phi_2$ , which is trivially $\phi_1(\beta) = \phi_2(\beta)$ .

Intuitively, when $X$ is reduced-ranked, the model with $\beta$ is parameter redundant (too many parameters) hence a non-redundant lower-dimensional reparametrization (which could consist of a collection of linear functionals) is possible. When is such new representation possible? The key is estimability.

To illustrate the above statements, let's reconsider your example. We have verified parametric functionals $\phi_2(\beta) = \theta_1 - \theta_3$ and $\phi_3(\beta) = \theta_2$ are estimable. Therefore, we can rewrite the model $(1)$ in terms of the reparametrized parameter $(\phi_2, \phi_3)'$ as follows

E [Y] = [\begin{matrix} 1 & 0 \\ 1 & 1 \\ 1 & 0 \\ 1 & - 1 \end{matrix}] [\begin{matrix} ϕ_{2} \\ ϕ_{3} \end{matrix}] = \tilde{X} γ .

$\begin{equation} E[Y] = \begin{bmatrix} 1 & 0 \\ 1 & 1 \\ 1 & 0 \\ 1 & - 1 \end{bmatrix} \begin{bmatrix} \phi_2 \\ \phi_3 \end{bmatrix} = \tilde{X}\gamma. \end{equation}$

Clearly, since $\tilde{X}$ is full-ranked, the model with the new parameter $\gamma$ is identifiable.

— Zhanxiong
fonte

If you need a proof for the second part of option C, I will supplement my answer.

— Zhanxiong

thanks! for such a detailed answer. Now, about the second part of C: I know that "best" relates to minimum variance. So, why not

\frac{1}{4} (Y_{1} + Y_{2} + Y_{3} + Y_{4})

$\dfrac{1}{4}(Y_1+Y_2+Y_3+Y_4)$ is not "best"?

— Stat_prob_001

Oh, I don't know why I thought it is the estimator in C. Actually

(Y_{1} + Y_{2} + Y_{3} + Y_{4}) / 4

$(Y_1 + Y_2 + Y_3 + Y_4)/4$ is the best estimator. Will edit my answer

— Zhanxiong

Apply the definitions.

I will provide details to demonstrate how you can use elementary techniques: you don't need to know any special theorems about estimation, nor will it be necessary to assume anything about the (marginal) distributions of the $Y_i$ . We will need to supply one missing assumption about the moments of their joint distribution.

Definitions

All linear estimates are of the form

t_{λ} (Y) = \sum_{i = 1}^{4} λ_{i} Y_{i}

$t_\lambda(Y) = \sum_{i=1}^4 \lambda_i Y_i$ for constants

λ = (λ_{i})

$\lambda = (\lambda_i)$ .

An estimator of $\theta_1-\theta_3$ is unbiased if and only if its expectation is $\theta_1-\theta_3$ . By linearity of expectation,

\begin{aligned} θ_{1} - θ_{3} & = E [t_{λ} (Y)] = \sum_{i = 1}^{4} λ_{i} E [Y_{i}] \\ = λ_{1} (θ_{1} - θ_{3}) + λ_{2} (θ_{1} + θ_{2} - θ_{3}) + λ_{3} (θ_{1} - θ_{3}) + λ_{4} (θ_{1} - θ_{2} - θ_{3}) \\ = (λ_{1} + λ_{2} + λ_{3} + λ_{4}) (θ_{1} - θ_{3}) + (λ_{2} - λ_{4}) θ_{2} . \end{aligned}

$\eqalign{ \theta_1 - \theta_3 &= E[t_\lambda(Y)] = \sum_{i=1}^4 \lambda_i E[Y_i]\\ & = \lambda_1(\theta_1-\theta_3) + \lambda_2(\theta_1+\theta_2-\theta_3) + \lambda_3(\theta_1-\theta_3) + \lambda_4(\theta_1-\theta_2-\theta_3) \\ &=(\lambda_1+\lambda_2+\lambda_3+\lambda_4)(\theta_1-\theta_3) + (\lambda_2-\lambda_4)\theta_2. }$

Comparing coefficients of the unknown quantities $\theta_i$ reveals

\begin{matrix} (1) & λ_{2} - λ_{4} = 0 and λ_{1} + λ_{2} + λ_{3} + λ_{4} = 1. \end{matrix}

$\lambda_2-\lambda_4=0\text{ and }\lambda_1+\lambda_2+\lambda_3+\lambda_4=1.\tag{1}$

In the context of linear unbiased estimation, "best" always means with least variance. The variance of $t_\lambda$ is

Var (t_{λ}) = \sum_{i = 1}^{4} λ_{i}^{2} Var (Y_{i}) + \sum_{i \neq j}^{4} λ_{i} λ_{j} Cov (Y_{i}, Y_{j}) .

$\operatorname{Var}(t_\lambda) = \sum_{i=1}^4 \lambda_i^2 \operatorname{Var}(Y_i) + \sum_{i\ne j}^4 \lambda_i\lambda_j \operatorname{Cov}(Y_i,Y_j).$

The only way to make progress is to add an assumption about the covariances: most likely, the question intended to stipulate they are all zero. (This does not imply the $Y_i$ are independent. Furthermore, the problem can be solved by making any assumption that stipulates those covariances up to a common multiplicative constant. The solution depends on the covariance structure.)

Since $\operatorname{Var}(Y_i)=\sigma^2,$ we obtain

\begin{matrix} (2) & Var (t_{λ}) = σ^{2} (λ_{1}^{2} + λ_{2}^{2} + λ_{3}^{2} + λ_{4}^{2}) . \end{matrix}

$\operatorname{Var}(t_\lambda) =\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2).\tag{2}$

The problem therefore is to minimize $(2)$ subject to constraints $(1)$ .

Solution

The constraints $(1)$ permit us to express all the $\lambda_i$ in terms of just two linear combinations of them. Let $u=\lambda_1-\lambda_3$ and $v=\lambda_1+\lambda_3$ (which are linearly independent). These determine $\lambda_1$ and $\lambda_3$ while the constraints determine $\lambda_2$ and $\lambda_4$ . All we have to do is minimize $(2)$ , which can be written

σ^{2} (λ_{1}^{2} + λ_{2}^{2} + λ_{3}^{2} + λ_{4}^{2}) = \frac{σ^{2}}{4} (2 u^{2} + (2 v - 1)^{2} + 1) .

$\sigma^2(\lambda_1^2 + \lambda_2^2 + \lambda_3^2 + \lambda_4^2) = \frac{\sigma^2}{4}\left(2u^2 + (2v-1)^2 + 1\right).$

No constraints apply to $(u,v)$ . Assume $\sigma^2 \ne 0$ (so that the variables aren't just constants). Since $u^2$ and $(2v-1)^2$ are smallest only when $u=2v-1=0$ , it is now obvious that the unique solution is

λ = (λ_{1}, λ_{2}, λ_{3}, λ_{4}) = (1 / 4, 1 / 4, 1 / 4, 1 / 4) .

$\lambda = (\lambda_1,\lambda_2,\lambda_3,\lambda_4) = (1/4,1/4,1/4,1/4).$

Option (C) is false because it does not give the best unbiased linear estimator. Option (D), although it doesn't give full information, nevertheless is correct, because

θ_{2} = E [t_{(0, 1 / 2, 0, - 1 / 2)} (Y)]

$\theta_2 = E[t_{(0,1/2,0,-1/2)}(Y)]$

is the expectation of a linear estimator.

It is easy to see that neither (A) nor (B) can be correct, because the space of expectations of linear estimators is generated by $\{\theta_2, \theta_1-\theta_3\}$ and none of $\theta_1,\theta_3,$ or $\theta_1+\theta_3$ are in that space.

Consequently (D) is the unique correct answer.

— whuber
fonte