Qual è la correlazione attesa tra la variabile residua e quella dipendente?

26

Nella regressione lineare multipla, posso capire che le correlazioni tra residuo e predittori sono zero, ma qual è la correlazione attesa tra residuo e variabile criterio? Dovrebbe essere zero o fortemente correlato? Qual'è il significato di questo?

regression residuals

— Jfly
fonte

4

Che cos'è una "variabile criterio" ??

— whuber

2

@whuber Immagino che Jfly si riferisca alla risposta / risultato / dipendente / ecc. variabile. davidmlane.com/hyperstat/A101702.html È interessante vedere i molti nomi di tali variabili: en.wikipedia.org/wiki/…

— Jeromy Anglim

@Jeromy Grazie! Avevo indovinato che era il significato, ma non ero sicuro. Questo è un nuovo termine per me - e per Wikipedia, evidentemente.

— whuber

Avrei pensato che sarebbe stato uguale a o qualcosa di simile, come

E[R2] $E[R^2]$

R2=[corr(y,y^)]2 $R^2=[corr(y,\hat{y})]^2$

— Probislogic

y=f(x)+e $y = f(x) + e$ , dove è la funzione di regressione, è errore, e . Quindi . Questa è la statistica di esempio; il suo valore atteso sarebbe simile ma più disordinato.

f $f$

e $e$

Cov(f(x),e)=0 $Cov(f(x),e) = 0$

Corr(y,e)=SD(e)/SD(y)=1−R2−−−−−−√ $Corr(y,e) = SD(e)/SD(y) = \sqrt{1-R^2}$

— Ray Koopman,

20

Nel modello di regressione:

y i = x' i β + u i

$y_i=\mathbf{x}_i'\beta+u_i$

il solito presupposto è che , è un campione iid. presupposto che ed ha un rango massimo, lo stimatore ordinario dei minimi quadrati: $(y_i,\mathbf{x}_i,u_i)$ $i=1,...,n$ $E\mathbf{x}_iu_i=0$ $E(\mathbf{x}_i\mathbf{x}_i')$

β ˆ = (\sum i = 1 n x i x' i) - 1 \sum i = 1 x i y i

$\widehat{\beta}=\left(\sum_{i=1}^n\mathbf{x}_i\mathbf{x}_i'\right)^{-1}\sum_{i=1}\mathbf{x}_iy_i$

è coerente e asintoticamente normale. La covarianza prevista tra un residuo e la variabile di risposta è quindi:

E y i u i = E (x' i β + u i) u i = E u 2 i

$Ey_iu_i=E(\mathbf{x}_i'\beta+u_i)u_i=Eu_i^2$

Se supponiamo inoltre che ed , possiamo calcolare la covarianza attesa tra e la sua regressione residua: $E(u_i|\mathbf{x}_1,...,\mathbf{x}_n)=0$ $E(u_i^2|\mathbf{x}_1,...,\mathbf{x}_n)=\sigma^2$ $y_i$

E y i u ˆ i = E y i (y i - x' i β ˆ) = E (x' i β + u i) (u i - x i (β ˆ - β)) = E (u 2 i) ⎛ ⎝ 1 - E x' i (\sum j = 1 n x j x' j) - 1 x i ⎞ ⎠

$\begin{align*} Ey_i\widehat{u}_i&=Ey_i(y_i-\mathbf{x}_i'\widehat{\beta})\\\\ &=E(\mathbf{x}_i'\beta+u_i)(u_i-\mathbf{x}_i(\widehat{\beta}-\beta))\\\\ &=E(u_i^2)\left(1-E\mathbf{x}_i' \left(\sum_{j=1}^n\mathbf{x}_j\mathbf{x}_j'\right)^{-1}\mathbf{x}_i\right) \end{align*}$

Ora per ottenere la correlazione dobbiamo calcolare e . Si scopre che $\text{Var}(y_i)$ $\text{Var}(\hat{u}_i)$

Var (u^i) = E (y i u^i),

$\text{Var}(\hat u_i)=E(y_i\hat{u}_i),$

quindi

Corr (y i, u^i) = 1 - E x' i (\sum j = 1 n x j x' j) - 1 x i - - - - - - - - - - - - - - - - - - -  ⎷  

$\text{Corr}(y_i,\hat u_i)=\sqrt{1-E\mathbf{x}_i' \left(\sum_{j=1}^n\mathbf{x}_j\mathbf{x}_j'\right)^{-1}\mathbf{x}_i}$

Ora il termine viene dalla diagonale della matrice del cappello , dove . La matrice è idempotente, quindi soddisfa una proprietà seguente $\mathbf{x}_i' \left(\sum_{j=1}^n\mathbf{x}_j\mathbf{x}_j'\right)^{-1}\mathbf{x}_i$ $H=X(X'X)^{-1}X'$ $X=[\mathbf{x}_i,...,\mathbf{x}_N]'$ $H$

trace (H) = \sum i h i i = rank (H),

$\text{trace}(H)=\sum_{i}h_{ii}=\text{rank}(H),$

dove è il termine diagonale di . Il è il numero di variabili linearmente indipendenti in , che di solito è il numero di variabili. Chiamiamolo . Il numero di è la dimensione del campione . Quindi abbiamo termini non negativi che dovrebbero riassumere fino a . Di solito è molto più grande di , quindi molto sarebbe vicino allo zero, il che significa che la correlazione tra la variabile residua e la risposta sarebbe vicina a 1 per la maggior parte delle osservazioni. $h_{ii}$ $H$ $\text{rank}(H)$ $\mathbf{x}_i$ $p$ $h_{ii}$ $N$ $N$ $p$ $N$ $p$ $h_{ii}$

Il termine viene utilizzato anche in vari diagnostici di regressione per determinare osservazioni influenti. $h_{ii}$

— mpiktas
fonte

10

+1 Questa è esattamente la giusta analisi. Ma perché non finisci il lavoro e rispondi alla domanda? L'OP chiede se questa correlazione è "alta" e cosa potrebbe significare .

— whuber

Quindi potresti dire che la correlazione è approssimativamente

1−pN−−−−−√ $\sqrt{1-\frac{p}{N}}$

— Probislogic

1

La correlazione è diversa per ogni osservazione, ma sì, puoi dirlo, a condizione che X non abbia valori anomali.

— mpiktas,

21

La correlazione dipende da . Se è alto, significa che gran parte della variazione nella variabile dipendente può essere attribuita alla variazione nelle variabili indipendenti e NON al termine dell'errore. $R^2$ $R^2$

Tuttavia, se è basso, significa che gran parte della variazione nella variabile dipendente non è correlata alla variazione nelle variabili indipendenti e quindi deve essere correlata al termine di errore. $R^2$

Considera il seguente modello:

, dove e non sono correlati. $Y=X\beta+\varepsilon$ $Y$ $X$

Supponendo condizioni di regolarità sufficienti per il CLT da mantenere.

convergono a, poichéesono incorrelati. Pertanto sarà sempre zero. Così, l'. esono perfettamente correlati !!! $\hat{\beta}$ $0$ $X$ $Y$ $\hat{Y}=X\hat{\beta}$ $\varepsilon:=Y-\hat{Y}=Y-0=Y$ $\varepsilon$ $Y$

Tenendo tutto il resto fisso, aumentando diminuirà la correlazione tra l'errore e la dipendenza. Una forte correlazione non è necessariamente causa di allarme. Questo può semplicemente significare che il processo sottostante è rumoroso. Tuttavia, una bassa (e quindi un'elevata correlazione tra errore e dipendente) può essere dovuta a errata specificazione del modello. $R^2$ $R^2$

— opaco
fonte

Trovo questa risposta confusione, in parte attraverso l'uso di "

" per stare in piedi, sia per i termini di errore nel modello e dei residui

. Un altro punto di confusione è il riferimento a "convergere a" anche se non vi è alcuna sequenza di qualcosa in evidenza a cui potrebbe essere applicata la convergenza. L'assunto che

e

non siano correlati sembra speciale e non illustrativo di circostanze generali. Tutto ciò oscura qualunque cosa questa risposta stia cercando di dire o quali affermazioni siano generalmente vere. ε $\varepsilon$

Y−Y^ $Y-\hat Y$

X $X$

Y $Y$

— whuber

17

Trovo questo argomento piuttosto interessante e le risposte attuali sono purtroppo incomplete o in parte fuorvianti, nonostante la pertinenza e l'elevata popolarità di questa domanda.

Per definizione del quadro OLS classica dovrebbe esserci alcuna relazione tra e $y ̂$ $\hat u$ , poiché i residui ottenuti sono per costruzione incorrelati con derivante lo stimatore OLS. La varianza che minimizza la proprietà sotto l'omoschedasticità assicura che l'errore residuo sia distribuito casualmente attorno ai valori adattati. Questo può essere mostrato formalmente da: $y ̂$

Cov (y ̂, u ̂ | X) = Cov (P y, M y | X) = Cov (P y, (I - P) y | X) = P Cov (y, y) (I - P)'

$\text{Cov}(y ̂,u ̂|X)=\text{Cov}(Py,My|X)=\text{Cov}(Py,(I-P)y|X)=P\text{Cov}(y,y)(I-P)'$

= P σ 2 - P σ 2 = 0

$=Pσ^2-Pσ^2=0$

Dove e sono matrici idempotenti definiti come: e . $M$ $P$ $P=X(X' X)X'$ $M=I-P$

Questo risultato si basa sulla rigorosa esogeneità e omoschedasticità e praticamente si tiene in grandi campioni. L'intuizione per la loro uncorrelatedness è il seguente: I valori stimati condizionato sono centrate attorno che sono ritenuti come indipendenti e identicamente distribuite. Tuttavia, qualsiasi deviazione dalla stretta esogeneità e omoschedasticità assunzione potrebbe causare variabili esplicative siano endogeno e stimolare una correlazione latente tra e . $y ̂$ $X$ $u ̂$ $u ̂$ $y ̂$

Ora la correlazione tra i residui "originale" è una storia completamente diversa: $u ̂$ $y$

Cov (y, u ̂ | X) = Cov (y M y | X) = Cov (y, (1 - P) y) = Cov (y, y) (1 - P) = σ 2 M

$\text{Cov}(y,u ̂|X)=\text{Cov}(yMy|X)=\text{Cov}(y,(1-P)y)=\text{Cov}(y,y)(1-P)=σ^2 M$

Alcuni verifica nella teoria e sappiamo che questa matrice di covarianza è identica alla matrice di covarianza del residuo stessa (prova omessa). Abbiamo: $\hat{u}$

Var (u ̂) = σ 2 M = Cov (y, u ̂ | X)

$\text{Var}(u ̂ )=σ^2 M=\text{Cov}(y,u ̂|X)$

Se vogliamo calcolare la (scalare) covarianza tra e come richiesto dal PO, si ottiene: $y$ $\hat{u}$

⟹ Cov s c a l a r (y, u ̂ | X) = Var (u ̂ | X) = (\sum u 2 i) / N

$\implies \text{Cov}_{scalar}(y,u ̂|X)=\text{Var}(u ̂|X)=\left(∑u_i^2 \right)/N$

(= sommando le voci diagonali della matrice di covarianza e dividendole per N)

La formula sopra indica un punto interessante. Se ci prova il rapporto regredendo sui residui (+ costante), il coefficiente di pendenza , che può essere facilmente derivare quando si divide l'espressione sopra del . $y$ $\hat{u}$ $\beta_{\hat{u},y}=1$ $\text{Var}(u ̂|X)$

D'altra parte, la correlazione è la covarianza standardizzata dalle rispettive deviazioni standard. Ora, la matrice varianza dei residui è , mentre la varianza di è . La correlazione diventa quindi: $σ^2 M$ $y$ $σ^2 I$ $\text{Corr}(y,u ̂ )$

Corr (y, u ̂) = Var ( u ̂ ) Var ( u ^ ) Var ( y ) - - - - - - - - - - - \sqrt = Var ( u ̂ ) Var ( y ) - - - - - - \sqrt = Var ( u ̂ ) σ 2 - - - - - - \sqrt

$\text{Corr}(y,u ̂ )=\frac{\text{Var}(u ̂ )}{\sqrt{\text{Var}(\hat{u})\text{Var}(y)}}=\sqrt{\frac{\text{Var}(u ̂ )}{\text{Var}(y)} }=\sqrt{\frac{\text{Var}(u ̂ )}{σ^2 }}$

This is the core result which ought to hold in a linear regression. The intuition is that the $\text{Corr}(y,u ̂ )$ expresses the error between the true variance of the error term and a proxy for the variance based on residuals. Notice that the variance of $y$ is equal to the variance of $\hat{y}$ plus the variance of the residuals $\hat{u}$ . So it can be more intuitively rewritten as:

Corr (y, u ̂) = 1 1 + Var ( y ) ^ Var ( u ̂ ) - - - - - - - - \sqrt

$\text{Corr}(y,u ̂ )=\frac{1}{\sqrt{1+\frac{\text{Var}(\hat{y)}}{\text{Var}(u ̂ )}}}$

The are two forces here at work. If we have a great fit of the regression line, the correlation is expected to be low due to $\text{Var}(u ̂ )\approx 0$ . On the other hand, $\text{Var}(\hat{y})$ is a bit of a fudge to esteem as it is unconditional and a line in parameter space. Comparing an unconditional and conditional variances within a ratio may not be an appropriate indicator after all. Perhaps, that's why it rarely done in practice.

An attempt conclude the question: The correlation between $y$ and $u ̂$ is positive and relates to the ratio of the variance of the residuals and the variance of the true error term, proxied by the unconditional variance in $y$ . Hence, it is a bit of a misleading indicator.

Notwithstanding this exercise may give us some intuition on the workings and inherent theoretical assumptions of an OLS regression, we rarely evaluate the correlation between $y$ and $u ̂$ . There are certainly more established tests for checking properties of the true error term. Secondly, keep in mind that the residuals are not the error term, and tests on residuals $u ̂$ that make predictions of the characteristics on the true error term $u$ are limited and their validity need to be handled with utmost care.

For example, I would like to point out a statement made by a previous poster here. It is said that,

"If your residuals are correlated with your independent variables, then your model is heteroskedastic..."

I think that may not be entirely valid in this context. Believe it or not, but the OLS residuals $u ̂$ are by construction made to be uncorrelated with the independent variable $x_k$ . To see this, consider:

X' u i = X' M y = X' (I - P) y = X' y - X' P y

$X'u_i=X'My=X'(I-P)y=X'y-X'Py$

= X' y - X' X (X' X) X' y = X' y - X' y = 0

$=X'y-X'X(X'X)X'y=X'y-X'y=0$

⟹ X' u i = 0 ⟹ Cov (X', u i | X) = 0 ⟹ Cov (x k i, u i | x k i) = 0

$\implies X'u_i=0 \implies \text{Cov}(X',u_i|X)=0 \implies \text{Cov}(x_{ki},u_i|x_ki)=0$

However, you may have heard claims that an explanatory variable is correlated with the error term. Notice that such claims are based on assumptions about the whole population with a true underlying regression model, that we do not observe first hand. Consequently, checking the correlation between $y$ and $u ̂$ seems pointless in a linear OLS framework. However, when testing for heteroskedasticity, we take here into account the second conditional moment, for example, we regress the squared residuals on $X$ or a function of $X$ , as it is often the case with FGSL estimators. This is different from evaluating the plain correlation. I hope this helps to make matters more clear.

— Majte
fonte

1

Note that we have

var(u^)var(y)=SSETSS=1−R2 $\frac{var(\hat{u})}{var(y)}=\frac{SSE}{TSS}=1-R^2$ (at least roughly anyway). This gives

corr(y,u^)=1−R2−−−−−−√ $corr(y,\hat{u})=\sqrt{1-R^2}$ which is some further intuition about what you mention in later paragraphs.

— probabilityislogic

2

What I find interesting about this answer is that the correlation is always positive.

— probabilityislogic

You state that

Var(y) $Var(y)$ is matrix, yet you divide by it.

— mpiktas

@probabilityislogic: Not sure if I can follow your step. It would be then under the squareroot 1+(1/1-R^2), which is (2-R^2)/(1-R^2)? Yet what's true is that it remains positive. The intuition is that if you have a line through a scatterplot, and you regress this line on errors from that line, it should be obvious that as the value y of that line increases the value of the residuals increase as well. This is because the residuals are positively dependent on y by construction.

— Majte

@mpiktas: In this case the matrix becomes a scalar as we are dealing y being only in one dimension.

— Majte

6

The Adam's answer is wrong. Even with a model that fits data perfectly, you can still get high correlation between residuals and dependent variable. That's the reason no regression book asks you to check this correlation. You can find the answer on Dr. Draper's "Applied Regression Analysis" book.

— Jeff
fonte

3

Even if correct, this is more of an assertion than an answer according to CV's standards, @Jeff. Would you mind elaborating / backing up your claim? Even just a page number & edition of Draper & Smith would suffice.

— gung - Reinstate Monica

4

So, the residuals are your unexplained variance, the difference between your model's predictions and the actual outcome you're modeling. In practice, few models produced through linear regression will have all residuals close to zero unless linear regression is being used to analyze a mechanical or fixed process.

Ideally, the residuals from your model should be random, meaning they should not be correlated with either your independent or dependent variables (what you term the criterion variable). In linear regression, your error term is normally distributed, so your residuals should also be normally distributed as well. If you have significant outliers, or If your residuals are correlated with either your dependent variable or your independent variables, then you have a problem with your model.

If you have significant outliers and non-normal distribution of your residuals, then the outliers may be skewing your weights (Betas), and I would suggest calculating DFBETAS to check the influence of your observations on your weights. If your residuals are correlated with your dependent variable, then there is a significantly large amount of unexplained variance that you are not accounting for. You may also see this if you're analyzing repeated observations of the same thing, due to autocorrelation. This can be checked for by seeing if your residuals are correlated with your time or index variable. If your residuals are correlated with your independent variables, then your model is heteroskedastic (see: http://en.wikipedia.org/wiki/Heteroscedasticity). You should check (if you haven't already) if your input variables are normally distributed, and if not, then you should consider scaling or transforming your data (the most common kinds are log and square-root) in order to make it more normalized.

In the case of both, your residuals, and your independent variables, you should take a QQ-Plot, as well as perform a Kolmogorov-Smirnov test (this particular implementation is sometimes referred to as the Lilliefors test) to make sure that your values fit a normal distribution.

Three things that are quick and may be helpful in dealing with this problem, are examining the median of your residuals, it should be as close to zero as possible (the mean will almost always be zero as a result of how the error term is fitted in linear regression), a Durbin-Watson test for autocorrelation in your residuals (especially as I mentioned before, if you are looking at multiple observations of the same things), and performing a partial residual plot will help you look for heteroscedasticity and outliers.

— Adam
fonte

Thank you very much. Your explanation is very helpful to me.

— Jfly

1

+1 Nice, comprehensive answer. I'm going to nitpick on 2 points. "If your residuals are correlated with your independent variables, then your model is heteroskedastic"--I would say that if the variance of your residuals depends on the level of an independent variable, then you have heteroscedasticity. Also, I have heard the Kolmogorov-Smirnov/Lilliefors tests described as "notoriously unreliable," and in practive I have certainly found this to be true. Better to make a subjective determination based on a Q-Q plot or a simple histogram.

— rolando2

4

The claim that "the residuals from your model... should not be correlated with... your... dependent variable" is not generally true, as explained in other answers on this thread. Would you mind correcting this post?

— gung - Reinstate Monica

1

(-1) I think this post is not relevant enough to the question asked. It is good as general advice, but perhaps a case of the "right answer to the wrong question".

— probabilityislogic