Relazione tra

40

Diciamo che ho due array monodimensionali, $a_1$ e $a_2$ . Ciascuno contiene 100 punti dati. $a_1$ sono i dati effettivi e $a_2$ è la previsione del modello. In questo caso, il valore di $R^2$ sarebbe:

R^{2} = 1 - \frac{S S_{r e s}}{S S_{t o t}} (1) .

$R^2 = 1 - \frac{SS_{res}}{SS_{tot}} \quad\quad\quad\quad\quad\ \ \quad\quad(1).$ Nel frattempo, questo sarebbe uguale al valore quadrato del coefficiente di correlazione,

R^{2} = (Correlation Coefficient)^{2} (2) .

$R^2 = (\text{Correlation Coefficient})^2 \quad (2).$ Ora se cambio i due:

a_{2}

$a_2$ sono i dati effettivi e

a_{1}

$a_1$ è la previsione del modello. Dall'equazione

(2)

$(2)$ , poiché al coefficiente di correlazione non importa quale viene per primo, ilvalore di

R^{2}

$R^2$ sarebbe lo stesso. Tuttavia, dall'equazione

(1)

$(1)$ ,

S S_{t o t} = \sum_{i} (y_{i} - \bar{y})^{2}

$SS_{tot}=\sum_i(y_i - \bar y )^2$ , ilvalore di

R^{2}

$R^2$ cambierà, poiché

S S_{t o t}

$SS_{tot}$ è cambiato se si passa

y

$y$ da

a_{1}

$a_1$ ad

a_{2}

$a_2$ ; nel frattempo,

S S_{r e s} = \sum_{i} (f_{i} - \bar{y})^{2}

$SS_{res}=\sum_i(f_i-\bar y)^2$ non cambia.

La mia domanda è: come possono contraddirsi a vicenda?

Modifica :

Me lo stavo chiedendo, sarà la relazione in Eq. (2) resta valido, se non si tratta di una semplice regressione lineare, ovvero la relazione tra IV e DV non è lineare (potrebbe essere esponenziale / log)?
Questa relazione rimarrà comunque valida se la somma degli errori di predizione non è uguale a zero?

correlation r-squared

— Shawn Wang
fonte

Ho trovato questa presentazione molto utile e non tecnica: google.com/…

— ihadanny

19

Questo è vero che cambierà ... ma ti sei dimenticato il fatto che la somma di regressione dei quadrati dei cambierà pure. Quindi consideriamo il modello di regressione semplice e denotiamo il coefficiente di correlazione come $SS_{tot}$ , dove ho usato il sottoindiceper sottolineare il fatto cheè la variabile indipendente eè la variabile dipendente. Ovviamente, rimane invariato se si scambiacon. Possiamo facilmente mostrare che, doveè la somma della regressione di quadrati e $r_{xy}^2=\dfrac{S_{xy}^2}{S_{xx}S_{yy}}$ $xy$ $x$ $y$ $r_{xy}^2$ $x$ $y$ $SSR_{xy}=S_{yy}(R_{xy}^2)$ $SSR_{xy}$ è la somma totale dei quadrati dove $S_{yy}$ $x$ è indipendente e è variabile dipendente. Pertanto: $y$ doveè la somma residua corrispondente di quadrati in cuiè indipendente eè variabile dipendente. Si noti che in questo caso, abbiamocon

R_{x y}^{2} = \frac{S S R_{x y}}{S_{y y}} = \frac{S_{y y} - S S E_{x y}}{S_{y y}},

$R_{xy}^2=\dfrac{SSR_{xy}}{S_{yy}}=\dfrac{S_{yy}-SSE_{xy}}{S_{yy}},$

S S E_{x y}

$SSE_{xy}$

x

$x$

y

$y$

S S E_{x y} = b_{x y}^{2} S_{x x}

$SSE_{xy}=b^2_{xy}S_{xx}$

(Vedi ad esempio Eq. (34) - (41)qui.) Pertanto:

b = \frac{S_{x y}}{S_{x x}}

$b=\dfrac{S_{xy}}{S_{xx}}$

Chiaramente sopra equazione è simmetrica rispetto ad

ed

. In altre parole:

Per riassumere quando si cambia

con

nel modello di regressione semplice, sia numeratore che denominatore di

R_{x y}^{2} = \frac{S_{y y} - \frac{S_{x y}^{2}}{S_{x x}^{2}} . S_{x x}}{S_{y y}} = \frac{S_{y y} S_{x x} - S_{x y}^{2}}{S_{x x} . S_{y y}} .

$R_{xy}^2=\dfrac{S_{yy}-\dfrac{S^2_{xy}}{S^2_{xx}}.S_{xx}}{S_{yy}}=\dfrac{S_{yy}S_{xx}-S^2_{xy}}{S_{xx}.S_{yy}}.$

x

$x$

y

$y$

R_{x y}^{2} = R_{y x}^{2} .

$R_{xy}^2=R_{yx}^2.$

x

$x$

y

$y$

cambierà in modo che

R_{x y}^{2} = \frac{S S R_{x y}}{S_{y y}}

$R_{xy}^2=\dfrac{SSR_{xy}}{S_{yy}}$

R_{x y}^{2} = R_{y x}^{2} .

$R_{xy}^2=R_{yx}^2.$

— statistica
fonte

Grazie mille! Ho notato che questo potrebbe essere dove mi sbagliavo:

vale solo se 1) la previsione del modello è una linea retta e 2) la media della previsione del modello è uguale alla media dei punti del campione. Se la relazione tra DV e IV non è una linea retta o la somma degli errori di predizione è diversa da zero, la relazione non sarà valida. La prego di farmi sapere se questo è corretto?

R^{2} = r^{2}

$R^2 = r^2$

— Shawn Wang,

1

Ci ho pensato perché hai usato

, mentre stavo usando l'equazione che ho pubblicato nell'OP. Queste due equazioni sono equivalenti tra loro solo quando la somma degli errori di predizione è zero. Quindi, nel mio PO,

non cambia mentre

R^{2} = S S_{r e g} / S S_{t o t}

$R^2=SS_{reg}/SS_{tot}$

S S_{r e s} = \sum_{i} (f_{i} - \bar{y})^{2}

$SS_{res}=\sum_i(f_i-\bar y)^2$

cambiato, e quindi la

S S_{t o t}

$SS_{tot}$

R^{2}

$R^2$ è cambiato.

— Shawn Wang

Ti capita di avere un riferimento su come risolverlo per il caso generale dei gaussiani p-variate?

— jmb,

26

$R^{2}$ $y_{i}$ $\hat{y}_{i}$ .

The complete proof of how to derive the coefficient of determination R2 from the Squared Pearson Correlation Coefficient between the observed values yi and the fitted values y^i can be found under the following link:

http://economictheoryblog.wordpress.com/2014/11/05/proof/

In my eyes it should be pretty easy to understand, just follow the single steps. I guess looking at it is essential to understand how the realtionship between the two key figures actually works.

— Andreas Dibiasi
fonte

6

In case of simple linear regression with only one predictor $R^2 = r^2 = Corr(x,y)^2$ . But in multiple linear regression with more than one predictors the concept of correlation between the predictors and the response does not extend automatically. The formula gets:

R^{2} = C o r r (y_{e s t i m a t e d}, y_{o b s e r v e d})^{2}

$R^2 = Corr(y_{estimated},y_{observed})^2$

The square of the correlation between the response and the fitted linear model.

— aman
fonte

5

@Stat has provided a detailed answer. In my short answer I'll show briefly in somewhat different way what is the similarity and difference between $r$ and $r^2$ .

$r$ is the standardized regression coefficient beta of $Y$ by $X$ or of $X$ by $Y$ and as such, it is a measure of the (mutual) effect size. Which is most clearly seen when the variables are dichotomous. Then $r$ , for example, $.30$ means that 30% of cases will change its value to opposite in one variable when the other variable changes its value to the opposite.

$r^2$ , on the other hand, is the expression of the proportion of co-variability in the total variability: $r^2 = (\frac {cov}{\sigma_x \sigma_y})^2 = \frac {|cov|} {\sigma_x^2} \frac {|cov|} {\sigma_y^2}$ . Note that this is a product of two proportions, or, more precise to say, two ratios (a ratio can be >1). If loosely imply any proportion or ratio to be a quasi-probability or propensity, then $r^2$ expresses "joint probability (propensity)". Another and as valid expression for the joint product of two proportions (or ratios) would be their geometric mean, $\sqrt{prop*prop}$ , which is very $r$ .

(The two ratios are multiplicative, not additive, to stress the idea that they collaborate and cannot compensate for each other, in their teamwork. They have to be multiplicative because the magnitude of $cov$ is dependent on both magnitudes $\sigma_x^2$ and $\sigma_y^2$ and, conformably, $cov$ has to be divided two times in once - in order to convert itself to a proper "proportion of the shared variance". But $cov$ , the "cross-variance", shares the same measurement units with both $\sigma_x^2$ and $\sigma_y^2$ , the "self-variances", and not with $\sigma_x \sigma_y$ , the "hybrid variance"; that is why $r^2$ , not $r$ , is more adequate as the "proportion of shared variance".)

So, you see that meaning of $r$ and $r^2$ as a measure of the quantity of the association is different (both meanings valid), but still these coefficients in no way contradict each other. And both are the same whether you predict $Y\text~X$ or $X\text~Y$ .

— ttnphns
fonte

Thank you so much! I am starting to wonder whether I am using the wrong definition, that two definitions of

R^{2}

$R^2$ co-exist and they are not equivalent to each other. Could you please help me in the question that - if I am thinking about more generalized cases where the model is not a simple linear regression (could be exponential) - is my equation in the OP still correct for calculating

R^{2}

$R^2$ ? Is this a different quantity, also called

R^{2}

$R^2$ , but different from the "coefficient of determination"?

— Shawn Wang

Coefficient of determination or R-square is a wider concept than r^2 which is only about simple linear regression. Please read wikipedia en.wikipedia.org/wiki/Coefficient_of_determination.

— ttnphns

Thanks again! That I do understand. My question is: for more complex regressions, can I still square the r value to get the coefficient of determination?

— Shawn Wang

1

For a "complex regression", you get R-square, but you don't get r.

— ttnphns

1

I think you might be mistaken. If $R^2=r^2$ , I assume you have a bivariate model: one DV, one IV. I don't think $R^2$ will change if you swap these, nor if you replace the IV with the predictions of the DV that are based on the IV. Here's code for a demonstration in R:

x=rnorm(1000); y=rnorm(1000)              # store random data
summary(lm(y~x))                          # fit a linear regression model (a)
summary(lm(x~y))                          # swap variables and fit the opposite model (b)
z=lm(y~x)$fitted.values; summary(lm(y~z)) # substitute predictions for IV in model (a)

If you aren't working with a bivariate model, your choice of DV will affect $R^2$ ...unless your variables are all identically correlated, I suppose, but this isn't much of an exception. If all the variables have identical strengths of correlation and also share the same portions of the DV's variance (e.g. [or maybe "i.e."], if some of the variables are completely identical), you could just reduce this to a bivariate model without losing any information. Whether you do or don't, $R^2$ still wouldn't change.

In all other cases I can think of with more than two variables, $R^2\ne r^2$ where $R^2$ is the coefficient of determination and $r$ is a bivariate correlation coefficient of any kind (not necessarily Pearson's; e.g., possibly also a Spearman's $\rho$ ).

— Nick Stauner
fonte

1

I recently did Theil linear regression then calculated

R^{2} = - 0.1468

$R^2=–0.1468$ and

S S R > S S T

$SSR>SST$ . I have seen Excel produce

- R^{2}

$-R^2$ -values as well, and at first I laughed at it, then slowly came understanding and it ceased to be funny. So is the general definition of

R^{2}

$R^2$ correct? What gives.

— Carl