Qual è l'errore standard residuo?


35

When running a multiple regression model in R, one of the outputs is a residual standard error of 0.0589 on 95,161 degrees of freedom. I know that the 95,161 degrees of freedom is given by the difference between the number of observations in my sample and the number of variables in my model. What is the residual standard error?


2
This question and its answers might help: Why do we say residual standard error?
Antoine Vernet

A quick question: Is "residual standard error" the same as "residual standard deviation"? Gelman and Hill (p.41, 2007) seem to use them interchangeably.
JetLag

Risposte:


26

Un modello di regressione adattato utilizza i parametri per generare previsioni di stima puntuale che sono il mezzo di risposte osservate se si dovesse replicare lo studio con lo stesso Xvalori un numero infinito di volte (e quando il modello lineare è vero). La differenza tra questi valori previsti e quelli utilizzati per adattarsi al modello sono chiamati "residui" che, durante la replica del processo di raccolta dei dati, hanno proprietà di variabili casuali con 0 medie.

I residui osservati vengono quindi utilizzati per stimare successivamente la variabilità di questi valori e per stimare la distribuzione campionaria dei parametri. Quando l'errore standard residuo è esattamente 0, il modello si adatta perfettamente ai dati (probabilmente a causa di un overfitting). Se l'errore standard residuo non può essere dimostrato essere significativamente diverso dalla variabilità nella risposta incondizionata, allora ci sono poche prove che suggeriscono che il modello lineare abbia qualche capacità predittiva.


3
This may have been answered before. See if this question provides the answers you need. [Interpretation of R's lm() output][1] [1]: stats.stackexchange.com/questions/5135/…
doug.numbers

26

Say we have the following ANOVA table (adapted from R's example(aov) command):

          Df Sum Sq Mean Sq F value Pr(>F)
Model      1   37.0   37.00   0.483  0.525
Residuals  4  306.3   76.57               

If you divide the sum of squares from any source of variation (model or residuals) by its respective degrees of freedom, you get the mean square. Particularly for the residuals:

306.34=76.57576.57

So 76.57 is the mean square of the residuals, i.e., the amount of residual (after applying the model) variation on your response variable.

The residual standard error you've asked about is nothing more than the positive square root of the mean square error. In my example, the residual standard error would be equal to 76.57, or approximately 8.75. R would output this information as "8.75 on 4 degrees of freedom".


1
I up-voted the answer from @AdamO because as a person who uses regression directly most often, that answer was the most straightforward for me. However, I appreciate this answer as it illustrates the notational/conceptual/methodological relationship between ANOVA and linear regression.
svannoy

12

Typically you will have a regression model looks like this:

Y=β0+β1X+ϵ
where ϵ is an error term independent of X.

If β0 and β1 are known, we still cannot perfectly predict Y using X due to ϵ. Therefore, we use RSE as a judgement value of the Standard Deviation of ϵ.

RSE is explained pretty much clearly in "Introduction to Statistical Learning".


2
This should be the accepted answer. RSE is s just an estimate of the Standard Deviation of ϵ, i.e. the residual. It's also known as the residual standard deviation (RSD), and it can be defined as RSE=RSS(n2) (e.g. see ISL page 66).
Amelio Vazquez-Reina

1
For anyone reading the epub of ISL, you can locate "page 66" with ctrl-f "residual standard error." (Epub files do not have true page numbers).
user2426679
Utilizzando il nostro sito, riconosci di aver letto e compreso le nostre Informativa sui cookie e Informativa sulla privacy.
Licensed under cc by-sa 3.0 with attribution required.