Стандартные ошибки для множественных коэффициентов регрессии?

18

Я понимаю, что это очень простой вопрос, но я нигде не могу найти ответ.

Я вычисляю коэффициенты регрессии, используя либо нормальные уравнения, либо QR-разложение. Как я могу вычислить стандартные ошибки для каждого коэффициента? Я обычно думаю о стандартных ошибках как о:

SEx¯ =σx¯n

What is σx¯ for each coefficient? What is the most efficient way to compute this in the context of OLS?

Belmont
источник

Ответы:

19

When doing least squares estimation (assuming a normal random component) the regression parameter estimates are normally distributed with mean equal to the true regression parameter and covariance matrix Σ=s2(XTX)1 where s2 is the residual variance and XTX is the design matrix. XT is the transpose of X and X is defined by the model equation Y=Xβ+ϵ with β the regression parameters and ϵ is the error term. The estimated standard deviation of a beta parameter is gotten by taking the corresponding term in (XTX)1 multiplying it by the sample estimate of the residual variance and then taking the square root. This is not a very simple calculation but any software package will compute it for you and provide it in the output.

Example

On page 134 of Draper and Smith (referenced in my comment), they provide the following data for fitting by least squares a model Y=β0+β1X+ε where εN(0,Iσ2).

                      X                      Y                    XY
                      0                     -2                     0
                      2                      0                     0
                      2                      2                     4
                      5                      1                     5
                      5                      3                    15
                      9                      1                     9
                      9                      0                     0
                      9                      0                     0
                      9                      1                     9
                     10                     -1                   -10
                    ---                     --                   ---
Sum                  60                      5                    32
Sum of  Squares     482                     21                   528

Looks like an example where the slope should be close to 0.

Xt=(111111111102255999910).

So

XtX=(nXiXiXi2)=(106060482)

and

(XtX)1=(Xi2n(XiX¯)2X¯(XiX¯)2X¯(XiX¯)21(XiX¯)2)=(48210(122)612261221122)=(0.3950.0490.0490.008)

where X¯=Xi/n=60/10=6.

Estimate for β=(XTX)1XTY = ( b0 ) =(Yb-b1 Xb) b1 Sxy/Sxx

b1 = 1/61 = 0.0163 and b0 = 0.5- 0.0163(6) = 0.402

From (XTX)1 above Sb1 =Se (0.008) and Sb0=Se(0.395) where Se is the estimated standard deviation for the error term. Se =√2.3085.

Sorry that the equations didn't carry subscripting and superscripting when I cut and pasted them. The table didn't reproduce well either because the spaces got ignored. The first string of 3 numbers correspond to the first values of X Y and XY and the same for the followinf strings of three. After Sum comes the sums for X Y and XY respectively and then the sum of squares for X Y and XY respectively. The 2x2 matrices got messed up too. The values after the brackets should be in brackets underneath the numbers to the left.

Michael R. Chernick
источник
2
Не задумывался как плагин для моей книги, но я провожу вычисления решения наименьших квадратов в простой линейной регрессии (Y = aX + b) и вычисляю стандартные ошибки для a и b, стр.101-103, Основы биостатистики для врачей, медсестер и клиницистов, Wiley 2011. более подробное описание можно найти в Draper and Smith, Прикладной регрессионный анализ, 3-е издание, Wiley New York, 1998, стр. 126-127. В последующем ответе я приведу пример Дрейпера и Смита.
Майкл Р. Черник
8
When I started interacting with this site, Michael, I had similar feelings. With experience, they have changed. It's worthwhile knowing some TEX and once you do, it's (almost) as fast to type it in as it is to type in anything in English. I also learned, by studying exemplary posts (such as many replies by @chl, cardinal, and other high-reputation-per-post users), that providing references, clear illustrations, and well-thought out equations is usually highly appreciated and well received. High quality is one thing distinguishing this site from most others.
whuber
2
That is all nice Bill and it is nice that so many people are dedicated to give those high quality posts. I may use Latex for other purposes, like publishing papers. But I don't have the time to go to all the effort that people expect of me on this site. i am not going to invest the time just to provide service on this site.
Michael R. Chernick
4
I think the disconnect is here: "This is just one of many things about this site that requires those posting to put in extra time and effort" - @whuber and I are both saying that it, in fact, does not take extra time if you know how to do it. We don't learn TEX so that we can post on this site - we (at least I) learn TEX because it's an important skill to have as a statistician and happens to make posts much more readable on this site.
Macro
3
Like many of the people on here, yes, I work as a statistician, but I also happen to find it fun - this site is recreational for me and it's a nice bonus that others find some of my posts useful. If you find marking up your equations with TEX to be work and don't think it's worth learning then so be it, but know that some of your content will be overlooked.
Macro