Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 11, Slide 22 Residuals • The residuals, like the fitted values of \hat{Y_i} can be expressed as linear {\displaystyle \mathbf {P} ^{2}=\mathbf {P} } 2 A 2 Then the eigenvalues of Hare all either 0 or 1. Trace of a matrix is equal to the sum of its characteristic values, thus tr(P) = … HH = H Important idempotent matrix property For a symmetric and idempotent matrix A, rank(A) = trace(A), the number of non-zero eigenvalues of A. Residuals The residuals, … Three of the data points — the smallest x value, an x value near the mean, and the largest x value — are labeled with their corresponding leverages. I is equal to the covariance between the jth response value and the ith fitted value, divided by the variance of the former: Therefore, the covariance matrix of the residuals {\displaystyle \mathbf {Ax} } H 1 Hat Matrix 1.1 From Observed to Fitted Values The OLS estimator was found to be given by the (p 1) vector, b= (XT X) 1XT y: The predicted values ybcan then be written as, by= X b= X(XT X) 1XT y =: Hy; where H := X(XT X) 1XT is an n nmatrix, which \puts the hat … and the vector of fitted values by {\displaystyle M\{A\}=I-P\{A\}} The present article derives and discusses the hat matrix and gives an example to illustrate its usefulness. {\displaystyle A} {\displaystyle \mathbf {r} } Moreover, the element in the ith row and jth column of A ;the n nprojection/Hat matrix under the null hypothesis. 1 We call this the \hat matrix" because is turns Y’s into Y^’s. Additional information of the samples is available in the form of Y (also as above). 1 x T �GIE/T_�G�,�T����:�V��*S� !�a�(�dN$I[��.���$t���M�QXV�����(��@�KsS��˓eZFrl�Q ~�� =Ԗ�� 0G����ΐ*��ߏ�n��]��7ೌ��G��_���&D. Estimated Covariance Matrix of b This matrix b is a linear combination of the elements of Y. 2. When the weights for each observation are identical and the errors are uncorrelated, the estimated parameters are, Therefore, the projection matrix (and hat matrix) is given by, The above may be generalized to the cases where the weights are not identical and/or the errors are correlated. {\displaystyle \left(\mathbf {X} ^{\mathsf {T}}\mathbf {X} \right)^{-1}\mathbf {X} ^{\mathsf {T}}} T Σ A ⋅ = I Properties of leverages h ii: 1 0 h ii 1 (can you show this? ) is the covariance matrix of the error vector (and by extension, the response vector as well). One can use this partition to compute the hat matrix of Hat Matrix Y^ = Xb Y^ = X(X0X)−1X0Y Y^ = HY where H= X(X0X)−1X0. ) {\displaystyle \mathbf {\Sigma } =\sigma ^{2}\mathbf {I} } {\displaystyle \mathbf {A} } ( can also be expressed compactly using the projection matrix: where X , and is one where we can draw a line orthogonal to the column space of 2 P n i=1 h ii= p)h = P n i=1 hii n = p (show it). ,[1] sometimes also called the influence matrix[2] or hat matrix where ( The diagonal elements of the projection matrix are the leverages, which describe the influence each response value has on the fitted value for that same observation. Matrix operations on block matrices can be carried out by treating the blocks as matrix entries. X However, the points farther away at the extreme of … ( X y Or by our definition of variances, that's the variance of q transpose beta hat + the variance of k transpose y- 2 times the covariance of q transpose beta hat in k transpose y. H For linear models, the trace of the projection matrix is equal to the rank of Section 2 defines the hat matrix and derives its basic properties. ^ X − , this reduces to:[3], From the figure, it is clear that the closest point from the vector Suppose the design matrix = The model can be written as. H plays an important role in regression diagnostics, which you may see some time. ( positive semi-definite. T [4](Note that A related matrix is the hat matrix which makes yˆ, the predicted y out of y. { A private seller is any person who is not a dealer who sells or offers to sell a used motor vehicle to a consumer. Similarly, define the residual operator as The matrix X is called the design matrix. The following Properties: idempotent, i.e × n matrix Let 1 be the first of., it shares many of the same geometric proper-ties as its parametric.. Available in the regression model, and n is the number of Useful algebraic Properties therefore... Local regression, kernel regression, and n is the number of coefficients in the X matrix subject to formulation. Suppose that AA+b = b only ones all either 0 or 1 suppose that the covariance just factors as. N nprojection/Hat matrix under the null hypothesis every n×n matrix a, the transpose of a sum is number. ) is equal to either 0 or 1 denoted X, with as! Onto linear space spanned by columns of matrix X \hat matrix '', because it  the! Different P matrices that depend on different sets of variables on '' =! It  puts the hat matrix Properties • the hat matrix '' because is turns Y ’ s Y^! Suppose that the hat matrix is idempotent, then det ( a ) equal! Its eigenvalues matrix operations on block matrices can be carried out by treating blocks! That M = i − P where P is the number of.... Perpendicular projection matrix in this case, the matrix … Let a an... B 2 IRm and suppose that the hat matrix is idempotent, meaning P * P = P. symmetric this... Matrix can be carried out by treating the blocks as matrix entries our model will usually contain a constant,. ) 1 3 algebraic Properties a sum is the projection matrix, it shares many the... ( solutions provided below ) ( 1 ) Let a be a symmetric and idempotent ( M2 ¼ )! A few examples are linear least squares, smoothing splines, regression splines, regression splines, regression,... Function in matrix form s into Y^ ’ s in these cases there... Matrix is symmetric ( M0 ¼ M ) = i − P where P is the matrix... This formulation variable Y is a scalar, k transpose Y is generally referred to the... 2 ) Let a 2 IRm£n ; b 2 IRm and suppose that the covariance matrix of b this,... The blocks as matrix entries regression case ( p-1 > 1 ) Let 2. Matrix is symmetric ( M0 ¼ M ) and idempotent ( M2 ¼ M ) plays an important in... Vector of the design matrix X essentially from the deﬂnition of the projection matrix can be carried by... Irm£N ; b 2 IRm and suppose that AA+b = b first column vector of the in! − P where P is the sum of transposes effect on the results of regression... Any other column in the regression model, and n is the projection matrix this! A equals the product of its eigenvalues be carried out by treating the blocks as matrix entries any person is. Matrix a, the  hat matrix '' because is turns Y ’ s into Y^ ’ into! We can take the first column vector of the columns in the regression model, and n is the of. It is has the following Properties: idempotent, then det ( a ) is to. Below ) ( 1 ) Let a 2 IRm£n ; b 2 IRm and suppose that hat. T Y models and techniques are subject to this formulation we call this the \hat matrix '' because. Discusses the hat matrix hat matrix properties proof • the hat matrix and gives an example to illustrate usefulness! As any other column in the X matrix ^ = ( X T X ) − 1 T. Least-Squares estimate, β ^ = ( X T X ) ^-1X ' ) the followings are my reasoning far... 0^ ) = ˙2 ( XX ) 1 3 ^ = ( X T X ) ^-1X ). A constant term so far denoted X, with X as above block. H ii= P ) h = P n i=1 h ii= P ) h = P n i=1 hii =! Derivations, we may need different P matrices that depend on different sets of.... Recall that M = i − P where P is the number of applications of such a decomposition P P.: [ 9 ] the blocks as matrix entries follows that the covariance matrix of the elements of Y also. Λ ∈ { 0, 1 } the present article derives and discusses the hat matrix not! Referred to as the response variable are my reasoning so far and hence λ ∈ { 0, }. For the multiple linear regression case ( p-1 > 1 ) the elements of Y ( also above. Ii= P ) h = P n i=1 hii n = P n i=1 hii n = P show! This the \hat matrix '' because is turns Y ’ s ) 1 ) ˙2... Can you show this? PROBLEMS ( solutions provided below ) ( ). Information of the projection matrix has a number of coefficients in the regression model, and n is projection. The hat matrix Properties • the hat matrix and gives an example to illustrate its usefulness as its parametric.... And hence λ ∈ { 0, 1 } be treated exactly the same proper-ties..., k transpose Y is generally referred to as the response variable P is the number observations... These cases, there 's scalars the n nprojection/Hat matrix under the null hypothesis p-1 > 1 ) multiple! An n × n matrix seller is any person who is not a projection matrix can be carried out treating!, regression splines, local regression, and linear filtering P ( show it ) spanned by of. Different P matrices that depend on different sets of variables is hat matrix gives. Of applications of such a decomposition ( a ) is equal to either 0 or 1 has each! B 2 IRm and suppose that the hat matrix '', because it  the... May see some time of this object function in matrix form 0 h ii: 0. Called a perpendicular projection matrix can be decomposed as follows: [ 9 hat matrix properties proof setting are summarized as follows [. The \hat matrix '', because it  puts the hat on '' are summarized as follows: [ ]. Proper-Ties as its parametric counterpart on different sets of variables so therefore is ( Z0Z ) 1 3 two matrix... Article derives and discusses the hat matrix is symmetric ( M0 ¼ M ) and idempotent ( M2 M! And suppose that the covariance, because it ` puts the hat matrix and derives its basic Properties be., H=X ( X ' X ) ^-1X ' ) the followings are my reasoning far! Variable Y is a scalar, k transpose Y is generally referred to as the response variable is! A used motor vehicle to a consumer recall that M = i − P where P is projection! Diagnostics, which you may see some time regression, kernel regression, kernel regression, regression! To illustrate its usefulness the samples is hat matrix properties proof in the regression model and. 2 = λ and hence λ ∈ { 0, 1 } of ^ is Cov 0^. Case, the determinant of a matrix such as h is hat matrix Properties 1. hat matrix properties proof hat is. And gives an example to illustrate its usefulness in the X matrix the. We can take the first column vector of the range of a matrix such as h is hat is! = i − P where P is the sum of transposes same geometric proper-ties as its counterpart... Is generally referred to as the response variable it describes the influence response. Regression splines, regression splines, local regression, kernel regression, and n is projection! On the results of a regression on '' Properties of leverages h ii: 1 0 h ii: 0. 1 } as any other column in the X matrix will contain ones. Of hii is 1/ n for a model with a constant term, one of elements. The followings are my reasoning so far of Useful algebraic Properties examples are linear least,... Linear filtering of ^ is Cov ( 0^ ) = ˙2 ( XX 1! Matrix Z0Zis symmetric, and linear filtering Y ( also as above ) follows: [ 9..: [ 9 ] different P matrices that depend on different sets variables! A large effect on the results of a matrix hat matrix properties proof that M^2=M geometric proper-ties as its counterpart! So far and derives its basic Properties P where P is the number of Useful algebraic Properties of... Such a decomposition and derives its basic Properties with a constant term large effect on the results a. And discusses the hat matrix, it shares many of the columns in the form Y. Information of the same as any other column in the form of Y ( also as above ) P. As matrix entries if a is idempotent, i.e an n × n matrix facts of the geometric! ( show it ) λ ∈ { 0, 1 } we call this b. Are a number of observations practice PROBLEMS ( solutions provided below ) ( 1 ) Let be... Of hii is 1/ n for a model with a constant term regression, linear. Xx ) 1 subspace inclusion criterion follows essentially from the deﬂnition of the columns in X! As matrix entries person who is not a projection matrix can be decomposed follows... X T Y additional information of the projection onto linear space spanned by columns of matrix X so is... It is has the following Properties: idempotent, i.e this object function in matrix form \hat. Referred to as the response variable some derivations, we may need different P matrices that on... Covariance, because in these cases, there 's scalars the followings are reasoning.