Linear Models

pdf version here

Homoskedastic Linear Model

Gauss Markov Assumptions

  • Linearity : $Y = X\beta$
  • Strict Exogeneity : $E(\epsilon_iX) = 0$
    • Unconditional mean of error $E(\epsilon_i) = 0$
    • Cross moment of residuals and regressors is zero, X is orthogonal to $\epsilon$ : $E(X_i \epsilon_i) = 0$
  • No multicollinearity - $rank(X) = k$
  • Spherical error variance : $E(\epsilon_i^2X) = \sigma^2 ; E(\epsilon \epsilon’X) = \sigma^2 I_n$
  • $\epsilonX \sim N(0,\sigma^2 I_n)$
  • ${(Y_i,X_i): i = 1, …, n}$ are i.i.d.

This gives us

where, under homoskedasticity, $\hat{\sigma^2} = \frac{e’e}{n-k}$, where $e = y - X\beta$

MLE

Generalised least squares

If covariance matrix of errors is known: $E(\epsilon \epsilon’ |X) = \Omega$ Restrited OLS - optimise: $L(b,\lambda) = (Y-Xb)’(Y-Xb)+2\lambda(Rb-r)$

Huber-White Sandwich ‘Robust’ SEs

Under homoskedasticity, which simplifies to $V(\beta) = \sigma^2(X’X)^{-1}$ because of the assumption $E(\epsilon \epsilon’) = \sigma^2 I$. If this is not true (i.e. heteroskedasticity is present), the variance covariance formula is

$V(\hat{\beta}) = Q^{-1}\Omega Q^{-1}$ Where, $Q=\mathbb{E}X_iXi’, \Omega = \mathbb{E}\hat{u_i}^2 X_i X_i’$

Fitted values and residuals

Define 2 matrices that are positive semidifinite, symmetric,idempotent:

  • $P_x = X(X’X)^{-1}X’$ - Hat Matrix - projector into $span(X)$
  • $M_x = I_n - P_x = I_n - X(X’X)^{-1}X’$ - Annihilator Matrix - projector into $span^{\bot}(X)$

Fitted Value: $\hat{Y} = P_x Y$ Residual: $e = M_x Y$

Model Fit : $R^2 , F$

R-squared

ESS = Explained Sum of Squares

TSS = Total Sum of Squares

RSS = Residual Sum of Squares

Adjusted $R^2$

Mean Squared Error (MSE) = $\mathbb{E}(y-X_i’\hat{\beta})$

F statistic

Wald Statistic reject $H_0$ if $W_q > \chi^2_{q,1-\alpha} = F/q$

Bonferroni correction - multiple hypothesis correction, J hypotheses : $\tau = \alpha/J$ Holms-Bonferroni : $\alpha/J \ldots \alpha/(J-n)$ each step

Instrumental Variables

Exogeneity violated when $E(X_i \epsilon_i) \neq 0$. OLS estimates no longer consistent.

IV requirements:
  • $Cov(Z,X) \neq 0$ - Relevance
  • $Cov(Z,\epsilon) = 0 ; Z \bot \epsilon$ - Exogeneity / Exclusion restriction
  • Affects Y only through X
  • $dim(Z_i) \geq dim(X_i)$
Terminology
  • First Stage : Regress X on Z
  • Reduced form : Regress Y on Z

If $dim(Z_i) > dim(X_i)$ (more instruments than endogenous regressors),

GMM

If $dim(Z_i) > dim(X_i)$,

efficient GMM : $\mathbb{V}(\hat{\beta}_{gmm}) = (Q’\Omega^{-1}Q)^{-1}$

Sargan’s Over-ID Test

$H_0 : E (Z_i(Y_i - X_i’\beta)) = 0$

Reject if $S > \chi^2_{l-k}$