*log using OLS_examples.log *** Open file (adjust your path) use reg.dta, clear *** Open window to visualise data structure: browse *** Example of saving data in an "old" format, to be read by previous releases of STATA saveold reg.dta, version(11) replace *** Iniatial description of dataset: des *** Basic Summary stats: sum sum, d pwcorr *** Generate command: gen ln_wage=log(real_wage) **** Practicing with OLS *************** **************************************** *Basic estimates reg ln_wage grade reg ln_wage grade, noconst *Robust Std Errors: reg ln_wage grade, robust reg ln_wage grade, vce(robust) reg ln_wage grade, vce(hc2) reg ln_wage grade, cluster(idcode) reg ln_wage grade, cluster(year) egen pippo=group(state black) reg ln_wage grade, cluster(pippo) drop pippo reg ln_wage grade, vce(boot, rep(50)) * Predictions reg ln_wage grade tenure black, cluster(id) predict fitted_res, resid predict fitted_Y, xb drop fitted_res fitted_Y * Adding dummies (keeping them or not as extra vars in the dataset) reg ln_wage grade i.year xi: reg ln_wage i.grade drop _Igrade_* ************* Omitted variable bias ************************************ * Reference model with a lot of variables potentially omitted reg ln_wage grade, cluster(idcode) * We expect positive bias due to omitting black: let's see this reg ln_wage grade black, cluster(idcode) * We expect perhaps a positive bias due to omitting tenure: let's verify this reg ln_wage grade tenure, cluster(idcode) * Look at correlations in the data helps guessing what to expect: pwcorr ln_wage grade black tenure ******** IV regression via 2SLS ******************************* **** Example with data on housing values: study effect of house value on rent price * Data , available on-line: webuse hsng2.dta, clear * Or provided by me: use hsng2.dta, clear saveold hsng2.dta, version(11) replace des *** Compare OLS and IV (with IV given by family income) reg rent hsngval help ivregress ivregress 2sls rent (hsngval = faminc) *** Check that ivreg estimates correspond to 2 separate OLS regressions (but notice SE are different): * 2SLS by hands: reg hsngval faminc predict X_hat, xb reg rent X_hat * IVREG: ivregress 2sls rent (hsngval = faminc) *** Hausman test for endogeneity via command "hausman" reg rent hsngval est store OLS ivregress 2sls rent (hsngval = faminc) est store IV * Note: the second estimator you pass to hausman command must be the "inconsistent under the null estimator" (the OLS in this case) hausman . OLS * Problem: hausman command not work if estimates were obtained via robust S.E. * See the error: reg rent hsngval, rob est store OLS ivregress 2sls rent (hsngval = faminc), rob est store IV hausman . OLS * See "help hausman": The "suest" command provides an alternative, but it also not always work...alternative via auxiliary regression *** Hausman test via auxiliary regressions * Note: test is still on the null "X is not endog." reg hsngval faminc, robust predict fittedX, xb predict fittedV, res reg rent hsngval fittedX, robust reg rent hsngval fittedV, robust ****** Other useful features of ivregress: ** use with more IVs: ivregress 2sls rent pcturban (hsngval = faminc i.region) ** report first step results: significance of IV and the F-stat provide initial evidence that IV are ok ivregress 2sls rent pcturban (hsngval = faminc i.region), first *** ivregress postestimation provide a battery of diagnostics *** see help "ivregress postestimation" ivregress 2sls rent pcturban (hsngval = faminc i.region) * First stage useful statistics: estat firststage * Endogeneity tests estat endogenous estat overid *** NB: a number of additional programs/commands are available via web, especially on weak IVs