Stata Practice
Published:
Used webdoc init "markdown-name.md", logall
command to make this markdown page from Stata log file…
Use given datasets
sysuse dataname
Used 1980 Census state data
. sysuse dir // check the given datasets auto.dta bpwide.dta citytemp4.dta network1.dta pop2000.dta tsline1.dta voter.dta auto2.dta cancer.dta educ99gdp.dta network1a.dta sandstone.dta tsline2.dta xtline1.dta autornd.dta census.dta gnp96.dta nlsw88.dta sp500.dta uslifeexp.dta bplong.dta citytemp.dta lifeexp.dta nlswide1.dta surface.dta uslifeexp2.dta . sysuse census (1980 Census data by state) . describe Contains data from C:\Program Files\Stata16\ado\base/c/census.dta obs: 50 1980 Census data by state vars: 13 6 Apr 2018 15:43 ------------------------------------------------------------------------------------------------------------------------ storage display value variable name type format label variable label ------------------------------------------------------------------------------------------------------------------------ state str14 %-14s State state2 str2 %-2s Two-letter state abbreviation region int %-8.0g cenreg Census region pop long %12.0gc Population poplt5 long %12.0gc Pop, < 5 year pop5_17 long %12.0gc Pop, 5 to 17 years pop18p long %12.0gc Pop, 18 and older pop65p long %12.0gc Pop, 65 and older popurban long %12.0gc Urban population medage float %9.2f Median age death long %12.0gc Number of deaths marriage long %12.0gc Number of marriages divorce long %12.0gc Number of divorces ------------------------------------------------------------------------------------------------------------------------ Sorted by: . codebook region // lookup the value label used for region variable ------------------------------------------------------------------------------------------------------------------------ region Census region ------------------------------------------------------------------------------------------------------------------------ type: numeric (int) label: cenreg range: [1,4] units: 1 unique values: 4 missing .: 0/50 tabulation: Freq. Numeric Label 9 1 NE 12 2 N Cntrl 16 3 South 13 4 West
Descriptive statistics
Read the data by subgroups.
. bysort region: summarize pop // same as sort+by ------------------------------------------------------------------------------------------------------------------------ -> region = NE Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- pop | 9 5459476 5925235 511456 1.76e+07 ------------------------------------------------------------------------------------------------------------------------ -> region = N Cntrl Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- pop | 12 4905473 3750094 652717 1.14e+07 ------------------------------------------------------------------------------------------------------------------------ -> region = South Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- pop | 16 4670877 3277853 594338 1.42e+07 ------------------------------------------------------------------------------------------------------------------------ -> region = West Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- pop | 13 3320961 6217177 401851 2.37e+07 . sort region . by region: sum pop ------------------------------------------------------------------------------------------------------------------------ -> region = NE Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- pop | 9 5459476 5925235 511456 1.76e+07 ------------------------------------------------------------------------------------------------------------------------ -> region = N Cntrl Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- pop | 12 4905473 3750094 652717 1.14e+07 ------------------------------------------------------------------------------------------------------------------------ -> region = South Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- pop | 16 4670877 3277853 594338 1.42e+07 ------------------------------------------------------------------------------------------------------------------------ -> region = West Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- pop | 13 3320961 6217177 401851 2.37e+07 . tab region, sum(pop) Census | Summary of Population region | Mean Std. Dev. Freq. ------------+------------------------------------ NE | 5459475.9 5925235.4 9 N Cntrl | 4905472.5 3750093.7 12 South | 4670876.8 3277853.1 16 West | 3320960.8 6217176.6 13 ------------+------------------------------------ Total | 4518149.4 4715037.8 50 . egen average_pop = mean(pop), by(region) // make average population variable . tab region, sum(average_pop) // sd = 0 for regional average population Census | Summary of average_pop region | Mean Std. Dev. Freq. ------------+------------------------------------ NE | 5459476 0 9 N Cntrl | 4905472.5 0 12 South | 4670877 0 16 West | 3320960.8 0 13 ------------+------------------------------------ Total | 4518149.5 766394.85 50
Regression
predict y
and predict resid, residual
prefix (i.
)
. reg death pop65p Source | SS df MS Number of obs = 50 -------------+---------------------------------- F(1, 48) = 4012.20 Model | 8.4369e+10 1 8.4369e+10 Prob > F = 0.0000 Residual | 1.0094e+09 48 21028232.5 R-squared = 0.9882 -------------+---------------------------------- Adj R-squared = 0.9879 Total | 8.5379e+10 49 1.7424e+09 Root MSE = 4585.7 ------------------------------------------------------------------------------ death | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pop65p | .0769946 .0012155 63.34 0.000 .0745506 .0794386 _cons | 245.3045 896.729 0.27 0.786 -1557.69 2048.299 ------------------------------------------------------------------------------ . rvfplot, yline(0) . predict predicted_death (option xb assumed; fitted values) . predict residual_death, residuals . reg death pop65p residual_death Source | SS df MS Number of obs = 50 -------------+---------------------------------- F(2, 47) = . Model | 8.5379e+10 2 4.2689e+10 Prob > F = . Residual | 0 47 0 R-squared = 1.0000 -------------+---------------------------------- Adj R-squared = 1.0000 Total | 8.5379e+10 49 1.7424e+09 Root MSE = 0 -------------------------------------------------------------------------------- death | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------------+---------------------------------------------------------------- pop65p | .0769946 . . . . . residual_death | 1 . . . . . _cons | 245.3045 . . . . . -------------------------------------------------------------------------------- . reg predicted_death pop65p Source | SS df MS Number of obs = 50 -------------+---------------------------------- F(1, 48) = . Model | 8.4369e+10 1 8.4369e+10 Prob > F = . Residual | 0 48 0 R-squared = 1.0000 -------------+---------------------------------- Adj R-squared = 1.0000 Total | 8.4369e+10 49 1.7218e+09 Root MSE = 0 ------------------------------------------------------------------------------ predicted_~h | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- pop65p | .0769946 . . . . . _cons | 245.305 . . . . . ------------------------------------------------------------------------------ . reg death i.region Source | SS df MS Number of obs = 50 -------------+---------------------------------- F(3, 46) = 0.86 Model | 4.5280e+09 3 1.5093e+09 Prob > F = 0.4693 Residual | 8.0851e+10 46 1.7576e+09 R-squared = 0.0530 -------------+---------------------------------- Adj R-squared = -0.0087 Total | 8.5379e+10 49 1.7424e+09 Root MSE = 41924 ------------------------------------------------------------------------------ death | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- region | N Cntrl | -9684.611 18486.77 -0.52 0.603 -46896.55 27527.33 South | -12333.17 17468.35 -0.71 0.484 -47495.15 22828.8 West | -27888.19 18179.49 -1.53 0.132 -64481.6 8705.224 | _cons | 52996.11 13974.68 3.79 0.000 24866.53 81125.69 ------------------------------------------------------------------------------ . reg death i.region, baselevels Source | SS df MS Number of obs = 50 -------------+---------------------------------- F(3, 46) = 0.86 Model | 4.5280e+09 3 1.5093e+09 Prob > F = 0.4693 Residual | 8.0851e+10 46 1.7576e+09 R-squared = 0.0530 -------------+---------------------------------- Adj R-squared = -0.0087 Total | 8.5379e+10 49 1.7424e+09 Root MSE = 41924 ------------------------------------------------------------------------------ death | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- region | NE | 0 (base) N Cntrl | -9684.611 18486.77 -0.52 0.603 -46896.55 27527.33 South | -12333.17 17468.35 -0.71 0.484 -47495.15 22828.8 West | -27888.19 18179.49 -1.53 0.132 -64481.6 8705.224 | _cons | 52996.11 13974.68 3.79 0.000 24866.53 81125.69 ------------------------------------------------------------------------------ . reg death ib3.region, baselevels Source | SS df MS Number of obs = 50 -------------+---------------------------------- F(3, 46) = 0.86 Model | 4.5280e+09 3 1.5093e+09 Prob > F = 0.4693 Residual | 8.0851e+10 46 1.7576e+09 R-squared = 0.0530 -------------+---------------------------------- Adj R-squared = -0.0087 Total | 8.5379e+10 49 1.7424e+09 Root MSE = 41924 ------------------------------------------------------------------------------ death | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- region | NE | 12333.17 17468.35 0.71 0.484 -22828.8 47495.15 N Cntrl | 2648.562 16010.01 0.17 0.869 -29577.92 34875.04 South | 0 (base) West | -15555.01 15654.19 -0.99 0.326 -47065.26 15955.23 | _cons | 40662.94 10481.01 3.88 0.000 19565.75 61760.12 ------------------------------------------------------------------------------ . reg death ibn.region, noconstant Source | SS df MS Number of obs = 50 -------------+---------------------------------- F(4, 46) = 11.73 Model | 8.2439e+10 4 2.0610e+10 Prob > F = 0.0000 Residual | 8.0851e+10 46 1.7576e+09 R-squared = 0.5049 -------------+---------------------------------- Adj R-squared = 0.4618 Total | 1.6329e+11 50 3.2658e+09 Root MSE = 41924 ------------------------------------------------------------------------------ death | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- region | NE | 52996.11 13974.68 3.79 0.000 24866.53 81125.69 N Cntrl | 43311.5 12102.43 3.58 0.001 18950.57 67672.43 South | 40662.94 10481.01 3.88 0.000 19565.75 61760.12 West | 25107.92 11627.64 2.16 0.036 1702.698 48513.15 ------------------------------------------------------------------------------ . tab region, sum(death) Census | Summary of Number of deaths region | Mean Std. Dev. Freq. ------------+------------------------------------ NE | 52,996.11 59,289.85 9 N Cntrl | 43,311.5 33,059.96 12 South | 40,662.94 27,726.55 16 West | 25,107.92 49,307.3 13 ------------+------------------------------------ Total | 39,474.26 41,742.35 50
Make dataset by hand
make variable with random values monte carlo
. set obs 30 // make 30 cases, they are empty for now number of observations (_N) was 0, now 30 . gen id = _n // id number of each case . gen x = rnormal() // x has random value of normal distribution(mean 0, sd 1) . gen y = 10*x -1 + rnormal() . reg y x Source | SS df MS Number of obs = 30 -------------+---------------------------------- F(1, 28) = 2241.08 Model | 1950.7145 1 1950.7145 Prob > F = 0.0000 Residual | 24.3722057 28 .870435918 R-squared = 0.9877 -------------+---------------------------------- Adj R-squared = 0.9872 Total | 1975.08671 29 68.1064382 Root MSE = .93297 ------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- x | 9.775934 .2065046 47.34 0.000 9.352929 10.19894 _cons | -1.002713 .170671 -5.88 0.000 -1.352317 -.6531092 ------------------------------------------------------------------------------
Monte carlo is one command that I prefer to make my own dataset.
. ** monte carlo simulation . program model, eclass 1. drop _all 2. set obs 30 3. gen x = rnormal() 4. gen y = 10*x - 1 + rnormal() 5. reg y x 6. test x = 10 7. end . simulate _b _se, reps(100): model // run the model 100 times and get coefficients and SEs command: model Simulations (100) ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 .................................................. 50 .................................................. 100 . sum Variable | Obs Mean Std. Dev. Min Max -------------+--------------------------------------------------------- _b_x | 100 10.00136 .175676 9.666389 10.47112 _b_cons | 100 -.9900542 .1853214 -1.599854 -.5627291 _se_x | 100 .1888646 .0352951 .1100566 .3144236 _se_cons | 100 .1846541 .0254627 .1050955 .2459035