Stata Practice

9 minute read

Published:

Used webdoc init "markdown-name.md", logall command to make this markdown page from Stata log file…

Use given datasets

sysuse dataname Used 1980 Census state data

. sysuse dir // check the given datasets
  auto.dta        bpwide.dta      citytemp4.dta   network1.dta    pop2000.dta     tsline1.dta     voter.dta
  auto2.dta       cancer.dta      educ99gdp.dta   network1a.dta   sandstone.dta   tsline2.dta     xtline1.dta
  autornd.dta     census.dta      gnp96.dta       nlsw88.dta      sp500.dta       uslifeexp.dta
  bplong.dta      citytemp.dta    lifeexp.dta     nlswide1.dta    surface.dta     uslifeexp2.dta

. sysuse census 
(1980 Census data by state)

. describe 

Contains data from C:\Program Files\Stata16\ado\base/c/census.dta
  obs:            50                          1980 Census data by state
 vars:            13                          6 Apr 2018 15:43
------------------------------------------------------------------------------------------------------------------------
              storage   display    value
variable name   type    format     label      variable label
------------------------------------------------------------------------------------------------------------------------
state           str14   %-14s                 State
state2          str2    %-2s                  Two-letter state abbreviation
region          int     %-8.0g     cenreg     Census region
pop             long    %12.0gc               Population
poplt5          long    %12.0gc               Pop, < 5 year
pop5_17         long    %12.0gc               Pop, 5 to 17 years
pop18p          long    %12.0gc               Pop, 18 and older
pop65p          long    %12.0gc               Pop, 65 and older
popurban        long    %12.0gc               Urban population
medage          float   %9.2f                 Median age
death           long    %12.0gc               Number of deaths
marriage        long    %12.0gc               Number of marriages
divorce         long    %12.0gc               Number of divorces
------------------------------------------------------------------------------------------------------------------------
Sorted by: 

. codebook region // lookup the value label used for region variable

------------------------------------------------------------------------------------------------------------------------
region                                                                                                     Census region
------------------------------------------------------------------------------------------------------------------------

                  type:  numeric (int)
                 label:  cenreg

                 range:  [1,4]                        units:  1
         unique values:  4                        missing .:  0/50

            tabulation:  Freq.   Numeric  Label
                             9         1  NE
                            12         2  N Cntrl
                            16         3  South
                            13         4  West

Descriptive statistics

Read the data by subgroups.

. bysort region: summarize pop // same as sort+by

------------------------------------------------------------------------------------------------------------------------
-> region = NE

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         pop |          9     5459476     5925235     511456   1.76e+07

------------------------------------------------------------------------------------------------------------------------
-> region = N Cntrl

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         pop |         12     4905473     3750094     652717   1.14e+07

------------------------------------------------------------------------------------------------------------------------
-> region = South

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         pop |         16     4670877     3277853     594338   1.42e+07

------------------------------------------------------------------------------------------------------------------------
-> region = West

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         pop |         13     3320961     6217177     401851   2.37e+07


. sort region

. by region: sum pop

------------------------------------------------------------------------------------------------------------------------
-> region = NE

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         pop |          9     5459476     5925235     511456   1.76e+07

------------------------------------------------------------------------------------------------------------------------
-> region = N Cntrl

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         pop |         12     4905473     3750094     652717   1.14e+07

------------------------------------------------------------------------------------------------------------------------
-> region = South

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         pop |         16     4670877     3277853     594338   1.42e+07

------------------------------------------------------------------------------------------------------------------------
-> region = West

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
         pop |         13     3320961     6217177     401851   2.37e+07

. tab region, sum(pop) 

     Census |        Summary of Population
     region |        Mean   Std. Dev.       Freq.
------------+------------------------------------
         NE |   5459475.9   5925235.4           9
    N Cntrl |   4905472.5   3750093.7          12
      South |   4670876.8   3277853.1          16
       West |   3320960.8   6217176.6          13
------------+------------------------------------
      Total |   4518149.4   4715037.8          50

. egen average_pop = mean(pop), by(region) // make average population variable 

. tab region, sum(average_pop) // sd = 0 for regional average population 

     Census |       Summary of average_pop
     region |        Mean   Std. Dev.       Freq.
------------+------------------------------------
         NE |     5459476           0           9
    N Cntrl |   4905472.5           0          12
      South |     4670877           0          16
       West |   3320960.8           0          13
------------+------------------------------------
      Total |   4518149.5   766394.85          50

Regression

predict y and predict resid, residual prefix (i.)

. reg death pop65p 

      Source |       SS           df       MS      Number of obs   =        50
-------------+----------------------------------   F(1, 48)        =   4012.20
       Model |  8.4369e+10         1  8.4369e+10   Prob > F        =    0.0000
    Residual |  1.0094e+09        48  21028232.5   R-squared       =    0.9882
-------------+----------------------------------   Adj R-squared   =    0.9879
       Total |  8.5379e+10        49  1.7424e+09   Root MSE        =    4585.7

------------------------------------------------------------------------------
       death |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      pop65p |   .0769946   .0012155    63.34   0.000     .0745506    .0794386
       _cons |   245.3045    896.729     0.27   0.786     -1557.69    2048.299
------------------------------------------------------------------------------

. rvfplot, yline(0) 

. predict predicted_death
(option xb assumed; fitted values)

. predict residual_death, residuals

. reg death pop65p residual_death

      Source |       SS           df       MS      Number of obs   =        50
-------------+----------------------------------   F(2, 47)        =         .
       Model |  8.5379e+10         2  4.2689e+10   Prob > F        =         .
    Residual |           0        47           0   R-squared       =    1.0000
-------------+----------------------------------   Adj R-squared   =    1.0000
       Total |  8.5379e+10        49  1.7424e+09   Root MSE        =         0

--------------------------------------------------------------------------------
         death |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
        pop65p |   .0769946          .        .       .            .           .
residual_death |          1          .        .       .            .           .
         _cons |   245.3045          .        .       .            .           .
--------------------------------------------------------------------------------

. reg predicted_death pop65p

      Source |       SS           df       MS      Number of obs   =        50
-------------+----------------------------------   F(1, 48)        =         .
       Model |  8.4369e+10         1  8.4369e+10   Prob > F        =         .
    Residual |           0        48           0   R-squared       =    1.0000
-------------+----------------------------------   Adj R-squared   =    1.0000
       Total |  8.4369e+10        49  1.7218e+09   Root MSE        =         0

------------------------------------------------------------------------------
predicted_~h |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      pop65p |   .0769946          .        .       .            .           .
       _cons |    245.305          .        .       .            .           .
------------------------------------------------------------------------------

. reg death i.region 

      Source |       SS           df       MS      Number of obs   =        50
-------------+----------------------------------   F(3, 46)        =      0.86
       Model |  4.5280e+09         3  1.5093e+09   Prob > F        =    0.4693
    Residual |  8.0851e+10        46  1.7576e+09   R-squared       =    0.0530
-------------+----------------------------------   Adj R-squared   =   -0.0087
       Total |  8.5379e+10        49  1.7424e+09   Root MSE        =     41924

------------------------------------------------------------------------------
       death |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      region |
    N Cntrl  |  -9684.611   18486.77    -0.52   0.603    -46896.55    27527.33
      South  |  -12333.17   17468.35    -0.71   0.484    -47495.15     22828.8
       West  |  -27888.19   18179.49    -1.53   0.132     -64481.6    8705.224
             |
       _cons |   52996.11   13974.68     3.79   0.000     24866.53    81125.69
------------------------------------------------------------------------------

. reg death i.region, baselevels

      Source |       SS           df       MS      Number of obs   =        50
-------------+----------------------------------   F(3, 46)        =      0.86
       Model |  4.5280e+09         3  1.5093e+09   Prob > F        =    0.4693
    Residual |  8.0851e+10        46  1.7576e+09   R-squared       =    0.0530
-------------+----------------------------------   Adj R-squared   =   -0.0087
       Total |  8.5379e+10        49  1.7424e+09   Root MSE        =     41924

------------------------------------------------------------------------------
       death |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      region |
         NE  |          0  (base)
    N Cntrl  |  -9684.611   18486.77    -0.52   0.603    -46896.55    27527.33
      South  |  -12333.17   17468.35    -0.71   0.484    -47495.15     22828.8
       West  |  -27888.19   18179.49    -1.53   0.132     -64481.6    8705.224
             |
       _cons |   52996.11   13974.68     3.79   0.000     24866.53    81125.69
------------------------------------------------------------------------------

. reg death ib3.region, baselevels

      Source |       SS           df       MS      Number of obs   =        50
-------------+----------------------------------   F(3, 46)        =      0.86
       Model |  4.5280e+09         3  1.5093e+09   Prob > F        =    0.4693
    Residual |  8.0851e+10        46  1.7576e+09   R-squared       =    0.0530
-------------+----------------------------------   Adj R-squared   =   -0.0087
       Total |  8.5379e+10        49  1.7424e+09   Root MSE        =     41924

------------------------------------------------------------------------------
       death |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      region |
         NE  |   12333.17   17468.35     0.71   0.484     -22828.8    47495.15
    N Cntrl  |   2648.562   16010.01     0.17   0.869    -29577.92    34875.04
      South  |          0  (base)
       West  |  -15555.01   15654.19    -0.99   0.326    -47065.26    15955.23
             |
       _cons |   40662.94   10481.01     3.88   0.000     19565.75    61760.12
------------------------------------------------------------------------------

. reg death ibn.region, noconstant

      Source |       SS           df       MS      Number of obs   =        50
-------------+----------------------------------   F(4, 46)        =     11.73
       Model |  8.2439e+10         4  2.0610e+10   Prob > F        =    0.0000
    Residual |  8.0851e+10        46  1.7576e+09   R-squared       =    0.5049
-------------+----------------------------------   Adj R-squared   =    0.4618
       Total |  1.6329e+11        50  3.2658e+09   Root MSE        =     41924

------------------------------------------------------------------------------
       death |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      region |
         NE  |   52996.11   13974.68     3.79   0.000     24866.53    81125.69
    N Cntrl  |    43311.5   12102.43     3.58   0.001     18950.57    67672.43
      South  |   40662.94   10481.01     3.88   0.000     19565.75    61760.12
       West  |   25107.92   11627.64     2.16   0.036     1702.698    48513.15
------------------------------------------------------------------------------

. tab region, sum(death)

     Census |     Summary of Number of deaths
     region |        Mean   Std. Dev.       Freq.
------------+------------------------------------
         NE |   52,996.11   59,289.85           9
    N Cntrl |    43,311.5   33,059.96          12
      South |   40,662.94   27,726.55          16
       West |   25,107.92    49,307.3          13
------------+------------------------------------
      Total |   39,474.26   41,742.35          50

Make dataset by hand

make variable with random values monte carlo

. set obs 30 // make 30 cases, they are empty for now 
number of observations (_N) was 0, now 30

. gen id = _n // id number of each case

. gen x = rnormal() // x has random value of normal distribution(mean 0, sd 1)

. gen y = 10*x -1 + rnormal() 

. reg y x 

      Source |       SS           df       MS      Number of obs   =        30
-------------+----------------------------------   F(1, 28)        =   2241.08
       Model |   1950.7145         1   1950.7145   Prob > F        =    0.0000
    Residual |  24.3722057        28  .870435918   R-squared       =    0.9877
-------------+----------------------------------   Adj R-squared   =    0.9872
       Total |  1975.08671        29  68.1064382   Root MSE        =    .93297

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |   9.775934   .2065046    47.34   0.000     9.352929    10.19894
       _cons |  -1.002713    .170671    -5.88   0.000    -1.352317   -.6531092
------------------------------------------------------------------------------

Monte carlo is one command that I prefer to make my own dataset.

. ** monte carlo simulation 
. program model, eclass
  1. drop _all
  2. set obs 30 
  3. gen x = rnormal()
  4. gen y = 10*x - 1 + rnormal()
  5. reg y x
  6. test x = 10
  7. end 

. simulate _b _se, reps(100): model // run the model 100 times and get coefficients and SEs 

      command:  model

Simulations (100)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
..................................................    50
..................................................   100

. sum 

    Variable |        Obs        Mean    Std. Dev.       Min        Max
-------------+---------------------------------------------------------
        _b_x |        100    10.00136     .175676   9.666389   10.47112
     _b_cons |        100   -.9900542    .1853214  -1.599854  -.5627291
       _se_x |        100    .1888646    .0352951   .1100566   .3144236
    _se_cons |        100    .1846541    .0254627   .1050955   .2459035