Summary of Basic STATA Commands and Syntax

14 January 2014
Prepared by Sarah Bales
Summary of basic STATA commands and

syntax
STATA
Function
command
Commands for loading data
use
Load Stata format dataset
insheet
Read ASCII (text) data
created by a spreadsheet
(e.g. tab or commadelimited data)
edit
Edit data with Data Editor,
allows manual data entry
clear
Clear memory
save
Save datasets
Syntax
use filename [, clear nolabel]
insheet [varlist] using filename [,
options]
Common options include [tab, comma,
clear]
edit [varlist] [if] [in] [, nolabel]
clear
save [filename] [, save_options]
Common options include [replace]
Commands for checking and verifying data
describe
datafile name and path, n, d [varlist] [, options]
no. of variables, variable
common option: fullnames =>does not
names, types and labels,
abbreviate variable names
how the data are sorted.
(Note type includes
string, and various
numeric types requiring
different amounts of
memory)
summarize
mean, sd, min, max, n
su [varlist] [if] [, options]
su,detail
percentiles, medians,
su [varlist] [if] ,detail
kurtosis, skewness,
highest and lowest values
ci
standard errors (SE) and
ci [varlist] [if] [, options]
confidence intervals
common option is: level(#); default is
level(95)
browse
allows you to browse (no
edit [varlist] [if] [,nolabel]
edit
changes allowed) or edit
Note: nolabel displays codes instead of
(changes allowed) data as labels.
if it were in spreadsheet
form
list
lists values of variables
list [varlist]
list in 1/n (will list the first n
observations).
sort
sort the data by the
sort varlist
gsort
variable(s) indicated in
gsort [+|-] varname [[+|-] varname ...]
ascending order; gsort
allows sorting also in
descending order
14 January 2014
STATA
command
by
bysort
tab
Function
Syntax
Repeat Stata commands

on subsets of the data
sorted by varlist. bysort
first sorts by varlist then
performs the command
One-way or two-way
frequency tables.
by varlist [,sort]: stata_cmd

bysort varlist: stata_cmd
tab, sum
One or two-way tables of

summary statistics
including mean, standard
deviation, frequency and
observations
table
Table of summary
statistics
if exp
Allows analysis of specific

subsets.
tabulate varname1 [varname2] [if] [,

options]
Common options include: [,missing,
nofreq, nolabel]
Additional common options for two-way
tables: [,chi2, col, row, cell, nofreq]
tabulate varname1 [varname2] [if] [,
summarize(varname3) otheroptions]
Common otheroptions include:
[no]means,
[no]standard, [no]freq,
[no]obs
Prefix [no] suppresses the item.
table rowvar [colvar] [if] [,contents(clist)
otheroptions]
Ex: table region ,c(mean income n
income)
shows mean income and number of obs.
(n) for each region.
table region gender ,c(mean
income) shows the mean income for
male and female in each region.
if comes before the , for options
exp include >, >=, ==, <=, <, ~=, !=
and can be combined with &, | (=or)
Basic programming commands

* /* */ // ///
Symbols to indicate
comments by ignoring
text following or in
between the symbols
log
set mem
* comments ignored by STATA in running

a do-file
/* the part in between /* */ is ignored */
// ignores all text after this symbol in
same line
/// ignores all text after this symbol in
same line and ignores carriage return to
allow commands to extend beyone one
line
log using filename [, append replace
[text|smcl] name(logname)]
Used to record commands

and results of running do
file in a text or smcl
(STATA language) file.
To assign RAM for STATAs
use
set mem 400m [,permanently] (depends

on the size of your dataset, and whether
you want it permanently set or just for
this STATA session)
14 January 2014
STATA
command
set more
global
Function
Syntax
set more on allows the

program to run
continuously without
stopping. set more of
stops after each page
displayed and waits for
you to push enter to
continue displaying.
Used to replace long text
that will be repeated in
the do-file
set more {on|off} [, permanently]
For example for long paths indicating

where your .dta files are you can create
the macro as follows:
global pathname
"d:\data\original\women data\"
Then when you use the data you type:
use $pathname\datafile.dta
instead of
use d:\data\original\women
data\datafile.dta
cd "drive:directory_name"
cd
Change directory where

STATA accesses or saves
data files
Basic variable creation /alteration commands
drop
Eliminates variables or
To drop variables: drop varlist
observations
To drop observations: drop if exp
keep
Keeps variables or
To keep variables: keep varlist
observations and drops
To keep observations: keep if exp
everything else
generate
Create contents of a
gen newvar =exp [if]
variable
replace
Replace contents of
replace oldvar =exp [if]
existing oldvar
recode
Recodes categorical
recode varname (rule) [(rule) ...]
variables
For example if gender is coded male=1,
female=2, but we want male=1,
female=0 we would type:
recode gender 2=0
If we want to recode age into agegroups
(under 15, 16-65 and above 65 to 100):
generate agegroup=recode(age,15,
65,100)
14 January 2014
STATA
command
egen
Function
Syntax
Extensions to generate
variable, allows creation of
max, min, sd, mean, total,
count of non-missing,
deviation from mean, etc.
tab, gen
Creates dummy variables

for categorical variables.
Dummy variable indicates
1 if condition is true and 0
otherwise.
Change numeric values to
missing. STATA missing
value code is indicated by
a dot .
Change missing values to
numeric values
egen [type] newvar = fcn(arguments)

[if] [, options]
Ex 1: to get a total income from the
variables wageinc rentinc farminc, we
could type:
egen totincome=rsum(wageinc rentinc
farminc)
Ex 2: to get highest education of any hh
member:
bysort hhid: egen maxeduc=max(educ)
Pay attention to how this treats missing
values!!
tab varname ,gen(stubname)
e.g. tab region,gen(reg) => to create
dummy variables for each region.
mvdecode
mvencode
rename
label
Renames the
Creates label
Creates label
Creates label
destring,rep
lace
Convert string to numeric

variable.
compress
Attempts to reduce the

amount of memory used
by your data by
converting to lower data
types, but without
reducing precision. Useful
when dataset gets very
big and uses too much
RAM.
mvdecode varlist [if] , mv(numlist)

mvdecode _all ,mv(-999)
This transforms -999 to .
mvencode varlist [if] , mv(#)[override]
mvencode _all ,mv(-999)
This transforms . to -999
The override option allows the
transformation even if -999 is an existing
value in the data series.
rename old_varname new_varname
label data ["label"]
label variable varname ["label"]
label define lblname # "label" [#
"label" ...] [, add modify replace]
label values varlist [lblname]
destring [varlist] , replace
[destring_options]
Options include: ignore(chars) or
force (any remaining strings are
converted to missing)
compress [varlist]
variable
for datasets
for variables
for values
14 January 2014
STATA
command
foreach
Function
Syntax
Loops over items in

varlist.
Pay attention, opening `
differs from closing
Commands for manipulating datasets
merge
Merge datasets to add
variables by matching
observations according
to some id code indicated
by varlist.
Example of 1:m merge:
housing information has
single observation per
household. Education has
one observation per
household member, but
one household contains
multiple members.
append
Add observations to
variables
collapse
Makes a dataset of
summary statistics
reshape
Convert data from wide to

long form and vice versa.
Used with panel data.
foreach `x in varlist {
commands referring to `x' in
varlist
}
merge 1:1 varlist using filename.dta
(matches individual observations)
merge 1:m varlist using filename.dta
(matches 1 observation in using file to
multiple observations in merging file).
merge m:1 varlist using filename.dta
(matches multiple observations in using
file to single observation in merging
file).
append using filename [, options]

Useful option is generate(newvar) =>
specifies the name of a variable to be
created that will mark the source of
observations as master or appending
file.
collapse clist [if] , options]
Where clist is either:
[(stat)] varlist
[(stat)] target_var=varname
And stat can be mean, sd, semean, sum,
count, max, min, etc.
reshape wide stub, i(i) j(j) (=>j indicates
existing var)
reshape long stub, i(i) j(j) (=>j indicates
new var)

Summary of Basic STATA Commands and Syntax

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Summary of Basic STATA Commands and Syntax

Caricato da

Copyright:

Formati disponibili

14 January 2014

Prepared by Sarah Bales