Sei sulla pagina 1di 5

14 January 2014

Prepared by Sarah Bales

Summary of basic STATA commands and


syntax
STATA
Function
command
Commands for loading data
use
Load Stata format dataset
insheet
Read ASCII (text) data
created by a spreadsheet
(e.g. tab or commadelimited data)
edit
Edit data with Data Editor,
allows manual data entry
clear
Clear memory
save
Save datasets

Syntax
use filename [, clear nolabel]
insheet [varlist] using filename [,
options]
Common options include [tab, comma,
clear]
edit [varlist] [if] [in] [, nolabel]

clear
save [filename] [, save_options]
Common options include [replace]
Commands for checking and verifying data
describe
datafile name and path, n, d [varlist] [, options]
no. of variables, variable
common option: fullnames =>does not
names, types and labels,
abbreviate variable names
how the data are sorted.
(Note type includes
string, and various
numeric types requiring
different amounts of
memory)
summarize
mean, sd, min, max, n
su [varlist] [if] [, options]
su,detail
percentiles, medians,
su [varlist] [if] ,detail
kurtosis, skewness,
highest and lowest values
ci
standard errors (SE) and
ci [varlist] [if] [, options]
confidence intervals
common option is: level(#); default is
level(95)
browse
allows you to browse (no
edit [varlist] [if] [,nolabel]
edit
changes allowed) or edit
Note: nolabel displays codes instead of
(changes allowed) data as labels.
if it were in spreadsheet
form
list
lists values of variables
list [varlist]
list in 1/n (will list the first n
observations).
sort
sort the data by the
sort varlist
gsort
variable(s) indicated in
gsort [+|-] varname [[+|-] varname ...]
ascending order; gsort
allows sorting also in
descending order

14 January 2014
STATA
command
by
bysort

tab

Prepared by Sarah Bales

Function

Syntax

Repeat Stata commands


on subsets of the data
sorted by varlist. bysort
first sorts by varlist then
performs the command
One-way or two-way
frequency tables.

by varlist [,sort]: stata_cmd


bysort varlist: stata_cmd

tab, sum

One or two-way tables of


summary statistics
including mean, standard
deviation, frequency and
observations

table

Table of summary
statistics

if exp

Allows analysis of specific


subsets.

tabulate varname1 [varname2] [if] [,


options]
Common options include: [,missing,
nofreq, nolabel]
Additional common options for two-way
tables: [,chi2, col, row, cell, nofreq]
tabulate varname1 [varname2] [if] [,
summarize(varname3) otheroptions]
Common otheroptions include:
[no]means,
[no]standard, [no]freq,
[no]obs
Prefix [no] suppresses the item.
table rowvar [colvar] [if] [,contents(clist)
otheroptions]
Ex: table region ,c(mean income n
income)
shows mean income and number of obs.
(n) for each region.
table region gender ,c(mean
income) shows the mean income for
male and female in each region.
if comes before the , for options
exp include >, >=, ==, <=, <, ~=, !=
and can be combined with &, | (=or)

Basic programming commands


* /* */ // ///
Symbols to indicate
comments by ignoring
text following or in
between the symbols

log

set mem

* comments ignored by STATA in running


a do-file
/* the part in between /* */ is ignored */
// ignores all text after this symbol in
same line
/// ignores all text after this symbol in
same line and ignores carriage return to
allow commands to extend beyone one
line
log using filename [, append replace
[text|smcl] name(logname)]

Used to record commands


and results of running do
file in a text or smcl
(STATA language) file.
To assign RAM for STATAs
use

set mem 400m [,permanently] (depends


on the size of your dataset, and whether
you want it permanently set or just for
this STATA session)

14 January 2014
STATA
command
set more

global

Prepared by Sarah Bales

Function

Syntax

set more on allows the


program to run
continuously without
stopping. set more of
stops after each page
displayed and waits for
you to push enter to
continue displaying.
Used to replace long text
that will be repeated in
the do-file

set more {on|off} [, permanently]

For example for long paths indicating


where your .dta files are you can create
the macro as follows:
global pathname
"d:\data\original\women data\"
Then when you use the data you type:
use $pathname\datafile.dta
instead of
use d:\data\original\women
data\datafile.dta
cd "drive:directory_name"

cd

Change directory where


STATA accesses or saves
data files
Basic variable creation /alteration commands
drop
Eliminates variables or
To drop variables: drop varlist
observations
To drop observations: drop if exp
keep
Keeps variables or
To keep variables: keep varlist
observations and drops
To keep observations: keep if exp
everything else
generate
Create contents of a
gen newvar =exp [if]
variable
replace
Replace contents of
replace oldvar =exp [if]
existing oldvar
recode
Recodes categorical
recode varname (rule) [(rule) ...]
variables
For example if gender is coded male=1,
female=2, but we want male=1,
female=0 we would type:
recode gender 2=0
If we want to recode age into agegroups
(under 15, 16-65 and above 65 to 100):
generate agegroup=recode(age,15,
65,100)

14 January 2014

Prepared by Sarah Bales

STATA
command
egen

Function

Syntax

Extensions to generate
variable, allows creation of
max, min, sd, mean, total,
count of non-missing,
deviation from mean, etc.

tab, gen

Creates dummy variables


for categorical variables.
Dummy variable indicates
1 if condition is true and 0
otherwise.
Change numeric values to
missing. STATA missing
value code is indicated by
a dot .
Change missing values to
numeric values

egen [type] newvar = fcn(arguments)


[if] [, options]
Ex 1: to get a total income from the
variables wageinc rentinc farminc, we
could type:
egen totincome=rsum(wageinc rentinc
farminc)
Ex 2: to get highest education of any hh
member:
bysort hhid: egen maxeduc=max(educ)
Pay attention to how this treats missing
values!!
tab varname ,gen(stubname)
e.g. tab region,gen(reg) => to create
dummy variables for each region.

mvdecode

mvencode

rename
label

Renames the
Creates label
Creates label
Creates label

destring,rep
lace

Convert string to numeric


variable.

compress

Attempts to reduce the


amount of memory used
by your data by
converting to lower data
types, but without
reducing precision. Useful
when dataset gets very
big and uses too much
RAM.

mvdecode varlist [if] , mv(numlist)


mvdecode _all ,mv(-999)
This transforms -999 to .
mvencode varlist [if] , mv(#)[override]
mvencode _all ,mv(-999)
This transforms . to -999
The override option allows the
transformation even if -999 is an existing
value in the data series.
rename old_varname new_varname
label data ["label"]
label variable varname ["label"]
label define lblname # "label" [#
"label" ...] [, add modify replace]
label values varlist [lblname]
destring [varlist] , replace
[destring_options]
Options include: ignore(chars) or
force (any remaining strings are
converted to missing)
compress [varlist]

variable
for datasets
for variables
for values

14 January 2014
STATA
command
foreach

Prepared by Sarah Bales

Function

Syntax

Loops over items in


varlist.
Pay attention, opening `
differs from closing
Commands for manipulating datasets
merge
Merge datasets to add
variables by matching
observations according
to some id code indicated
by varlist.
Example of 1:m merge:
housing information has
single observation per
household. Education has
one observation per
household member, but
one household contains
multiple members.
append
Add observations to
variables

collapse

Makes a dataset of
summary statistics

reshape

Convert data from wide to


long form and vice versa.
Used with panel data.

foreach `x in varlist {
commands referring to `x' in
varlist
}
merge 1:1 varlist using filename.dta
(matches individual observations)
merge 1:m varlist using filename.dta
(matches 1 observation in using file to
multiple observations in merging file).
merge m:1 varlist using filename.dta
(matches multiple observations in using
file to single observation in merging
file).

append using filename [, options]


Useful option is generate(newvar) =>
specifies the name of a variable to be
created that will mark the source of
observations as master or appending
file.
collapse clist [if] , options]
Where clist is either:
[(stat)] varlist
[(stat)] target_var=varname
And stat can be mean, sd, semean, sum,
count, max, min, etc.
reshape wide stub, i(i) j(j) (=>j indicates
existing var)
reshape long stub, i(i) j(j) (=>j indicates
new var)

Potrebbero piacerti anche