Stat A Tutorial

STATA TUTORIAL
Stata is easy to learn Powerful software used widely in research
STATA tutorial: Part One

Introduction Create and using log and do files Memory Input and load data Examine data Generate and organize variables Modify data Label data Data Management Import data Data import Descriptive statistics and statistical tests.
Introduction
Open up Stata in start menu Programs Use the help files! Syntax: help [command name]
.help regress
If you want to search all the sources that has to do with regression in general, type
. findit regress
Set up your working directory

You will have to tell stata where to find your files by setting your directory path. . cd c://name of your folder
Create and using log and do files

The log command starts a log file that keeps record of the commands and outputs during your Stata session. Example: Let's create a log file named mylog.
If the filename is specified without an extension, the default extension is .smcl. But you may prefer the extension .log which enables you to open your log file in notepad or MS Word, . log using mylog.log

To save and quit your log file when your are ending a session:
. log close
To temporarily suspend sending output to your log file use:

. log off
And to resume...
. log on

When you start your next session in Stata, you can decide wether to start a new log file or add to an already existing one or replace and old one.
. log using mylog.log, append . log using mylog.log, replace

The log file will save everything from your session. The command log however, will only save your input commands.
. cmdlog using mycmdlog.txt
.txt is the default extension. Append and replace applies here also This is useful if we want to create a do-file.

You can run all your commands in your cmdlog file, so you don't have to type them all over again.
. do mycmdlog.txt
If you want to supress output

. run mycmdlog.txt
You can add the extension .do to your do-files.
Memory
Sometimes, your dataset is to big for Stata to handle. Then you can allocate more memory to reading your data: Example, increase memory to 300MB
. set memory 300m
Stata will warn your if your allocated memory is too small.
Input and load data

To create a dataset manually:
. input wage school public female 1. 2. 3. 4. 5. 6. 94 8 1 0 75 7 0 1 70 16 0 0 75 8 1 1 78 11 1 1 end
. save mydata
Input and load data

That last command will save your newly created data as mydata.dta You can choose other extensions too, like .xls If you want to work with this data, type
. use mydata
This command loads a stata file, that is files with the extension .dta
Examine dataset
Now, you have loaded your data file (mydata.dta) For an overview of your data:
. des
Short for describe, will describe your dataset If you want to see the actual values for all variables in your dataset, type:
. list
Examine dataset
To examine certain variables, type:
. list wage school
To list a particular range of numbers in one or all variables, use the in-option:
. list in 1/3
To see the 3 first values from all variables

. list wage in 1/3
To see the 3 first values in the variable wage.
Examine dataset
The qualifier option; if, allows you to get a list that satisfy a stated condition. Example: list of observation of females with more than 8 years of schooling. To give a quick overview of the data, type
. inspect
Your condition will then be:

Female==1 school>=8
Examine dataset
So combined with the list command, type
. list if female==1 & school>=8
== is typical programming syntax. Meaning identical to and not equal with
Examine dataset
count command can be used to show the number of observations in the dataset:
. count
You can also count the number of observations that fulfill a condition
. count if female==1 & school>=8
Generate and organize variables

Creating new variables. If you need to calculate new values based on a previous variables, perhaps. Use generate command, or shorthand gen. Example: Let's create a squared wage variable
. gen wagesqr=wage^2
Or to log transform the wage variable

. gen lnwage=log(wage)
Type list and see the result

To organize your variables, it may be prudent to generate ID numbers for each observation:
. gen ID=_n
To give different ID:s for males and females, use by option. However it is necessary to first sort the data by these groups.
. sort female . by female: gen Idg=_n
Type . list and see the result

Some more data manipulation. To replace an existing variable
. replace wagesqr=wagesqr/1000
To create location measurement for all observations for some variable

. egen avgwage=mean(wage)
Type list to see result

To categorize variables, you can use recode option Example: generate a new variable based on school variable and recode the values as 0 and 1, (dummy variables). Condition is:
School is less than or equal to 8 --> 0 School is between 9 21 -->1 Type . gen schlevel=school . recode schlevel 1/8=0 9/21=1
Modify variables
To rename a variables
. rename wage timlom
So, the structure is rename [oldname] [newname] Don't forget to save the data with your new name!
. save, replace
To delete a variable
. drop schlevel
Label variables
Variable labels allows you to include a description of that variable. Example: Let's label the variable female
. label variable female Indicates gender of the individual. . des female
To include a interpretation of the values, we use label define

. label define mf 0 men 1 women
Label variables
mf is a created definition of 1/0 for female . To connect the new definition mf with the values i variable female
. label values female mf
Type list To see the result!
Data management
We used drop command before, that deletes a variable. If we instead want to specify which variables to keep, and discard the rest, type:
. keep timlon school public female ID
We can decide to keep variables by condition here too. Example: let's create a dataset of only females
. drop if female==0
Dataset
We will use a different dataset now Download Lnu.xls Open Lnu.xls and save it as lnu.csv instead
Import data
First, before we load new data, let's get rid of the old one
. clear
To import a .csv file

. insheet using Lnu.csv
If you have semi-colon as delimiter (or something else)
. Insheet using Lnu.csv,delimit(;)
Save it as a Stata file

. save mylnu.dta
Descriptive statistics and statistical tests

Your data is hopefully loaded now, let's look at it then
. des
Frequency table with tabulate command

. tab skolar
If you want a frequency table for several variables type:

. tab1 skolar kvinna

We can create crosstabulation
. tab skolar kvinna
To get the column and/or row percentages

. tab skolar kvinna, column row
To instead hide the frequencies and only see the percentages, use nofreq
. tab skolar kvinna, column nofreq

Here we can use a condition also,
. tab skolar kvinna if timlon>100,column nofreq
Summary statistics using summarize for a number of variables

. summ timlon skolar
Here we can again sort these summary statistics by group, like men and women
. sort kvinna . by kvinna: summ timlon skolar

We can sort by more than one group. Example: summary statistics for men and women working in the private and publi sectors
. sort kvinna offentlg . by kvinna offentlg: summ timlon skolar
Some summary statistics like mean, median, standard deviation and so on can be listed in one table with tabstat command.
. tabstat timlon skolar, stat(mean var sd min max N)

Here we can use the by option to et summary statistics for men and women on a single table
. tabstat timlon skolar, stat(mean sd) by(kvinna)
Correlations We can get the correlation between two or more variables Example: look at the correlation between wage, years of schooling and years of work experiene
. corr timlon skolar erfarnht

T-test (mean comparison test) To test the equality of means, we use a t-test One-sample mean-comparison test Test if the mean of a specified variable is equal to a certain hypothesized value. Example: Test if the average wage in Sweden is equal to 100
. ttest timlon==100

The confidence interval is 95% by default, this can be changed by
. ttest timlon==100, level(99)
Two-group mean-comparison test

To test if a specified value is the same for 2 groups Example: to test if men and women on average earn the same wage . ttest timlon by(kvinna)
End session
Don't forget to
. log close . cmdlog close
STATA tutorial: Part Two

Regression analysis Regression: Extract results Regression: Predictions Graphing Data
Start Session
Don't forget to
. cmdlog using mylog, append . log using mylog.log,append
Regression analysis
We use regression analysis to study the effect of one variable X (independent) on another variable Y (dependent). In this section we will see how to run a linear regressions, extract regression results, generate predicted values and run joint hypothesis test. We will continue to use mylnu.dta
. use mylnu
Explore data
Use data editor in the stata menu to browse through your data in a spreadsheet.
Data -->data editor
Regression analysis
Example: Let's see the effect of years of schooling on hourly wage rates. Using Ordinary Least Square (OLS) regression of dependent variable timlon and independent variable skolar
. reg timlon skolar
What does this result tell us about the effect of years of schooling on hourly wage rate?
Regression analysis
To see if the result of the variables in the regression is statistically significant, check
p-value t-value
Also check the R value to see how well the model explains the values of the indpendent variable.
Regression analysis
Keeping school years constant, we can also see the relationship between gender and wage rate.
. reg timlon skolar kvinna
What does the coefficient for kvinna tell us?
Regression analysis
Heteroskedasticity?
Correct for this by the option robust
. reg timlon skolar kvinna, robust Shorthand...

. Reg timlon skolar kvinna,r
Regression analysis
We can again use the by option to run separate analysis for groups of observations. Example: Run a regression for private and public sector employees separately but in the same time. Again we need to sort the data first
. sort offentlg . by offentlg: reg timlon skolar kvinna, r
Regression analysis
We can also use the if option to run separate regressions. Example: Run a regression just for public sector employees.
Condition is then offentlg==1 . reg timlon skolar kvinna if offentlg==1,robust
Regression: Extract results

You can see your stored results from a regression run
. ereturn list
You can save your last regression in an estimate table

. est store model1
To examine our coefficients from our regression: model1

. est table
Regression: Extract results

If you want to see more result statistics, just add the desired statistics after table,
. est table, b se t stats(N,r2,F)
Example: Let's run another regression and store it as model2 . reg timlon skolar kvinna erfarnht,r
. est store model2 . est table model1 model2, b se stats(N,r2,F)
Regression: Predictions
The predict command computes predicted (fitted) value and residual for each observation. To calculate predicted values for timlon from our regression We will name it yhat
. predict yhat
Regression: Predictions
Calculate predicted values of residuals, and store it as uhat
. predict uhat, resid
Check your new variables

. Des yhat uhat
Regression: Joint Hypothesis test (F-test)

To test wether one or more independent variables are jointly statistically significant in explaning variations in the dependent variable. That is, do all of the independent variables as a whole explain variations in the dependent variable.
. reg timlon skolar kvinna erfarnht,r . test kvinna erfarnht
Graphing Data
Histograms
. hist timlon . hist uhat
Superimpose a normal curve

. hist uhat, normal
Graphing Data
Scatter plot: to show the relationship between 2 variables.
. graph twoway svatter timlon skolar
Shorthand...
. twoway scatter timlon skolar
Write a title for your graph

. twoway scatter timlon skolar, ti(Hourly wage vs Years of schooling)
Graphing Data
We can fit a linear line onto our scatter plot to see any relationship more clearly.
. Scatter timlon skolar, lfit timlon skolar
Export as a post script file

. graph export mygraph.ps
Copy the graph directly to MS Word by rightclicking and use copy.
End Session
Don't forget to
. log close . cmdlog close

Stat A Tutorial

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Stat A Tutorial

Caricato da

Copyright:

Formati disponibili

STATA TUTORIAL

Stata is easy to learn Powerful software used widely in research

STATA tutorial: Part One

Set up your working directory

Create and using log and do files

Create and using log and do files

To temporarily suspend sending output to your log file use:

Create and using log and do files

Create and using log and do files

Create and using log and do files

If you want to supress output

You can add the extension .do to your do-files.

Stata will warn your if your allocated memory is too small.

Input and load data

Input and load data

To see the 3 first values from all variables

To see the 3 first values in the variable wage.

Your condition will then be:

== is typical programming syntax. Meaning identical to and not equal with

Generate and organize variables

Or to log transform the wage variable

Type list and see the result

Generate and organize variables

Type . list and see the result

Generate and organize variables

To create location measurement for all observations for some variable

Type list to see result

Generate and organize variables

To include a interpretation of the values, we use label define

Type list To see the result!

To import a .csv file

. Insheet using Lnu.csv,delimit(;)

Save it as a Stata file

Descriptive statistics and statistical tests

Frequency table with tabulate command

If you want a frequency table for several variables type:

Descriptive statistics and statistical tests

To get the column and/or row percentages

Descriptive statistics and statistical tests

Summary statistics using summarize for a number of variables

Descriptive statistics and statistical tests

Descriptive statistics and statistical tests

Descriptive statistics and statistical tests

Descriptive statistics and statistical tests

Two-group mean-comparison test

STATA tutorial: Part Two

What does the coefficient for kvinna tell us?

. reg timlon skolar kvinna, robust Shorthand...

Regression: Extract results

You can save your last regression in an estimate table

To examine our coefficients from our regression: model1

Regression: Extract results

Check your new variables

Regression: Joint Hypothesis test (F-test)

Superimpose a normal curve

Write a title for your graph

Export as a post script file

Copy the graph directly to MS Word by rightclicking and use copy.

Potrebbero piacerti anche