Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction
Open up Stata in start menu Programs Use the help files! Syntax: help [command name]
.help regress
If you want to search all the sources that has to do with regression in general, type
. findit regress
And to resume...
. log on
.txt is the default extension. Append and replace applies here also This is useful if we want to create a do-file.
Memory
Sometimes, your dataset is to big for Stata to handle. Then you can allocate more memory to reading your data: Example, increase memory to 300MB
. set memory 300m
. save mydata
This command loads a stata file, that is files with the extension .dta
Examine dataset
Now, you have loaded your data file (mydata.dta) For an overview of your data:
. des
Short for describe, will describe your dataset If you want to see the actual values for all variables in your dataset, type:
. list
Examine dataset
To examine certain variables, type:
. list wage school
To list a particular range of numbers in one or all variables, use the in-option:
. list in 1/3
Examine dataset
The qualifier option; if, allows you to get a list that satisfy a stated condition. Example: list of observation of females with more than 8 years of schooling. To give a quick overview of the data, type
. inspect
Examine dataset
So combined with the list command, type
. list if female==1 & school>=8
Examine dataset
count command can be used to show the number of observations in the dataset:
. count
You can also count the number of observations that fulfill a condition
. count if female==1 & school>=8
To give different ID:s for males and females, use by option. However it is necessary to first sort the data by these groups.
. sort female . by female: gen Idg=_n
Modify variables
To rename a variables
. rename wage timlom
So, the structure is rename [oldname] [newname] Don't forget to save the data with your new name!
. save, replace
To delete a variable
. drop schlevel
Label variables
Variable labels allows you to include a description of that variable. Example: Let's label the variable female
. label variable female Indicates gender of the individual. . des female
Label variables
mf is a created definition of 1/0 for female . To connect the new definition mf with the values i variable female
. label values female mf
Data management
We used drop command before, that deletes a variable. If we instead want to specify which variables to keep, and discard the rest, type:
. keep timlon school public female ID
We can decide to keep variables by condition here too. Example: let's create a dataset of only females
. drop if female==0
Dataset
We will use a different dataset now Download Lnu.xls Open Lnu.xls and save it as lnu.csv instead
Import data
First, before we load new data, let's get rid of the old one
. clear
To instead hide the frequencies and only see the percentages, use nofreq
. tab skolar kvinna, column nofreq
Here we can again sort these summary statistics by group, like men and women
. sort kvinna . by kvinna: summ timlon skolar
Some summary statistics like mean, median, standard deviation and so on can be listed in one table with tabstat command.
. tabstat timlon skolar, stat(mean var sd min max N)
Correlations We can get the correlation between two or more variables Example: look at the correlation between wage, years of schooling and years of work experiene
. corr timlon skolar erfarnht
End session
Don't forget to
. log close . cmdlog close
Start Session
Don't forget to
. cmdlog using mylog, append . log using mylog.log,append
Regression analysis
We use regression analysis to study the effect of one variable X (independent) on another variable Y (dependent). In this section we will see how to run a linear regressions, extract regression results, generate predicted values and run joint hypothesis test. We will continue to use mylnu.dta
. use mylnu
Explore data
Use data editor in the stata menu to browse through your data in a spreadsheet.
Data -->data editor
Regression analysis
Example: Let's see the effect of years of schooling on hourly wage rates. Using Ordinary Least Square (OLS) regression of dependent variable timlon and independent variable skolar
. reg timlon skolar
What does this result tell us about the effect of years of schooling on hourly wage rate?
Regression analysis
To see if the result of the variables in the regression is statistically significant, check
p-value t-value
Also check the R value to see how well the model explains the values of the indpendent variable.
Regression analysis
Keeping school years constant, we can also see the relationship between gender and wage rate.
. reg timlon skolar kvinna
Regression analysis
Heteroskedasticity?
Correct for this by the option robust
Regression analysis
We can again use the by option to run separate analysis for groups of observations. Example: Run a regression for private and public sector employees separately but in the same time. Again we need to sort the data first
. sort offentlg . by offentlg: reg timlon skolar kvinna, r
Regression analysis
We can also use the if option to run separate regressions. Example: Run a regression just for public sector employees.
Condition is then offentlg==1 . reg timlon skolar kvinna if offentlg==1,robust
Example: Let's run another regression and store it as model2 . reg timlon skolar kvinna erfarnht,r
. est store model2 . est table model1 model2, b se stats(N,r2,F)
Regression: Predictions
The predict command computes predicted (fitted) value and residual for each observation. To calculate predicted values for timlon from our regression We will name it yhat
. predict yhat
Regression: Predictions
Calculate predicted values of residuals, and store it as uhat
. predict uhat, resid
Graphing Data
Histograms
. hist timlon . hist uhat
Graphing Data
Scatter plot: to show the relationship between 2 variables.
. graph twoway svatter timlon skolar
Shorthand...
. twoway scatter timlon skolar
Graphing Data
We can fit a linear line onto our scatter plot to see any relationship more clearly.
. Scatter timlon skolar, lfit timlon skolar
End Session
Don't forget to
. log close . cmdlog close