Sei sulla pagina 1di 57

Empirical Methods in Finance

Lecture 1

Florian Peters

Universiteit van Amsterdam


Finance Group

January 7, 2020
Objectives of this Course

Main goal: Equipping you with the data analytics skills for writing your thesis
and preparing you for jobs that require sophisticated data analysis.

Various components:
I Knowledge of empirical methods used in finance – especially how to
implement them
I Acquaintance with datasets and statistical software
I Hands-on introduction to data management and analysis using STATA
I Main assignment is to replicate a published academic paper
I Some advice on thesis proposal

At the end of this course you will be able to conduct data analysis at the
level of top-tier academic research, i.e. replicate published studies.
Organization

Four lectures. Mondays, 13:00-15:00, REC A0.01


Four tutorials.
I You have been assigned to one tutorial per week, and a working group of 3-4
students within that tutorial
I You can find your tutorial and working group in Canvas ⇒ People ⇒ Group
I Group 8 is reserved for RE finance students. All other groups are open to all
other tracks.
I Content: Student presentations. Interactive. I will ask questions and give
feedback during presentations. Will grade based on presentation.
I Each tutorial counts 10% of course grade, i.e. 40% in total.
Content of tutorials

Week 1: Exploration of databases, retrieval of specified datasets, merging of


datasets, basic data preparation, descriptive statistics, graphs, simple regressions
Week 2: Guided replication of a published empirical paper
Week 3: Presentations of selected empirical research articles (one paper per group)
Week 4: Presentations of your solutions of the main assignment
Differences between tutorials

The content of the RE finance tutorial will slightly differ from the others.
RE finance students will work with RE data in week 1, and discuss RE articles in
week 3. Tutorials of week 2 and 4 will be the same for all groups.
Evaluation

Individual, take-home assignment (60%)


Thesis proposal (pass/fail)
Tutorial presentations : T1=10%, T2=10%, T3=10%, T4=10%.
Grading is at the group level.
Presence at tutorials will be checked. Allowed to miss w/o penalty only if you can
present a doctor’s note. Otherwise, can miss at most 1 tutorial, but you will get a
-1.0 grade penalty relative to your group’s grade for the missed tutorial.
Workload: This course is designed for 140 hours of students’ total study time, i.e.
35 hours per week.
Main assignment

Main grade component: Individual empirical assignment.


I will ask you to replicate a published empirical article with a few modifications.
I will post the assignment next week Wednesday. Deadline is two weeks later.
This is an individual assignment, not a group assignment. You can, of course,
discuss issues and general solution approaches with classmates. But you cannot
copy&paste code from anyone.
Deadline is Wed, 29 Jan. Late submissions can only be accepted until Thu, 30 Jan,
9 AM.
There will be a resit opportunity for this assignment.
First tutorial (this week)

Assignment of students to tutorials and working groups is available in Canvas ⇒


People ⇒ Group
The assignments for this week have been posted in the “Week 1” module on
Canvas.
Choose the assignment (numbered 1-9 for regular finance students and 1-4 for RE
finance students) indicated for your working group in the Excel file “Groups.xlsx”
(will be posted today after lecture)
Contact your group members immediately
Start working on this asap, ideally meet with your group members tomorrow
morning in the library
If you don’t have a WRDS account yet, sign up today
Download and install STATA on your computer today (see instructions on the
syllabus)
I will be in the library tomorrow 11-12 and 13-14 to answer questions
Correspondence

You can reach me at emf-abs@uva.nl or during office hours


Office hours are held on Jan 20, and Jan 27 from 15:00 to 16:00 in M3.15
Pls use email for administrative questions only, i.e. absence, illness, etc. For
questions relating to course content, pls come to my office hours or talk to me after
lecture.
Outline

1 Structure of Thesis Proposal

2 Intro to STATA

3 Data sources
Structure of Thesis Proposal Intro to STATA Data sources

Overall Structure

You need to submit a first draft of a thesis proposal by next Tuesday.

Your thesis proposal should have the following four parts:


1. Research question
2. Related literature
3. Data
4. Methodology

It can be at most 2 pages long including references (font size: 12pt).


Pass/fail. You need to pass to pass the course. But it will not be graded.
I have posted an assignment on Canvas through which you need to upload your
thesis proposal.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 10 / 56


Structure of Thesis Proposal Intro to STATA Data sources

1. Research question - take “question” literally

A thesis proposal must have a clear question.


I Take the word ‘question’ literally. Try to formulate your research topic as a
question, even if that question is not going to be the literal title of your thesis.
I That rules out purely descriptive topics, like e.g. “Bankruptcy Codes around
the World: A Comparative Study”
I But does not rule out a topic like: “Do Bankruptcy Codes Matter for Capital
Structure? A Cross-Country Analysis”
The nature of a question is that we don’t know the answer (yet).

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 11 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Research question - start with a puzzle

Great research is often motivated with a puzzle, ie we have a theory or an idea


about how the world works but the data hasn’t yet confirmed it.

I Example 1 (Hong and Sraer, 2016, “Speculative Betas” )


“Over the last twenty years, financial economists have developed a large and impressive body of
findings on the excess predictability of cross-sectional asset returns. These studies reject the
CAPM beta. But there is suggestive evidence that the risk and return relationship is not only not
strong, but is frequently going the wrong way. In this paper, we provide a theory of this low risk,
high expected return puzzle by incorporating speculative disagreement and costly short-selling
into the CAPM.”

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 12 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Research question - start with a puzzle

Another good setting for a research question is a situation where there are
competing theories around, but we don’t really know which one is correct.

I Example 2 (Bertrand and Mullainathan, 2003, “Enjoying the Quiet Life? Corporate Governance
and Managerial Preferences”)

“Much of our understanding of corporations builds on the idea that managers, when they are not
closely monitored, will pursue goals that are not in shareholders interests. But what goals would
managers pursue? While existing models [...] emphasize empire building, our results seem to fit a
different class of models, which we refer to as “quiet life” models, very much as in Hicks’s (1935)
suggestion that the best of all monopoly profits is a quiet life”

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 13 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Research question - point out your contribution

If you choose a topic that is closely related to other research, you need to ask
yourself why it is necessary or useful to do what you do. Ask yourself: How does my
thesis help us learn smth in addition to what we already know? This is especially
important if you replicate an existing study with new or different data.

I Example 1 (not so good):

“There is now abundant evidence across the world of a January effect in stock returns. This
thesis examines whether a January effect also exists in Dutch companies.”
I Example 2 (good, from Campello, Ribas, and Wang, 2014, “Is the Stock Market a Side Show?
Evidence from a Structural Reform”):
“Do stock market transactions affect the real economy? If so, how? It has long been argued that
stock market transactions are largely a side show to corporate activity. At the same time, there
are reasons to believe those transactions might matter. [...] A recent intervention in the Chinese
stock market, however, puts us close to an experiment on the effect of secondary market
transactions on corporate welfare. The 2005 split-share reform was to swiftly convert non-tradable
shares into tradable status.”

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 14 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Research question - relate to the big picture

Keep the big question(s) in mind.


You will probably work on a question that is part of or contributes to a bigger
question. Mention that big question as part of the motivation.
If your thesis does not contribute to any bigger question, it is probably not so great.
But usually you can think of something.

I Ex 1 (Slightly adapted from Ahern and Dittmar, 2012, “The Changing of the Boards: The Impact
on Firm Value of Mandated Female Board Representation” ):
We analyze the value implications of female board representation using the introduction in
2003 of a 40% quota in Norway. We find that the constraint imposed by the quota caused a
significant drop in the stock price at the announcement of the law and a large decline in Tobins Q
over the following years. Our results are relevant to academics, investors, and policy makers.
They quantify the costs of such laws borne by shareholders and point to the potential causes of
the value decline. More broadly, they are consistent with the idea that firms choose boards to
maximize value, and puts into question the popular notion that externally imposed changes to
corporate governance benefit shareholders.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 15 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Research question - summary

1 You need to have a question (hypothesis).


2 The question should not have already been answered exhaustively, ie you need to
make clear what we can learn from your work in addition to what’s already out
there. Ask yourself what the contribution of your work might be.
3 You should make clear why you think the question is relevant. Ask yourself “Why
should we care?” or “Who should care?” (investors, consumers, policy-makers,
managers)? Keep the big(ger) question in mind!

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 16 / 56


Structure of Thesis Proposal Intro to STATA Data sources

2. Related literature

You should briefly describe the most relevant related literature.


Don’t be too exhaustive in this first proposal. When you describe another paper,
one or two sentences are usually enough to summarize its main contribution.
Make clear what the unique contribution of your work is. There should be an
original aspect of your work. A pure replication of another study or a review of
other studies does not constitute a valid thesis topic.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 17 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Related literature - how to search

Once you have become interested in a topic, first try to find the (most recent)
review articles on that topic. The references in these articles often give you the
most comprehensive list of articles on the subject.
Look for the main contributing researchers in the specific field you are writing
about. Often, they are located in the top US schools. So, look for big names, both
with respect to people as well as to universities.
Go to their web page. Check their papers, in particular, the most recent work
(including unpublished working papers). Check their co-authors and whom they
cite, then go to those people’s web pages and check their work, etc. etc.
Also, check conference programs for related papers: AFA, WFA, EFA, FIRS, NBER,
and other conferences.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 18 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Related literature - how to search (cont’d)

Another important indication for whether a paper matters is the journal in which it
is published. If it’s published in a so-called A-journal, then it’s probably relevant
and well-executed. If it’s in a B- or C-journal you can probably ignore it.
The Finance A-journals are Journal of Finance, Journal of Financial Economics,
Review of Financial Studies.
Econ A-journals are American Economic Review, Journal of Political Economy,
Econometrica, Quarterly Journal of Economics, Review of Economic Studies.
If you don’t find any prominent researcher publishing on the topic you are interested
in, and you find no paper published in the finance or econ A-journals, then there are
three possibilities: (1) the topic is not interesting; (2) the topic is interesting but
impossible to analyze; (3) the topic is interesting and can be analyzed, and you are
the first one who came up with the idea.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 19 / 56


Structure of Thesis Proposal Intro to STATA Data sources

3. Data

You have to make sure that you will be able to obtain the necessary data before
you start.
Sometimes there are situations in which you are not yet 100% sure that you will get
the “ideal” data, but you still want to pursue the topic. In this case you need to
have a plan B. Can you do the same or a similar analysis with other, available but
less-ideal data?
Specify the databases that you will use and make sure that you have access to them.
Sometimes part of the data you want to analyze needs to be hand-collected. In that
case, make sure you have access to the necessary data sources, and make an
estimate of how long it will take you to collect those data. Eg, collect data for one
full day, and extrapolate.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 20 / 56


Structure of Thesis Proposal Intro to STATA Data sources

4. Methodology

You need to describe the methodology that you will be using, eg OLS, IV, Probit,
Diff-in-diff, panel regressions with/without fixed effects, event study, etc.
You should write down the regression equation(s) that you want to estimate, and
mention the coefficient of interest, i.e. the coefficient that answers your research
question.
In a good thesis, there is typically one particular coefficient that answers your
research question. Make that connection clear.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 21 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Remarks

The above guidelines are ambitious. If you meet all of them, you will have an ideal
thesis proposal (grade=10).
We don’t expect all students to write a perfect thesis. So, more moderate proposals
lead to theses that get passing or even good grades.
In any case, you should be able to write something on every one of the above four
points, even if the your contribution is more modest.
I am aware that the deadline for the first proposal is very early. Some of you will
decide to change topic later (key courses have not yet been taught). That’s OK.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 22 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Advisers
You will be asked to name your preferred adviser in the first proposal. Here is the list of
advisers at our department and their areas of expertise:

Finance Supervisors

Stefan Arping Arnoud Boot Tolga Cascurlu Torsten Jochem

• Arping: Corporate Finance, Corporate Governance, Banking


• Boot: Banking, Financial Intermediation, Financial Regulation, Corporate
Governance
• Cascurlu: M&A, Entrepreneurship, Banking, Law&Finance
• Jochem: Corporate Governance, Insider Trading, Executive Compensation,
Empirical Banking

1
Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 23 / 56
Structure of Thesis Proposal Intro to STATA Data sources

Advisers
Finance Supervisors

Tomislav Ladika Jeroen Ligterink Jens Martin Evgenia Zhivotova Enrico Perotti

• Ladika: credit default swaps, syndicated loans, and corporate investment


strategies
• Ligterink: Intl corporate finance, risk management, capital structure, corporate
restructuring
• Martin: private equity, IPOs, M&A, corporate governance
• Zhivotova: corporate governance, boards, executive compensation
• Perotti: banking, financial regulation, liquidity regulation, credit cycles, repo
markets

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 24 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Advisers
Finance Supervisors

Florian Peters Rafael Ribas Vladimir Vladimirov Tanju Yorulmazer Aleksandar Andonov

• Peters: Behavioral Finance, M&A, Executive Turnover & Compensation


• Ribas: Financing new ventures (informal financing, crowdfunding), human
rights violations and firm value, collaborative R&D, stock liquidity and
corporate decisions, executive networks
• Vladimirov: M&A, Corporate Cash Holdings, Capital Structure, Bankruptcy &
Distress
• Yorulmazer: Banking & Financial Regulation, Financial Crises
• Andonov: Institutional investors, asset management, alternative asset classes and
governance

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 25 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Advisers
Finance Supervisors

Simon Rottke Spyros Terovitis

• Rottke: asset pricing (classical and behavioral)


• Terovitis: information asymmetry, information disclosure, rating agencies

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 26 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Real Estate Finance Advisers


The Real Estate Finance Group

dr. M.I. Dröes prof. dr. M.K. Francke dr. M.A.J. Theebe D.W.van Dijk prof. dr. P. van Gool prof. dr. J.B.S Conijn
m.i.droes@uva.nl m.k.francke@uva.nl m.a.j.theebe@uva.nl d.w.vandijk@uva.nl p.vangool@uva.nl j.b.s.conijn@uva.nl
Room M 3.10 Room M 3.12 Room M 3.10 Room M 4.04 Room M 3.12 Room M 3.12

• Dröes: house price risk, price dynamics (international), externalities,


household finances, mortgage markets, urban/spatial economics
• Francke: commercial property prices and liquidity (indices), housing and
finance, house price dynamics, economic obsolescence
• Theebe: commercial RE markets, commercial RE investments and portfolios,
pricing and investment performance
• Van Dijk: price dynamics, mortgage markets, search behavior
• Van Gool: pension funds, real estate valuation
• Conijn/Schilder: housing corporations, (international) housing markets 1

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 27 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Outline

1 Structure of Thesis Proposal

2 Intro to STATA

3 Data sources

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 28 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Preliminaries

STATA is a powerful statistical package with smart data-management facilities, a


wide array of up-to-date statistical techniques, and an excellent system for
producing publication-quality graphs and tables.
For this course you cannot use other statistical software even if you are currently
more familiar with it.
You can download and install STATA on your own computer using the instructions
on the syllabus or work on UvA library computers where STATA is installed.
STATA has excellent online video tutorials on all the important commands:
http://www.stata.com/links/video-tutorials/
There are also good tutorials available, e.g., at http://data.princeton.edu/stata/

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 29 / 56


Structure of Thesis Proposal Intro to STATA Data sources

STATA Interface

Command window (bottom): Write commands & execute


Output window (center): See results
Variables window (right): List of variables and labels
Command history (left): History of commands, can roll back or click prior commands.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 30 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Data Management

1. Importing data into STATA


Option 1, Data Editor: Open data editor (at top of window); copy & paste data
from Excel directly into the data editor.
I STATA recognizes variable names in first row
I Can also change cell entries directly in data editor

Option 2, ‘insheet’ command: If you have a your data in a text file (tab-delimited is
best format), you can directly load the data using the ‘insheet’ command.
I insheet using /Users/YourName/EMF/Data/RawData.txt

Option 3, load stata file: If you already have a data file in stata format (extension
.dta) use the ‘use’ command
I use /Users/YourName/EMF/Data/RawData.dta

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 31 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Data Management (Cont’d)

2. Creating and changing variables


Creating new variables: use ‘generate’ command
I simple: g leverage=debt/equity
I using cond. statements:
g highleverage=0
replace highleverage=1 if leverage>=0.5

replace/change variables: use ‘replace’ command


I replace leverage=0 if leverage<0

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 32 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Data Management (Cont’d)

3. Descriptive statistics and histograms


Before you start running regressions, always check descriptive stats!

Descriptive statistics, two options:


I tabstat y, stat(min p1 p5 p25 p50 p75 p95 p99 max mean sd)
I summarize y or summarize y, detail

I recommend you always plot histograms of newly coded variables


I hist leverage, bin(100) frac
I hist leverage, width(0.05) frac
I hist leverage if leverage>0 & leverage<1.0, width(0.05) frac

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 33 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Outliers

4. Handling outliers
It is often very important to minimize the influence of outliers in the data. Outliers
can be visually detected in histograms. They are sometimes due to data entry
errors, sometimes they are simply ‘atypical’ cases that need to be handled.

Outliers are often of concern when the variable of interest is a ratio:


I price-to-book (book equity is zero or negative for some firms)
I price-earnings ratio (earnings can be zero or negative)

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 34 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Outliers (Cont’d)

Option 1, Winsorizing: replace extreme values with upper/lower cutoff


I STATA’s ‘winsor2’ command: winsor2 leverage, suffix( win) cuts(0.5 99.5)

Option 2, Truncating (or trimming): Delete extreme observations


I winsor2 leverage, suffix( win) cuts(0.5 99.5) trim

Winsorizing (truncating) by subsets of the data: Sometimes it makes sense to


winsorize within a category/value of another variable, eg within each year:
I winsor2 leverage, suffix( win) cuts(0.5 99.5) trim by(year)

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 35 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Merging (joining) datasets


5. Merging datasets
For almost every empirical study, you will need to merge different datasets.

In STATA, there are two commands available: ‘joinby’ and ‘merge’. I recommend
‘joinby’.
I First, you need to be clear about the variables on which you want to merge, eg
firm ID (gvkey) and year.
I Second, you need to prepare the two datasets you want to merge. Typically,
both datasets should have only one observation for each combination of the
merge variables. Use ‘duplicates report gvkey year’ to check, and ‘duplicates
tag gvkey year, g(tag)’ and ‘drop if tag>0’ or ‘duplicates drop gvkey year,
force’ to delete duplicates.
I Third, merge the files and check the number of successfully merged
observations:
clear
use dataset1
joinby gvkey year using dataset2, unmatched(both)
tab merge
drop if merge!=3
Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 36 / 56
Structure of Thesis Proposal Intro to STATA Data sources

Lead and lag operators

Stata has very convenient time series functions that can be used for data
preparation and regression

In order to use these, you need to (1) sort the data along the panel and time
dimension, (2) tell Stata what the panel and time identifiers are, (3) make sure that
there are no panel-time duplicates in your dataset
duplicates report gvkey year
duplicates drop gvkey year, force
sort gvkey year
tsset gvkey year
Then you can use lead (F.), lag (L.) and difference (D.) operators:
reg income L.income L2.income L3.income
g retL1 = L.ret

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 37 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Date and time

Stata has large a number of built-in functions to handle calendar dates and times.
For the full documentation type ‘help datetime functions’
Stata internally stores dates and times as integer numbers, e.g. the calendar day
Jan 1, 1960 is stored as a “1”, Jan 2, 1960 as a “2” etc.
It is therefore often necessary to format dates, so that they are displayed in a
readable format. See next slide.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 38 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Creating date variables

There are many different ways to create date variables


Assume a date is given in string (=text) format, e.g. the variable called “date raw”
contains strings of the following form: “1/15/1990”.
Then one can create a Stata date by typing
g date = date(date raw,”MDY”)
The new variable “date” now contains numbers according to the respective
calendar date in each row
You can then format the date, so that it has a familiar, readable form, e.g. by typing
format date %td

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 39 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Creating date variables

Sometimes one needs to create a broader date variable than daily, e.g. indicating
the month or quarter
Such variables can be created from a daily date variable by using the “mofd”,
“qofd” or “yofd” commands
To create a variable containing the calendar month of a date you can type:
g month = mofd(date)
format month %tm
To create a variable containing the calendar quarter of a date you can type:
g quarter = qofd(date)
format quarter %tq
etc.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 40 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Operations across rows of data: the EGEN function

Stata’s ‘egen’ function, in combination with the ‘by’ command, lets you do all kinds
of operations across observations of one variable within a category of another
variable
B Ex 1: Compute the sum of total compensation for the whole executive team
B Ex 2: Compute the number of analysts covering a firm
B Ex 3: Compute the average CEO compensation by year or industry
B Ex 4: Compute the weighted return of a portfolio, ie decile 10 of stocks sorted
by book-to-market
Ex 1: ‘by gvkey year, sort: egen tdc1 team = total(tdc1)’

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 41 / 56


Structure of Thesis Proposal Intro to STATA Data sources

The EGEN function (cont’d)

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 42 / 56


Structure of Thesis Proposal Intro to STATA Data sources

The EGEN Function (Cont’d)


Convenient also for plotting averages over time, etc.
B Ex 3: Plot average CEO compensation over time (by year)
by year, sort: egen tdc1 mean = mean(tdc1)
by year, sort: egen tag = tag(year)
label variable tdc1 mean“Avg. total compensation”
scatter tdc1 mean year if tag==1, connect(line) scheme(s1mono) msize(small)
xlabel(1992(2)2011)
7000
4000 5000 6000
Avg. total compensation
3000
2000

1992 1994 1996 1998 2000 2002 2004 2006 2008 2010
Fiscal Year

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 43 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Regression

In STATA, OLS regression is done using the ‘regress’ or ‘reg’ command


By default a constant is included in the regression (no need to specify)
There are various options to specify how standard errors are computed (ols, robust,
cluster, etc.)
I reg y x1 x2, cluster(gvkey)

‘areg’ is a command that provides an easy way to include many dummy variables in
the regression
I areg y x1 x2, a(year) cluster(gvkey)

Many other regression methods are available, eg ivreg2 (IV regression), probit,
dprobit, logit (discrete choice models), xtreg (panel regressions), etc.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 44 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Handling Regression Output

The regression tables shown in the STATA output window are not easy to import
into Excel/Word/Latex/etc.

The ‘outreg2’ command is extremely useful for these purposes. Almost anything
can be specified: decimal places of coefficients and t-stats/SEs/P-vals, number of
stars indicating significance, adding column title and notes, R-squared, dropping
variables, adding additional statistics, ...
The formatted tables can be displayed in the data editor and copy/pasted into
Excel.
Example:
I reg y1 x1 x2, cluster(gvkey)
outreg2 using ols-results1.txt, adjr2 bdec(2) tdec(2) label
reg y2 x1 x2 , cluster(gvkey)
outreg2 using ols-results1.txt, adjr2 bdec(2) tdec(2) label append
seeout

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 45 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Example: STATA Regression output

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 46 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Organizing your code


Do-files:
You should always write up and save the commands that you use for a given project
In Stata, such text files have extension ‘.do’ (are called do-files)
Open a new do-file by hitting the ‘Do-file Editor’ button at the top of the interface
window
You can highlight and run specific parts of your code using the ‘Do’ or ‘Run’ button
in the do-file window
Setting the directory:
For every project, you should create a folder, eg ‘EMF’ with a subfolder where your
Stata files will be located, eg. called ‘Data’
At the beginning of each do-file, you point Stata to that folder:
B clear
set more off
cd C://.../EMF/T1/Data
use dataset1
...

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 47 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Outline

1 Structure of Thesis Proposal

2 Intro to STATA

3 Data sources

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 48 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Data sets
Corporate accounting data:
I for US firms: Compustat North America (through WRDS)
I worldwide: Compustat Global (through WRDS)
Stock price data:
I for US firms: CRSP (through WRDS); data for NYSE, AMEX, and NASDAQ
stocks for entire history starting in 1935; daily and monthly frequency.
I international: Compustat Global (through WRDS) or Datastream
Merged CRSP-Compustat database: Accounting data for all stocks that are also in
CRSP. Provides a link of the firm identifiers gvkey (Compustat) and permno
(CRSP) (through WRDS).
US executive compensation:
I Execucomp (through WRDS): yearly data for executives of S&P1500 firms
from 1993 to present
I ISS Incentive Labs (through WRDS): Like Execucomp but more detailed data
on compensation, like vesting periods etc. Coverage: 1998-present.
US Corporate Governance & board of directors: ISS Directors and ISS Governance
(through WRDS)
Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 49 / 56
Structure of Thesis Proposal Intro to STATA Data sources

Data Sets

Fama-French, momentum, and liquidity factor returns, the risk-free rate,


characteristics-based portfolio returns, industry returns, industry definitions: Ken
French’s web page
(http:mba.tuck.dartmouth.edupagesfacultyken.frenchdata library.html)
The Fama-French, momentum, liquidity factor returns, and the risk-free rate are
also available through CRSP (WRDS)
Analyst earnings forecasts and actuals: I/B/E/S (through WRDS)
Insider trading data: Thomson Reuters (through WRDS)

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 50 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Data Sets

M&A data: Zephyr or Factset.


Fixed income (bond) data: Mergent Fixed Income Securities Database.
Date on bond issues, ratings, etc.
Macroeconomic data (including GDP, inflation, employment, US population):
Federal Reserve Bank of St. Louis (http://research.stlouisfed.org/fred2/).
Note: STATA has a very useful function called ‘freduse’ that draws these data
directly off the fred webpage.
Employment and price level/inflation data: Bureau of Labor Statistics
(http://www.bls.gov/data)

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 51 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Data Sets (Real Estate)

WoON 2002, 2006, 2009, 2012


Survey (micro)data, cross-sectional data, contains house characteristics, household
characteristics, price data. Available for scientific research, through DANS
Centraal Bureau v/d Statistiek (CBS) ⇒ Statline and Statistics Netherlands.
For free. Per municipality: average prices, number of houses, transactions,
population, unemployment, institutional investors, investment in real estate,
number of non-residential properties, etc.
Cadastre (Kadaster)
Microdata, Transaction prices, Assessed value (WOZ)
Dutch Association of Realtors (NVM)
Free price data per NVM region. Available for scientific research microdata on
transaction prices, list prices, house characteristics.
Commercial real estate: IPD/ROZ, JLL, Cushman and Wakefield
Return indices, market reports.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 52 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Data Sets (Real Estate)

SNL (snl.com)
International data on Real Estate Investment Trusts (REITs), available to all UvA
students.
CRSP/Ziman Real Estate Data Series// US REIT stock return data.
Datastream
European and Asian REIT stock return data.
Bank for international settlements (BIS)
Country level data, property price statistics (by different classes).
Hypostat, European mortgage federation
Mortgages, house prices (not harmonized) for European countries.
Eurostat
HPI index, harmonized, recent data.

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 53 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Data Sets (Real Estate)

American Housing Survey (similar survey as Dutch WoON data)


Has a MSA and national (panel) data version. Available for research
Federal Housing Finance Agency (FHFA)
MSA level, house prices
Case-Shiller house price indices: National and MSA level price index
spindices.com/index-family/real-estate/sp-case-shiller
INREV
Investor association, non-listed real estate vehicles, Europe
EPRA
Industry organization, European listed real estate

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 54 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Data Support at the UvA

For specific questions contact Bjorn Witlox (b.witlox@uva.nl)

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 55 / 56


Structure of Thesis Proposal Intro to STATA Data sources

Next Lecture

Methodology:
I Event Study
I Diff-in-diff methods
I Panel regressions
I Standard errors

Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 56 / 56

Potrebbero piacerti anche