Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Lecture 1
Florian Peters
January 7, 2020
Objectives of this Course
Main goal: Equipping you with the data analytics skills for writing your thesis
and preparing you for jobs that require sophisticated data analysis.
Various components:
I Knowledge of empirical methods used in finance – especially how to
implement them
I Acquaintance with datasets and statistical software
I Hands-on introduction to data management and analysis using STATA
I Main assignment is to replicate a published academic paper
I Some advice on thesis proposal
At the end of this course you will be able to conduct data analysis at the
level of top-tier academic research, i.e. replicate published studies.
Organization
The content of the RE finance tutorial will slightly differ from the others.
RE finance students will work with RE data in week 1, and discuss RE articles in
week 3. Tutorials of week 2 and 4 will be the same for all groups.
Evaluation
2 Intro to STATA
3 Data sources
Structure of Thesis Proposal Intro to STATA Data sources
Overall Structure
Another good setting for a research question is a situation where there are
competing theories around, but we don’t really know which one is correct.
I Example 2 (Bertrand and Mullainathan, 2003, “Enjoying the Quiet Life? Corporate Governance
and Managerial Preferences”)
“Much of our understanding of corporations builds on the idea that managers, when they are not
closely monitored, will pursue goals that are not in shareholders interests. But what goals would
managers pursue? While existing models [...] emphasize empire building, our results seem to fit a
different class of models, which we refer to as “quiet life” models, very much as in Hicks’s (1935)
suggestion that the best of all monopoly profits is a quiet life”
If you choose a topic that is closely related to other research, you need to ask
yourself why it is necessary or useful to do what you do. Ask yourself: How does my
thesis help us learn smth in addition to what we already know? This is especially
important if you replicate an existing study with new or different data.
“There is now abundant evidence across the world of a January effect in stock returns. This
thesis examines whether a January effect also exists in Dutch companies.”
I Example 2 (good, from Campello, Ribas, and Wang, 2014, “Is the Stock Market a Side Show?
Evidence from a Structural Reform”):
“Do stock market transactions affect the real economy? If so, how? It has long been argued that
stock market transactions are largely a side show to corporate activity. At the same time, there
are reasons to believe those transactions might matter. [...] A recent intervention in the Chinese
stock market, however, puts us close to an experiment on the effect of secondary market
transactions on corporate welfare. The 2005 split-share reform was to swiftly convert non-tradable
shares into tradable status.”
I Ex 1 (Slightly adapted from Ahern and Dittmar, 2012, “The Changing of the Boards: The Impact
on Firm Value of Mandated Female Board Representation” ):
We analyze the value implications of female board representation using the introduction in
2003 of a 40% quota in Norway. We find that the constraint imposed by the quota caused a
significant drop in the stock price at the announcement of the law and a large decline in Tobins Q
over the following years. Our results are relevant to academics, investors, and policy makers.
They quantify the costs of such laws borne by shareholders and point to the potential causes of
the value decline. More broadly, they are consistent with the idea that firms choose boards to
maximize value, and puts into question the popular notion that externally imposed changes to
corporate governance benefit shareholders.
2. Related literature
Once you have become interested in a topic, first try to find the (most recent)
review articles on that topic. The references in these articles often give you the
most comprehensive list of articles on the subject.
Look for the main contributing researchers in the specific field you are writing
about. Often, they are located in the top US schools. So, look for big names, both
with respect to people as well as to universities.
Go to their web page. Check their papers, in particular, the most recent work
(including unpublished working papers). Check their co-authors and whom they
cite, then go to those people’s web pages and check their work, etc. etc.
Also, check conference programs for related papers: AFA, WFA, EFA, FIRS, NBER,
and other conferences.
Another important indication for whether a paper matters is the journal in which it
is published. If it’s published in a so-called A-journal, then it’s probably relevant
and well-executed. If it’s in a B- or C-journal you can probably ignore it.
The Finance A-journals are Journal of Finance, Journal of Financial Economics,
Review of Financial Studies.
Econ A-journals are American Economic Review, Journal of Political Economy,
Econometrica, Quarterly Journal of Economics, Review of Economic Studies.
If you don’t find any prominent researcher publishing on the topic you are interested
in, and you find no paper published in the finance or econ A-journals, then there are
three possibilities: (1) the topic is not interesting; (2) the topic is interesting but
impossible to analyze; (3) the topic is interesting and can be analyzed, and you are
the first one who came up with the idea.
3. Data
You have to make sure that you will be able to obtain the necessary data before
you start.
Sometimes there are situations in which you are not yet 100% sure that you will get
the “ideal” data, but you still want to pursue the topic. In this case you need to
have a plan B. Can you do the same or a similar analysis with other, available but
less-ideal data?
Specify the databases that you will use and make sure that you have access to them.
Sometimes part of the data you want to analyze needs to be hand-collected. In that
case, make sure you have access to the necessary data sources, and make an
estimate of how long it will take you to collect those data. Eg, collect data for one
full day, and extrapolate.
4. Methodology
You need to describe the methodology that you will be using, eg OLS, IV, Probit,
Diff-in-diff, panel regressions with/without fixed effects, event study, etc.
You should write down the regression equation(s) that you want to estimate, and
mention the coefficient of interest, i.e. the coefficient that answers your research
question.
In a good thesis, there is typically one particular coefficient that answers your
research question. Make that connection clear.
Remarks
The above guidelines are ambitious. If you meet all of them, you will have an ideal
thesis proposal (grade=10).
We don’t expect all students to write a perfect thesis. So, more moderate proposals
lead to theses that get passing or even good grades.
In any case, you should be able to write something on every one of the above four
points, even if the your contribution is more modest.
I am aware that the deadline for the first proposal is very early. Some of you will
decide to change topic later (key courses have not yet been taught). That’s OK.
Advisers
You will be asked to name your preferred adviser in the first proposal. Here is the list of
advisers at our department and their areas of expertise:
Finance Supervisors
1
Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 23 / 56
Structure of Thesis Proposal Intro to STATA Data sources
Advisers
Finance Supervisors
Tomislav Ladika Jeroen Ligterink Jens Martin Evgenia Zhivotova Enrico Perotti
Advisers
Finance Supervisors
Florian Peters Rafael Ribas Vladimir Vladimirov Tanju Yorulmazer Aleksandar Andonov
Advisers
Finance Supervisors
dr. M.I. Dröes prof. dr. M.K. Francke dr. M.A.J. Theebe D.W.van Dijk prof. dr. P. van Gool prof. dr. J.B.S Conijn
m.i.droes@uva.nl m.k.francke@uva.nl m.a.j.theebe@uva.nl d.w.vandijk@uva.nl p.vangool@uva.nl j.b.s.conijn@uva.nl
Room M 3.10 Room M 3.12 Room M 3.10 Room M 4.04 Room M 3.12 Room M 3.12
Outline
2 Intro to STATA
3 Data sources
Preliminaries
STATA Interface
Data Management
Option 2, ‘insheet’ command: If you have a your data in a text file (tab-delimited is
best format), you can directly load the data using the ‘insheet’ command.
I insheet using /Users/YourName/EMF/Data/RawData.txt
Option 3, load stata file: If you already have a data file in stata format (extension
.dta) use the ‘use’ command
I use /Users/YourName/EMF/Data/RawData.dta
Outliers
4. Handling outliers
It is often very important to minimize the influence of outliers in the data. Outliers
can be visually detected in histograms. They are sometimes due to data entry
errors, sometimes they are simply ‘atypical’ cases that need to be handled.
Outliers (Cont’d)
In STATA, there are two commands available: ‘joinby’ and ‘merge’. I recommend
‘joinby’.
I First, you need to be clear about the variables on which you want to merge, eg
firm ID (gvkey) and year.
I Second, you need to prepare the two datasets you want to merge. Typically,
both datasets should have only one observation for each combination of the
merge variables. Use ‘duplicates report gvkey year’ to check, and ‘duplicates
tag gvkey year, g(tag)’ and ‘drop if tag>0’ or ‘duplicates drop gvkey year,
force’ to delete duplicates.
I Third, merge the files and check the number of successfully merged
observations:
clear
use dataset1
joinby gvkey year using dataset2, unmatched(both)
tab merge
drop if merge!=3
Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 36 / 56
Structure of Thesis Proposal Intro to STATA Data sources
Stata has very convenient time series functions that can be used for data
preparation and regression
In order to use these, you need to (1) sort the data along the panel and time
dimension, (2) tell Stata what the panel and time identifiers are, (3) make sure that
there are no panel-time duplicates in your dataset
duplicates report gvkey year
duplicates drop gvkey year, force
sort gvkey year
tsset gvkey year
Then you can use lead (F.), lag (L.) and difference (D.) operators:
reg income L.income L2.income L3.income
g retL1 = L.ret
Stata has large a number of built-in functions to handle calendar dates and times.
For the full documentation type ‘help datetime functions’
Stata internally stores dates and times as integer numbers, e.g. the calendar day
Jan 1, 1960 is stored as a “1”, Jan 2, 1960 as a “2” etc.
It is therefore often necessary to format dates, so that they are displayed in a
readable format. See next slide.
Sometimes one needs to create a broader date variable than daily, e.g. indicating
the month or quarter
Such variables can be created from a daily date variable by using the “mofd”,
“qofd” or “yofd” commands
To create a variable containing the calendar month of a date you can type:
g month = mofd(date)
format month %tm
To create a variable containing the calendar quarter of a date you can type:
g quarter = qofd(date)
format quarter %tq
etc.
Stata’s ‘egen’ function, in combination with the ‘by’ command, lets you do all kinds
of operations across observations of one variable within a category of another
variable
B Ex 1: Compute the sum of total compensation for the whole executive team
B Ex 2: Compute the number of analysts covering a firm
B Ex 3: Compute the average CEO compensation by year or industry
B Ex 4: Compute the weighted return of a portfolio, ie decile 10 of stocks sorted
by book-to-market
Ex 1: ‘by gvkey year, sort: egen tdc1 team = total(tdc1)’
1992 1994 1996 1998 2000 2002 2004 2006 2008 2010
Fiscal Year
Regression
‘areg’ is a command that provides an easy way to include many dummy variables in
the regression
I areg y x1 x2, a(year) cluster(gvkey)
Many other regression methods are available, eg ivreg2 (IV regression), probit,
dprobit, logit (discrete choice models), xtreg (panel regressions), etc.
The regression tables shown in the STATA output window are not easy to import
into Excel/Word/Latex/etc.
The ‘outreg2’ command is extremely useful for these purposes. Almost anything
can be specified: decimal places of coefficients and t-stats/SEs/P-vals, number of
stars indicating significance, adding column title and notes, R-squared, dropping
variables, adding additional statistics, ...
The formatted tables can be displayed in the data editor and copy/pasted into
Excel.
Example:
I reg y1 x1 x2, cluster(gvkey)
outreg2 using ols-results1.txt, adjr2 bdec(2) tdec(2) label
reg y2 x1 x2 , cluster(gvkey)
outreg2 using ols-results1.txt, adjr2 bdec(2) tdec(2) label append
seeout
Outline
2 Intro to STATA
3 Data sources
Data sets
Corporate accounting data:
I for US firms: Compustat North America (through WRDS)
I worldwide: Compustat Global (through WRDS)
Stock price data:
I for US firms: CRSP (through WRDS); data for NYSE, AMEX, and NASDAQ
stocks for entire history starting in 1935; daily and monthly frequency.
I international: Compustat Global (through WRDS) or Datastream
Merged CRSP-Compustat database: Accounting data for all stocks that are also in
CRSP. Provides a link of the firm identifiers gvkey (Compustat) and permno
(CRSP) (through WRDS).
US executive compensation:
I Execucomp (through WRDS): yearly data for executives of S&P1500 firms
from 1993 to present
I ISS Incentive Labs (through WRDS): Like Execucomp but more detailed data
on compensation, like vesting periods etc. Coverage: 1998-present.
US Corporate Governance & board of directors: ISS Directors and ISS Governance
(through WRDS)
Florian Peters (UvA) Empirical Methods in Finance January 7, 2020 49 / 56
Structure of Thesis Proposal Intro to STATA Data sources
Data Sets
Data Sets
SNL (snl.com)
International data on Real Estate Investment Trusts (REITs), available to all UvA
students.
CRSP/Ziman Real Estate Data Series// US REIT stock return data.
Datastream
European and Asian REIT stock return data.
Bank for international settlements (BIS)
Country level data, property price statistics (by different classes).
Hypostat, European mortgage federation
Mortgages, house prices (not harmonized) for European countries.
Eurostat
HPI index, harmonized, recent data.
Next Lecture
Methodology:
I Event Study
I Diff-in-diff methods
I Panel regressions
I Standard errors