Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
DATA ANALYSIS
A Practical Guide to Report-Writing
2. Finding Data
Before Getting Started 4
Where to Look for Data 5
Navigating a Statistics Website 6
Choosing the Correct Variables 7
Downloading Your Data 9
Creating Your Database 10
Additional Excel Tools 11
b
5. Putting It All Together
Rough Draft, or Final Product? 36
Using Data to Tell Your Story 37
Integrating Graphs, Tables, and Text 38
Producing High-Resolution Images 39
Fonts, Colors, and Other Considerations 41
A Few Writing Tips 43
Proofreading 44
Presenting With Slides 44
An Example
Oil Prices and Oil-Price Volatility 45
A GARCH Analysis of Oil-Price Volatility 52
Moving on to Your Own Projects 55
Appendix
Books and Software Resources 56
Data Sources and Variables 57
c
INTRODUCTION
i
That said, this is not a tutorial per se. It would be impossible
to introduce web searches, Excel formulas, formatting, and
graphic design in any detail all in one document, let alone
review macroeconomic concepts. In fact, I have made many of
my graphs in the open-source software R, but don’t even touch
upon it here. Since I teach R in a separate class (with some
overlap, however), I treat “Macroeconomic Data Analysis” as
more of a true introduction to the topic. I have put the
associated .R files, as well as links to some useful videos, on my
website at www.scotthegerty.com or bit.ly/2SlGXU0.
In presenting this material, I have to assume that readers
have some sort of a starting point. Some of my students have
better Excel skills than I had starting out (or have now), so it
makes no sense for me to go into too much detail here. Other
students skip Excel and go straight to more advanced software.
Besides, I do not put much value in “cookbook” tutorials that
have specific, yet unexplained, goals (such as “change the line
color to 50% gray”). It’s impossible to instill much real learning
that way.
Instead, the objective here is to show what could be done,
with ideas, data, interpretation, and presentation, so that
students can practice and learn these skills themselves. A few
sources are listed at the end for further reading. But this
document itself serves as one possible example, incorporating
macroeconomic examples throughout. I hope that it will help you
with your own future macroeconomic projects.
ii
1. WHAT’S YOUR STORY?
CHOOSING WHAT PROJECT TO PURSUE
One of the most important parts of studying or writing about
economics is coming up with an interesting question that is
worth answering. In fact, even academic papers often have a
hard time doing this—so they wind up making tiny changes to
an existing idea and going from there. For undergraduate
students or those who are just starting out, you don’t need to
come up with something groundbreaking. Just ask yourself 1)
what you want to find out about and 2) who might find it
interesting if you tell them. But knowing what you are looking
for before you start makes the whole process easier.
1
Suppose gasoline prices are rising, and you want to know if
the price of oil is higher than in the past. If you know it is rising,
you might be able to make better investment decisions. You
could find the right data, make a graph, and see where it’s
headed. But what data do you need?
You can start by finding data for oil prices. As we will see
later, we can get them starting in 1980. But if prices were lower
then, a relatively high oil price would seem low compared to
today. So you need to use a price index to make real oil prices so
that you can compare different years. Looking at a single
variable over time (often called a univariate analysis) can be
very interesting in its own right. But you need to combine your
knowledge of formulas, math, and economics to do it correctly.
If you wanted to go further and look at how high oil prices
might affect the whole U.S. economy, you could get data for real
GDP. This type of analysis would be bivariate (or multivariate if
it’s more than two). Oftentimes this requires more use of
statistics, but even the tools discussed here will allow you to
make an effective report or presentation.
2
The easiest approach is to simply ask for clarification
beforehand. That way, you can present your analysis
appropriately. But be prepared to explain some things you
thought everybody knew, and to skip some background if it’s
clear that people are familiar with it.
In general, you can present the same ideas in different ways,
depending on your audience. In increasing order of complexity,
these include:
▪ A simple graph. While not very statistical, this conveys a lot of
information. In particular, when people often base ideas or
arguments on guesswork, having your data presented at all
can make a powerful statement.
▪ Basic statistics. Many people with a college degree have taken
at least one statistics class, so summary stats such as mean,
standard deviation, minimum, and maximum will help clarify
your data further. For bivariate analyses, simple correlations
can be useful. It might help to remind your audience the
difference between “mean” and “median.” Just because people
took stats doesn’t mean they actually remember it.
▪ More advanced econometric methods. T-tests, regression
analysis, and ARIMA (for time series) can help make your case
very effectively. But, regardless of your audience, your
explanation is just as important as your results. You also must
make sure you present your results properly, so table
formatting is important. It is also essential to note that these
techniques need to be done correctly to be useful. Running an
Ordinary Least Squares (OLS) regression without specifying it
properly will lead to useless results that could lead to bad
decisions. Don’t feel the need to use more advanced
techniques…a simple graph might work just as well!
3
2. FINDING DATA
2.1. U.S. oil prices (WTI) and Consumer Price Index in Microsoft Excel.
4
WHERE TO LOOK FOR DATA
Probably the most useful “one-stop” data source is the Federal
Reserve Bank of St. Louis’ FRED Economic Data site
(fred.stlouisfed.org). It contains thousands of data series, from
various sources, and is easy to use. It also can create graphs for
you. Here, though, we want to be able to download our own
data—and that is straightforward as well.
5
NAVIGATING A STATISTICS WEBSITE
While it is possible to do a tutorial on how to navigate every
step, for every data site, there are far too many to memorize
them all. It is much better to look for common patterns and
layouts, and know how to move through the steps of the data-
collection process. In general, you will have to find a tab or
header marked “Data,” and then click through the links as you
choose data.
The BLS site looks a lot like FRED’s. Note the “Data Tools”
section. Clicking here takes you to a number of data resources
(including inflation and prices, employment, and other labor
statistics); you need to choose accordingly. I usually then go to
the “multi-screen data search.” The BLS site has a couple of
other issues I haven’t seen elsewhere; I explain how to handle
them below using some simple data manipulation tools.
2.4. The European Central Bank’s main page header (April 2020)
Pretty much every data site has some sort of a “Data” tab, or
you could try searching directly through Google. Once you know
what to look for, the process is usually pretty similar. The
European Central Bank’s website (in 2.4), for example, has
statistical data that are easy to find.
6
CHOOSING THE CORRECT VARIABLES
As you move through the site, you will have to be aware of a few
properties of the data that you download:
▪ The frequency of your data
▪ Whether your data are in nominal or real terms
▪ Seasonality and seasonal adjustment.
7
When gathering data, you also might wish to download
variables that are already in real terms, or you might want to
get nominal data and convert it yourself. This gives you the
choice of deflator (such as CPI, PPI, or the GDP deflator), so you
have more flexibility. Real (rather than nominal) GDP,
investment, and other variables are typically used by
themselves, but you may need to download both the nominal
variable as well as the real version. One instance where you
might need nominal variables, for example, is if you are
calculating a country’s trade balance as a share of the entire
economy. You can use nominal exports, imports, and GDP to
calculate [(𝑋 – 𝑀) / 𝑌], which gives you a percentage. (Recall
that prices and dollar values “cancel out” in this calculation). If
you already have price-level data, you could calculate real GDP
yourself and not have to download it.
8
deseasonalized data show the same overall trends over time, but
seasonality is clear in the original series. If you only paid
attention to seasonal properties, you might think the economy
was collapsing in January—when really this is predictable.
You might be able to directly download data that are already
deseasonalized by the data provider using statistical procedures
(such as the Census-X12 method). Or, you can use the
appropriate software (such as Eviews) and do it yourself. One
other way to avoid this issue is if you are using percentage
changes—you can take the change over the corresponding period
of the previous year (such as June-June). If you don’t have much
experience with these procedures, I recommend finding
deseasonalized data and just using that.
9
programs as well. Some statistical software will not recognize
Excel files at all. For this reason, make sure you save your final
dataset as a .csv when you are ready to use other programs. I
have never downloaded data as a .pdf, because I would have to
put it in Excel anyway.
Regardless of your choice of software, you will next have to
create a single, concise, database that includes all your original
variables for the same time period.
10
I suggest renaming all your variables using 1) a consistent
method and 2) relatively few characters. Choose names that you
wouldn’t mind having in print (such as on a graph). Some
software cuts off titles longer than eight characters. Also avoid
punctuation, other than _ (underscore).
For example, I would use Y for GDP (and maybe NOM_Y for
nominal GDP). If we use the transformations in Chapter 3, log
GDP would be LNY, and log differences could be DLNY. There is
no single “correct” way to name variables, but other economists
would have no problem interpreting what I wrote.
11
If you are not very familiar with Excel, one very important
command is called “Paste Special.” You can Transpose data
(turn rows into columns), which is important if your data are
presented horizontally instead of vertically. It is also a good idea
to remove all formatting (such as cell borders) when you paste
into Excel; Paste Special lets you do this too (as Text).
2.7. BLS energy-price data in Excel, before and after splitting columns.
12
3. CALCULATING AND
TRANSFORMING VARIABLES
SOFTWARE
Here, most of the examples will make use of Microsoft Excel,
which is good for the type of analysis that this chapter focuses
on. Your data most likely are in.xlsx, .xls, or .csv format, so you
can simply create new variables in your existing document. It is
also easy to apply one variable’s transformation to other
variables. I don’t focus too much on Excel formulas or commands
(you can find whole books on them, or do a web search for
specific tools), but I do explain some basics and point out some
useful tips.
13
If you want to learn statistical or econometric software
(which I recommend), there are a number of options. Some, like
SPSS and Minitab, are not really used much in economics.
Others are better for time-series data than others. But in
general, you will have to make a choice involving 1) cost, 2) ease
of use, and 3) workplace value.
The most useful in business and academia, hands down, are
R and SAS. Both involve programming languages and structures
that often take time to learn. A big difference is that R is free to
download, while SAS is proprietary and can cost thousands of
dollars for a license. Many jobs require proficiency in one or
both, and in recent years, I have seen SAS’ dominance erode.
Many people really like Python (and it is worth learning!), but it
is likely better for other data-science topics. R has a number of
packages that are specifically designed for econometrics.
If you open up R itself, it is just a command prompt; this can
be intimidating. It is much better to use the interface RStudio,
but even then, you need to properly code everything you want
the software to do. In addition, you can download pre-written
“packages” rather than code the actual statistical tests you want
to do. So while it’s not easy to learn, it might look more difficult
than it actually is. Most of the graphics here (other than the
Excel examples) are done in R, and I advise my own students to
learn it if at all possible.
14
much statistically as the others, and I don’t think the
professional world puts as much value on them. Likewise, many
people put “knowledge of Excel” on their résumés—this is
considered pretty basic now, so don’t highlight it unless you
have specialized knowledge above and beyond what we do here.
TRANSFORMING VARIABLES
If you chose “raw” variables, you will want to transform them
into the format you need before beginning. The easiest way for
beginners to do this is in Excel, although more advanced users
might find that other software will allow them to write
programs that can handle numerous variables very quickly.
Here, we focus on a few very basic transformations:
▪ Creating real variables
▪ Applying a common scale (such as millions of dollars)
▪ Calculating percentage changes
▪ Calculating natural logarithms
▪ Using log changes as percentage changes.
15
much, but you can easily multiply 1.412 billion by 1000 to get
1,412 million.
Again, “Paste Special” in Excel can help you here. Simply
type 1000 in a blank cell, copy it, and select every cell you wish
to convert. Than make sure you check “Multiply.”
3.2. Microsoft Excel’s “Paste Special” allows you to multiply across multiple cells,
as well as transpose, remove formatting, and perform other useful functions.
16
your data are in column C, one value might be calculated as
100*(C3-C2)/C2 or 100*(C3/C2-1). You may wish to annualize
these percentage changes if you are working with quarterly or
monthly data. Since these involve 1/4 or 1/12 of the year, you
can simply replace “100” with “400” or “1200.”
You may also with to calculate year-over-year percentage
changes. You would then subtract the same quarter or month of
the previous year, skipping the observations in between.
January 2020 would be compared to January 2019. Make sure
you subtract the correct number of cells. With quarterly data,
you would have to type 100*(C6/C2-1), a gap of five quarters
(2019Q1 to 2020Q1), using Excel. But, the year-over-year series
is much smoother and looks different than the other series.
17
important to know that you cannot take logs of negative
numbers. If you have values below zero, you will have to try a
different approach.
I always think of logs as a “flattener,” or a function that
“demotes” mathematical functions by knocking them down one
step. For example, you can rank the functions in order:
You can try 23, 2 x 3, and 2 + 3 and see that 8 > 6 > 5. While
I’m not even going to try to get into all the log rules from math
class, here are the most important:
3.4. A randomly-generated exponential growth series (left) and its natural log
(right). The log series is flatter. Note the scale on the left side of each graph.
18
so putting the two together, you get Δx/x or the percentage
change. The monetary equation ln(M) + ln(V) = ln(P) + ln(Y) can
be turned into %ΔM + %ΔV = %ΔP + %ΔY. If velocity (V) doesn’t
change (according to the theory), changes in the money supply
turn into inflation, real GDP growth, or some of both.
19
also explain correlation, which is the simplest measure of a
relationship between two variables.
The mean, or average, is a well-known measure a variable’s
“center,” or where it is located. Given the symbol μ, it is the sum
of all values divided by the number of observations. The median
is the middle value, or the average of the two middle values if
there are an even number of observations. For variables such as
income, the median is often preferred, because one “outlier” can
drastically change the mean, but not the median.
For example, with the numbers [1, 3, 5, 9, 12], the mean
(average) is 6, and the median is 5. But if the largest number
increases to 27, the mean goes up to 9, while the median
remains 5. Medians are less sensitive to these outliers, so they
are good for measuring income or housing prices, where one
billionaire or large mansion can make an entire neighborhood
richer on average.
Analyses often report the minimum and maximum values of
a variable. This helps establish the range over which the values
are likely to fall. The numbers [1, 3, 5, 9, 12] have a minimum of
1 and a maximum of 12, but these values do not say how spread
out they are. For that, we use the standard deviation, which
captures dispersion. For example, [5, 5, 5, 5, 5] has a mean of 5,
but no spread. The numbers [1, 3, 5, 9, 12] have a mean of 6, but
each number differs more from the average value.
The standard deviation, labeled σ, is the square root of the
variance, which takes the difference of each value from the
mean, squares it, and then averages all the squared values. For
example, the numbers [5, 5, 5, 5, 5] have a standard deviation of
zero, while [1, 3, 5, 9, 12] have a standard deviation of 4. The
numbers [1, 3, 5, 9, 27] are spread out even more, which is
reflected in a standard deviation of 9.38.
Standard deviations are useful when you want to tell how
“unusual” something is. If you are starting with [5, 5, 5, 5, 5],
the number 7 stands out, because the difference is much larger
than the standard deviation. It might be important to learn the
causes of this difference. But, if you are comparing this to [1, 3,
5, 9, 12], the number 7 fits right in, even if it is “above average.”
Means and standard deviations are used together for
statistical tests, but they are often combined to put standard
20
deviations in perspective. The coefficient of variation divides the
standard deviation by the mean, or 𝜎 / 𝜇 . This is useful because
large means and large variances often go together. For example,
a neighborhood with million-dollar homes might have price
ranges that differ by hundreds of thousands of dollars. The
coefficient of variation puts this large standard deviation in
context by taking into account the large average value. That
way, these price differences can be compared to those in a
neighborhood with less-expensive homes that might differ in
price from one another by only a few thousand dollars.
Finally, the correlation coefficient (ρ) captures the
association between two variables. This value can range from -1,
which means that the variables move in perfectly opposite
directions, to +1, where they move perfectly together. A
correlation of 0 means there is no association. Typically, you
don’t need a perfect correlation to say there is at least some
relationship, although there is no universally agreed-upon
minimum value.
As anyone who understands regression will tell you,
correlations do not take into account any other variables.
Sometimes outside events are the “real reason” why two things
move together. Also, as the saying goes, “correlation does not
imply causation.” It is impossible to tell what caused what, or
whether it is coincidence or because of something else. For that
reason, economic studies only use correlations for preliminary
analysis, and use regression or other methods for their main
work. Nonetheless, many non-economist audiences will
appreciate the insights provided by this measurement.
Microsoft Excel easily allows for all these measures to be
calculated. These are calculated over ranges of cells. The
formula AVERAGE(B2:B144) gives the mean over some range of
cells; you might also wish to cover a “generic” range with
AVERAGE(B:B). STDEV.P(B2:B144) calculates the “population”
standard deviation (which I use here, instead of STDEV.S), and
dividing the two would give the coefficient of variation. The
correlation formula requires two ranges, which must be of equal
length. If you wish to calculate ρ for data in columns B and C,
you might type CORREL(B2:B144, C2:C144) or
CORREL(B:B,C:C). Note the comma that separates the columns.
21
All these methods calculate a single statistic for each
variable. For time-series data, you can create means, standard
deviations, and correlations that change over time, and these
can be added for nearly every quarter or month of your dataset.
These new variables can be plotted and examined like any of
your original ones.
3.6. Rolling standard deviation JPYVOL (black) vs. log changes in the exchange
rate, DLNJPY (gray). Note high volatility during the 1997 Asian crisis and 2008.
22
variables are closely connected can be examined alongside times
when they are not. The calculation is similar, using code such as
CORREL(B2:B13,C2:C13), and so forth.
23
4. PRESENTING YOUR DATA
24
There are a number of things wrong with this graph: There
are no dates on the x-axis, and the title (“Mexico”) is given twice
when it might not need to be presented at all. More minor points
include extra decimal places and (in my opinion) the horizontal
lines. Note that in every graph, I name the data source.
Plus, the line was originally blue. I generally avoid colors for
anything printed, because it costs extra. Journals have extra
charges, or if you print at home or at work, color ink is
expensive. Many people print in black and white for that reason.
Colors might become illegible, and any references (such as “the
green line”) will be useless.
Here are some more Industrial Production data:
The colors were originally blue, red, green, purple, and light
blue. The dates are “squished” and randomly assigned between
January, May, and September. The vertical axis still has extra
zeros. In addition, there is “white space” below 4.2 that could be
eliminated (basically, you can “zoom in”).
To fix these issues, you can (right) click on the main data
box, as well as on every axis. For example, right-clicking the
vertical axis allows you to “Format Axis,” where you can change
the minimum and maximum values, the rounding (under
“number”), and other features. You can also change fonts and
other aspects of your graph. You can click on and delete the
25
horizontal lines (that’s more my personal preference). If you
right-click the main body, you can “Select Data” (add, remove, or
format variables) or format the plot area. It is quite possible to
go way beyond what I do here, which is necessary for good
graphic design. The formatting of graph 4.3 on page 27 is plenty
for an academic report, PowerPoint, or academic journal.
Clicking on each line allows you to change the color or line
style, including width and color. There are enough shades of
gray and dash types to make each line distinguishable (I think
that more than 5 or 6 lines on one graph is too much, anyway).
You will have to do each line individually. I am not a proponent
of “cookbook” manuals, which tell the reader exactly what color
to make each line. Instead, the best way to learn is to practice
doing it yourself and try what looks good to you personally.
The x-axis is particularly problematic. Dates in your
database, which might read something like “2009m2,” need to be
entered as years only. To get a usable year variable, I suggest
copying the date column and pasting it in an empty column
(with no data immediately to its right). Then, use Text to
Columns and choose a delimiter of the correct width (here: 4
characters). Here, it cuts off everything to the right of the year.
If you right-click on your data area of your chart, you can “Select
Data” and use this year column as your date axis.
You will still have to adjust the spacing on your horizontal
axis. It is enough to have dates listed only every five years, so
you can adjust the interval between labels to 20 (if you have
quarterly data) or 60 (for monthly data).
I personally leave the labeling outside the graph itself, opting
to label everything (particularly the title) separately. The
individual line labels might be helpful, but seem to take up a lot
of space on the right-hand side of a graph. Oftentimes, a chart
title or variable label will be redundant if it is in the graphic and
the text above or below. At the very least, make sure you do not
have the default text “Chart Title” anywhere in your graph.
Also, make sure your variable names are clear. For example, it
might be a good idea to rename “Mexico” to “Mexican IP” in 4.1
above. If your chart is titled “Mexican Industrial Production,”
delete the line label entirely. The country names read well for
Figures 4.2 and 4.3. The original variable names included MXIP
26
and BRIP (for Mexico and Brazil), for example, but it is better to
rename your variables to make them clear to the reader.
27
▪ The interval between axis labels is now 60 (months), or 5
years. I opted not to make these intervals round (such as 2000,
2005, etc.), but that can be done as well.
▪ The interval between “tick marks” was changed from 1 to 12,
or once per year.
▪ As with most graphics here, I added a border. There is also a
description of the entire graph at the bottom.
28
For example, here is a chart I made up for some country’s
composition of capital inflows, split between Foreign Direct
Investment, portfolio investment, and Other investment:
FDI 32%
Portfolio 41%
Other 27%
29
numbers themselves. Sometimes students with minimum page
requirements try to be inefficient as possible—filling up two
pages of a seven-page assignment with a couple of pie charts.
Trust me, professors have figured this one out.
Here are a few bar charts that are each 5 inches wide. The 3-
bar chart has redundant percentages (above the bars and on the
left-hand scale).
Combining the three bars into one shows each type of flow’s
share of the total. Even if you were to eliminate all the white
space on the sides of the bar, this chart is going to be much
larger than the original table. From an efficiency standpoint, it
is far better to use the table and to avoid using any graphs at
all. But, of the three types of graph presented, the third option
(the 100% bar chart in 4.7) is probably the best. It has no
distorting angles, and it conveys the information more succinctly
than do the other two. I personally would tweak it to make the
fonts larger and get rid of the extra space.
30
4.7. A 100% bar chart of the country’s foreign investment. This still can be
improved, and is still inefficient in terms of useful information per unit of space.
TABLES
Given that graphics are not an efficient use of space, much of
your non-time-series data will be in the form of tables. There are
a few rules that I personally follow:
1) Avoid any type of “grid” lines (boxes around every cell).
Instead, use a limited number of horizontal lines, usually
below the header and at the bottom.
2) Round your data appropriately. Usually two or three
decimal places are enough. The letter E should never
appear as a number; this refers to the number of zeros.
For example, the (very large) number 123456789012
sometimes reads in Excel as 1.23E+11, and the (very
small) number 0.0000123456789 reads as 1.23E-05. If you
have really large numbers, consider either re-scaling (into
billions, for example), or looking at your data for a
problem. The same goes for very small numbers.
Oftentimes, regression coefficients are so small that they
have no economic meaning. You may have to put “0.000,”
but you also have to ask why the number is so small in
the first place.
31
3) Try to follow some formatting rules. As we discuss in
Chapter 5, you might want to use a sans serif font for
tables, with numbers right-justified. At the very least,
make sure that your headers are all on one line (by
widening the columns, if necessary), that you have proper
headings, and that it is legible.
REGRESSION OUTPUT
Whether you’ve taken a single econometrics course or you have a
Ph.D., never paste software output into your document.
32
This seems to happen all the time, and in my opinion it is
tacky. In fact, academic papers that do this come off as a little
unprofessional (and might be less likely to get published as a
result). If you are running some type of regression analysis,
make sure you 1) format your results as a table and 2) only
include the most relevant information.
Output from the software Eviews is presented in 4.9. It is an
AR(2) estimation of log changes in the yen-dollar exchange rate
from 1971 to 2016. If you are familiar with econometrics and
ARIMA modeling, you can see that the AR(2) coefficient is
insignificant (in fact, an autoregressive model of order one
performs much better). Keep in mind that you will have to draw
on your previous statistical knowledge for this type of
macroeconomic analysis.
The raw output has a number of redundancies. Standard
errors, t-statistics, and p-values are all provided, but most
analyses only use one (I personally prefer the p-value). Far more
statistics related to the estimation are provided than are
typically reported. There are also too many decimal places for
each number.
Creating and formatting a table properly involves selecting
only the essential results and presenting them concisely:
33
common, sans serif font. The variable names are left-justified,
while the estimates are right-justified.
A further estimation (volatility modeling using GARCH)
provides another table, as well as a software-generated graph:
Mean equation
Variable Coefficient (p-value)
Constant -0.212 (0.208)
AR(1) 0.309 (0.000)
Variance equation
Variable Coefficient (p-value)
Constant 1.247 (0.163)
ARCH 0.070 (0.078)
GARCH 0.736 (0.000)
AIC 4.71
4.11. An AR(1)-GARCH(1,1) estimate of yen volatility 1971-2016.
34
This volatility series is a more sophisticated version of the
rolling standard deviation in 2.6. The same “spikes” can be
found during the 1997 Asian Crisis and the 2008 financial crisis.
This graph is a little more visually appealing than those that
Microsoft Excel can generate. But, many of the formatting
options in Eviews and other software still need to be adjusted.
The GARCH graph in 4.12 uses my own personal settings,
including the retro, typewriter-looking Courier New fonts for the
axes. Whatever your tastes, line color and width, background
color, and fonts can all be changed—and sometimes have to be,
so that your graphs look good on the page.
35
5. PUTTING IT ALL TOGETHER
WORKING ON A TEAM?
Graphs and figures might need to be checked extra-
carefully. They can be copied incorrectly, or created by an
artist who is not concerned with the content. Sometimes
they are placed independently of the editors. In particular,
you want to:
▪ Check the placement, that is, that the graph is located
where you want it.
▪ Verify the header/chart title; these are often copied
incorrectly or typed by hand.
▪ Make sure the images are clear and imported correctly.
▪ Check spacing; also make sure that cells aren’t shifted.
▪ Check the fonts, sometimes these “disappear” and are
replaced with defaults.
▪ Look at the footer and any additional notes.
36
USING DATA TO TELL YOUR STORY
Before you begin putting together your final document, take
time to really think about how your data will be analyzed. Any
graphs or tables are there to support your interpretation. Look
everything over for any interesting patterns. Think about what
your results mean. Much of your analysis will be presented
through your writing and explanation, rather than your charts
and figures. And of course, keep in mind your audience and how
your findings will be interesting to them.
No matter what format you are using, and who your
audience is, a good report generally has four sections:
1) An introduction that explains the issue at hand and why
it needs to be addressed. Academic papers often use this
section to discuss previous literature on the subject.
2) A methods section, which explains the data used and any
statistical procedures. One challenge is to neither explain
too much nor too little. For that, you need to know your
audience. For example, if your audience knows basic
statistics, you can simply say, “means and standard
deviations are presented here.” But it is not difficult to
imagine others who might need a bit more explanation.
Likewise, I’ve seen economists use obscure econometric
methods without going into enough detail. One example
in this document is that I assume you know what a .pdf
is, but not a TIFF file.
3) An explanation of results, often tying together the main
idea, previous research, and each table or graph.
4) A conclusion that brings back the “big picture” and makes
specific recommendations.
37
INTEGRATING GRAPHS, TABLES, AND TEXT
Different projects require different formats. Many reports are
written, while others are presented orally, with a set of slides to
support the presentation. They may be done in a corporate
setting, or by students as part of a class assignment. Here, we
focus on self-produced, written, academic assignments.
Many professional (or aspiring) economists prefer not to use
Microsoft Word or PowerPoint. (Instead, they often use LaTeX,
which supports mathematical equations, for documents.
Presentations are produced in a related software called
Beamer). I’m not going to get into that here. While much of this
explanation is for Word, much of it applies universally, no
matter what software you use.
First, while many academic papers (particularly unpublished
working papers) place all tables and graphs at the end, a
macroeconomic report should have all these elements in the
main body, as close to the text that references them as possible.
Each table or graph can be placed on a separate line, with a
paragraph space (return) above and below.
Make sure you have a concise, clear title (above or below the
element). You can be concise with this. I have seen lengthy
explanations of multiple variables in table footers; this is mostly
redundant and can be avoided. A good idea is to number tables
or figures (these can have separate numbers, such as “Table 1”
and “Figure 1”). Here, I number all of mine together. In your
text, make sure you refer to tables by their number, as in
“Figure 4 shows…” rather than “The table above shows…”.
Another option for image placement
is to embed it directly in the text. This
might be a good idea if you have wide
paragraphs or are making your own
document (such as a market report).
One trick is that since images “bump”
the text, you have to format them.
right-click on the image, choose “Wrap
Text,” and choose “Tight.” The volatility series from 4.12 is
repeated here, but it clearly needs fixing. There’s not enough
space for a title, and it is very close to the text on its left. For
38
that reason, I recommend making larger graphs in separate
lines, without wasting too much space.
Make sure you crop your images if necessary. Right-
clicking the image shows the “crop” tool (which has an
icon similar to the one here on the right). You can leave
some space on each side, but don’t cut off anything important.
Your text itself can either be single- or double-spaced.
Double-spacing is for editing purposes, so that you can make
corrections (or get comments) in the white space. This is good for
a class paper, particularly if this format is requested as part of
the assignment. But if you take a look at any book, magazine, or
professional document, there isn’t any extra space—just enough
so that the text isn’t crowded. Some of these design
considerations are explained later.
Specific length requirements are determined by the nature of
the analysis project, as well as by a course professor or academic
journal. Resist the temptation to excessively pad a document, or
to cut too much to get below a certain maximum. While you
should read and re-read your document for any redundant
phrases or sections, make sure you explain everything that you
need to. Another problem that writers sometimes have is that
they assume they wrote something, when really they just
thought it. Look through your document with this in mind.
39
resolution the image. If you’ve ever seen a “blown up” picture
that looks boxy or blurry, it is because the resolution is too low.
40
sure you have zoomed in as much as you can before you take the
snapshot, so that your pasted image is as clear as possible.
41
artistry. Still, trust your instincts or try to model other
documents you have seen.
Choosing common fonts might not seem very exciting, but it
ensures that your reader will see exactly what you see. If their
computer doesn’t have it, their software might substitute a
similar, but related, font. This might cause weird spacing or
move things around. One way to avoid this is to save your
document as a .pdf in a way that preserves the original fonts.
Make sure your text is the right size for its purpose. Usually,
10- or 12-point font is acceptable for academic papers. Some
fonts are “wider” than others. College professors often specify
fonts and point size, because Courier New takes more space to
write the same words as does Arial. And while many an attempt
has been made to stretch a short paper with 36-point fonts, it is
pretty obvious. Outside of this format, though, you will want to
make sure your reader’s needs are met. For example, font sizes
smaller than 6 points are often illegible, so large tables might
need to be formatted accordingly, on multiple pages.
You also might want to change the default spacing between
lines and paragraphs. There are options in-between single- and
double-spacing. If your leading is too small, the bottoms of your
letters will touch the tops of the letters in the line below.
Likewise, too large of a leading might result in too much white
space. I generally don’t put any extra space between paragraphs;
this requires me to change Word’s default settings. I do put
space around tables, figures, and headers and footers.
Word’s default margins are 1 inch on each side. You might
wish to adjust these—here, I have 1.5 inches on each side except
the left, which is 2 inches to allow for binding. But, because I
single-space, there are more words per page (about 350) than if
it were formatted like a term paper (which would have about
250 words). Sometimes, if you have a large table, you can reduce
the margins to get it to fit. Make sure you don’t go below 0.5
inches, though, because it might not print properly if you do.
One book I like is Document Design: A Guide for Technical
Communicators by Miles Kimball and Ann Hawkins. There are
a number of similar books, which are used by designers. Even if
that’s not your goal, it’s good to know about color, type, and
42
layout if you are putting together a document that you want
people to read.
43
PROOFREADING
You also want to make sure your report is well-written and
thoroughly edited. If English is not your first language, get
someone to read it over for common issues (such as when to use
“a” and “the”), where rules don’t seem to apply.
Read through your document multiple times, for different
levels of detail. A quick read-through can help to see the “big
picture,” while going over every word can help catch tiny
mistakes. Make sure you look for things that Word’s squiggles
can’t catch, such as typing “work” for “word.” It’s not a
misspelling, so it won’t be flagged. If you mention people by
name (such as your college professor), make sure you get his or
her name right. The same is true for the person’s title and any
other details. Look it up if you have to.
If you have someone editing your work, don’t be extra-sloppy
and rely on your editor to catch everything. Not only will more
mistakes raise the likelihood that something gets through, your
editor may stop being your friend.
44
AN EXAMPLE
OIL PRICES AND OIL-PRICE VOLATILITY
Here, we combine everything we’ve discussed here—as well as
our knowledge of macroeconomics—to examine trends in oil
prices. We can go through each of the five steps:
2) GATHERING DATA
For this example, we will use monthly data from FRED. There
are multiple oil prices, but here we will use West Texas
Intermediate (WTI). A search for “WTI” will provide a number of
results, but here, we choose Global price of WTI Crude from
January 1980 to September 2016. The data are not
seasonally adjusted, but this doesn’t seem to cause too big of a
problem. Download these data as either an Excel file or a .csv.
While I have R code, which uses these data to generate similar
(but not identical!) results, on my website, it is important that
you know how to find data yourself.
Since these data are nominal, we will need to create real
values using a price index. Two options are the Consumer Price
Index (CPI) and the Producer Price Index (PPI); here we use the
45
PPI because of oil’s importance in industry. A search for “PPI”
shows that monthly data are also available; make sure you
download at least January 1980-September 2016. You can
always cut the longer series. If you download this series, you can
combine PPI and WTI into a single file. The first column (DATE)
should be 1/1/1980, followed by WTI and PPI.
3) CALCULATING VARIABLES
Here, we are going to deflate nominal WTI by the PPI (making
sure to multiply by 100). Then, we are going to create three
measures of change: Monthly log changes, monthly percentage
changes, and yearly percentage changes. Finally, we will
calculate 12-month moving standard deviations for the monthly
log-changes series.
E.2. Nominal and Real Oil Prices (dollars per barrel), 1980-2016. Source: FRED.
46
If you’re not that familiar with Excel, the formula for the
January 1980 real WTI would be 100*(B2/C2); then, you can
copy that cell and paste all cells below, or simply click the lower
right corner of that original cell to fill in the rest. I named my
new variable RWTI. If you’re curious how Nominal and Real
WTI compare, the nominal value is higher than the real value
after the “base year” (1982-1984), because price levels have
risen. Not controlling for this will make oil prices look higher
than they really are later in the series.
Next, we calculate percentage changes. We do this three
different ways, even though we are only going to use one. First,
we calculate log changes, multiplying by 100 to make it into a
percentage. In Excel, you can do this as 100*(LN(D3)−LN(D2)).
Note that the parentheses “nest” the functions, following the
order of operations. The logs are grouped, then this group is
multiplied by 100. We also create monthly percentage changes.
Assuming that Column D is your real WTI, the formula will be
100*(D3/D2−1). This is the same as subtracting before dividing,
since D2/D2 = 1. We can make annual percentage changes as
well, but we cannot do it until we have a full year of values. If
we start in January 1981, we can subtract the value from
January 1980, as in 100*(D14/D2−1). Make sure that you copy
your formula into all cells in the column, so that it updates to
100*(D15/D3−1) and so on. When you are finished, notice that
the numbers differ between the monthly and yearly versions,
since we are not annualizing them here.
Finally, we can create a rolling standard deviation of log
changes in the oil price (rather than percentage changes). Here,
we do it for 12 months of data. The first value we can calculate
is January 1981, using values beginning in February 1980. The
second value will be February 1981, using values beginning in
March 1980. The formula will be STDEV.P(E3:E14),
STDEV.P(E4:E15), and so on. The first months’ cells will be
blank, since there aren’t enough data to calculate them without
earlier months. We now have all our variables.
I named them RWTI, DLNRWTI, MOMRWTI, YOYRWTI,
and SD12DLN. These abbreviations include “R” for real, “D” for
difference, “LN” for natural log, and “MOM” for “month over
month.” SD12 signifies standard deviations over 12-month
47
windows. There are other ways to name variables, but these will
be consistent and clear.
My starting database is presented in E.3. Next, we can graph
and summarize important relationships among them.
E.4. Real WTI Oil Price (dollars per barrel), 1980-2016. Source: FRED.
48
in 2008. When doing your analysis, make sure to look for any
important patterns, and try to explain them in terms of real
events. We can see these patterns as well with our graph of log
changes in the real oil price, which shows a sharp drop in the
mid-1980s, a spike around the 1991 Gulf War, and some large
fluctuations after 2013. The source is again named in the footer.
E.5. Real WTI Oil Price (monthly log changes), 1980-2016. Source: FRED.
For these and all the Excel graphs here, I made some
important changes to the default settings. First, I made a
column of just the years (cutting off the months and days) using
“text to columns,” and used that for my date axis. I made the
axis text darker and larger, so that it can be shrunk on a page
and still be legible. I also made sure to set the interval between
dates to 60 (months), so that the values appear only every five
years. I added vertical and horizontal lines to the axes. I also
made the time-series line thinner, but that is a matter of
personal taste. I printed each figure to a 600dpi .jpg file, then
inserted it into my main document, and then rotated it and
cropped it to fit. If your goal is an academic report, you are
probably fine copying and pasting.
Your main goal is to be legible. There is no “correct” font or
line style for most academic documents, other than what works.
The rule of thumb is to look at your document in its final form
and see if you can actually read it. If it’s too small or the text is
too light, you might have to go back and adjust your figures. I
suggest right-clicking on all parts of a Excel graph—main chart
49
area, title, and horizontal and vertical axes—and looking at all
the options. Taking time to try different options helps you learn
how to do it, so it is time well-spent if you plan on doing more
graphs in the future.
Our graph of rolling standard deviations is formatted much
the same way. Not that while the series doesn’t start until 1981,
I did not adjust the dates on the graph. This makes it easier to
compare across graphs, plus it keeps the listed dates as
multiples of 5 (rather than 1981, 1986, etc.).
We see important economic patterns as well. Here, the time
periods mentioned above (1980s, 1991, 2008, and after 2013)
show large volatility in the oil price. Your analysis would seek to
explain this.
Having all our charts, we will next make two tables: First,
we will present the correlations among our three measures of
price changes. Second, we will make a summary table for WTI,
RWTI, DLNRWTI, and SD12DLN.
We first calculate the correlations among log changes,
monthly percentage changes, and annual percentage changes.
We will calculate three separate correlation coefficients, because
there are three unique pairs among our three series.
Econometric software often calculates multiple pairs’
correlations more quickly, but it’s not hard to do it with the
CORREL() formula here. Just make sure you select each column
50
for each member of the pair separately. Applying some
formatting (horizontal lines, cell spacing, and changing the font
to Arial 10) and adding names gives us the following table:
51
sake and not really discussed. There’s often not much to say, so
don’t try to milk too much of a story out of it. It is necessary to
include, however, so your reader gets a sense of the overall data.
52
First, following the standard Box-Jenkins procedure, we
establish the order of our ARIMA model using Autocorrelation
and Partial Autocorrelation Functions (ACFs and PACFs). I
printed them to .jpg files for E.9, because pasting them from
Eviews doesn’t reproduce them perfectly. I suggest going further
and cropping out everything but the bar graphs.
I tried different combinations of ARMA(p,q) from (1,0) to (2,2),
and settled on a simple AR(1) for my base model. Remember,
Eviews output looks like this (E.10):
E.10. Eviews output for an AR(1) estimation of log changes in real oil prices.
E.11. Estimation results, ARIMA estimation of log changes in real oil prices.
53
We can then estimate a GARCH(1,1) model. I use Bollerslev-
Wooldridge heteroskedasticity-consistent standard errors and
get what is in E.12:
Mean Equation
Variable Coeff. (p-value)
Constant -0.393 (0.288)
AR(1) 0.216 (0.000)
Variance Equation
Variable Coeff. (p-value)
Constant 1.339 (0.038)
ARCH(1) 0.295 (0.003)
GARCH(1) 0.730 (0.000)
AIC 6.627
54
just put it in the text). But you can make the same statements
and draw similar conclusions regardless of the methods that you
use.
55
APPENDIX
R code for recreating tables and/or graphs for the Japanese yen,
Madison employment, and the WTI exercise are available at
www.scotthegerty.com or bit.ly/2SlGXU0.
Software
www.r-project.org www.Eviews.com
gretl.sourceforge.net www.stata.com
56
Appendix: Data Sources and Variables
57