Sei sulla pagina 1di 21

ECON20003 – QUANTITATIVE METHODS 2

TUTORIAL 1

During the tutorial class work in your own pace. If you run out of time or miss the class for some
reason, make sure that you catch up by the next tutorial.

After you have completed the tutorial exercises you might start working on the “Exercise for
assessment” during this class. Note that your solutions and answers to the additional exercises
might be collected and marked by your tutor during the next class, so copy and save everything
in a Word document and bring a hard copy ready to be handed in the next tutorial.

The first and second tutorials serve as an introduction to R and RStudio.

About R and RStudio

In QM2 tutorials some of the calculations will be performed manually (i.e. with your Casio fx-
82 calculator) to get a feel for statistics and to help you understand the various statistical
procedures, but the emphasis will be on learning how to use two free and open-source software
programs.

The first program is R, which was originally created by Ross Ihaka and Robert Gentleman in
the 1990s at the University of Auckland, New Zealand, and is currently developed by the R
Development Core Team. It is a programming language and environment for statistical
computing and data visualization. The second program is RStudio, an integrated development
environment (IDE) for R. Among others, it comprises the R environment, an advanced text
editor, and the help system of R. RStudio itself does not perform any statistical operations, they
are performed by R in the background, and R can be perfectly used without it. However,
RStudio makes working with R much easier as it makes more convenient to set up a working
directory and access files on the computer, to write and execute R codes, and to view and use
the various available R objects.

R has a text-based interface where one can enter R commands. These commands are
processed by R and the results (if any) are printed on the screen. This interface is not as
elegant and convenient than the point-and-click interface of popular proprietary statistical /
econometric packages, like e.g. that of EViews, making it relatively hard to learn how to use R.
Although RStudio certainly flattens the learning curve, you might still wonder, why to bother to
use R and RStudio at all, why not to use instead some more stylish program like EViews, or
even Microsoft Excel that most students already have on their computers and are familiar with.

The main advantage of R and RStudio compared to EViews is that anybody can download and
install them on his/her computer free of charge. This way students can work on the tutorial
exercises and assignment questions out off campus any time. As for Excel and similar
spreadsheet programs, they do not naturally lend themselves to ‘serious’ data analysis. They
1
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
can be convenient for basic statistics but using spreadsheets for more complex calculations is
cumbersome and generally a bad idea in the long run. By striving to do so you would dig
yourselves into a very deep hole.

R is an open-source program. On the one hand, this is a disadvantage because it means that
there is no single company that would look after the maintenance and development of R. That
is why R is not as neat as commercial packages. On the other hand, it is an advantage because
it makes R highly extensible. Being an open-source program, R has a vast community both in
academia and in business who create and maintain literally thousands of well-documented
extension packages for a wide variety of statistical and graphical techniques greatly extending
the base functionality of R. These packages are available free, they can be downloaded from
a worldwide repository system, called the Comprehensive R Archive Network (CRAN). In
October 2019, it featured more than 15000 contributed packages.1

R and RStudio are installed in the computer labs where the QM2 tutorials are held. If you wish
to have them on your own computer, read the next 2.5 pages that briefly describe how to
download and install R and RStudio on Windows. Otherwise, just skip the rest of this section
and move on to the next one, ‘Getting Started with R and RStudio’.

To download R onto your computer, visit the Comprehensive R Archive Network (CRAN)
website (https://cran.r-project.org/) and select the download link per your operating system.

If, for example, you use Windows, select Download R for Windows. Then, select ‘install R for
the first time’ (see next page) to download the latest version of R, which in January 2020 is R-
3.6.2.

1
To get some idea about the ever-expanding library of R extension packages, you might wish to visit the following
site: https://rviews.rstudio.com/2019/07/24/june-2019-top-40-r-packages/.
2
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
On the new screen page (see below), click ‘Download R 3.6.2 for Windows’, choose ‘Save
File’, and once the R-3.6.2-win.exe file is downloaded, double-click on it to install R. Follow the
instructions, there is no need to change the default installation parameters.

Although R is a fully functional standalone program, RStudio can assist in writing, compiling,
debugging and executing R codes. For this reason, it is a good idea to have RStudio too on
your computer.

3
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
To install RStudio, visit https://www.rstudio.com/products/rstudio/download/, click ‘Download’
under the free ‘RStudio Desktop - Open Source License’ option, choose the installer for your
operating system and just follow the instructions.2, 3

Getting Started with R and RStudio

When both R and RStudio are installed on a computer, R can be used either outside or inside
RStudio.

Although for most users it is more convenient to use R from within RStudio, for the sake of
illustration, launch R first from the Start menu or by clicking on its shortcut on the desktop, to
check whether it has been installed properly. A window, like the one on the next page, should
appear on your screen. It shows the RGui (R Graphical user interface) window with the R
Console in it, a panel in which you can type R commands, submit them for execution, and view
the results.

As you can see in the last paragraph in the R Console, you can access some demos or get
some help by typing demo(), help() or help.start() behind the red > symbol. Do not worry about
them at this stage, just type q() and press Enter to quit R.

2
Note that if you are on a 32-bit system, you need to download and install some older version of RStudio.
3
Make sure to have R on your computer before installing RStudio.
4
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
In return, R displays the following dialog box:

The workspace is your current R working environment. It is a snapshot of your work to the point
of saving it and it includes all objects that you created during the current session, or have
loaded from a previous session.

5
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
If you click the ‘Yes’ button, R saves an image of your workspace and reloads it automatically
the next time you start R. This is often a convenient option, especially if you have not completed
your project yet, just suspend it. However, even in this case it is probably better to save the
objects you intend to keep time after time during every session and not to wait till you quit R.
This time you do not have any object in your workspace yet, so just click the ‘No’ button.

Now launch RStudio from the Start menu or by clicking on its shortcut on the desktop and wait
for its window to appear.4

4
Do not worry if some of the details on your screen are not the same than on the screenshot below. You probably
have different Global Options.
6
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
The Main Menu, just below the Title Bar, is a set of drop-down menus titled File, Edit, Code,
View, Plots, Session, Build, Debug, Profile, Tools, and Help. These (sub-)menus can be used
the same way than drop-down menus in general in other Windows programs.

When the program is launched the first time on a computer, under the Title Bar and the Main
Menu, RStudio displays three panels or windows: the Console/Terminal/Jobs panel (left), the
Environment/History/Connections panel (top-right), and the Files/Plots/Packages/Help/Viewer
panel (bottom-right).

RStudio has a fourth panel as well, but at this stage it is hidden by default. To open it, click the
File drop-down menu, choose New File / R Script. You should now have two panels on the left
half your screen: the new Source panel (top-left) and the Console/Terminal/Jobs panel (bottom-
left).

7
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
The Source panel (top-left) serves as a built-in text editor that allows you to create a new R
script or to open a file containing an existing R script. An R script, in general, is just a text file
with the R extension that keeps a record of your R code. By default, every new script created
by RStudio is Untitled. If you save it, your code will be available when you re-open RStudio.

The Console/Terminal/Jobs panel (bottom-left) is for entering proper R commands that execute
immediately and for viewing the output (Console tab), for providing access to the system shell
directly from the RStudio IDE (Terminal tab), and for running R scripts in batch mode in the
background while the user is working on a separate R session interactively5 (Jobs tab).

The Environment/History/Connections panel (top-right) shows the list of R objects (i.e., data
frames, arrays, values and functions) you have created in the Console during your R session
(Environment tab), the history of all previous commands (History tab), and all the existing and
currently active connections to supported data sources (Connections tab).

Finally, the Files/Plots/Packages/Help/Viewer panel (bottom-right) has a navigable file manager


that shows all the files that are currently available in the working directory (Files tab), displays
the plots and charts you have created (Plots tab), shows the R packages that are installed on
the computer and those that can be installed (Packages tab), is for searching the R
documentation for help directly from RStudio (Help tab), and allows users to view local web
content (Viewer tab).

Before you begin working in RStudio, a working directory must be set up. It is just a folder, the
default location for all project files (input data-sets, plots and other objects) read into R and
saved out of R.

To check the current working directory, click on the Files tab. As you can see on the screenshot
below, my working directory is at > F: > Dady > Teaching > Quantitative Methods 2 > Tutorials
> 2020 > R.

5
In batch mode, a series of commands are run to completion without manual intervention, while in interactive
mode the user types an instruction into the command line, the instruction is executed, the result is displayed on-
screen, and then the user is prompted to enter the next command.
8
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
You should have your working directory at a convenient place where you can easily find it. Once
you have set up a folder there, you can set the working directory by following the Session / Set
Working Directory / Choose Directory… menu steps and navigating to your folder.6

To check the current working directory, in the Console type:

getwd()

Then, click on the Help tab to open the Help Home page (see the screenshot below). It provides
links to extensive online help both for R and RStudio. Look at some of the options, for example,
the RStudio Cheat Sheets and the Search Engine & Keywords links.

6
Alternatively, the working directory can be set by executing the setwd(“file location") command.

9
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
Finally, quit RStudio by following the File / Quit Session… menu steps.7 RStudio will ask you
whether to save an image of your workspace to your working directory. Since you have not
done any work yet and thus there is nothing to be saved, just click on the Don’t Save button.

7
Alternatively, you can type q() in the Console.
10
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
Basic Data Handling

Exercise 1

Consider the table below. It displays the name, gender, age (year), height (cm) and weight (kg)
of six teenagers. Each row is a case and each column is a variable. Age, Height and Weight
are quantitative variables, while Name and Gender are qualitative variables as they are not
made up of meaningful numbers but letters.8 Enter this data into RStudio.

Name Gender Age Height Weight


(year) (cm) (kg)
Alfred M 14 175 51
Alice F 13 142 38
Barbara F 14 157 46
Henry M 15 170 61
John M 16 178 75
Sally F 16 160 54

Launch RStudio. It is highly recommended to create a new RStudio project every time you start
working on a new tutorial exercise.

An RStudio project is a working directory designated with a RProj file that stores the workspace,
command history and source documents in one place together. Projects are not mandatory for
working in RStudio but they are useful as they make it straightforward to divide your work into
multiple contexts and to separate them from each other.

To create a new project, click File / New Project…. The program will ask you whether to start
the new project in a brand new working directory or associate it with an existing working
directory (see the first screenshot on the next page).9

8
Note that even if we used some numbers to denote the possible categories, e.g. 1 for female and 2 for male,
Name and Gender still would be qualitative variables because we could not use them in any meaningful calculation.
9
There is a third option as well, “Checkout a project from a version control repository”, but it is not relevant for us.
11
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
By default, an RStudio project inherits the name of the folder where it is saved in. Hence, to
keep every project separate, it is necessary to start every new project in a new working
directory.

For this reason, select the first option, New Directory, and then Project Type: New Project. In
the opening dialogue window (see on the next page) enter t1e1 in the Directory name box, your
preferred root directory, i.e. “file/path”, in the Create project as subdirectory of box, and click on
the Create Project button.

In return, RStudio creates a new folder named t1e1 in your working directory and saves the
t1e1.Rproj project file in it (see the second screenshot on the next page).

12
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
Next, create a new script by following the File / New File / R Script menu steps and then click
on the save icon or follow the File / Save As… menu steps to save the Untitled script under a
new name, say t1e1.

You should always save every new untitled script under a unique name before even starting to
type in it to make sure that you do not lose it unexpectedly if something crashes on your
computer while you are using R.

Now you should have two items on the Files tab with the same name, t1e1, but different
extensions, Rproj and R. They are both saved in the t1e1 folder of your working directory (see
the screenshot on the next page).
13
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
Having initialized the project, we are now ready to enter the data into RStudio. Yet, before doing
so, it is important to introduce a few concepts and definitions.

In every software and programming language the various pieces of information or data used in
a program need to be stored in some reserved memory locations. R does not provide direct
access to these locations but offers several specialized data structures, called objects. R is an
object-based program, everything is treated as an object and is referred to through some
symbol or variable. The symbols themselves are also objects and can be manipulated in the
same way as any other object.

The five most frequently used R-objects or data structures are atomic vector, list, matrix, data
frame, and array. They can be classified by their dimensionality (1 dimensional: atomic vector
and list), 2 dimensional (matrix and data frame), or n dimensional (array), and whether they
contain a single type of contents (homogeneous: atomic vector, matrix and array) or different
types of contents (heterogeneous: list and data frame). Hence, atomic vectors and lists are both
one dimensional, but the former contain a single type of contents while the latter contain
different types of contents. Similarly, matrices and data frames are both 2 dimensional, but the
former contain a single type of contents while the latter contain different types of contents.

R distinguishes six basic data types: character (e.g. “Laszlo”, “True”, “3.14”), numeric10 (e.g.
201, 3.14), integer (e.g. 2L where L is the integer function that forces 2 to be stored as an
integer), logical (TRUE, FALSE), complex (e.g. 2+3i, where i is an imaginary number defined
as the square root of -1), and raw (used to store the data 'byte by byte').

An atomic vector is the simplest type of data structure. It is a one-dimensional array of


contiguous cells containing a single type of data. For example, (1,2,3) is a numeric atomic
vector and (“one”,”two”,”three”) is a character atomic vector. All other R-objects are built upon
atomic vectors. For example, {(1,2,3), (“one”,”two”,”three”)} is a list object that combines a
numeric atomic vector and a character atomic vector.

10
It is also known as double.
14
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
In our example, there are five variables (Name, Gender, Age, Height, Weight) and six data
points for each. The first two variables, Name and Gender, are qualitative (also known as
categorical) and the observations on them can be stored in two character-type atomic vectors.
The other three variables (Age, Height, Weight) are quantitative (also known as numerical) and
the observations on them can be stored in three numeric-type atomic vectors.

A data set can be entered in RStudio either by typing it straight from the keyboard in an RStudio
spreadsheet or by importing the data previously saved in the native R format or in a foreign file
format. Although you will usually import data from Excel spreadsheets, it is useful to start with
entering our small data set from the keyboard to an RStudio spreadsheet.11

There are two places for typing commands in RStudio, the Source panel and the Console tab.
The main difference between these two options is that the Source panel is just a built-in text
editor, RStudio interacts with R via the Console. Hence, a command that you type in Source is
not evaluated after hitting the Enter key. You need to instruct RStudio to do so by highlighting
the code in the Source panel that you want to be evaluated and clicking on the Run button on
the top right of the Source panel. In return, RStudio sends the highlighted code to the Console,
where R evaluates and executes it.

A command that you type straight in the Console gets executed automatically after you press
Enter. For this reason, typing commands straight in the Console might seem to be the more
convenient and better option, especially when the code is very short. However, not every code
can be executed interactively, i.e. command-by-command, some of them must be entered in
the Source panel and executed in batch mode. Moreover, the content of the Console panel is
not editable, so if you make a mistake in typing your code into the Console, you need to re-type
everything all over again. Conversely, you can edit your code in the Source panel and save it
for future use. For these reasons, it is highly recommended to use the Source panel rather than
the Console tab right from the start.

To enter our data from the keyboard to an RStudio spreadsheet, type data.entry(1) in the
Source panel and click on the Run button in the menu bar of the Source panel.

11
In Exercise 3 of Tutorial 2 you will learn how to import the data from an Excel spreadsheet to an RStudio
spreadsheet.
15
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
In return, RStudio echoes the command in the Console and opens the Data Editor window (see
the screenshot on the next page).

The RStudio Data Editor looks like an Excel spreadsheet with rows and columns. The active
cell is highlighted by thickening its borders and you can navigate in the spreadsheet by using
the left/right and up/down arrows on your keyboard. There is no constraint on the number of
rows or columns currently in use, the grid is scrolled automatically when you reach the last
visible column or row in the spreadsheet.

As you can see, at this stage both the first variable and its first value are “1”. To rename the
first variable to Name, click on the first header cell, and in the opening Variable editor dialogue
window enter variable name: Name, specify that it has character type values and click X in the
upper right corner (see on the next page).

16
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
When you name an R object, you must keep in mind the following rules:

(i) a name can be a combination of letters, numbers and a few special characters, but it
cannot start with a number;
(ii) a name can contain neither the ^, !, $, @, +, -, ?, * special characters nor spaces;
(iii) R is a case sensitive language, so variable ‘A’ and variable ‘a’ are treated as two
different variables in R;
(iv) If you name a new object with the name of an existing object, R overwrites any previous
information stored in the existing object without warning or asking for permission.

Enter the data for Name by typing in the names of the teenagers one by one and hitting return
after each. If you need to navigate, just use the up and down arrows. If an entry does not fit in
the cell, right-click on the column and select the Autosize column option from the opening drop-
down menu.

To create the second variable, click on the second header cell. In the opening Variable editor
dialogue window enter variable name: Gender, select again type: character, and type in the
genders of the teenagers. Create the remaining three variables, Age, Height and Weight,
similarly, except that choose type: numeric for each. This is important because otherwise the
program would treat the values of these variables as strings of characters rather than numbers
and it would be impossible to perform any arithmetic operation on them.

At this stage your data editor should look like this:

17
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
If you are satisfied with what you see on your screen, close the Data Editor window by clicking
X in its upper right corner or by following the File / Close menu steps. Either way, the data set
is added to your R working environment and displayed in the alphabetic order of the variable
names on the Environment tab of the top-right panel:

As you can see, the Environment tab displays not only the names of the variables, but also the
type (num or chr), the length ([1:6]) and the elements of the atomic vectors.

Any time you enter some data to RStudio, it is recommended to save it. RStudio has four save
toolbar buttons. Two of them are on the left in the main menu bar, the third is below them on
the Source panel, and the fourth is on the right in the Environment/History/Connections panel.
The left save buttons are for saving the actual script you are working on or all open documents,
respectively, while the right save button is for saving your environment (that is, open files,
loaded variables, loaded libraries).

Save this data set as t1e1 in your preferred location, “file/path”, using the Save button on the
right of your screen in the Environment tab. RStudio echoes this command in the Console

18
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
save.image("file/path/t1e1.RData")

This shows that RStudio saved the file named t1e1 in RData format, which is specific to R and
can store unlimited number of R objects within a single file. It contains not only your data set,
but your entire workspace, i.e. your R working environment including opened files, loaded
variables and libraries. You should now see this new file as well on your Files tab.

Quit RStudio by following the File / Quit Session… menu steps. RStudio warns you that your
RData and R files have unsaved changes.

Since you have just created the t1e1.RData file and have not executed any command since
then, there is no need to save the Workspace image. Hence, deselect the first file and click on
the Save Selected tab.

19
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
After RStudio has shut down, open Windows File Explorer and check the t1e1 folder on your
hard drive or USB key, wherever it is. You should have 4 items in the folder: an unnamed
RHistory file, an R file, R Workspace file and an R Project file (see the second screenshot on
the next page).

Finally, relaunch RStudio. By default, RStudio returns to the latest project, in this case
t1e1.Rproj. Hence, it displays the t1e1.R file in the Source panel and shows the content of the
t1e1 project folder on the Files tab.

Quit again RStudio without saving anything.

20
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1
Exercise for Assessment

Exercise 2

One of the major measures of the quality of service provided by any organisation is the speed
with which the organisation responds to customer complaints. Last year the flooring department
of a large family-owned department store received 50 complaints about carpet installation. The
following data represent the number of days between the receipt and resolution of these
complaints.

Days 
54  35  29  2  1 
11  126  4  35  26 
12  165  27  26  74 
13  5  29  22  26 
33  137  28  123  14 
5  110  52  94  20 
19  32  152  25  27 
4  27  61  36  5 
10  31  29  81  13 
68  110  30  31  23 

a) Is the variable Days qualitative or quantitative? If it is quantitative, is it discrete or


continuous? In addition, determine its level of measurement. Explain your answers.

b) Launch RStudio and close the Script tab, if it is open. Create a new RStudio project and
script, and name both t1e2.

c) Enter the observations from your keyboard to an RStudio spreadsheet and save them in
an RData file. Quit RStudio. When prompted, save only the t1e2.R file.

d) Open your working directory. Capture your screen by taking a screenshot (Alt + Print
Screen) and paste it with your answers for part (a) in a Word document.

21
L. Kónya, 2020, Semester 1 ECON20003 - Tutorial 1