Sei sulla pagina 1di 25

LECTURE NOTES FOR

ECO232 COMPUTER APPLICATIONS II

Part One

Using R Commander
for Data Analysis

March 2015
Written by

N.Nilgn oka
1|Page

Introduction to R Commander
What is R Commander
The R Commander (Fox, 2005) provides a graphical user interface (GUI") to the open-source
R statistical computing environment. R is a command-driven system, and new users often
find learning R challenging. This is particularly true of those who are new to statistical
methods, such as students in basic-statistics courses.
R commander is free statistical software. R commander was developed as an easy to use
graphical user interface (GUI) for R (freeware statistical programming language) and was
developed by Prof. John Fox to allow the teaching of statistics courses and removing the
hindrance of software complexity from the process of learning statistics. This means it has
drop down menus that can drive the statistical analysis of data.
It also has a series of plug-ins which extend the range of application
RcmdrPlugin.Export Graphically export objects to LaTeX or HTML
RcmdrPlugin.FactoMineR Graphical User Interface for FactoMineR
RcmdrPlugin.HH Rcmdr support for the HH package
RcmdrPlugin.IPSUR Introduction to Probability and Statistics Using R
RcmdrPlugin.SurvivalT Rcmdr Survival Plug-In
RcmdrPlugin.TeachingDemos Rcmdr Teaching Demos Plug-In
RcmdrPlugin.epack Rcmdr plugin for time series
RcmdrPlugin.orloca orlocaRcmdr Plug-in
Installing R Commander
You need to first install R and then R commander. After starting R program from Package
menu select CRAN Mirror.

2|Page

Then install Packages of R Commander.


dialog box select Rcmdr.

Package

3|Page

After installing Rcmdr. You must load Rcmdr to work on it. When loading is done you will see
a screen like in picture.
Rcmdr Screen
Drop
down
menus

Toolbars
Script Window: R commands
generated by the GUI
You can type commands directly
here. Select then by highlighting and
then send the code by pressing the
Submit button (on right below the
script window)

Output Window
DARK BLUE: printed output
RED: command that was used

Message Window:
RED: Error messages
GREEN: Warnings
BLUE: Other information

Graphs will appear in a separate Graphics Device Window. Only the most recent graph will
appear. You can use page up and page down keys to recall previous graphs.
Drop down Menu item
File
Menu items for loading and saving script files; for saving output and the R
workspace; and for exiting.
Edit
Menu items (Cut, Copy, Paste, etc.) for editing the contents of the script and
output windows. Right clicking in the script or output window also brings up an
edit context menu
Data
Submenus containing menu items for reading and manipulating data.
Statistics
Submenus containing menu items for a variety of basic statistical analyses.
Graphs
Menu items for creating simple statistical graphs.
Models
Menu items and submenus for obtaining numerical summaries, confidence
intervals, hypothesis tests, diagnostics, and graphs for a statistical model, and
for adding diagnostic quantities, such as residuals, to the data set. Distributions
Probabilities, quantiles, and graphs of standard statistical distributions (to be
used, for example, as a substitute for statistical tables).
Tools
Menu items for loading R packages unrelated to the Rcmdr package (e.g., to
access data saved in another package), and for setting some options.
Help
Menu items to obtain information about the R Commander (including an
introductory manual derived from this paper). As well, each R Commander
dialog box has a Help button.

4|Page

Toolbar buttons
Data set

Edit data set


View data set
Model

Shows the name of the active dataset


Button: allows you choose among dataset currently in memory which to be
active
Allows you to open the active dataset
Allows you to view the active dataset
Shows the name of the active statistical model e.g. linear model
Button: allows you to choose among current models in memory

Menu items are inactive (ie, greyed out) if not applicable to the current context.

Manual entry
Start a new data set through Data -> New data set
Enter a new name for the dataset -> OK

Note: the name cannot have spaces in it


Note: R is case-sensitive hence mydata MyData
A data editor window where you can type in your data using a typical spreadsheet format.
Each row corresponds to an independent object e.g. a subject on which a measurement was
made.

Define the variables (column) by clicking on the column label and then in the resulting dialog
box enter the name and type. Where type can be numeric (quantitative) or character
(qualitative). Click on the x in the right hand corner to close this dialog box.
This data frame is then the active dataset for R commander.

5|Page

Import from text file


Note: the data file will need to be organized as a classic data frame. Each column represents
a single variable e.g. glucose level. Each row represents an individual. The header
information needs to be contained in a single row.
Data -> Import data -> from text file

Chose a name for the new dataset (note you cannot have spaces)
Specify the characteristics of the data files (e.g. commas for csv files) -> OK
Browse and select the file/Open
Once data is imported you should double-check the file was read-in correctly:
Message window: are there any errors?
Do the number of rows and columns look as expected?
View the data via View data set button

6|Page

Using R for Data Analysis


The Simple Linear Regression Model
DATA INPUT
Import from Excel
Data files can be read in from Excel, however they often have issues. It is recommended that
instead the file is converted to a text file and then import as detailed in R Commander
Screen

Opening Food.xls File in R Commander. Import from Excel, Access or dBase DataSet

Before importing data from


Excel, it is important to give a name to this dataset. In our example we prefer to give name
as Food then confirm by OK.
After confirming you will see OPEN dialog box appears to select your file to work on. Our file
will be food.xls

On R Commander window you will notice that DataSet to be Food on Toolbar area. There
you can click to view dataset to see your Food data files.
7|Page

View Data set will open Food file in different window. There you can check for your DataSet
that is correct and same as in Excel.

Numerical Summaries
In R Commander you can get Statistics of your Data set. For this you must select Statistics
from Menu bar.

Summaries Numerical Summaries will give you some statistics of your DataSet.

8|Page

On Numerical Summaries dialog box you must select variables to get summaries. You can
either select one or more.

Results will come to Output area of R Commander window. To understand the results, you
may find what parameters mean in the table.
Understanding the
output: parameter
mean
sd
N
NA
0%
25%
50%
75%
100%

What is it?
Measure of central tendency
Standard deviation - a measure of variability in the data
Number of readings
Number of missing values
Minimum value
The value below which 25 percent of the observations may be
found.
The value below which 50 percent of the observations may be
found.
The value below which 75 percent of the observations may be
found.
Maximum value
9|Page

Plotting the Food Expenditure Data


Identifying the outlier Graphs -> Index Plot

10 | P a g e

You can either use Options in <auto> mode or you can write labels for y-axis and Graph title.
But do not forget to change the title and the label for your future works.

Select the variable of concern


Tick identify observations with mouse
Look at the graphical output and click the mouse on the observation that is the outlier for it
index number.

11 | P a g e

XY Conditioning Plot

To get XY Conditioning Plot click to Graphs menu select XY Conditioning Plot Command.

On XY Conditioning Plot dialog box you must select variable for dependent and independent
variables for your Dataset.

12 | P a g e

The Plot will be seen in R GUI window.


You can compare the graphics both in Excel and R.

13 | P a g e

Food Expenditure Chart in Excel

food_exp
700
600
500
400
300

food_exp

200
100
0

10

20

30

40

Scatterplots
Graph -> Scatterplot

Select the variables for x-axis and y-axis


Enter the name for the x axis label and the y axis label
If you wish the x or y axis can be logged.
Jitter: this is useful when there are many data points to see if they are overlaying, as a
function is used to randomly perturb the points but this does not influence line fitting.
Least-square line can be selected to fit a best fit linear regression line.
14 | P a g e

Plot by groups will allow a selection of a categorical variable such the scatter plot will use
colour to distinguish groups by the categorical variable and fit regression lines independently
for each group. Interpretation of the output?

The dotted line: is the best fit linear regression


15 | P a g e

The solid line: is loess line. A loess line is a locally weighted line and is used to assess
whether the assumption of linearity is appropriate. Visually you are looking to see whether
the loess line suggestions a significant deviation from the linear.
The box plots give an indication to the spread of each variable independently.

Explanatory and Response variables


Remember that a statistical model attempts to approximate the response variable Y as a
mathematical function of the explanatory variables X1, . . . ,Xn.
Estimating A Simple Regression
Estimating a simple regression model is so important to give idea us about the relationship
of our Dataset. To estimate linear regression click Statistics menu on R Commander window.
Our command will be Fit Models
Fit Models Linear regression

On Linear regression dialog box you can give name to your model. Here it uses RegModel.1
(you can either change it or keep)

Then select variables as response (dependent) and explanatory (independent) variable.


16 | P a g e

Here in our example response variable is food_exp, explanatory variable is income.

You will see the results in Output window.

Saving Output for Text


When working with R, I would advise you to change your working directory to a specific
folder. For your each work, with different dataset,create a different working directory.

When working with R it is important to save results in a text file. For this File menu, Save
output as must be selected.

17 | P a g e

You could see every detail in the text file.

Plotting A Simple Regression


Estimating regression gives us idea for our dataset. Plots will give us visual expressing. For
this we will select models menu.

18 | P a g e

Graphs Basic diagnostic plots

19 | P a g e

Saving Workspace

You must save Script as; R Markdown files as; R workspace as in Food folder.

Correlation
As we know the correlation is a measure of the strength (and direction) of the Linear
relationship. To get correlation matrix you simply click to Statistics menu Summaries and the
Correlation matrix.

On correlation matrix dialog box you must select variables (both response and explanatory
variables).

20 | P a g e

You will get the results again in output window.

21 | P a g e

Histogram in R Commander
Finding Histogram is different in R than in Excel. In Excel you need to create BIN column but
in R Commander there is no need to create a BIN column. But in R commander you need to
Add Observation to your model.

To do this you need and to be sure of you are working with Food dataset and the model to
be RegModel.1, otherwise you could get the results for any other dataset.

Adding observation select Models menu Add observation statistics to data

22 | P a g e

Add Observation Statistics to Data dialog box you need to click Residuals for histogram.
After adding residuals to your model it is time to select Graphs menu Histogram command.

On Histogram dialog box you need to select residuals.RegModel.1 variable to get


histogram plot.

Do never make Histogram without Adding observations to your work.


After analyzing your Dataset you can create Fitted values plot for your regression. For this
you again need to Add Observation to your statistics.

23 | P a g e

Models Add Observation Statistics to Data


On dialog box this time youll select all observations.

Graphs menu, Scatterplot command.


In Scatterplot dialog box select X and Y variables. X variable will be fitted.RegModel.1, Y
variable will be food_exp data.

If you would select correctly your graphic will be same as shown as in picture.

24 | P a g e

25 | P a g e

Potrebbero piacerti anche