Sei sulla pagina 1di 7

DSC 5190 Data Analytics for Business

Instructor: Zhixin (Richard) Kang, Ph.D.


UNC Pembroke

Homework Assignment #4: R Chapter 6

Please complete the following questions and submit/upload your answers in a word file and
two R script files (one is for linear regression, and the other is for logistic regression) for
grading.

Question 1 (50 Points): After running and understanding all the code lines in the Linear
Regression in R-A Case Study.R script file that covers linear regression modeling and
forecasting, please use the 90 Days Bank Bill Data.csv data to make a data analysis using a linear
regression model in R. This dataset studies the Australian financial market. In the data file, there
are totally 16 variables. The dependent variable is BankBillRate (90 days bank bill rate), and all
other 15 variables are the independent (predictor) variables. The purpose of this study is to
explore whether the 90 days bank bill rate is impacted by the other 15 predictor variables in a
linear regression framework.

This dataset spans from October 1991 to August 1997 and was collected and posted by Pritchard,
F., and Dixon, G. (1997) from the Australian Market Quote - AAP Financial Markets (Data
Source: http://www.statsci.org/data/oz/bankbill.html).

After implementing the linear regression modeling and forecasting process in R, please copy and
paste all the results, including the histogram, density, and box plots for the dependent variable,
into a word file. In the word file you submit, please also answer the following questions:
1. Are all the independent variables statistically significant in the linear regression model (Clue:
look at the t-values of estimated coefficients in the linear regression model)?

Per the absolute t-values of estimated coefficients in the linear regression model, none of the
independent variables are significant. Below are the results of the t-value. All values are far less
than the 1.96 absolute value indicating the coefficients are statistically significant at 5%
significance level.
2. How well does the linear regression model perform in forecasting the 90 days bank bill rate
using these 15 independent variables (Clue: look at the min_max accuracy and the MAPE
measure, as illustrated in the Linear Regression in R-A Case Study.R script file)?

The min_man accuracy is .9427054. This tells us the accuracy is very high being it is close to
1. Additionally, the MAPE value is .05922598. The smaller this number is, the better the
forecasting performance is. In this case, the average error is about 5.92% of the observed values.
This is a good accuracy.

Please upload your R script file for the linear regression modeling and forecasting process in this
question.

Question 2 (50 Points): After running and understanding all the code lines in the
LogisticRegressionInR.R script file that covers logistic regression modeling and forecasting,
please use the Titanic Dataset.csv data to make a data analysis using a logistic regression model
in R. This dataset studies the famous Titanic tragedy in her first voyage. In the data file, there are
totally 12 variables. The dependent variable is the binary variable, Survived. In the remaining
variables, not every variable will be the independent variable. Please only choose the following
variables in your logit regression model: TicketClass, Sex, Age, SiblingsOrSpousesAboard,
ParentsOrChildrenAboard, Fare, and PortOfEmbarkation. Among these independent variables,
TicketClass, Sex, and PortOfEmbarkation are categorical variables, and Age,
SiblingsOrSpousesAboard, ParentsOrChildrenAboard, and Fare are continuous numeric
variables. The purpose of this study is to explore whether these independent variables (factors)
can predict a passengers survival status (survived or not) using a logit modeling framework.

Data Source: https://www.kaggle.com/datasets.

After implementing the logistic regression modeling and forecasting process in R, please copy
and paste all the results, including the histogram, density, and box plots for any one of the three
independent numeric variables, into a word file. In the word file you submit, please also answer
the following questions:

Age:
1. Are all the independent variables statistically significant in the logistic regression model
(Clue: look at the z-values of estimated coefficients in the logistic regression model)?

Per the z-values of estimated coefficients, Ticket Class (3) and sex variables are statistically
significant in the logistic regression model.

2. How well does the logistic regression model perform in forecasting the Titanic passengers
survival status using these independent variables

The logistic regression model has an accuracy rate of 89.31% accuracy. This is sufficient
accuracy to use data.

Potrebbero piacerti anche