Midterm Data Analystics (Fall 2016-17) - v1 - B

TUNIS BUSINESS SCHOOL
Fall 2016-2017
Midterm Exam: Data Analystics

Date: January 17th, 2017
Professor name: Amor Messaoud
Time: 11.30 AM – 1.00 PM
Number of pages: 7
INSTRUCTIONS
1. Books and notes are not permitted.

2. Calculators are allowed.
3. Cell phones are not allowed.
Questions (30 points)
1. In the multiple regression model, the t-statistic for testing that the slope is
significantly different from zero is calculated
a. by multiplying the p-value by 1.96.
b. using the adjusted R2 and the confidence interval.
c. by dividing the estimate by its standard error.
d. from the square root of the F-statistic.
2. Analysis of variance is a statistical method of comparing the ________ of several
populations.
a. standard deviations
b. variances
c. means
d. proportions
e. none of the above
3. To determine whether the test statistic of ANOVA is statistically significant, it
can be compared to a critical value. What two pieces of information are needed to
determine the critical value?
a. sample size, number of groups
b. mean, sample standard deviation
c. expected frequency, obtained frequency
Tunis Business School Fall 2016-17 Page 1/7

d. Mean Square Between Groups, Mean Square Within Groups
4. A regression analysis between sales (in $1000) and price (in dollars) resulted in
the following equation:
Sales = 50,000 – 8 Price
The above equation implies that an
a. increase of $1 in price is associated with a decrease of $8 in sales

b. increase of $8 in price is associated with an increase of $8,000 in sales
c. increase of $1 in price is associated with a decrease of $42,000 in sales
d. increase of $1 in price is associated with a decrease of $8000 in sales
5. Data used to build a data mining model.
a. validation data
b. training data
c. test data
d. hidden data
6. Logistic regression is a ________ regression technique that is used to model data
having a _____outcome.
a. linear, numeric
b. linear, binary
c. nonlinear, numeric
d. nonlinear, binary
7. Simple regression assumes a __________ relationship between the input attribute
and output attribute.
a. linear
b. quadratic
c. reciprocal
d. inverse
8. The correlation coefficient is used to determine:
a. A specific value of the y-variable given a specific value of the x-
variable
b. A specific value of the x-variable given a specific value of the y-
variable
c. The strength of the relationship between the x and y variables
d. None of these
9. In the multiple regression model, the least squares estimator is derived by
a. minimizing the sum of squared prediction mistakes.
b. setting the sum of squared errors equal to zero.
c. minimizing the absolute difference of the residuals.
d. forcing the smallest distance between the actual and fitted values.
10. In ANOVA, if the sample standard deviations within all four groups under study
are approximately equal, we should:
a. immediately decide not to reject the null hypothesis
b. immediately reject the null hypothesis
c. mildly reject the null hypothesis
d. redo the test, since the standard deviations are equal

e. there is not enough information to make a decision on whether to reject
or not reject the null hypothesis
Problem #1 (10 points)
A consumer organization wants to know whether there is a difference in the price of a

particular toy at three different types of stores. The price of toy was checked in a sample
of 5 stores for each type of store. Complete the ANOVA table and decide.
Discount Variety Department

$12 15 19
13 17 17
14 14 16
12 18 20
15 17 19
ANOVA
price
Sum of Squares df Mean Square F Sig.
Between Groups 63,333 2 ,001

Within Groups 28,400 12
Total 91,733 14

Tayko Software is a software catalog firm that sells games and educational software. It started out
as a software manufacturer, and added third party titles to its offerings. It has recently put
together a revised collection of items in a new catalog, which it mailed out to its customers. This
mailing yielded 1000 purchases. Based on these data, Tayko wants to devise a model for
predicting the spending amount that a purchasing customer will yield. The data set contains the
following information on 1000 purchases:

Codelist
Var. # Variable Name Description Variable Type Code Description
1. US Is it a US address? binary 1: yes 0: no
17. Freq. Number of transactions in last year at numeric
source catalog
18. last_update_days_ago How many days ago was last update numeric
to cust. record
20. Web_order Customer placed at least 1 order via binary 1: yes 0: no
web
21. Gender=male Customer is male binary 1: yes 0: no
22. Address_is_res Address is a residence binary 1: yes 0: no
24. Spending Amount spent by customer in test numeric
mailing ($)
The following R output presents the results of the regression analysis.
1. Obtain the estimated regression model.

2. Is the model valid?
3. What is the value of the coefficient of determination? Interpret.
4. What are the significant attributes?
5.

Universal Bank would like to know which customers are likely to accept a personal loan.
What characteristics would forecast this? If the bank were to consider expending
advertising efforts to contact customers who would be likely to consider a personal loan,
which customers should the bank contact first? By answering this question correctly the
bank will be able to optimize its advertising effort by directing its attention to the highest-
yield customers.
The data set is taken from Shumeli et al. (2010). It contains information on 5000 loan
applications. Table 1 presents a description of the different variables. The response
variable is Y = PersonalLoan. It is whether or not an offered loan had been accepted on
an earlier occasion. The explanatory variables include:
 X1 = Age,
 X2 = Experience,
 X3 = Income,
 X4 = Family,
 X5 = CCavg,
 X6 = Education,
 X7 = Mortgage,
 X8 = SecuritiesAccount,
 X9 = CDAccount,
 X10 = Online and
 X11 = CreditCard.

The objective is to build a scorecard using logistic regression. The data set is divided into
training (3000 cases) and validation sets (2000 cases). Dummy variables are created from
the variable Education (The reference class is Undergrad). The following R output
presents the results of the logistic regression.

1. Is the model valid?
2. What are the significant attributes?
3. Interpret the obtained estimates of the coefficients of Edu.Grad and Edu.Prof.
4. Interpret the following confusion matrix.
END
Good Luck

Midterm Data Analystics (Fall 2016-17) - v1 - B

Caricato da

Informazioni sul documento

Titolo originale

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Midterm Data Analystics (Fall 2016-17) - v1 - B

Caricato da

Copyright:

Formati disponibili

TUNIS BUSINESS SCHOOL

Midterm Exam: Data Analystics

1. Books and notes are not permitted.

Questions (30 points)

Tunis Business School Fall 2016-17 Page 1/7

a. increase of $1 in price is associated with a decrease of $8 in sales

Tunis Business School Fall 2016-17 Page 2/7

Problem #1 (10 points)

A consumer organization wants to know whether there is a difference in the price of a

Discount Variety Department

Sum of Squares df Mean Square F Sig.

Between Groups 63,333 2 ,001

Problem #2 (30 points)

Tunis Business School Fall 2016-17 Page 3/7

The following R output presents the results of the regression analysis.

1. Obtain the estimated regression model.

Tunis Business School Fall 2016-17 Page 4/7

Tunis Business School Fall 2016-17 Page 5/7

Tunis Business School Fall 2016-17 Page 6/7

Tunis Business School Fall 2016-17 Page 7/7

Potrebbero piacerti anche