Sei sulla pagina 1di 7

ECON 1203 Tutorial Sample Solutions

Semester 1 2016
Weeks 5 and 6
1. A random number generator is designed to draw numbers at random from within a specified range. We
can consider any number in the range as a possible outcome.
(a) What type of distribution is the random number generator drawing from?
A continuous uniform distribution.
(b) Suppose we program a random number generator to generate a random number with a value
falling in the interval [0, 2]. What is the height of the density of the distribution from which the
random number generator is drawing? Draw a graph of the probability density function.
1
() = 0 2
2
= 0

(c) What is the cumulative probability distribution of the random variable from which draws are
being taken? Draw a graph of the cumulative probability distribution function.
The cumulative probability distribution, F(y) = P(YY) is just a graph of F(y) against y. So, from the above
graph, we can see F(0)=0 and F(2)=1. Since the probability is increasing uniformly the graph must be a
straight line with an upward slope (since probability cannot be negative) increasing from the point
(y,F(y))=(0,0) to (y,F(y))=(2,1). Specifically, F(y)=0.5y in the range (0,2). If y<0, F(y)=0 and if y>2,
F(y)=1.
(d) Find the following for this case: P(Y<0.6); P(Y0.6); P(0.5<Y<1.5), using both the density
function and the cumulative probability function. Show that your answers match whichever
you use.
( < 0.6) = 0.6 0.5 = 0.3
( 0.6) = ( < 0.6) = 0.3
(0.5 < < 1.5) = 1 0.5 = 0.5
Whether you get these values from the uniform probability density function as given here, or from
F(y)=0.5y (the cumulative probability distribution), the results are identical:
( < 0.6) = (0.6) = 0.5 0.6 = 0.3 and
(0.5 < < 1.5) = (1.5) (0.5) = 0.75 0.25 = 0.5.

2. From several years records, a fish market manager has determined that the weight of deep sea bream
sold in the market (X) is approximately normally distributed with a mean of 450 grams and a standard
deviation of 100 grams. Assuming this distribution will remain unchanged in the future, calculate the
expected proportions of deep sea bream sold over the next year weighing
a) between 300 and 400 grams.
300 450
400 450
(300 < < 400) =
<<

100
100
= (1.5 < < 0.5)
= ( < < 0.5) ( < < 1.5)
= 0.3085 0.0668
= 0.2417
b) between 400 and 600 grams.

400 450
600 450
<<

100
100
= (0.5 < < 1.5)
= ( < < 1.5) ( < < 0.5)
= 0.9332 0.3085
= 0.6247

(400 < < 600) =

c) more than 625 grams.

625 450

100
= ( > 1.75)
= 1 ( < < 1.75)
= 1 0.9599 = 0.0401

( > 625) = >

3. In a certain large city, household annual incomes are considered approximately normally distributed with
a mean of $40,000 and a standard deviation of $6,000. What proportion of households in the city have
an annual income over $35,000? If a random sample of 120 households were selected, how many of
these households would we expect to have annual incomes between $35,000 and $45,000?
. , ~(40000, 60002 )

35000 40000

6000
= ( > 0.833)
= 1 ( < < 0.833)
= 1 0.2033
= 0.7967

( > 35000) = >

So 79.67% of households in the city would be expected to have annual incomes greater than $30,000.
35000 40000
45000 40000
(35000 < < 45000) =
<<

6000
6000
(0.83 < < 0.83)
= 1 2( < < 0.83)
= 1 2 0.2033
= 0.5934

Therefore we expect 0.5934(120)71 households in the sample to have annual incomes between $35,000
and $45,000.
4. What is the 75th percentile of the normal distribution N(10, 9)?
Let x be the required percentile. First find z, the 75th percentile of a standard normal.
( < < ) = 0.75
,
= 0.675

~(10,9) 75 :
10
= 0.675
3
= 12.025.

5. In a certain city, it is estimated that 60% of households have access to the internet. A company wishing
to sell services to internet users randomly chooses 150 households in the city and sends them
advertising material.
(a)
Calculate the probability that more than 90 contacted households have internet access.
Let X be the number of households contacted that have internet access. Then assume X is a binomial
random variable with n=150 and p=0.6. Because n is large, we can use the normal approximation to the
binomial where:
= = 150 0.6 = 90
= (1 ) = 150 0.6 0.4 = 36
2

Thus incorporating the continuity correction we need to find:

(b)

( > 90) = ( 91) ( 90.5)


90.5 90
=
= P( 0.083)
6
= 1 ( < 0.083)
= 1 0.5319 = 0.4681

Calculate the probability that between 60 and 100 (inclusive) contacted households have
internet access.
(60 100) (59.5 < < 100.5)
59.5 90
100.5 90
=
<

6
6
= (5.08 < < 1.75)
= ( < 1.75) ( < 5.08)
= 0.9599 0 = 0.9599

(c) There is an 80% chance (probability of .8) that the number of contacted households with internet
access equals or exceeds what value?
( ) = 0.8
,

( > 0.5) = 0.8

0.5 90
= 0.8
6
90.5
= 0.2
< <
6
90.5

0.84 = 85.46
6
>

There is an 80% chance that the number of contacted households with internet access is 85 or more.
6. Using your personalized Course Project data:
(a) Calculate the sample averages of all variables. Which of these averages are meaningful?
Express the meaning of each average in words that are understandable and effective for a
layperson such as your client.
As each student will have a customized data set, exact answers will differ for this question. In my data, the
averages are:
pid: 147
visits: 1.74
dayofweek: 3.85
freqmallvisit: 4.67
spend: 468.27
age: 2.69
Hhinc: 3.67
Sex: 0.72
Food: 0.48
Apparel: 0.19
Department: 0.09
Grocery: 0.09
Other: 0.21
An effective expression of the average for the variable freqmallvisit, for example, would be: On average,
based on the data provided, a customer to this mall in October 2015 had visited the mall between 4 and 5
times in the previous 3 months. Other statements should similarly demonstrate both a reasonable (but not
obsessive) level of precision, and an interpretation that makes sense, made in everyday terms with which the
customer will be familiar.
(b) Do you need to manipulate the raw data provided, before proceeding to statistical analyses, in
order to address the clients question? If so, how?

7. Work through problem 28 on page 264 of Sharpe (Chapter 7), referring to the 68-95-99.7 Rule explained
on page 239-240 of Sharpe.
8. UNSW wants to measure the attractiveness of its brand to potential students. The university performs
an experiment by inviting 100 high school students from different public schools across New South
Wales to browse a few websites related to different universities, and then to choose the one that they
would prefer most.
(a)

Is this a random sample? Can you think of any potential source of selection bias?
The sample is not perfectly random. First of all, only students in NSW are sampled, and
therefore the attitudes of students from other states of Australia and overseas students are
missed. Also the students are all coming from public schools, and public school graduates
might have different aspirations or expectations compared to private school graduates.

(b)

Suppose that a perfectly random sample of students is drawn from the target population, and
these students take part in the exercise described above. With reference to the brief discussion
on page 732 of Sharpe (Confounding and Lurking Variables), can you think of any
confounding factors that is, factors that might lead to lack of confidence in using students
expressed preferences, as measured in this exercise, as an indicator of their degree of overall
attraction to the UNSW brand?
Even if the sample is perfectly random and even if the website-browsing activity provides
students with the right sort of information to decide which university they prefer (debatable),
universities have different qualities in different fields. For instance, UNSW engineering and
science might be leading faculties, but the medical faculty might not be the top. A students
choice of a university does not only depend on the attractiveness of the University as a whole,
but also on whether they are leading in the students particular field of interest. While part of
the appeal of the university as a whole may be due to such field-specific factors, the stated
preference data alone cannot be used to distinguish these factors from other factors purely
related to the overall appeal of the universitys brand. As another example, Australian
students have historically been reluctant to travel in order to attend university, so proximity
(which is not what the question originally targets) has also been an important factor in
determining university choice. Its very likely that other types of feelings/thoughts other than
brand loyalty would push students to prefer a particular university over others (for example,
whether friends or family attended there or work there).
Therefore, to measure overall attractiveness of the brand as an independent construct, we
should control for additional factors like the strengths and weaknesses across schools,
differences in travel times, and students prior experience with each university.

(c) Suppose that the exercise described in part (b) is conducted. The resulting data include each
students high school, the selection of universities whose websites they browsed, and the one
amongst those that they chose as their most-preferred university. Sketch on a piece of paper or
in an Excel sheet what these data would look like once they are made ready for quantitative
analysis.
The data might look something like this; note that the questions does not indicate how many
other websites were viewed, so in the table below we have shown a binary variable for each of
a number of Sydney-area universities (where 0=website not viewed, and 1=website viewed),
and not restricted the number of sites viewed to be any particular number or even the same
number across observations. Also note that the students preference is shown as a single

variable, although one could construct a dummy array (including binary variables for each
university, where 0=not preferred, and 1 = preferred):
Student

UNSW_1

UTS_2

USyd_3

MacQ_4

UWS_5

Pref

100

(d) Add to the display in part (c) any additional variables that you r answer to part (b) indicated you
might like to have access to. Show these variables in a form that is analysis-ready.
The answer here depends on what additional variables are to be included. Whatever is shown
should make logical sense and also be in an analysis-ready form: i.e., numbers rather than
words should appear in the data for each observation.
(e) Suppose you had access to the expanded data set constructed in part (d). Describe what sort of
analyses you could conduct that might help to shed light on UNSWs core question about the
attractiveness of its brand.
Again, this will depend on the particular variables, but in general one could report, for
example, how UNSW fares in head-to-head comparisons with certain sets of other universities
amongst certain subgroups of potential students, based on cross-tabulations of the Pref
variable with respect to the mix of websites viewed and/or student demographics.
(f) Based on your analysis, what would you be able to tell UNSW leadership about the core drivers
of its brand appeal that is, what it is about UNSW that students are drawn to?
Unless students are explicitly asked about why they find UNSW appealing (or why they dont),
the answer to this question can only really be guessed at. The key thing to take away is that
even with lots of variables that allow for the measurement of the strength of appeal of UNSW,
the reasons for that appeal are likely to remain masked without a set of survey questions
explicitly trying to ascertain the potential sources of that appeal. We will consider a way to
operationalise this idea later in the course when speaking of hedonic regression models.

9. Work through problem 22 on page 325 of Sharpe (Chapter 9).


All histograms are centered at 0.85 the true value of p in the distribution from which the samples
were drawn (so this is a mechanical result of the process of simulating!) but as the sample size
increases, the distribution becomes more and more symmetric and unimodal, and the variability in the
sample proportion reduces. This is what happens to the sampling distribution of a parameter in
general terms: as n, the sample size, rises, the sampling distribution (which can only be imagined or
drawn conditional on n!) of the parameter starts to look more and more like the normal distribution.
10. Work through problem 44 on page 328 of Sharpe (Chapter 9).
11. Work through problem 60 on page 329 of Sharpe (Chapter 9).
12. Work through problem 36 on page 356 of Sharpe (Chapter 10).

Potrebbero piacerti anche