Sei sulla pagina 1di 4

Assignment 2: Twitter Analysis

300700 Statistical Decision Making


Assignment Due: Friday of Week 7 (11th of April)
Review Due: Friday of Week 8 (18th of April)

Complete the following tasks using RStudio and record the results and your analysis in an R Markdown file.

Apple or Pi

We have found that many twitter users do not use the term #applepi, but prefer to mention #apple and
#raspberrypi separately. A report from Twitter shows us that of all of the tweets from the past month, the
terms #apple and #raspberrypi appear in the following number of tweets.
#apple
Present
Absent

#raspberrypi

Present

Absent

1,230,443
17,233,452

2,224,178
1,245,776,300

1. What is the probability of a tweet from that month containing the term #apple?
2. What is the probability of a tweet from that month containing the term #apple and #raspberrypi?
3. What is the probability of a tweet from that month containing the term #apple given that it contains the
term #raspberrypi?
4. Examine if the terms #apple and #raspberrypi are independent for that month.

Tweeters from Crazy Mac

We found that many of the tweets containing information about Apple and Raspberry Pi are from employees at
the magazine Crazy Mac. Of the numerous employees at Crazy Mac, we counted the number of days in which
62 employees wrote a tweet containing information about Apple and Raspberry Pi during January 2014. The
following table contains the count of days from each of the 62 employees.
3
4
3
2
4
4
4

3
2
3
9
4
6
5

2
3
4
3
4
4

2
2
1
4
1
2

3
4
7
1
2
3

1
1
3
3
3
7

2
3
3
3
4
3

4
1
2
0
5
3

4
3
2
4
6
2

2
3
1
0
4
3

1. If this sample comes from a Binomial distribution, what are the parameters of the distribution (n and p),
estimated from this sample?
2. Compute the difference between the sample standard deviation and the Binomial standard deviation
(using the computed parameters n and p for the Binomial distribution). Does this provide evidence for
or against the distribution being Binomial?
3. Given that the distribution is Binomial, what is the probability of an employee of Crazy Mac mentioning
Apple and Raspberry Pi in a tweet for more than 5 days of the month of February 2014 (assuming that
the parameter p is the same for January and February).
1

Tweets per day

A report from Australian Twits has stated that there are an average of 10.2 tweets per day containing the
term #applepi. Using the following R code, and the given average, we can simulate counting the tweets per
day for fifty days, and compute the mean:
# set up storage array
sample.means = rep(0,1000)
# loop 100 times
for (a in c(1:1000)) {
# obtain a random sample from a Poisson distribution
poisson.sample = rpois(50, lambda = 10.2)
# compute the mean of the sample, and store the mean
sample.means[a] = mean(poisson.sample)
}
1. Find the mean and standard deviation of the sample means from the simulation and compare them to the
theoretical sample mean mean and sample mean standard deviation.
2. Provide an appropriate plot to examine if the distribution of the sample mean follows a Normal distribution.

Check the tweets

To examine the validity of the statement that there are an average of 10.2 tweets per day containing the term
#applepi, we observed the tweet count per day of tweets containing the term #applepi for 50 days and obtained
the following sample:
5
2
3
6
6

0
3
3
2
0

1
6
1
0
2

1
2
4
2
5

3
4
3
4
3

4
0
4
5
3

2
6
3
2
9

5
1
3
2
8

2
2
3
3
1

3
0
3
4
2

1. Compute the mean x


and standard deviation s of the sample.
2. Given that the population mean is 10.2, and using the information obtained from the previous question
(about the distribution of the sample mean), compute the probability of obtaining a sample mean less
than or equal to x
.
3. What can we conclude from the previous probability result?

Assignment Submission
One assignment is to be submitted per student by the due date, containing the description and results from
performing the tasks in sections 1 to 4. The assignment is to be written in R Markdown, Knitted to HTML,
then converted to PDF.
To submit the assignment, login to the 300700 control panel: http://staff.scem.uws.edu.au/~lapark/
300700/login.php and go to the Assignment Submission box and submit the PDF. If submitted successfully,
the assignment will appear in your list of submitted assignments. Please compare the MD5 sum of the recorded
file with your own to ensure that the correct file has been received1 . Note that any resubmission will overwrite
the previous submission.
The first page of your assignment should contain the declaration shown in Figure 1. Note: An examiner
or lecturer/tutor has the right not to mark this project report if the above declaration has not
been added to the cover of the report. Each group members name and student number should be written
after the declaration, along with the percentage contribution of the member to the assignment. Note that a
contribution may involve a solution to a problem, writing up the solution, helping a group member, or any
other task that has lead to increasing the group members understanding of the assignment content.
The assignment should begin on the second page. No identifying information (Name or student
number) should be placed on any page except the first. This is so that the report can be anonymised
by removing the cover page.
1 For

more information see: http://en.wikipedia.org/wiki/Md5sum

Marking Criteria
The assignment will contribute a maximum of 5 marks towards each students final mark. Four of the marks
will be awarded based on the report. One mark will be awarded based on each students peer review.

Report Assessment
Each of the four sections is worth one mark. A mark (or fraction of it) will be awarded proportional to the
understanding of the problem and the solution presented. Remember that the assessor will only be able to read
what you have written, therefore, clearly explain all decisions made. Each section will be marked according to:
Solution Type

Marks

Problem is answered correctly and the students intentions are correct.


The students intentions are correct, but answer is wrong.
The students intentions are incorrect.

1
0.5
0

The same mark out of four will be awarded to each member of the group, given that the contribution of
each member was equal.

Peer Review Assessment


Once the assignments have been submitted, members of each group that submitted an assignment will receive
five anonymous submitted assignments. Each student is to then rank the assignments based on the above
marking criteria, using the assignment solutions from vUWS.
The ranking should be uploaded using the 300700 control panel (log in at http://staff.scem.uws.edu.au/
lapark/300700/login.php),
where the top ranked assignment is the one which deserves the greatest number
~
of marks from the provided set.
Once the ranking has been submitted, each student will receive an additional mark using the following
criteria:
[1 mark] The Kendall Tau correlation2 of the students ranked list of a sample of five reports with the
tutors ranked list of the same five reports is greater than zero.
Note that if a student does not submit a report, they will not receive the chance to take part in this peer
review assessment, and hence receive no marks for the assignment.

2 http://en.wikipedia.org/wiki/Kendall_tau

By including this statement, we the authors of this work, verify that:


We hold a copy of this assignment that we can produce if the original is lost or damaged.
We hereby certify that no part of this assignment/product has been copied from any other
students work or from any other source except where due acknowledgement is made in the
assignment.
No part of this assignment/product has been written/produced for us by another person except
where such collaboration has been authorised by the subject lecturer/tutor concerned.
We are aware that this work may be reproduced and submitted to plagiarism detection software
programs for the purpose of detecting possible plagiarism (which may retain a copy on its
database for future plagiarism checking).
We are aware that this work may be used in Unit based peer review assessment. There is
no student identification information contained within this work other than the information
provided on this cover page.
We hereby certify that we have read and understand what the School of Computing and
Mathematics defines as minor and substantial breaches of misconduct as outlined in the learning
guide for this unit.
Group Member Name

Student Number

% Contribution to Assignment

Figure 1: Statement to be included on the first page of each submission.

Potrebbero piacerti anche