Sei sulla pagina 1di 9

SEMESTER I 2019/2020 DUE DATE:

BUM 2413 APPLIED PART 1 (18/10/2019)


STATISTICS (Friday, not late than
5.00 PM)
ASSIGNMENT
PART 1 & PART 2 PART 2 (22/11/2019)
PUSAT SAINS MATEMATIK (Friday, not late than
TOTAL MARK: 86 5.00 PM)

The objectives of this assignment are to help you to understand the statistical analysis process, to
analyse data using software and to develop your integrity in reporting your assignment. The best
way to understand statistics is by involving yourself in the whole statistical process and not just
limited to studying statistics from books, videos, or websites. This assignment requires you to
follow the steps of statistical problem-solving methodology by conducting your own study. PART
1 involves Step 1 to Step 4 of statistical problem-solving methodology, while PART 2 involves
Step 5 to Step 6. You will experience on how to collect, organise, summarise, analyse, present,
interpret, and draw conclusion from data, as well as preparing a report of your study.

INSTRUCTIONS:
1. Set up a group that consists of four (4) or five (5) members from your section only and name
your group using any statistical term.

2. Obtain an APPROVAL of your chosen topic from your lecturer BEFORE you start
collecting data and begin your statistical analysis.

3. Use the template on page 5 and page 6 as the cover for your assignment booklet of Part 1
and Part 2, respectively. Fill in all the required particulars clearly.

4. Answer ALL questions in PART 1 and PART 2, and use appropriate statistical notations.

5. Perform ALL analyses using Microsoft Excel and P-value approach.

6. Submit the following items for EACH group:


(i) A hardcopy report that includes all attachments of the relevant evidences (Microsoft
Excel outputs, handwritten data record, photos, Google Docs, and etc.) in the appendix
section.
(ii) A softcopy report uploaded via KALAM in one compressed file. Name your file as
‘Section_group’, for example, 01G_means.

7. LATE submission of assignment will not be entertained.


ASSIGNMENT
BUM2413 APPLIED STATISTICS
SEMESTER I 2019/2020

GROUP NAME : EXCEL


NAME STUDENT ID SECTION
MUHAMMAD MIRZA BIN MSTAPAH MA17035 13P
MUHAMMAD FIRDAUS BIN ILIAS MA17140 13P
MUHAMMAD ASYRAF BIN ABDUL WAHAB MH17016 13P

MUHAMMAD IMRAN BIN MD RAZALI MA17070 13P

LECTURER
DR. NORYANTI BINTI MUHAMMAD

PART 1
FOR EXAMINER USE ONLY
Question Marks Your Marks Question Marks Your Marks
1 2 8 4
2 2 9 12
3 1 10 12
4 7 11 2
5 6 12 2
6 5 13 1
7 4
TOTAL 60
PART 1 (60 Marks: 7%)
1. Identify a problem that you are interested to study. Provide a brief description of
your study.
The problem that we are interested to study is how many hour does UMP
students spend on social media (instagram, twitter, facebook and snapchat) in a day.
We chose this topic as we are interested to know how the duration of UMP students
spend on social media in a day differ between gender which is male and female.
2. Choose a single quantitative variable that describe your chosen problem. Identify
the type of level of measurement for the variable.
We chose a single variable which is in ratio level of measurement, since it
possesses all the characteristics of interval measurement, and there exist a true zero.
3. State your population.

Our population is 190 Universiti Malaysia Pahang (UMP), Pekan students.

4. Divide the data collected into two significant groups that related to the study (e.g.:
gender, faculty, year of study, etc)
i. State the name of the groups.
Gender

ii. Present the data collected according to the groups.


Response from the survey
Count of Gender
50

45
46
40
41
35

30

25 27
24
20

15 17
14
10

5 7
6 6
2
0
11 PM 11 PM 11 PM 11 PM 11 PM 11 PM 11 PM 11 PM 11 PM 11 PM
1 hour 2 hours 3 hours 30 minutes 4 hours 1 hour 2 hours 3 hours 30 minutes 4 hours
Female Male

The table of the overall data for how much time UMP students spend on social media in a day

Time spend 0.5 1 2 3 4


(hours)
Male 6 14 3 12 4
Female 2 7 13 46 20

iii. Identify the method of data collection being used. Provide the significant
evidence.

The method of data collection used is based surveys, which is


quantitative + qualitative.

iv. Identify which sampling method you used to collect the data. Explain the
sampling method process.

The sampling method that being used is voluntary sampling. We have


created a google docs for UMP student to answer the question that
have been provided. The questionnaire is distributed through whatsapp
application.

5. For each group, select two sets of data of different sizes (n<30, n>30). Therefore, you
should have four sets of data in total.
(i) Present the data selected as shown in the following table.
Sample size Group 1 Group 2
n<30 26 62
n>30 24 78

(ii) Identify which sampling method you used to select the four sets of data. Explain
the sampling method process.

The sampling method that we used in this questionnaire are voluntary sampling.
The questionnaire that we make which is in google form is shared to all ump
students in Pekan which is through whatsapp messenger.

6. For each set of data, obtain the descriptive statistics using Microsoft Excel. Then,
summarise the measures of central tendency and measures of variation in the
following table.

Set of data Measures of central tendency Measures of variation


Male for n<30 Mode: 3 Range, R: 2.5
Median: 3 Sample variance, s2: 0.1758
Mean: 2.0962 Standard deviation, s: 0.3459
Midrange, R: 3
Male for n>30 Mode: 3 Range, R: 3.5
Median: 3 Sample variance, s2: 0.1196
Mean: 2.7016 Standard deviation, s: 0.3459
Midrange, R: 3
Female for n<30 Mode: 1 Range, R: 3.5
Median: 1 Sample variance, s2: 0.07783
Mean: 1.4 Standard deviation, s: 0.3128
Midrange, R: 3
Female for n>30 Mode: 2 Range, R: 3.5
Median: 2 Sample variance, s2: 0.08797
Mean: 2.6025 Standard deviation, s: 0.2966
Midrange, R: 3

7. Compare and comment the measures of central tendency and measures of


variation between groups.
From the result of central tendency and measures of variation, the time
spend by UMP students on social media in a day. The value for males gives
higher mode, mean and median compared to female students. For standard
deviation, the time spend by male students are higher than female since the
standard error for male students are higher than female students spending on
social media in a day. Therefore, we can conclude that male students spend more
time on social media in a day than female students.

8. Do different sample sizes affect the conclusion of the study by comparing its
measures of central tendency and measures of variation? Justify your
answer.

As the result that have been observed before, we can see that different sample
of group can produce different kind of value for central tendency and
measures of variation. The sample for male spending on social media in a day,
with n<30 give variance of 0.07783 and n>30 give variance of 0.08797. for
female student, the variance given by sample n<30 is 0.17598 and for n>30 is
0.1196 respectively. The median can also play an important role for the most
typical value if a set of values has an outlier. However, when the sample size
is large and does not include outliers, the mean value could provide a more
accurate measurement of central tendency.
9. Construct histograms for the four sets of data (be sure to label it properly!).
Identify the shape of distribution for each histogram and give your comments
based on the data distribution.

For female n<30


The graph is symmetrically distributed.

For female n>30

The graph is negatively skewed.


For male n<30

The graph is positively skewed.

For male n>30


The graph is symmetrically distributed.

10. . Construct boxplots for all data sets on the same x-axis. Identify the shape of the
distributions. Compare and comment on the average and variability of the boxplots.
Item Male Male Female Female
N<30 N>30 N<30 N>30
Minimum 0.5 2 0.5 2
Quartile 1 0.75 2 1 3
Median 1 3 2 3
Quartile 3 1 3 2 3
Maximum 4 4 3 4
IQR 0.25 1 1 0
Q1-1.5IQR 0.375 0.5 0 3
Q3+1.5IQR 1.375 4.5 4.5 4
Outlier 2 0 2 0
Left Whisker 0.25 0 0.5 1
Right Whisker 3 1 1 1
Boxplot
3

0 0.5 1 1.5 2 2.5 3 3.5


1 2 3
Series3 1 2 2
Series2 2 3 3
Series1 0.75 1 1

Series3 Series2 Series1

11. What is the best measure of central tendencyto describe your data? Give a reason.
Median is the best when we want to measure the central tendency because median can be
used to describe the middle set for the skewed data.
12. What is the best variability measure to describe your data? Give a reason.
The best variability measure is IQR value. This is because we can know the initial estimate
outlier by looking at values more than one and a half times the IQR distance below the first
quartile or above the third quartile. To calculate either the outlier is true or not, the formula
for this is 1.5(IQR). IQR often preferred over range because it excludes most outliers.
13. Based on your problem stated in (1), give any relevant conclusion for the study.
In conclusion, we can say that this survey is to know how much time spend by ump
student using social media in a day. This problem was chosen as we want to know how the
amount of data used by students in UMP Pekan to surf the internet per month, especially
between the gender which is male and female as we know that the results from the last
semester exam was varied between of them.

Potrebbero piacerti anche