Data Analysis

Table 1‐ Data set
Mix 1 (psi) Mix 2 (psi)
2,298 3617.75
3,205 3675.75
3,325 3784.50
3,609 3857.00
3,918 3871.50
3,992 3944.00
4,057 3958.50
4,188 4045.50
4,289 4081.75
4,363 4110.75
4,377 4125.25
4,448 4190.50
4,450 4197.75
4,524 4263.00
4,536 4270.25
4,565 4277.50
4,591 4321.00
4,657 4335.50
4,666 4335.50
4,670 4350.00
4,724 4364.50
4,737 4386.25
4,763 4386.25
4,784 4386.25
4,816 4415.25
4,817 4415.25
4,852 4429.75
4,887 4458.75
4,905 4487.75
4,908 4589.25
4,923 4596.50
4,941 4705.25
4,993 4705.25
4,998 4763.25
5,035 4872.00
5,041 4879.25
5,058 4937.25
5,142 4951.75
5,152 4995.25
5,152 5038.75
5,330
5,535

Page 2 of 12

Develop:
1. Q‐plots for both datasets, compare and discuss
Table 2 – Mix 1 Rank & Percentile
Point Column1 Rank Percent
42 5,535 1 100.00%
41 5,330 2 97.50%
39 5,152 3 92.60%
40 5,152 3 92.60%
38 5,142 5 90.20%
37 5,058 6 87.80%
36 5,041 7 85.30%
35 5,035 8 82.90%
34 4,998 9 80.40%
33 4,993 10 78.00%
32 4,941 11 75.60%
31 4,923 12 73.10%
30 4,908 13 70.70%
29 4,905 14 68.20%
28 4,887 15 65.80%
27 4,852 16 63.40%
26 4,817 17 60.90%
25 4,816 18 58.50%
24 4,784 19 56.00%
23 4,763 20 53.60%
22 4,737 21 51.20%
21 4,724 22 48.70%
20 4,670 23 46.30%
19 4,666 24 43.90%
18 4,657 25 41.40%
17 4,591 26 39.00%
16 4,565 27 36.50%
15 4,536 28 34.10%
14 4,524 29 31.70%
13 4,450 30 29.20%
12 4,448 31 26.80%
11 4,377 32 24.30%
10 4,363 33 21.90%
9 4,289 34 19.50%
8 4,188 35 17.00%
7 4,057 36 14.60%
6 3,992 37 12.10%
5 3,918 38 9.70%
4 3,609 39 7.30%
Page 3 of 12

3 3,325 40 4.80%
2 3,205 41 2.40%
1 2,298 42 0.00%

Point Column1 Rank Percent
40 5038.75 1 100.00%
39 4995.25 2 97.40%
38 4951.75 3 94.80%
37 4937.25 4 92.30%
36 4879.25 5 89.70%
35 4872.00 6 87.10%
34 4763.25 7 84.60%
32 4705.25 8 79.40%
33 4705.25 8 79.40%
31 4596.50 10 76.90%
30 4589.25 11 74.30%
29 4487.75 12 71.70%
28 4458.75 13 69.20%
27 4429.75 14 66.60%
25 4415.25 15 61.50%
26 4415.25 15 61.50%
22 4386.25 17 53.80%
23 4386.25 17 53.80%
24 4386.25 17 53.80%
21 4364.50 20 51.20%
20 4350.00 21 48.70%
18 4335.50 22 43.50%
19 4335.50 22 43.50%
17 4321.00 24 41.00%
16 4277.50 25 38.40%
15 4270.25 26 35.80%
14 4263.00 27 33.30%
13 4197.75 28 30.70%
12 4190.50 29 28.20%
11 4125.25 30 25.60%
10 4110.75 31 23.00%
9 4081.75 32 20.50%
8 4045.50 33 17.90%
7 3958.50 34 15.30%
6 3944.00 35 12.80%
5 3871.50 36 10.20%
4 3857.00 37 7.60%
3 3784.50 38 5.10%
Page 4 of 12

2 3675.75 39 2.50%
1 3617.75 40 0.00%

Q‐Plot
100.00%
90.00%
80.00%
70.00%
60.00%
50.00%
40.00%
30.00%
20.00%
10.00%
0.00%
2,000 2,500 3,000 3,500 4,000 4,500 5,000 5,500 6,000
Mix 1 Mix 2
Figure 1‐ Q Plot for Mix 1 & Mix 2
Figure 1 indicates that the data sets are likely to be part of the same sample space and hence similar. Both the Q
plots follow similar curves and the data points seem to lie close to each other. In addition, for Mix 1 there are a few
data points which lie far out of the curve suggesting noise, whereas Mix 2 has uniformly lying points indicating that
there is less noise in the dataset.
2. Q‐Q plot (correlation of deciles) – discuss
Decile Mix 1 Mix 2
90% 5140 4880
80% 5000 4750
..70% 4950 4460
60% 4825 4405
50% 4780 4390
40% 4610 4300
30% 4460 4200
20% 4240 4075
10% 3960 3850

Page 5 of 12

Q‐Q plot
6000
4000
Mix 2
2000
0
0 2000 4000 6000
Mix 1
Figure 2‐ Q‐Q Plot for Mix 1 & Mix 2

Since the data points omitting the 0% decile and 100% decile fall approximately along the 45‐
degree reference line, it can be inferred that the two sets have similar probability distribution.
The correlation has been improved by omitting the outliers.

3. Box and whiskers plots for each data set, compare and discuss
Table 4 – Mix 1 & 2 Quantiles
Parameter Mix 1 Mix 2
Max 5,535.0 5038.8
Q3 4936.5 4591.1
Q2 4730.5 4357.3
Q1 4394.7 4121.63
Min 2298.0 3617.8
IQR 541.8 469.4

Page 6 of 12

Figure 3‐ Box & Whiskers Plot for Mix 1 & Mix 2

 Mix 1 appears to be a skewed left dataset from Figure 3, with 200psi greater on average strength
than Mix 2. The mean is lesser than the median and is spaced apart indicating asymmetry. There
are 3 data points which lie beyond the 1.5IQR value, indicating the presence of outliers.

 Mix 2 appears to be a symmetric dataset from Figure 3, with 200psi lesser on average strength
than Mix 1. The mean and median are very close to each other, representing a normal distribution.
All the data points are within the 1.5IQR represented by the whiskers. Thus, the dataset does not
have any outliers.

4. Identify outliers
 Outliers in Mix 1 – 2298, 3205, 3325
 No outliers in Mix 2

5. Frequency distribution for each data set
Table 5 – Mix 1 & 2 Classes
Class‐1 Class‐2
2200 3600
2400 3700
2600 3800
2800 3900
3000 4000
3200 4100
3400 4200
3600 4300
3800 4400
Page 7 of 12

4000 4500
4200 4600
4400 4700
4600 4800
4800 4900
5000 5000
5200 5100
5400
5600

Mix 1:
n=42
Table 6 – Frequency distribution of Mix 1
Bin Frequency Rel.Frequency Cumulative Freq.
2200 0 0.00 0.00
2400 1 0.02 0.02
2600 0 0.00 0.02
2800 0 0.00 0.02
3000 0 0.00 0.02

3200 0 0.00 0.02

3400 2 0.05 0.07

3600 0 0.00 0.07
3800 1 0.02 0.10
4000 2 0.05 0.14
4200 2 0.05 0.19
4400 3 0.07 0.26
4600 6 0.14 0.40
4800 7 0.17 0.57
5000 10 0.24 0.81
5200 6 0.14 0.95
5400 1 0.02 0.98
5600 1 0.02 1.00
More 0
Page 8 of 12

Histogram‐ Mix 1
12
10
8
Frequency
Frequency Bin
Figure 4‐ Frequency distribution of Mix 1

Mix 2:
n=40
Table 7 – Frequency distribution of Mix 2
Bin Frequency Rel.Frequency Cumulative Freq.
3600 0 0.00 0.00
3700 2 0.05 0.05

3800 1 0.03 0.08

3900 2 0.05 0.13

4000 2 0.05 0.18

4100 2 0.05 0.23
4200 4 0.10 0.33
4300 3 0.08 0.40
4400 8 0.20 0.60
4500 5 0.13 0.73
4600 2 0.05 0.78
4700 0 0.00 0.78
4800 3 0.08 0.85
4900 2 0.05 0.90
5000 3 0.08 0.98
5100 1 0.03 1.00
More 0
Page 9 of 12

Histogram‐ Mix 2
9
8
7
6
Frequency
5
4
3
2
1
0
3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 5100 More
Bin
Frequency
Figure 5‐ Frequency distribution of Mix 2

6. Relative frequency distribution for each set

Relative Frequency Histogram‐ Mix 1
0.25
0.20
Relative Frequency
0.15
0.10
0.05
0.00
2200 2400 2600 2800 3000 3200 3400 3600 3800 4000 4200 4400 4600 4800 5000 5200 5400 5600
Bin

Figure 6‐ Relative Frequency Histogram of Mix 1
Page 10 of 12

Relative Frequency Histogram‐ Mix 2
0.25
Relative Frequency 0.20
0.15
0.10
0.05
0.00
3600 3700 3800 3900 4000 4100 4200 4300 4400 4500 4600 4700 4800 4900 5000 5100
Bin
Figure 7‐ Relative Frequency Histogram of Mix 2

7. Find probability exceeding 4,000 for each set

 From Table 6, probability that strength doesn’t exceed 4,000 for Mix 1 = 0.14
Hence probability that strength exceeds 4,000psi = 1.0‐0.14 =0.86

 From Table 7, probability that strength doesn’t exceed 4,000 for Mix 2 = 0.18
Hence probability that strength exceeds 4,000psi = 1.0‐0.18 =0.83

8. Discuss overall differences and similarities between the two data sets
Table 8 – Descriptive Statistics
Parameter Mix 1 Mix 2
Mean 4576.69 4359.425

Standard Error 94.44719 57.62361
Median 4730.5 4357.25
Mode 5152 4386.25
Standard Deviation 612.0878 364.4437
Sample Variance 374651.4 132819.2
Kurtosis 4.013122 ‐0.45627
Skewness ‐1.7223 0.019934
Range 3237 1421
Minimum 2298 3617.75
Maximum 5535 5038.75
Sum 192221 174377
Count 42 40

Page 11 of 12

The relative frequency distribution for Mix 1 indicates that the dataset is non symmetric, skewed left and
has outliers. However, for Mix 2 the dataset appears to be symmetric from the relative frequency
distribution with no outliers. Additionally, the relative frequency distribution of mix 2 represents that of
a normal distribution. Since the Q‐Q plot exhibit positive correlative, the datasets can be considered to
have similar probability distribution, but not same.
Furthermore, Table 8 indicates that kurtosis for Mix 1 to be greater than 3 reflecting the presence of
outliers. The negative skewness of Mix 1 indicates that more points lie on the left tail of the curve causing
asymmetry.
Page 12 of 12

Data Analysis

Caricato da

Informazioni sul documento

Copyright

Formati disponibili

Condividi questo documento

Condividi o incorpora il documento

Opzioni di condivisione

Hai trovato utile questo documento?

Questo contenuto è inappropriato?

Copyright:

Formati disponibili

Data Analysis

Caricato da

Copyright:

Formati disponibili

Table 1‐ Data set

Mean 4576.69 4359.425

Potrebbero piacerti anche