Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Data collection
Identification of probability distribution
A Comparative Study of Hard and Fuzzy Data Clustering Algorithms with Cluster Validity Indice
Choosing Distribution Parameters & Estimators
) is defined by
X=
_ X
n
i=1
n
............................................................. (1)
and the sample variance S
2
is defined by
x
2
-nX
2 n
S
2
=
_
i=1
n-1
.................................................... (2)
Table 2: Sample Mean and Sample Variance of the Parameters
Sl.no Parameters Sample mean Sample variance
1. Inspection Hours 0.95 0.49
2. Code lines 5.00 10.50
3. Inspection Rate 6.25 8.25
4. Defects 6.25 30.90
5. Error Density 4.16 10.16
29 Elsevier Publications, 2013
Puneet Goswami
Table 3. Estimator values of the system parameters
Sl.no Parameter Distribution Estimators
1. Inspection hours lognormal = 0.95
2. Code lines exponential
`
= 0.2
3. Inspection Rate exponential
`
= 0.16
4. Defects exponential
`
= 0.16
5. Error Density exponential
`
z
z
z
z= 0.24
Numerical estimates [5] of the distribution parameters are needed to reduce the family of distributions to a
specific distribution and to test the resulting hypothesis. According to Table 3 we have considered our numerical
estimates.
We have to keep in mind that a distribution parameter is an unknown constraint whereas the estimator is a
statistic which depends on the sample values.
4.4. Evaluations
Goodness of fit test is used to test the hypothesis. We test the hypothesis obtained from the above three steps.
If hypothesis is accepted then we can consider the corresponding parameter as reliable.
Two well known goodness of fit test are the Chi-Square test and the Kolmogorov-Smirnov test. Choosing one,
let us conduct the Chi-Square test on our data.
The histogram of the inspection rate shown in Fig. 3 appeared to follow an exponential distribution, so the
parameter z
`
= 0.16 was computed. Thus, the following hypothesis are formed-
H
0
: the inspection rates are exponentially distributed
H
1
: the inspection rates are not exponentially distributed
2
The critical value _
u,k-s-1
can be found in any statistical tables. Refer [5]. The hypothesis H
0
is rejected if
_
0
2
>
_
u,k-s-1
2
, where
_
0
2
= tcststotistic = _
(0
i-
L
i
)
2
L
i
...... (3)
k= number of intervals = 4
n= sample size= 25
p= expected probability= 1/k= 0.25
E
i
= expected frequencies= n*p= 6.25
O
i
= observed frequencies
s = number of parameters estimated from the data (z
`
) which is 1
Degree of freedom= k-s-1 = 4-1-1 = 2
u level oI signiIicance 0.05
TABLE 4:CALCULATIONOF X
0
2
Class interval Observed freq
(O
|-
F
|
)
2
F
|
(70-140] 8 0.49
(140- 210] 8 0.49
(210-280] 7 0.09
>= 280 2 2.89
Total 25 3.96
_
0
2
= tcststotistic = _
(0
i-
L
i
)
2
L
i
=3.96
30 Elsevier Publications, 2013
Software quality asessment of components bases on input modelling technique
_
0
2
.05,2=
5.99
Since 3.96 is less than 5.99 the hypothesis H
0
can be accepted and hence inspection rate can be considered as a
reliable metric while designing the system.
The chi square test can be conducted for different level of significance. In the above test for inspection rate,
we have occupied the level of significance as 0.05. The test results for different system parameters and for
different level of significances can be calculated similarly and the results are listed below in the Table 5.
Table 5: Test results of chi-square test
System parameter
2