Sei sulla pagina 1di 13

SOME STATISTICAL TECHNIQUES IN

PLANT BREEDING EXPERIMENTS


A.R. RAO AND V.K. BHATIA
Indian Agricultural Statistics Research Institute
Library Avenue, New Delhi 110 012
vkbhatia@iasri.res.in
1. Introduction
Way back in 1865, Mendel was able to recognize the statistical nature of genetical
variables. Based on this premise, Fisher, Swell Wright and Haldane developed the
generalization theories applicable to more complex genetical phenomena. Generally, the
qualitative characters are analyzed at population level and quantitative characters are
analyzed from a point of view of plant or animal improvement. Here, only quantitative
characters are considered and analyses like, path analysis, diallel analysis and line tester
analysis are discussed.
2. Path Analysis
The theory of path coefficients is established as a general statistical method for cause and
effect analysis in a system of correlated variables. If the cause and effect relationship is
well defined, it is possible to represent the whole system of variables in the form of a
diagram, known as path diagram. Let the yield 'Y' in barely (effect) is linearly related to
various casual factors like number of ears per plant (X1), number of grains per ear (X2)
and 100 grain weight (X3) etc. It is also assumed that these factors show the following
type of association with one another (Fig.1).

X1
a
rX1X2
Y

b
X2
rX1X3

rX2X3
c
X3

R
Fig.1: Path diagram showing cause and effect relationship

From the figure it is obvious that yield is the result of X1, X2, and X3 and some other
undefined factors designated by R. Further, X1, X2, and X3 in turn are also correlated. In
the figure a, b, c and h are the path-coefficients due to respective variables.

Some Statistical Techniques in Plant Breeding Experiments

Definitions
Path coefficient is defined as the ratio of the standard deviation of the effect due to a given
cause to the total standard deviation of the effect, i.e., if Y is the effect and X1 is the cause,
the path-coefficient for the path from cause X1 to the effect Y, denoted by (X1 Y), is
x1/ y.
Path coefficient for the path from any cause to the effect is defined as the standardized
partial regression coefficient of the effect on that cause.
The advantage of the path diagram is that a set of simultaneous equations can be written
directly from the diagram and a solution of these equations provides information on the
direct and indirect contribution of these causal factors to the effect. The theoretical basis
of these equations may be explained as below:
Let r(X1, Y) be the correlation between X1 and Y as shown in the Fig. 1.
Assuming that
Y = X 1 + X2 + X3 + R
it can be shown that
r(X1, Y) = V(X1)/[V(X1)V(Y)]1/2 + r(X1 X2) X1X2)/[V(X1)V(Y)]1/2
+ r(X1 X3) X1X3/[V(X1)V(Y)] 1/2

= X1 + r(X1 X2) X 2 + r(X1 X3) X3


Y
Y
Y

where, as per definition,

X1
= 'a' the path coefficient from X1 to Y.
Y
X2
= ' b' the path coefficient from X2 to Y.
Y
X3
= ''c' the path coefficient from X3 to Y.
Y
Thus, r(X1,Y)=a + r(X1 X2)b + r(X1 X3)c.
Similarly, one can work out the equations for r(X2,Y), r(X3, Y) and r(R, Y) and finally can
get a set of simultaneous equations as given below:
r(X1,Y) = a + r(X1 X2)b + r(X1 X3)c
r(X2,Y) = a + r(X2 X1)a + b + r(X2 X3)c
r(X3,Y) = r(X3 X1)a + r(X3 X2)b + c

Some Statistical Techniques in Plant Breeding Experiments

r = (R,Y) = h
Considering only the first three factors, i.e., X1, X2 and X3, the simultaneous equations
given above can be presented in matrix notations as:

rX1Y rX1X1 rX1X 2


r

X 2 Y = rX 2 X1 rX 2 X 2
rX3Y rX3 X1 rX3X 2
Say,

rX1X3 a
rX 2 X3 b
rX3X3 c

A = B * C C = B-1A, provided B is non-singular matrix.

After having calculated the values of path coefficients, i.e., C, it is possible to obtain the
path value for residual R in the following way:
From the model given in the diagram (Fig. 1) it is obvious that:
Y = X 1 + X2 + X3 + R
and hence,
2Y = 2X1 + 2X 2 + 2X3 + 2R + 2 X1X 2 + 2 X 2 X3 + 2 X1X3
where, X1X 2 = r ( X1, X2) X1 X 2 .
The contribution of residual is thus,
h2 = 1 - a2 - b2 - c2 - 2r(X1X2)ab - 2r(X1X3)ac - 2r(X2X3)bc.
Example
In a replicated trial (r = 4), eight varieties of barely were tested and the observations were
recorded on number of ears per plant (Table 1), ear length (Table 2), 100-grain weight
(Table 3) and grain yield per plant (Table 4) (Singh and Chaudhary, 1995) as listed below:
Table 1. Data on number of ears per slant
Parents
1.
2.
3.
4.
5.
6.
7.
8.
Total

R-I
50.2
41.8
39.2
37.8
35.6
53.4
43.8
50.6
352.4

R-II
41.4
47.2
37.6
49.6
31.4
50.2
46.8
47.8
352.0

R-III
36.2
39.6
38.8
35.4
33.2
49.6
41.4
41.8
316.0

R-IV
39.8
46.6
33.6
41.8
29.8
57.8
43.6
46.8
339.8

Total
167.6
175.2
149.2
164.6
130.0
211.0
175.6
187.0
1360.2

Some Statistical Techniques in Plant Breeding Experiments

Table 2. Data on ears per length


Parents
1.
2.
3.
4.
5.
6.
7.
8.
Total

R-I
20.5
19.5
19.0
20.0
20.0
19.2
19.5
19.7
157.4

R-II
20.6
20.1
18.5
20.3
20.8
19.5
20.4
19.8
160.0

R-III
20.5
19.3
18.1
20.6
20.3
20.3
20.7
20.1
159.9

R-IV
19.6
20.1
19.3
20.3
19.9
19.9
20.3
20.5
159.9

Total
81.2
79.0
74.9
81.2
81.0
78.9
80.9
80.1
637.2

R-IV
3.9
3.7
4.7
4.3
4.1
4.5
4.3
4.1
33.6

Total
15.6
14.6
18.4
17.2
16.4
17.5
17.1
16.6
133.4

R-IV
76.5
108.7
69.5
95.9
51.0
107.2
89.5
81.5
679.8

Total
342.7
393.0
298.3
366.5
216.5
401.5
363.9
328.1
2710.5

Table 3. Data on 100-Grain Weight (g)


Parents
1.
2.
3.
4.
5.
6.
7.
8.
Total

R-I
3.9
3.7
4.5
4.3
4.1
4.2
4.3
4.2
33.2

R-II
4.0
3.6
4.6
4.4
4.0
4.5
4.3
4.0
33.4

R-III
3.8
3.6
4.6
4.2
4.2
4.3
4.2
4.3
33.2

Table 4. Data on Grain Yield (g)


Parents
1.
2.
3.
4.
5.
6.
7.
8.
Total

R-I
104.9
88.0
80.0
80.8
60.0
96.4
91.4
91.8
693.3

R-II
84.3
106.5
71.3
106.5
52.5
98.8
99.7
84.8
704.4

R-III
77.0
89.8
77.5
83.3
53.0
99.1
83.3
70.0
633.0

Here, SPAR1 package will be used to solve the problem for finding direct and indirect
paths. Further, inferences can be drawn from the results obtained.
3. Diallel Analysis
Few Definitions
Diallel cross: A diallel cross is a set of all possible matings between several genoypes
which may be individuals, clones, homozygous lines etc.

Some Statistical Techniques in Plant Breeding Experiments

General Combining ability: The General Combining ability (g.c.a.) of an inbred line is
defined as the average performance of the hybrids which this line produces with other
lines chosen from a random mating population. In general terms, such an effect is genic
and therefore its variance, taking into account epistatic interactions of digenic type and
2
denoted by gca
is given by:
2
gca
=

1 2 1 2
A + AA
2
4

Specific combining ability: It refers to a pair of inbred lines involved in a cross. The
specific combining ability (s.c.a) of a cross is defined as the deviation of the performance
of the cross from the expectation on the basis of the average g.c.a effects of the two lines
involved. Its existence indicates non-additive genetic effects and therefore its variance,
2
denoted by sca
, is given by:
1
2
2
2
+ 2AA + 2AD + DD
sca
= D
2
It may also be noted that the total genotypic variance is related to these two variances in
the following manner:
2
2
2
G
= 2A + D
+ 2AA + 2AD + DD
2
2
= 2 gca
+ sca

Here, emphasis will be given only on diallel analysis by Griffing method (1956). For p
inbred lines, the total number of all possible crosses are p2. However, there are situations
(determined mainly by the requirements of the breeding programme) where lesser number
of crosses may be made. The diallel system may be classified into four types. These are:(1)

A set of p2 crosses including selfings and reciprocal crosses.

(2)

A set of p(p + 1) crosses including selfings and only one set of F1 crosses.
2

(3)

A set of p(p-1) crosses using F1 crosses and their reciprocals but excluding selfings
and

(4)

A set of p(p 1) crosses using only one set of F1crosses and excluding selfings.
2

Griffing has used the term method for what has been called "types" above. It is
presumed that the experiment has been conducted in a randomized complete block design
using t (=1,2, ..., p2) crosses in b blocks having c observation in each of the tb plots. The
model for the analysis is:
Yijkl = + tij + bk + ( tb )ijk + eijkl
; i, j = 1,2, ..., p; j = 1,2,...,b; k = 1,2,...,c;
where Yijkl is the value of the lth observation from ijth cross in the kth block, the
population mean, tij the effect of the ij th cross, bk the effect of the kth block, (tb)ijk the
5

Some Statistical Techniques in Plant Breeding Experiments

interaction resulting from the ijth cross with the kth block and eijkl the deviation of Yijkl
from its expectation. The analysis of diallel crosses can be considered under two
assumptions, that is, when different effects in the model are considered as either fixed or
random.

Model-I (Fixed effects)


Here, the effects tij, bk and (tb)ijk are assumed as fixed while the deviation eijkl is assumed
as random and normally distributed. In this model, the primary interest is the study of the
performance of the parents through their hybrids and to identify promising crosses. Since
the material is to be treated as a population the inferences drawn are applicable strictly to
the specific material studied.
Model-II (Random effects)
All the effects tij, bk, (tb)ijk and eijkl are assumed to be random and distributed normally.
Here the stress is upon the variability in the population and not much upon the
performance of individual parental line or a cross. Essential features of the analysis of
variance along with the expectations of mean squares for "varieties", "blocks varieties"
and "plants within plots" are given for the two models in Table 5.
Table 5. Analysis of variance for Models I and II

Source

DF

MS

Blocks (B)

b-1

MB

Expectation of MS
Model-I
Model-II
v + cvtb + ctvb
v + tckb

Varieties (T)

t-1

MT

v + bckt

v + cvtb + cbvt

BT

(b- 1)(t - 1)

MBT

v + cktb

v +cvtb

Plant within plot

bt(c - 1)

ME

where

kb =

b 1

( tb) ij

t i2

b i2
; kt =

t 1

; ktb =

( t 1)(b 1)

and vx is the population variance of x. The appropriate F- tests for varieties from Table 5
are:

MT
ME

Model-I

F[t - 1, bt(c - 1)] =

Model-II

F[t - 1, (b - 1)(t - 1)] =

MT
MBT

where the degrees of freedom for the numerator and the denominator are given in the
parenthesis along F. It must be mentioned that in the detailed analysis the term tij can be
split as:
6

Some Statistical Techniques in Plant Breeding Experiments

tij = gi + gj + sij + rij (F1 and reciprocal crosses)


tij = gi + gj + sij (F1 but no reciprocal crosses)
where gi is the gca effect of the ith parent, sij the gca effect resulting from the crossing of
ith with the ith parent and rij the reciprocal effect involving the reciprocal crosses between
these two parents. The subsequent analysis of the data for combining ability is done by the
varietal means taken over the b blocks. The error mean squares for the two models (1 and
2) are calculated as ME1 = [ME/bc] and ME2 = [MBE/bc]. In either case E(ME) = v. Since
the analysis is done on varietal mean basis, the two models in terms of means are different
in form. For fixed effects the constraints are:

b k = 0; ( tb) ijk = 0
k

The above constraints do not apply to the random effects model. Using Yij as the new
symbol for the ijth varietal mean the two models (1 and 2) are now given by

e ijkl
Yij = + t ij +

k l

bc

and

bk
Yij = + t ij +

( tb) ijk
+

e ijkl
+

k l

bc

Notation for summation


The notation used can be better understood from the pattern of cells involved in the twoway tables for the 4 methods as given in Table 6. The mean values are denoted by Yij.
The illustration is for a 3 3 diallel.
Table 6. Pattern of cells involved in the 4 methods
Method - 1
Y11 Y12 Y13

Y21

Y22

Y23

Y31

Y32

Y33

Method - 2
Y11 Y12 Y13

Y22

Method - 3
Y12 Y13

Y23
Y33

Method - 1
Yi. = Yij ; Y.j = Yij ; Y.. = Yij
j

Method - 2
Yi. = Yij ; Yij = Y ji ; Y.. = Yij
j

Y21

Y31

Y32

Y23

Method - 4
Y12 Y13

Y23

Some Statistical Techniques in Plant Breeding Experiments

Method - 3
Yi. = Yij ; Y.j = Yij ; Y.. = Yij
i= j

i= j

Method 4
Yi. = Yij ; Y.. = Yij
i< j

i< j

Statistical Analysis
In what follows only the method 4 of Griffing's approach is being presented (analysis by
other methods will be explained through SPAR1). Variances of variance components,
treatment of mixed models and the extension of the technique to two and more variables
are not included.
Method - 4:
Let us consider the models I and II for fixed and random effects as

1
) ijkl
bc k l
1
1
1
and Yij = + g i + g j + s ij + ( ) b k + ( ) (tb) ijk + ( ) ijkl
b k
b k
bc k l
(i, j = 1, 2, ,p; k = 1, 2, ,b; l = 1, 2, ,c) and sij = sji.
Yij = + g i + g j + s ij + (

Restrictions imposed upon the parameters of model-I are:

g i = 0; s ij = 0
i

Sum of squares:

1
4
Y..2
Yi2.
p2 i
p ( p 2)
1
2
) Yi2. +
Y..2
SSS = Yij2 (
p2 i
(p 1)(p 2)
i< j

SSG =

The analysis of variance and the expectation of mean squares are given in Table 7.
Table 7. Analysis of variance for Model I & II

Source

DF

SS

MS

gca

p-1

SSG

MG

sca

p(p 3)
2
m

SSS

MS

SSE

ME

Error

Expectation of Mean Square


Model-I
Model-II
v + vS + (p-2)vg
1
v + (p 2)(
) g i2
p 1 i
2
v + vS
v +[
] Sij2
p(p 3) i< j
v
v

Some Statistical Techniques in Plant Breeding Experiments

The estimates of various effects in the model are:

2Y..
p(p 1)
pYi. 2Y..
g i =
p(p 1)
Yi. + Y. j
2Y..
Sij = Yij
+
p2
(p 1)(p 2)
=

The estimates of variance components are obtained by equating the observed variance to
their expectations given in Table 7 and they are:

1
(MG MS)
p2
v s = MS ME

v g =

F-tests of significance associated with Model-I: Let df(x) be the degrees of freedom for
the factor x where x stands for the symbols g or s. The primary interest in Model I is the
overall heterogeneity of the estimates within each class. It will be seen from the analysis
of variance Table 7 that the appropriate tests are:

F(df ( x ), vb(c - 1)] =

Mx
ME

F-tests of significance associated with Model-II:


The primary interest in this model is the estimation of variability parameters vg or vs.
Hence suitable procedures have to the adopted to tests the significance of vx. A study of
the variance component in Table 7 reveals that variance components can be tested by
calculating 'F' as a ratio of the concerned observed mean square and another suitable
observed mean square. For testing v g = 0 and v s = 0 the following statistics are used:
F[p 1, df (S)] =

MG
MS

F[df (r ), ( v 1)(b 1)] =

MR
ME 2

Exercise
In an experiment on Barley, six inbred lines were crossed in possible combinations
excluding self and reciprocals, the total entry being p(p-1)/2 (i.e. 15 incase of p=6
parents). The crosses were sown in randomized complete block design with two
replicates. The observations on the average number of kernels per head are given below:
Cross
12
13
14

Block 1
47.60
40.53
31.30
9

Block 2
48.70
38.15
30.75

Some Statistical Techniques in Plant Breeding Experiments

15
16
23
24
25
26
34
35
36
45
46
56

39.38
52.50
39.00
30.45
45.08
53.00
30.50
35.70
37.89
34.75
30.15
43.00

33.65
55.40
35.17
31.10
42.15
52.50
29.75
31.15
40.90
35.75
32.75
43.56

Using SPAR1 package the above diallel cross will be analyzed for estimating general and
specific combining abilities and their variances and estimates of variances. Based on the
results obtained inferences will be drawn.

4. Line Tester Analysis


When a breeding programme envisages the study of the performance of a large number of
inbred lines termed "lines" with a view to choosing promising combiners they are crossed
with a set of few inbred lines termed "testers" of proven merit. If the number of lines and
testers are l and t respectively, the number of crosses or hybrids or full-sib progenies is lt.
Let the lt crosses along with or without lines and testers be tried in a randomized complete
block design with r replications. The linear model involving effects pertaining to
combining ability is:
y ijkl = + li + t j + (lt ) ij + rk + e ijk
where y ijk is the value of the cross obtained from ith line and jth tester in the kth replication,

a general parameter common to all the plots, li the gca effect of the ith line, t j the gca
effect of the jth tester, (lt)ij the gca (interaction) effect of the ij th cross, rk the effect of the
kth replication and eijk the deviation of y ijk from its expectation. It is assumed that the
effects l, t, lt, r and e are random and is fixed. Least square technique leads to the
following analysis of variance (Essential features of the analysis of variance are given in
Table 8.).
Table 8. Analysis of variance in line Tester analysis

Source
Replications (R)
Lines (L)
Testers (T)
L T
Error

DF
r- 1
l-1
t-1
(l-1) (t-1)
(r-1) (lt-1)

SS
SSR
SSL
SST
SSLT
SSE

MS
MR
ML
MT
MLT
ME

Expectation of MS
v + ltvr
v + rvlt + rtvl
v + rvlt + rlvt
v + rvlt
v

where v is the population error variance and v l , v t etc. are the population variances
corresponding to lines and testers etc. The various sum of squares are calculated as:
10

Some Statistical Techniques in Plant Breeding Experiments

CF =

Y...2
rlt

2
CF
TSS = y ijk
k j i
2

Yi..

Y2
SSR = ..k CF
k lt

SSL =

tr

CF
2

Yij.

Yij.
SSLT =

j i

j i

SSE = TSS

CF SSL SST

+ CF

where (.) in the suffix of Y follows the conventional notation used for the summation over
the appropriate variable.
The estimates of parameters are given by
=

Y...
ltr

l = Yi.. Y...
i
tr
ltr
Yij. Yi.. Y. j. Y...
(lt ) ij =

+
r
tr
lr
ltr

t = Y. j. Y...
j
lr
ltr

The variances of estimates of parameters can be computed as

(l 1) v
var (li ) =
rlt
(l 1)(t 1) v
var(lt ) ij =
rlt
2
var (t i t j ) = v
rl
2(l 1)
var (lt ij lt ik ) =
v
rl

1
v
rlt
( t 1) v
var (t j ) =
rlt
2
var (li lj ) = v
rt

var ( ) =

2(t 1)
var (lt ij lt kj ) =
v
rt

2(lt l t )
var (lt ij lt km ) =
v
rlt

where v is the estimate of v and equal of ME. The estimates of variance components can
also be computed as by equating the expectations of means squares to the observed values
and it is easily seen that:
v l =

ML MT
rt

v lt =

MLT ME .
r

v t =

MT MLT
rl

The plant breeders are always interested to know the estimates of variance components of
gca and sca effects. The variance v l and v t are associated with gca while v lt is
associated with sca effects. It may be pointed out that the term gca variance is generally
used to tell about v l , v t , v l + v t , ML + MT 2MLT etc. The term sca variance however, can
r (l + t )

be used for v lt without ambiguity.

11

Some Statistical Techniques in Plant Breeding Experiments

Tests of significance of estimates of variance components


The different variance components v l , v t and v lt can be tested by calculating F- as a ratio
of two observed variances. For testing the null hypotheses
1. v l = v gca = 0 ,
calculate F (d1 , d 2 ) = ML where d1 = 1 1; d 2 = (1 1)(t 1)
MLT

2. v t = v gca = 0
calculate F (d1 , d 2 ) = MT where d1 = t 1; d 2 = (1 1)(t 1)
MLT

3. v lt = v sca = 0
calculate F (d1 , d 2 ) =

MLT
where d1 = (1 1)(t 01); d 2 = (r 1)(lt 1)
ME

and when
4. v gca =

ML + MT 2MLT
and v gca = 0
r (1 + t )

*
calculate F (d1 , d 2 ) = M where M* = ML + MT

d1 =

MLT
(ML + MT ) 2 (1 1)(t 1)

( t 1)(ML) 2 + (1 1)(MT) 2

5. v gca = v l + v t =

; d 2 = (1 1)( t 1)

1(ML) + t (MT) (1 + t )(MLT )


rlt

*
calculate F (d1 , d 2 ) = M where M* = l(ML) + t (MT )

l+t

MLT

{1(ML) + t(MT}

d1 =

(1 1)(t 1)

(t 1)1 (ML) + (1 1)t 2 (MT ) 2

; d 2 = (1 1)(t 1)

In all the above expressions d1 and d 2 denote the degrees of freedom corresponding to the
numerator and denominator of the right hand side of expression for F.

Exercise
Suppose there are 3 testes and 5 lines and 5 x 3 =15 crosses. These crosses along with 8
parents, i.e., 5 lines and 3 testers, total entry being 23, were tested in a R.B.D. with 4
replications and the data on grain yield were obtained (Table 9).
Table 9. Yield data on parents and crosses

Genotypes
16
17
18

R1
74.40
91.82
48.08

R2
70.86
99.18
62.10

R3
60.94
118.88
58.54
12

R4
68.00
120.68
41.84

Total
274.20
430.56
210.56

Some Statistical Techniques in Plant Breeding Experiments

26
27
28
36
37
38
46
47
48
56
57
58
1
2
3
4
5
6
7
8
Total

59.06
84.16
96.92
109.86
117.20
109.86
103.14
53.40
53.86
98.46
81.36
86.62
104.86
88.02
77.94
80.82
59.96
96.44
91.44
91.78
1959.28

65.62
109.74
91.44
98.16
100.28
116.16
109.66
60.86
48.30
73.10
72.82
94.18
84.32
106.54
71.34
106.52
52.48
98.82
99.66
84.82
1977.28

81.62
102.14
79.86
93.26
116.16
123.92
90.98
74.46
40.64
89.18
89.82
90.32
76.92
89.82
77.52
83.28
52.98
99.14
83.28
69.92
1943.58

86.76
94.52
74.38
102.26
112.52
120.86
119.40
69.08
44.62
75.86
83.74
108.16
76.48
108.68
69.48
95.92
50.98
107.16
89.46
81.48
2002.32

293.06
390.56
342.60
403.54
446.16
470.94
423.18
257.80
187.42
336.60
327.74
379.28
342.58
393.04
296.28
366.56
216.40
401.56
363.84
328.00
7882.46

ANOVA for Line tester Analysis including parents

Source
Replications
Treatments
Parents
Parents vs. crosses
Crosses
Lines
Testers
Lines Testers
Error
Total

DF
3
22
7
1
14
4
5
8
66
91

SS
83.000
32552.940
6299.620
53.666
26199.654
10318.361
1718.925
14162.368
6010.295
38646.235

MS
23.667
1479.679
899.945
53.666
1871.404
M1=2579.590
Mt=859.463
M1t =1770.296
Me=91.650

F
0.304
16.249
9.882
0.589
20.550
1.457
0.485

It is to be noted here that M.S. due to lines (Ml) and testers (Mt) are to be tested against
the M.S. due to lines testers (M1t). The latter is, in turn tested against M.S. due to error
(Me). The gca and sca effects, variance components and estimates of variance components
are further obtained from SPAR1 package.

13

Potrebbero piacerti anche