Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Disponvel em http://ecoqua.ecologia.ufrgs.br
Anlise de Agrupamentos
(mtodo de ligao simples)
Sp.A Sp.B Sp.C
1 2 3 4 5 6 7
26 28 29 29 30 35 39
28 30 31 33 27 38 36
18 14 13 13 19 15 15
Species B
40 6 7
1 0 2 5.83 0 3 6.56 1.00 0 4 7.68 2.24 2.00 0 5 4.24 6.48 7.28 8.54 0 6 13.78 9.27 9.43 8.06 12.73 0 7 15.56 11.23 11.36 10.63 13.34 4.47 1 2 3 4 5 6
4 30 2 3
0 7
7 6 4 3 2 5 1 0 4 8
20 20 Spe cie s A 40
Anlise de Agrupamentos
Agrupamento hierrquico
Algoritmos podem ser aglomerativos ou divisivos Processo de agrupamento representado por um dendrograma No produz uma classificao mas n-1 possibilidades de classificao, pois o nmero de grupos definido a posteriori Alguns algoritmos aglomerativos: Ligao simples, ligao completa, ligao mdia (UPGMA, WPGMA), soma de quadrados (Ward)
Agrupamento no-hierrquico
Nmero de grupos especificado a priori e o resultado uma classificao.
Refe rncias :
Legendre, P. ; Legendre, L. 1998. Numerical Ecol ogy. Elsevier, N. Yo rk. Orlci, L.; Ke nkel, N.C.; Orlci, M. 1 9 87. Data Analysis in Population and Community Ecology. University of Hawaii, Honolul u / New Mexico St ate University, Las Cruces. p 1 75 -1 82. Pielou, E. C. 1984. Th e Int erpretation of Ecological Data; a Primer on Classif ication and Ordination. New York, J. Wiley. p. 13 -40 e 63 -81. Pillar, V. D. 1999. Ho w shar p are classi f ication s? Ecology 80 : 2508 -25 1 6 Podani, J. 2000. Intro duction t o th e Exploration of Multivariate Biological Data . Leiden, Backhuys. p. 135 -174.
Alguns critrios para a redefinio da matriz de semelhana aps cada passo aglomerativo (Extrado de Podani 1994:82)
Ligao Simples
Quando a matriz contm dissimilaridades, a dissimilaridade entre os grupos P e Q :
dPQ = INF [ djk, para j=1, ..., n-1 e k=j+1, ..., n objetos, desde que j pertena ao grupo P e k ao grupo Q ]
onde: djk um elemento da matriz de dissimilaridades INF valor mnimo no conjunto entre []
Ligao Simples
Ligao Completa
Quando a matriz contm dissimilaridades, a dissimilaridade entre os grupos P e Q
dPQ = SUP [ djk, para j=1, ..., n-1 and k=j+1, ..., n objetos, desde que j pertena ao grupo P e k ao grupo Q ]
onde:
djk um elemento da matrizx de dissimilaridades SUP o valor mximo no conjunto entre []
Ligao Completa
d
h i
2 hi
para h=1, ..., n-1 e i= h+1, ..., n objetos, desde que h e i pertenam ao grupo P ou Q QP = 1 np
d
2 hi h i
para h=1, ..., n-1 and i= h+1, ..., n objetos , desde que h e i pertenam ao grupo P QQ = 1 nq d
h i 2 hi
para h=1, ..., n-1 and i= h+1, ..., n objetos , desde que h e i pertenam ao grupo Q
Anlise de agrupamentos (soma de quadrados) com dados simulados: 50 unidades descritas por quatro variveis aleatrias.
Anlise de agrupamentos (soma de quadrados) com dados simulados: 50 unidades descritas por quatro variveis aleatrias definindo 2 grupos ntidos.
So ntidos os grupos?
Anlise de agrupamentos de 20 comunidades em vegetao de campo (quadros 0.1 x 1 m) (Cadenazzi 1996). O mtodo de agrupamentos varincia mnima e a anlise baseada em distncias Euclidianas. Qual a probabilidade de que uma classificao (e.g., tipos de comunidades) obtida de um levantamento se mantenha ao se repetir o levantamento no mesmo universo amostral?
The n + nz sampling units in t he r efe rence sample and in th e boots t rap samp le are point s in a space defi ned by p variables.
Sz Tz
n n z 1 h 1 n nz i h 1
where
1 Tz n nz
d2 hi
is t he t ota l sum of squares, involv ing ( n + nz )( n + nz 1 )/ 2 s quared dissimilariti es of n + nz sampli ng units , n is t he size of th e ref erence sample and nz is t he size of th e boot st rap sample.
One-t o-o ne nearest neighb or sum of squares bet ween partit ions: S = 32 .8+28.6 = 61 .3667 Nearest neighbor gr oups: 1 ,4; 2 ,3 ; * Gz = 1 - S/ T = 0 .8509 ( 9 ) Null boot stra p sample ( th e unit s in each gr oup are ta ken at random f ro m t he nearest gro up in t he refe rence sample):
Sampling units: Groups: 3 3 1 4 3 3 4 3 5 4
( 3 ) Ref erence part it ion with 2 groups generat ed by c lust er analy sis:
Sampling units: Groups: 1 1 2 1 3 2 4 2 5 1
( 10 ) Distan ce m atri x of s ampling unit s ( refe rence plus null b oot strap sample):
0 34 0 134 234 0 41 129 45 0 51 89 285 150 0 134 234 0 45 285 0 0 34 134 41 51 134 0 134 234 0 45 285 0 134 0 41 129 45 0 150 45 41 45 0 51 89 285 150 0 285 51 285 150 0
( 5 ) Di st ance matr ix ( squared Euclidean) of s ampling un its ( ref erence plus boot str ap):
0 34 0 134 234 0 41 129 45 0 51 89 285 150 0 0 34 134 41 51 0 51 89 285 150 0 51 0 41 129 45 0 150 41 150 0 41 129 45 0 150 41 150 0 0 51 89 285 150 0 51 0 150 150 0
( 11 ) Sum of squares f or cont rasts bet ween nearest neighb or gro ups of sampling unit s in t he ref erence and null boot stra p sample: 1 ,4: 6 .5 2 ,3: 1 .5 ( 12 ) Tot al s um of s quares comput ed f ro m dist ance matri x of s t ep ( 10 ): T = ( 34+ ...+51+. ..+285+150 )/ 10 = 495 .8 Exclusive nearest neighb or sum of s quares bet ween partit ions: S = 6 .5+1.5 = 8 o Gz = 1 S / T = 0 .9839
o * Gz is larg er t han Gz t his it erati on will add z ero to t he cumulati ve o * f requency F( Gz Gz ) .
( 6 ) Boot str ap sample part it ion with 2 groups generat ed b y clust er analysis:
Sampling units: Groups: 1 3 5 4 4 3 4 3 5 4
( 7 ) Su m of s quares fo r cont rasts bet we en groups of s ampling un its in th e ref erence ( rows) a nd bootst rap sample ( columns) ; matr ix is rearranged:
1 2 3 78.2 28.6 4 32.8 206 > 4 32.8 206 3 78.2 28.6
Since
= 0 .9068.
Evaluation of sampling suf f iciency and signif icance for group part iti on levels in dif fe rent data set s by probabilit ies P( Gz Gz ) . Probabilit ies we re generat ed in 10 00 0 boot st rap it erat ions at each sample size . Dat a set s and part it ion levels are: ( A) Art ificial dat a of 60 un it s described by random variables, part it ion level 3 groups; ( BC) Art if icial dat a set of 3 well defined groups, part it ion levels 3 and 4 group s; The groups were defi ned by sum of squ ares clust ering. ( From Pillar 1 9 98 )
o *
Evaluation of s ampling suf f iciency and signif icance f or group pa rt iti on levels in dif fe rent data s et s by probabilit ies P( Gz Gz ) . Probabilit ies we re generat ed in 10 00 0 boot st rap it erat ions at each sample size . Data set s and p art it ion levels are: (D-F) EEA grassland dat a set (Pillar et al. 1 9 92 ) , part it ion levels 2 , 3 an d 4 groups; and ( G-J) Sant a Catarina grassland data s et ( Pillar and Tcacenco 1 98 6 ), p art it ion levels 2 , 3, 4 a nd 5 groups. Th e groups w ere defi ned by sum o f squares clust ering. ( From Pillar 19 9 8)
o *
Probability curve of P( Gz Gz ) f or increasing separat ion betw een groups in simulat ed dat a. Random dat a set s were def ined with 2 groups separat ed by exp ect ed dif f erence d betw een cent roids ranging f rom d = 0 ( a single group) to d = 0.3 2 (c learly two g roup s). The groups have equal size s (2 0 and 2 0 sampling unit s). The dat a conta in 4 0 variables with normal ( solid line) and unif orm ( dott ed line) distr ibut ion wi t hin each group. St andard deviat ions of th e means b ased on 1 0 dat a sets in each case are indicat ed. The parti t ion level aft er clust er analysis is indicat ed on each line. The numb er of it erati ons is 1 0 00 f or each combinat ion of cent roid dif f erence, part it ion level, distr ibut ion t ype and da t a set replicat e. ( From Pillar 1 9 99 )
Dimensions: 245 sampling units, 9 variables Data type: (5) mixed Type: 3 3 3 3 3 2 3 3 3 Resemblance measure: (5)Gower index, (1)between sampling units Clustering criterion: (4)average linkage (UPGMA)
SAMPLER Bootstrap resampling Sample attribute: sharpness of group structure (G*) Considering partitions with 2 to 5 groups. Sample size at 1 sampling step(s): 245 Probabilities P(GNull<=G*) generated in 1000 iterations of bootstrap resampling: 2 groups: 0.281 3 groups: 0.141 4 groups: 0.106 5 groups: 0.027
Cluster analysis (UPGMA) of 245 vegetation patches delimited on grassland, Morro Santana, Porto Alegre (Klebe 2003). Description used 6 structural variables and the analysis was based on Gower similarities.
Types of grassland vegetation patches, Morro Santana, Porto Alegre (Klebe 2003). Classification based on 9 variables describing vegetation structure.