Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
15
doi:10.1093/hmg/ddq198
Advance Access published on May 12, 2010
29272935
Institute of Medical Informatics and Statistics, Christian-Albrechts University, 24105 Kiel, Germany and 2Department
of Forensic Molecular Biology, Erasmus University Medical Center, 3015 GE Rotterdam, The Netherlands
Received March 10, 2010; Revised April 28, 2010; Accepted May 6, 2010
INTRODUCTION
The availability of high-density panels of genetic polymorphisms has led to the recent discovery of extended regions of
autozygosity in the human genome. At the genotype level,
these regions present as sizeable stretches, or runs, of homozygosity (ROH) (1). Increased levels of autozygosity have
long been implicated as a cause of the higher prevalence of
recessive diseases in small and isolated populations. Initially,
ROH analysis was thus endorsed successfully as a means to
map recessive diseases genes (2 7), but ROHs may also be
useful for disease gene identification under other genetic
models (8,9), may be indicative of selective sweeps (10,11)
and should be interesting from a human population genetics
point of view.
An abundance of ROHs was first demonstrated in Europeans using short tandem repeat markers (1). Subsequent
analyses of single nucleotide polymorphisms (SNPs) in large
To whom correspondence should be addressed at: Institute of Medical Informatics and Statistics, Christian-Albrechts University, Brunswiker Str. 10,
D-24105 Kiel, Germany. Tel: +49 4315973181; Fax: +49 4315973193; Email: nothnagel@medinfo.uni-kiel.de
# The Author 2010. Published by Oxford University Press. All rights reserved.
For Permissions, please email: journals.permissions@oxfordjournals.org
The availability of high-density panels of genetic polymorphisms has led to the discovery of extended
regions of apparent autozygosity in the human genome. At the genotype level, these regions present as sizeable stretches, or runs, of homozygosity (ROH). Here, we investigated both the genomic and the geographic
distribution of ROHs in a large European sample of individuals originating from 23 subpopulations. The
genomic ROH distribution was found to be characterized by a pattern of highly significant non-uniformity
that was virtually identical in all subpopulations studied. Some 77 chromosomal regions contained ROHs
at considerable frequency, thereby forming ROH islands that were not explicable by high linkage disequilibrium alone. At the geographic level, the number and cumulative length of ROHs followed a prominent South
to North gradient in agreement with expectations from European population history. The individual ROH
length, in contrast, showed only minor and unsystematic geographic variation. While our findings are thus
consistent with a larger effective population size in Southern than in Northern Europe, combined with a
higher historic population density and mobility, they also indicate that the patterns of meiotic recombination
in humans must have been very similar throughout the continent. Extending previous reports of a strong
correlation between geography and identity-by-state, our data show that the genomic identity-by-descent patterns of Europeans are also clinal. As a consequence, the planning, design and interpretation of ROH-based
genetic studies must take sample origin into account in order for such studies to be sensible and valid.
2928
NO
SE
FI
IR
UK
DK
NE
NG
SG
AU
SW
FR
PG
S1
S2
I1
I2
YU
GR
HU
RO
PO
CZ
Sample size
52
46
47
35
194
59
280
494
489
50
133
50
16
81
47
106
49
55
51
17
12
49
45
2457
Longitude
59.36
59.51
60.10
53.19
51.30
55.40
51.55
54.14
48.37
47.16
46.31
45.46
38.43
40.25
41.20
41.53
43.37
44.49
40.38
47.27
44.25
52.15
50.04
5.28
17.38
24.56
26.15
20.07
12.34
4.28
10.04
10.89
11.23
6.37
4.50
29.08
23.42
2.10
13.68
13.30
20.30
22.27
19.06
26.07
21.01
14.28
Weighted ROH
number per individual
mean + SD
42.16 + 6.72
41.49 + 6.37
48.04 + 7.34
40.14 + 5.02
38.51 + 6.30
40.11 + 6.26
38.79 + 6.83
40.49 + 6.25
38.39 + 6.09
36.42 + 6.39
37.64 + 5.89
36.89 + 6.71
34.15 + 7.56
37.70 + 5.89
36.37 + 6.94
35.59 + 6.19
34.60 + 5.11
36.88 + 5.72
33.69 + 5.22
33.68 + 5.25
32.55 + 4.48
41.45 + 5.98
39.21 + 5.11
38.74 + 6.60
1.31 + 0.08
1.27 + 0.07
1.30 + 0.08
1.28 + 0.08
1.30 + 0.09
1.30 + 0.09
1.30 + 0.08
1.31 + 0.08
1.31 + 0.08
1.29 + 0.09
1.29 + 0.08
1.28 + 0.10
1.26 + 0.08
1.30 + 0.08
1.29 + 0.10
1.30 + 0.09
1.28 + 0.07
1.31 + 0.08
1.28 + 0.11
1.25 + 0.07
1.17 + 0.06
1.29 + 0.07
1.27 + 0.06
1.30 + 0.08
Given are the number of samples after data cleaning (21), the geographical location of the sampling sites (subpopulations), the subpopulation-specific mean +
SD of the weighted number of ROHs and of the median-weighted ROH length per individual.
RESULTS
Genomic ROH distribution
Norway (Frde)
Sweden (Uppsala)
Finland (Helsinki)
Ireland
UK (London)
Denmark (Copenhagen)
Netherlands (Rotterdam)
Germany I (Kiel)
Germany II (Augsburg)
Austria (Tyrol)
Switzerland (Lausanne)
France (Lyon)
Portugal
Spain I
Spain II (Barcelona)
Italy I
Italy II (Marches)
Former Yugoslavia
Northern Greece
Hungary
Romania
Poland (Warsaw)
Czech Republic (Prague)
Total
Code
x2
df
P-value
1
5
10
20
50
100
250
558199.8639
53382.4379
15142.9161
4768.6176
764.6029
239.1148
98.7624
2696
555
284
146
62
35
21
,102100
,102100
,102100
,102100
,102100
3.1 10232
4.8 10212
which is in fact only marginally better than random classification (AUC 0.50).
The definition of an ROH does not only depend upon the
properties of single markers but takes adjacent SNPs simultaneously into account. We therefore correlated the average
single-marker gene diversity, taken over a sliding 1 Mb
window, with the average ROH frequency per SNP in that
window. Although an unambiguously negative correlation
emerged (genome-wide Pearsons r 20.268 + 0.018;
range of subpopulation-specific r-values: 20.230 to
20.293), the size of the observed correlation implied that
gene diversity explains only 7% (coefficient of determination r2 0.072) of the variation in ROH frequency
per SNP.
We also investigated the effects of LD upon the definition of
ROHs. To this end, we correlated the average ROH frequency
per SNP, taken over a sliding 1 Mb window, with the average
of the squared genotypic correlation coefficient g 2 within this
window. The average of the genome-wide subpopulationspecific Pearson correlation coefficients was 0.453 + 0.024,
with a range of 0.382 (PG) to 0.503 (S2). Thus, 20% (i.e.
0.4532 0.205) of the variation in ROH frequency per SNP
could be explained by the extent of LD in the vicinity of a
given marker. A correlation between LD and ROH prevalence
became particularly apparent for the three genomic regions
(on chromosomes 3, 4 and 14) with the highest ROH frequency per SNP (Fig. 3). However, increased LD in the vicinity of a given SNP was neither necessary nor sufficient for
SNPs to be included in an ROH.
Geographical pattern of ROH distribution in Europe
LD can act as a potential confounder in comparative ROH
analyses of different populations because the local level of
LD determines the effective number of SNPs used for ROH
definition. When characterizing the geographic distribution
of ROHs, we therefore weighted individual ROHs by their
internal level of LD, approximated by one minus the
average of the pair-wise squared genotypic correlation coefficient g 2 (see Materials and Methods).
The weighted ROH number per individual ranged from 10.5
to 60.4 in the overall sample, with all subpopulation-specific
IQRs falling between 25 and 55 (see Supplementary Material,
Fig. S1). The subpopulation average of the weighted ROH
number per individual varied between 32.55 (standard error,
SE: 1.3) in the Romanians and 48.0 (SE: 1.1) in the Finns
(Table 1). Similarly, the subpopulation average of the cumulative weighted ROH length per individual ranged from
49.7 Mb (SE: 2.3 Mb) in the Romanians to 81.5 Mb (SE:
2.2 Mb) in the Finns. Of the 2457 individuals analysed, 40
(1.6%) exhibited a cumulative weighted ROH length
100 Mb (3.3% of the human genome). These individuals originated from South Germany (10), North Germany and
Norway (6 each), Italy I, Spain I, Finland and The Netherlands
(3 each), Portugal (2), and from Austria, Denmark, the UK and
former Yugoslavia (1 each). As a consequence, particularly
high proportions of samples from Finland (6.4%), Norway
(11.5%) and Portugal (12.5%) were found to have at least
100 Mb of their genome located in ROHs. Twelve individuals
(0.5% of the total) had weighted ROHs comprising 150 Mb
Each chromosome was divided into bins of equal size. The average ROH count
per SNP per bin in the overall sample was subjected to a x2 goodness-of-fit test
over all chromosomes. The last line (bin size 250 Mb) contains the result of a
test for uniformity between chromosomes. df: degrees of freedom.
2929
2930
Table 3. Regions of at least 50 SNPs with high ROH frequency per SNP (ROH islands)
Location/size (kb)
No.
SNPs
Known genes
14
4
3
12
65,754.607 66,956.534/1,201.927
33,305.316 34,167.260/861.944
50,382.348 51,835.857/1,453.509
110,249.612 111,461.573/1,211.961
106
58
101
90
1
5
11
35,023.369 36,505.444/1482.075
129,845.818 131,423.014/1,577.196
47,998.479 49,391.209/1,392.730
101
105
114
16
16
10
65,360.598 66,845.475/1,484.877
46,391.563 46,826.430/434.867
74,211.870 75,086.795/874.925
72
55
59
Given are the mean and the range of the ROH frequency per SNP in the overall sample, both taken over all SNPs in the respective region.
Table 4. Relationship between single-marker gene diversity and heterozygote
deficit, respectively, and ROH frequency per SNP
Subpopulation
AU
DK
FI
YU
FR
GR
HU
IR
IT1
IT2
NO
PO
PG
RO
SE
SG
S1
S2
SW
CZ
UK
Mean
SD
Gene diversity
P-value
OR
Fixation index F
P-value
OR
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
8.1 10238
,102100
,102100
,102100
,102100
,102100
,102100
,102100
,102100
1.1 1029
,102100
2.4 1023
0.377
0.321
0.465
0.367
0.319
0.376
0.287
0.331
0.371
0.349
0.440
0.318
0.412
0.287
0.330
0.428
0.339
0.320
0.304
0.344
0.377
0.354
0.049
1.576
1.629
1.884
1.527
1.442
1.894
1.553
1.691
1.457
1.225
1.916
1.532
1.958
1.575
1.538
1.739
1.855
1.098
1.935
1.046
1.576
1.604
0.261
AUC
0.55
0.55
0.54
0.55
0.55
0.55
0.56
0.55
0.55
0.55
0.54
0.55
0.56
0.56
0.55
0.54
0.55
0.55
0.56
0.55
0.55
0.550
0.005
Odds-ratios (OR) and P-values are from a logistic regression analysis of the
ROH frequency per SNP, using single-marker gene diversity and local fixation
index F as covariates. AUC, area-under-curve (for details, see text).
trend was observed for the subpopulation average of the cumulative weighted ROH length per individual (latitude: r 0.61,
P 1.8 1023; longitude: r 20.14, P 0.5). Nevertheless, since the Finnish are known to be genetically quite distinct from other Europeans, and because some of the
Norwegian sampling sites included in our study (e.g. Frde)
also may have represented genetic isolates, it remained possible that the above correlations hinged mainly on a few
founder populations from the northern fringes of the continent.
However, exclusion of the Finnish and/or Norwegian samples
from our analysis hardly changed the observed correlation
between weighted ROH number and latitude (without
Chr.
2931
Figure 4. Geographic distribution of weighted ROHs in European genomes. White dots mark the location of the 23 sampling sites where individuals were
recruited into subpopulations (as defined in the text). (A) Subpopulation average of the weighted ROH number per individual; (B) subpopulation average
of the median-weighted ROH length (Mb) per individual. Contour maps were derived through spline interpolation.
Figure 3. ROH frequency, local linkage disequilibrium and gene diversity per SNP in selected chromosomal regions in the North German (NG) subpopulation.
Regions were selected from the top of Table 3. Vertical gray dashed lines: region limits. Green horizontal bars: extent of individual ROHs. Black ticks: physical
location of analysed SNP. Green line: ROH frequency per SNP. Red line: average genotypic correlation within bins of approximately 200 kb (marked by gray
ticks). Blue line: gene diversity per SNP.
2932
DISCUSSION
At the level of the individual genome, the distribution of
SNP-defined ROHs was found in our study to be highly structured in all of the European subpopulations analysed. This
2933
2934
R software v2.92 (31) was used for statistical analysis and for
creating graphs. The akima R package v0.5-2 (32) was used
for gridded bivariate cubic interpolation using splines (33).
The significance of the correlation of certain ROH characteristics with either longitude or latitude was assessed by a twosided test at the 5% level, as implemented in the cor.test
function of the R stats library. Data on European geographic
boundaries were obtained from http://www.oceanteacher.org/.
Graphs were edited with Adobe Illustrator CS2. Spatial autocorrelation was analysed and correlograms were generated
using PASSAGE v1.1 (34).
SUPPLEMENTARY MATERIAL
Supplementary Material is available at HMG online.
ACKNOWLEDGEMENTS
All sample donors are gratefully acknowledged for their participation. We thank the following colleagues for their help
and support: P. Arp, M. Balascakova, C. Becker, A. van
Belkum, J. Bertranpetit, L.A. Bindoff, R. Borup, S. Brauer,
A. Caliebe, J. Chambers, D. Comas, G. Eckstein, H. von EllerEberstein, F.C. Nielsen, S. Freitag-Wolf, U. Gether, C. Gieger,
E. Haastrup, A. Hofman, G. Holmlund, W. van IJken,
M. Jhamai, O. Junge, K. King, E. Knipers, J. Kooner,
A. Kouvatsi, O. Lao, J. Laven, P. Lichtner, J. Lindemans,
M. Macek, T. Meitinger, I. Mollet, V. Mooser, P. Nurnberg,
J. Palo, W. Parson, R. Ploski, F. Rivadeneira, A. Ruther,
A. Sajantila, R. van Schaik, C. Schjerling, S. Schreiber,
E. Sijbrands, M. Simoons, B. Stricker, A. Tagliabracci, A.G.
Uitterlinden, H. Ullum, P. Vollenweider, G. Waeber,
D. Waterworth, T. Werge and H.-E. Wichmann. We also
thank M. Wittig for helpful discussions.
Conflict of Interest statement. None declared.
FUNDING
This work was supported by the Netherlands Forensic Institute
(to M.Ka.); by Affymetrix Inc. (to M.Ka., M.Kr.); by the
German Federal Ministry of Education and Research
(BMBF) through the National Genome Research Network
REFERENCES
1. Broman, K.W. and Weber, J.L. (1999) Long homozygous chromosomal
segments in reference families from the centre dEtude du polymorphisme
humain. Am. J. Hum. Genet., 65, 14931500.
2. Hildebrandt, F., Heeringa, S.F., Ruschendorf, F., Attanasio, M., Nurnberg,
G., Becker, C., Seelow, D., Huebner, N., Chernin, G., Vlangos, C.N. et al.
(2009) A systematic approach to mapping recessive disease genes in
individuals from outbred populations. PLoS Genet., 5, e1000353.
3. Lander, E.S. and Botstein, D. (1987) Homozygosity mapping: a way to
map human recessive traits with the DNA of inbred children. Science,
236, 1567 1570.
4. Miano, M.G., Jacobson, S.G., Carothers, A., Hanson, I., Teague, P.,
Lovell, J., Cideciyan, A.V., Haider, N., Stone, E.M., Sheffield, V.C. et al.
(2000) Pitfalls in homozygosity mapping. Am. J. Hum. Genet., 67, 1348
1351.
5. Seelow, D., Schuelke, M., Hildebrandt, F. and Nurnberg, P. (2009)
HomozygosityMapperan interactive approach to homozygosity
mapping. Nucleic Acids Res., 37, W593W599.
6. Wang, S., Haynes, C., Barany, F. and Ott, J. (2009) Genome-wide
autozygosity mapping in human populations. Genet. Epidemiol., 33,
172180.
7. Woods, C.G., Cox, J., Springell, K., Hampshire, D.J., Mohamed, M.D.,
McKibbin, M., Stern, R., Raymond, F.L., Sandford, R., Malik Sharif, S.
et al. (2006) Quantification of homozygosity in consanguineous
individuals with autosomal recessive disease. Am. J. Hum. Genet., 78,
889896.
8. Jiang, H., Orr, A., Guernsey, D.L., Robitaille, J., Asselin, G., Samuels,
M.E. and Dube, M.P. (2009) Application of homozygosity haplotype
analysis to genetic mapping with high-density SNP genotype data. PLoS
ONE, 4, e5280.
9. Miyazawa, H., Kato, M., Awata, T., Kohda, M., Iwasa, H., Koyama, N.,
Tanaka, T., Huqun Kyo, S., Okazaki, Y. et al. (2007) Homozygosity
haplotype allows a genomewide search for the autosomal segments shared
among patients. Am. J. Hum. Genet., 80, 10901102.
10. Rosenberg, N.A. and Jakobsson, M. (2008) The relationship between
homozygosity and the frequency of the most frequent allele. Genetics,
179, 2027 2036.
11. Sabeti, P.C., Reich, D.E., Higgins, J.M., Levine, H.Z., Richter, D.J.,
Schaffner, S.F., Gabriel, S.B., Platko, J.V., Patterson, N.J., McDonald,
G.J. et al. (2002) Detecting recent positive selection in the human genome
from haplotype structure. Nature, 419, 832 837.
12. Auton, A., Bryc, K., Boyko, A.R., Lohmueller, K.E., Novembre, J.,
Reynolds, A., Indap, A., Wright, M.H., Degenhardt, J.D., Gutenkunst,
R.N. et al. (2009) Global distribution of genomic diversity underscores
rich complex history of continental human populations. Genome Res., 19,
795803.
13. Gibson, J., Morton, N.E. and Collins, A. (2006) Extended tracts of
homozygosity in outbred human populations. Hum. Mol. Genet., 15, 789
795.
14. Li, L.H., Ho, S.F., Chen, C.H., Wei, C.Y., Wong, W.C., Li, L.Y., Hung,
S.I., Chung, W.H., Pan, W.H., Lee, M.T. et al. (2006) Long contiguous
stretches of homozygosity in the human genome. Hum. Mutat., 27, 1115
1121.
15. Wang, H., Lin, C.H., Service, S., Chen, Y., Freimer, N. and Sabatti, C.
(2006) Linkage disequilibrium and haplotype homozygosity in population
samples genotyped at a high marker density. Hum. Hered., 62, 175189.
16. McQuillan, R., Leutenegger, A.L., Abdel-Rahman, R., Franklin, C.S.,
Pericic, M., Barac-Lauc, L., Smolej-Narancic, N., Janicijevic, B., Polasek,
17.
18.
19.
20.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
21.
2935