Sei sulla pagina 1di 3

for SHORT title: Killifish population genome variation

see http://arthropods.eugenes.org/EvidentialGene/vertebrates/
TABLE GQ. Quality effect of gene set construction methods for vertebrate fish,
comparing methods NCBI Eukaryote annotation (nc), EvidentialGene (evg),
MAKER2 (mk), EvidenceModeller (em), within and across species.
Similar method quality rankings are obtained in gene sets of plants and arthropo
ds.
- Don Gilbert, 2016 Feb.
Highly conserved ortholog genes (BUSCO vertebrate set)
Expectation ~ 99% found, ~90% align, ~ <1% tiny or big extremes
Geneset
nGroup nFound %Found %Align %Tiny %Big
----------- ------------------------------------------------------kfish.evg
4097
4045 98.7 91.5 0.3 0.5
* high accuracy
kfish.nc
4097
4031 98.4 89.5 1.3 0.7
notfur.em
4097
3996 97.5 87.5 2.9 1.1
notfur.mk
4097
3726 90.9 76.6 5.2 2.2
- low accuracy
pike.nc
4097
4060 99.1 93.2 0.6 0.6
* high accuracy
pike.mk
4097
3114 76.0 56.5 7.8 0.2
- low accuracy
amolly.nc
4097
4050 98.9 92.2 0.8 0.8
* high accuracy
guppy.nc
4097
4048 98.8 91.5 0.8 1.0
* high accuracy
--------------------------------------------------------------------Ortholog groups common to 3+ reference fish
Expectation ~ 90% found, ~85% align, ~ 1-3% tiny or big extremes
Geneset
nGroup nFound %Found %Align %Tiny %Big
----------- ------------------------------------------------------kfish.evg
17904 16345 91.3
86.6 0.8 2.5 * high accuracy
kfish.nc
17904 15701 87.7
82.9 1.8 1.9
notfur.em
17904 15277 85.3
78.8 2.9 2.7
notfur.mk
17904 13706 76.6
67.4 5.8 6.9 - low accuracy
pike.nc
17904 15726 87.8
82.5 1.0 2.4
pike.mk
17904 11676 65.2
51.9 16.6 1.8 - low accuracy
amolly.nc
17904 15888 88.7
85.6 0.9 2.4 * high accuracy
guppy.nc
17904 15797 88.2
84.6 1.2 2.6 * high accuracy
zebrafish
17904 15050 84.1
77.6 2.4 1.3
--------------------------------------------------------------------Gene set method suffix: nc= NCBI Eukaryote annotation,
evg=EvidentialGene, mk= MAKER2, em = EvidenceModeller
Teleostei fish taxonomy tree
Euteleosteomorpha
+ Neoteleostei
+ + + + + + Haplochromini
+ + + + + + + Maylandia zebra
# mayzebr = african cichlid Zebra Mbuna
+ + + + + Cyprinodontiformes
+ + + + + + + Nothobranchius furzeri # notfur = african turquoise killifish
+ + + + + + + Poeciliidae
+ + + + + + + + + + Xiphophorus maculatus # platyfish
+ + + + + + + + + + Poecilia formosa
# amolly = amazon molly
+ + + + + + + + + + Poecilia reticulata # guppy
+ + + + + + + Fundulidae
+ + + + + + + + Fundulus heteroclitus # kfish = atlantic killifish
+ Protacanthopterygii
+ + Esox lucius
# northern pike
Otomorpha
+ + Cypriniphysae
+ + + Danio rerio
# Zebrafish
---------------------------------------------------------------------

Fish comparison gene sets:


kfish.evg Fundulus heteroclitus, Evigene 2014, eugenes.org/EvidentialGene/ki
llifish/Genes/
kfish.nc
Fundulus heteroclitus, NCBI 2015, ftp.ncbi.nih.gov/genomes/all/GCF
_000826765.1_Fundulus_heteroclitus-3.0.2
notfur.em Nothobranchius furzeri, EvidenceModeller, doi:10.1016/j.cell.2015.
10.071
notfur.mk Nothobranchius furzeri, MAKER2, doi:10.1016/j.cell.2015.11.008
pike.nc
Esox lucius, NCBI, ftp.ncbi.nih.gov/genomes/all/GCF_000721915.2_A
SM72191v2
pike.mk
Esox lucius, MAKER2, doi:10.1371/journal.pone.0102089
amolly.nc Poecilia formosa, NCBI, ftp.ncbi.nih.gov/genomes/all/GCF_000485575
.1_Poecilia_formosa-5.1.2
guppy.nc
Poecilia reticulata, NCBI, ftp.ncbi.nih.gov/genomes/all/GCF_00063
3615.1_Guppy_female_1.0_MT
Orthology reference set of 10 fish + human:
23042 Astyanax_mexicanus, 46251 catfish, 23194 Maylandia_zebra, 19686 medaka,
20366 platyfish,
18341 spotted gar, 20787 stickleback, 19602 tetraodon, 21437 tilapia, 26247 ze
brafish, 39357 human
Refs:
pmk: Rondeau EB, Minkley DR, Leong JS, Messmer AM, Jantzen JR, et al. (2014) do
i:10.1371/journal.pone.0102089
nfmk: Valenzano et al., 2015, Cell 163, 1539-1554 (2015), doi:10.1016/j.cell.20
15.11.008
nfem: Reichwald et al., 2015, Cell 163, 1527-1538 (2015), doi:10.1016/j.cell.20
15.10.071
--------------------------------------------------------------------TABLE GF. Funhe coding sequence loci found in related fish, of 34991 loci, usin
g blastn.
Related fish are amolly (Poecilia for., close rel.), guppy (Poecilia ret.), notf
ur
(Nothobranchius furzeri, mid-close, notfur.em set), and zfish (Danio rerio,
distant relative). Ortholog loci share 1:1 clustering among protein
gene models of 2+ fish species. Non-ortholog loci do not cluster with
other fish gene models, but include those with protein
homology and those without. Inparalogs (n=3723) are excluded from both
subsets; they share homology with ortholog subset and align similarly to
orthologs, but determining uniqueness is complex.
Suppl. Table GFxxx lists each Funhe2EKm gene ID and alignments in other species.
kfishloc2fishmapngene_orde.tab table ;
include genome locs/species tables at eugenes.org/EvigentialGene/killifish/Ge
nes/inotherfish/
chralign/{chrazmolly,chrguppy,chrturkfish,chrzfish}-kfish2rae5h.allcds.tal
l3ba
Query
Source
Bits Ident Align Qlen Qspan
Sspan
Funhe2EKm000003t1 amollNW_006799965.1 558 422 495 645
19-663/19-318,
469-663
702938-703222,698865-699059:Funhe2EKm000003t1 guppy14LG10
693 510 592 645
19-663/19-320,
337-431,469-663 25443154-25443440,14716643-14716737,25439041-25439235:Funhe2EKm000003t1 turkf_sgr01
648 542 647 645
19-663/19-318,
325-473,474-663 24830259-24830555,24829332-24829480,24825247-24825432:Funhe2EKm000003t1 zfishNW_003335281.1 215 161 183 645
325-520/325-44

1,456-520

25666-25782,21854-21918:-

Alignment to genome assemblies:


blastn -task blastn -evalue 1e-5 -db otherfish-chrasm -query killifish2.mrna
blast output is parsed/reduced to table of gene coding exon locations on chrasm
with evigene script makeblastscore3.pl
Alignment of gene coding sequences:
blastn -task dc-megablast -template_type coding -template_length 18 -evalue 1e9 -ungapped
-db killifish2-cds -query otherfish.cds
---------------------------------------------GF.A Ortholog loci (n=21099) found on 4 related fish species.
92% have functions characterized from homology, 2% are one-exon genes
---------------------------------------------N Fish Genome Assembly Gene Models
1+/4
20973,99%
19979,94%
2+/4
20896,99%
19337,91%
3+/4
20462,96%
17534,83%
4/4
18129,85%
na
0
126,0%
1120,5%
GF.B Non-ortholog loci (n=10169) found on 4 related fish species.
40% have functions characterized from homology, 25% are one-exon genes
---------------------------------------------N Fish Genome Assembly Gene Models
1+/4
7897,77%
1899,18%
2+/4
7182,70%
871,8%
3+/4
5881,57%
352,3%
4/4
3670,36%
na
0
2272,22%
8270,81%
GF.C All loci (n=34991) found on 4 related fish species.
76% have functions characterized from homology.
---------------------------------------------N Fish Genome Assembly Gene Models
1+/4
32454,92%
24300,69%
2+/4
31569,90%
21991,62%
3+/4
29508,84%
18979,54%
4/4
23939,68%
na
0
2537,7%
10691,30%
---------------------------------------------GF.D Funhe NCBI vs Evigene genes found in 1+/4 other fish species.
ncbi-same 22110,99%
of 22237 equivalent NCBI+Evigene
ncbi-none 10358,81%
of 12768 only Evigene
--------------------------------------------------------

Potrebbero piacerti anche