Sei sulla pagina 1di 1

Uncovering genes associated with human infections in pathogenic bacteria:

a novel integrative analysis linking genome, bibliome, and diseasome


Frank Lin
Centre for Health Informatics, The University of New South Wales, Sydney, Australia

Motivations
Motivations
E. coli Klebsiella
The
The identification
identification ofof virulence
virulence genes
genes in
in bacteria
bacteria isis an an important
important topic
topic inin spp.

the
themanagement
managementand andstudy
studyof ofinfectious
infectiousdiseases
diseases(ID).
(ID).We Wehavehavedeveloped
developed
aa computational
computational pipeline
pipeline that
that assimilates
assimilates the
the genomic
genomic (whole (whole genome
genome
sequences),
sequences), bibliomic
bibliomic (MEDLINE)
(MEDLINE) and and diseasomic
diseasomic (SNOMED-CT) (SNOMED-CT)
databases
databases to to assist
assist with
with this
this discovery
discovery task.
task. Previously,
Previously, we we have
have reported
reported
that
that the
the co-occurrence
co-occurrence of of disease
disease and
and pathogen
pathogen names
names in in MEDLINE
MEDLINE can can
Pseudomonas
spp.
Proteus
spp.
be
beused
usedto toinfer
infercausal
causalassociations
associations[1].
[1].We
Wehave
havealso
alsoobserved
observedthat thatprotein
protein Pathogens Genomes Clinical manifestations or syndromes
phylogenetic
phylogeneticprofiles
profiles(pp)
(pp)maymaybe beused
usedtotopredict
predictpathway memberships[2]
pathwaymemberships [2]
mechanisms [3]
and Figure 1. The “shared” virulence gene hypothesis. The relationships between genes shared by multiple
and toto suggest
suggest potential
potential virulence
virulence mechanisms [3].. ItIt isis perceived
perceived that
that the
the pathogens and clinical syndromes can be exemplified by this illustration. The genes encoding a specific
tool
tool described
described inin this
this work
work cancan be
be useful
useful inin suggesting
suggesting markers markers forfor ID
ID bacterial virulence mechanism may be evolutionarily conserved and hence are present in different species
biosurveillance
biosurveillanceor orin
inassisting
assistingwith
withthe
thedevelopment
developmentof ofeffective
effectivevaccines.
vaccines. or genus (the “shared” virulence genes) [3]. These genes are candidates to be discovered by this method,
based on multi-genomic analysis of different bacterial species with comparative genomics.

Bibliome Diseasome
MEDLINE SNOMED
CT

Genome 128070006
62479008
"Abdominal infection"
"Acquired immune deficiency syndrome", ...
194922003 "Acute bacterial endocarditis", ...
Staphylococcus aureus Chlamydia trachomatis 193857008 "Acute conjunctivitis", Conjunctivitis, ...
Escherichia coli Campylobacter jejuni 54839009 "Acute poliomyelitis", Polio, ...
Streptococcus agalactiae Ureaplasma urealyticum 155264006 "Acute rheumatic fever", "Rheumatic fever"
Klebsiella pneumoniae Neisseria meningitidis 27321001 "Acute sore throat", "Acute pharyngitis"
13906002 Anaplasmosis, "Tick fever", Gallsickness
Streptococcus pneumoniae Bacteroides fragilis 33937009 "Lyme arthritis"
Enterococcus faecalis 186103005 "Bacillary dysentery", "Shigella dysenteriae"
Staphylococcus epidermidis
Acinetobacter baumannii 128350005 "Bacterial conjunctivitis"
Listeria monocytogenes Campylobacter fetus 274080003 "Bacterial gastroenteritis"
NCBI Pseudomonas aeruginosa Salmonella typhi 95883001 "Bacterial meningitis"
197171003 "Bacterial peritonitis"
whole Haemophilus influenzae Streptococcus pyogenes 10001005 "Bacterial septicaemia", "Bacterial sepsis", ...
Enterobacter cloacae Neisseria gonorrhoeae
genome Mycoplasma hominis
198221007 "Bacterial vaginosis"
Clostridium difficile Determining the pathogen-disease matrix. The co- 53084003 "Bacterial pneumonia"
sequences ... ... 55604004 "Bird flu", "Avian influenza", "Avian flu"
occurrences of the disease names with pathogen names 56304002
86500004
"Bovine virus diarrhoea", ...
Campylobacteriosis
List of pathogen names. The names of in MEDLINE abstracts were used to infer the potential ...

the entire list of bacterial species known causal relationship. For example, the concept “urinary
tract infection” co-occurs frequently with the pathogen Extracting clinical concepts related to infectious
to be associated with infectious diseases disease from ontology database. The list of infectious
in humans were short-listed from the name “Escherichia coli”, which is the principle causal
pathogen of UTI in humans and other mammals. The disease names were extracted from the Systematized
NCBI reference genome database. Two Nomenclature of Medicine - Clinical Terms (SNOMED-
hundred and fifty four bacterial species concept “pneumonia” is expected to occur in higher
frequencies in articles with mentions of its primary CT) database by traversing through the hierarchy of
were used in the subsequent analysis. concepts beneath the concept of “Infectious disease
causal pathogens (e.g., Streptococcus pneumoniae)
(disorder)” (SNOMED ID #40,733,004). A total of 327
infectious disease-related concepts were extracted.
Staphylococcus aureus JH9
Staphylococcus aureus JH1
Staphylococcus aureus Newman
...
Escherichia coli S88
Escherichia coli O103 H2 12009 Genes
Escherichia coli O157 H7 TW14359
...
Streptococcus agalactiae A909
Streptococcus agalactiae NEM316
Streptococcus agalactiae 2603
Klebsiella pneumoniae 342
Determining the phylogenetic profiles
Klebsiella pneumoniae NTUH K2044 Bacterial genomes used in this of each gene in the 525 genomes. The
Klebsiella pneumoniae MGH 78578 analysis. 1,131 genomes with presence or absence of homologous genes
Streptococcus pneumoniae R6
Streptococcus pneumoniae JJA available whole genome in a set of reference genomes
Streptococcus pneumoniae Hungary19A 6 sequences (WGS) were retrieved (phylogenetic profiles) was determined A Manhattan plot showing the association of all genes
Staphylococcus epidermidis RP62A
Staphylococcus epidermidis ATCC 12228 from the NCBI FTP site. 525 by performing all-against-all BLASTP in the S. pneumoniae D39 genome with literatures with
Listeria monocytogenes 4b F2365 WGS that also belong to the list with the E-value threshold at 10-10. The frequent mentions of ID concept “pneumonia”. For
Pseudomonas aeruginosa LESB58
of pathogens (see above) were procedure was described in detail in Refs.
Pseudomonas aeruginosa PA7
used in the subsequent analyses.
each gene, the relative association of a gene to a infectious
Haemophilus influenzae PittEE [2] and [3].
Haemophilus influenzae PittGG disease concept (e.g., “pneumonia”) can be calculated by
Enterobacter cloacae ATCC 13047 uid48363
Mycoplasma hominis comparing the observed distributions of PP with the
Chlamydia trachomatis A HAR-13
Chlamydia trachomatis B TZ1A828 OT
corresponding null distributions. In this example, the
... hyaluronidase gene (hysA, red oval) was among the
highest ranked genes associated with frequent mentions of
Reference genomes “pneumonia” in the literature.

Figure 2. The computational workflow for discovering genes associated with human infections in pathogenic bacteria.

A. B. C.
Summary
Summary
The
Thepreliminary
preliminaryresults
resultsfrom
fromthis
thisstudy
studyare
areencouraging
encouragingbutbut more
moreanalyses
analyses
are
arerequired
requiredtotoprove
proveits
itsgeneralisability.
generalisability.We
Weanticipate
anticipatethat
thatthis
thisintegrative
integrative
approach
approachisisapplicable
applicabletotoother
otherpathogens,
pathogens,andandisisalso
alsoable
ableto
to extend
extendto
to the
the
translation
translationof
ofmulti-omic
multi-omicknowledge
knowledgeinto
intoother
otherbiomedical
biomedicaldisciplines.
disciplines.
D. Figure 3. A case study on the discovery of genes associated with
Score SNOMED ID Description urinary tract infection (UTI) in the Escherichia coli UTI89
50.857 371770009 Endotoxaemia, Endotoxemia genome: 327 SNOMED-CT ID concepts were used to mine gene-ID References:
47.036 71057007 E. coli infection, Infection due to E. coli associations against all genes in 525 bacterial genomes. The discovery 1. Sintchenko V, Anthony S, Phan X-H, Lin F, and Coiera EW. PLoS ONE. 2010; 5(3): e9535.
39.170 371769008 Endotoxic shock
process was demonstrated in the discovery of genes associated with 2. Lin F, Coiera E, Lan R, Sintchenko V. BMC Bioinformatics 2009; 10:86.
32.367 186091002 Enteric fever, Typhoid fever
25.127 186094005 Salmonella food poisoning/gastroenteritis
SNOMED concept “urinary tract infection” (#68,566,005). In the 3. Lin F, Lan R, Sintchenko V, Kong F, Gilbert GL, Coiera E. PLoS ONE (in press)
UTI89 genome, the fim gene cluster (boxed area in A and B) was
20.272
18.908
186103005
197925009
Bacillary dysentery, Shigella dysenteriae
Asymptomatic bacteriuria found to be highly associated with frequent mentions of “UTI” in the 4. Gross L. PLoS Biol 4(9): e314. doi:10.1371/journal.pbio.0040314
18.862 154410004 Cestode/Trematode infestation, Helminthiases literature (fimG gene, p=4.1×10-20). Other known virulence gene 5. Connell I, Agace W, Klemm P, Schembri M, Mărild S, Svanborg C. PNAS 1996; 93(18): 9827-32.
17.171 154374002 Malaria clusters were also recovered, including the sfa (sfaH, p=2.5×10-18)
16.665 61462000 Paludism, Malaria, Plasmodiosis
16.382 74286002 TEM, Transmissible mink encephalopathy
and pap clusters (papC, p=9.8×10-7). The gene product of fim cluster Acknowledgments
16.361 11840006 Traveler's diarrhea, Turista is involved in the biosynthesis of type 1 fimbriae (panel C), an The author thank Drs. Vitali Sintchenko, Stephen Anthony, Xuan-Hieu Phan, and Fanrong Kong, Profs. Ruiting Lan, Gwendolyn Gilbert
15.769 68566005 Urinary tract infection important determinant of uropathogenesis [4,5]. The fimF gene and Enrico Coiera for collaborating with the earlier works [1-3]. The author is supported by a postdoctoral fellowship of Australian National
12.481 276674008 Neonatal meningitis (p=1.6×10-16), encoding type 1 fimbriae minor subunit precursor Health and Medical Research Council (NH&MRC) program grant #568,612. The body silhouette image in Figure 1 was a public domain image
12.454 111852003 Vaccinia (GenBank Accession YP_543949), was also found to be associated retrieved from Wikimedia Commons (http://commons.wikimedia.org/wiki/File:Female_shadow_-_upper.png).
12.183 155862004 Acute pyelonephritis or pyonephrosis with pathogens with mentions of other infectious disease concepts,
12.111 76571007 Septicemic shock, Septic shock
including endotoxic shock, gastroenteritis, and bacterial meningitis
12.045 36188001 Flexner's dysentery, Shigellosis
(panel D). This finding is illustrative of potential virulence role of
Contact: Dr Frank Lin
10.915 10087007 Schistosomiasis, Bilharzia, ...
fimbriae in gram-negative pathogens in the infective pathogenesis in Centre for Health Informatics E-mail: f.lin@unsw.edu.au
9.376 11836002 Primary/spontaneous bacterial peritonitis
8.977 24043009 Malignant catarrhal fever, Snotsiekte human. The transmission electron micrograph in panel C was adapted University of New South Wales Website: http://www.chi.unsw.edu.au/
8.638 111909004 Amoebic infection, Amebiasis from Ref [4] under the Creative Commons license. SYDNEY NSW 2052, Australia Phone: +61 (2) 9385 3437

Potrebbero piacerti anche