Sei sulla pagina 1di 6

Breast Cancer and Biomedical Informatics: The PrognoChip Project

G. Potamias1,2*, A. Analyti1, D. Kafetzopoulos3, D. Plexousakis1,2, P. Poirazi3, M. Reczko1, I.G. Tollis1,2, M. E. Sanidas4, E. Stathopoulos5, Tsiknakis1, S. Vassilaros6 Institute of Computer Science Foundation for Research & Technology Hellas (FORTH) Heraklion, 711 10 Greece Phone: +30-2810-391693, Fax: +30-2810-311601, E-mail: potamias@ics.forth.gr
Abstract - Breast cancer is the most common malignancy affecting women, the life time risk being approximately 10%. Breast cancer is both genetically and histopathologically heterogeneous, and the underling development mechanisms remain largely unknown. Global expression analysis using microarrays offers unprecedented opportunities to obtain molecular signatures of the state of activity of diseased cells and patient samples. The predictive power of this approach is much greater than that of currently used approaches, but remains to be validated in prospective clinical studies. The PrognoChip project is based on the synergy between Bioinformatics and Medical Informatics, following the lines of the new raising discipline of Biomedical Informatics. In this context we are moving towards the specification and creation of an Integrated Clinico-Genomics Information Technology Environment (ICG-ITE) where, the smooth integration between the clinical and the genomics worlds as well as the intelligent processing of the underlying data, enables the identification of reliable and clinically valid (i.e., in terms of prognosis) molecular (gene) markers. Keywords: Breast cancer, Biomedical informatics, semantic integration, data-mining

pathologically heterogeneous, and the mechanisms underlying breast cancer development remain largely unknown. Breast cancer patients diagnosed with the same stage of disease often have remarkably different responses to therapy and overall outcome. Even with the strongest prognostic indicators such as lymph node status, estrogen receptor expression and histological grade, it is not possible to accurately classify breast tumors according to their clinical behavior. Genomic background and variations in the transcriptional programs account for much of the observed diversity. The Prognochip project aims at the identification and validation of signature gene expression profiles of breast tumors correlating with other epidemiological or clinical parameters. Towards these goals scientists from distant scientific disciplines join forces and efforts: Molecular Biology (Institute of Molecular Biology & Biotechnology, FORTH; http://www.imbb.forth.gr), Medicine (University Hospital, University of Crete Surgical Oncology; and Prolipsis a diagnostic centre in Athens), Biostatistics and Computer Science (Institute of Computer Science, FORTH; http://www.ics.forth.gr). We expect that the synergy between Medicine, Molecular Biology, and Biomedical Informatics, will provide us with unique means and experience to evaluate gene expression signatures that will outperform the currently used parameters in therapy prediction and clinical prognosis of breast cancer. II. POST-GENOMICS, MICROARRAYS AND BREAST CANCER Since the discovery of the first oncogene about 25 years ago, a large body of research has convincingly demonstrated that the initiation and progression of cancers involve the accumulation of genetic aberrations in the cell. Recently, through studying blood samples of families in which there is a history of breast cancer,

I. INTRODUCTION The completion of the human genome and the development of post-genomic applications have introduced new holistic approaches and challenges in the analysis of diseases that will, in the years to come, revolutionize biomedical research and health care. A characteristic of medicine in the post-genomic era will be the consultation of both the comprehensive genotypic information of the patient and the detailed molecular classification of the disease in order to specify, with precision and high efficiency, an individualized treatment. Breast cancer is one of the most common malignancies affecting women, the life time risk being approximately 10%. Breast cancer is both genetically and histo_____________________________

* Corresponding author 1 Institute of Computer Science (ICS), FORTH, 2Dept. of Computer Science, University of Crete, 3Institute of Molecular Biology and Biotechnology (IMBB), FORTH, 4Dept. of Surgical Oncology, Medical School, University of Crete, Heraklion, Crete, Greece, 5 Dept. of Pathology, Medical School, University of Crete, Crete, Greece, 6Prolipsis Diagnostic Breast Center, Athens, Greece.

scientists have isolated and identified a gene linked to breast cancer. A person who has this modified gene, labelled BRCA1, has an 85% lifetime risk of developing breast cancer, as well as a significantly higher risk of ovarian cancer. By being able to identify these genes through particular markers associated with the gene, doctors will know which individuals are more susceptible to cancer and therefore can follow the proper procedure. The recent isolation of the gene BRCA1 has prompted investigators to identify other genes that may contribute to breast cancer; ovarian cancer and the breast-ovarian cancer syndrome. Research and technological development incriminated a number of other breastcancer related genes. These genes and their role in starting or growing breast cancer are listed in Table I (refer to http://www.breasted.org/genetics.html for a detailed description and references). Molecular diagnostics is a rapidly advancing field in which insights into disease mechanisms are being elucidated by use of new gene-based biomarkers. Until recently, diagnostic and prognostic assessment of diseased tissues and tumours relied heavily on indirect indicators that permitted only general classifications into broad histological or morphological subtypes and did not take into account the alterations in individual gene expression. In this context, global gene expression analysis using microarrays now offers unprecedented opportunities to obtain molecular signatures of the state of activity of diseased cells and patient samples. This groundbreaking approach of studying cancer promises to provide a better understanding of the underlying mechanism for tumourigenesis, more accurate diagnosis, more comprehensive prognosis, and more effective therapeutic interventions [KHA, 01] Within the past years, two major advances have taken place. First, microarray-based expression profiling has shown promise with the preliminary demonstration that clustering techniques can predict clinical outcome in lymphoma [ALI, 00], paediatric leukaemia [YEO, 02], and breast cancer [SOR, 01], [VEE, 02]. Relative results for breast cancer have demonstrated the ability of microarray-based expression profiling to detect tumour cells in peripheral blood samples, to predict chemotherapy responses in fine-needle aspiration samples in neoadjuvant chemotherapy, and, most importantly, to predict disease-free survival and overall survival from profiles in breast cancer surgical specimens [BER, 00], [HED, 01]. Second, in breast cancer genetics, genes like CHEK2 and HERC2/neu receptor tyrosine kinase were identified as low-penetrance breast cancer susceptibility genes and are targets of specific drugs [LAB, 01]. These studies demonstrate the transition of basic biologic research to clinical application.

TABLE I BREAST CANCER GENES AND THEIR ROLE Gene BRCA1, BRCA2 BP1 HER2, erb-B, Erb-B2, neu P65 ATM ZNF21 PDGF Bcl-1 RB EK2 Role Tumor suppressor stimulates cell growth stimulates cell growth stimulates cell growth controls cell division increases the longevity of cells stimulates the growth of blood vessels regulates the cell cycle regulates the cell cycle involved in repair of damaged DNA

Furthermore, analysis of primary tumours and derived metastases showed very similar expression profiles indicating that the molecular program of a primary tumour is generally retained in its metastases [SCH, 03]. Given the clinical heterogeneity of breast cancer, microarrays are an ideal tool to establish a more accurate classification [PIN, 03]. The predictive power of this approach is much greater than that of currently used approaches, but remains to be validated in prospective clinical studies. If confirmed in that setting, the expression profiling classifier would result at minimum in about a four-fold drop of patients receiving adjuvant therapy unnecessarily. Recent breast cancer studies have demonstrated the ability of microarray-based expression profiling to detect tumor cells in peripheral blood samples, to predict chemotherapy responses in fineneedle aspiration samples in neoadjuvant chemotherapy, and, most importantly, to predict disease-free survival and overall survival from profiles in breast cancer surgical specimens. The predictive power of this approach is much greater than that of currently used approaches, but remains to be validated in prospective clinical studies. III. INDIVIDUALIZED MEDICINE AND BIOMEDICAL INFORMATICS It becomes evident that in order to fully grasp the mechanisms of a disease we do not only need an understanding of the genetic base of the disease- dealing with large amounts of data and related functional genomics approaches (such as gene-expression profiling) but we also need to integrate the knowledge normally processed in the clinical setting. The use of genetic and proteomic data in addition to clinical symptoms for medical decision-making will contribute to the expected, continued shift towards evidence-based medicine. This vision can only be realized with an enormous investment into: (i) technology able to produce the genomic and proteomic data and the initial comparison of produced results with reference databases; (ii) creation of standardized databases that combine clinical history, symptoms and signs, laboratory

and procedural results, and genetic and proteomic data in raw as well as intelligently processed formats; (iii) technology that assures confidential access to these data by those who need access, and full-proof security against unauthorized access; (v) extraction of knowledge out of these huge databases, their expert interpretation and matching against existing computational models; (vi) development of novel explanatory and predictive models for the above, abstraction of the results to the clinical level, and incorporation of the extracted knowledge into algorithms and standardized clinical guidelines where feasible; and finally (vii) implementation of the new guidelines into the clinical decision-making process. In this setting a new discipline namely, Biomedical Informatics (BMI), is raising. BMI aims to offer the appropriate technology in order to support the emerging individualized medicine environment, and allow optimized, individualized healthcare using all relevant sources of information. Collaborative efforts between Medical Informatics (MI) and Bioinformatics (BI) could provide new insights and create a synergy for challenges needed to create novel genomic applications in medicine (refer to http://bioinfomed.isciii.es for a whitepaper on the field, and to http://www. infobiomed.net for a relevant EU funded NoE project). BI enables us to understand the fundamental knowledge about biological processes. The inclusion of clinical information in biomedical informatics opens the gateway to genetic risk profiling of patients, new paradigms in disease diagnoses and prognoses and novel approaches to drug discovery based on the correlation of genetic and molecular knowledge of diseases with clinical information of the patients. In this setting the respective biomedical informatics R&D agenda is forwarded towards the design, development and deployment of an integrated clinico-genomics operational framework where, functional genomics and disease compacting research are coupled and guided by related medical knowledge. IV. THE PROGNOCHIP PROJECT PrognoChip is a (running) project that joins forces and efforts from different scientific disciplines: Molecular Biology (Institute of Molecular Biology & Biotechnology, FORTH), Medicine (Dept of Surgical Oncology, University of Crete, and PROLIPSIS, diagnostic breast cancer center), and Computer Science (Institute of Computer Science, FORTH). The major tasks (already scheduled and initiated) within Prognochip are briefly presented in the sequel. Medicine/ Tissue collection & Histopathology. (a) surgical specimens are collected from breast cancer patients that undergo any type of surgical type of treatment; as soon as the specimen is removed from the

patient it is carried immediately (in less than 20 minutes) to the histopathology department in order to avoid ex vivo ischemia phenomena; (b) a tissue procurement protocol is designed for tissue collection and storage; sections are taken from the growing edge of the tumour, stored at 800C dry freezer for further reference, placed in RNAlater reagent for further RNA extraction, and covered with optimal cutting temperature compound (OCT) intended for immunohistochemistry - a TissueBank system was designed and developed (already in use) for proper tissue filing and management; (c) a set of immunohistology and FISH methods for growth factors and their receptors, especially HER-2 (up-regulated in 30% of breast carcinomas), are accessed for the characterization of breast carcinomas; all patients with malignant disease are staged according to the new TNM system. In the context of PrognoChip the plans is to obtain full-genome expression profiles from approximately 200 individual breast carcinomas. Ethical Issues: Patients are informed and consent to the molecular and genetic data analysis of their tissue and blood samples. They also consent to the use of the data for scientific purposes provided that their anonymity is secured. For this purpose, special security and authorization mechanisms are provided and made operational in the context of the deployed clinical information systems (see below). Molecular Biology/ Microarrays: A DNA microarray of long oligonucleotide probes has been designed, representing all known human genes, approximately 35,000 different transcripts of 27,000 different genes. Additional positive and negative control oligos have been included for the quality control of the procedure and the normalization of data. Oligonucleotide probes are spotted on a coated activated glass slide, at a density of approximately 2250 elements/cm3. A common reference material has been decided for the study, consisting from a defined set of cell-line extracts, ensuring accurate quantitation of gene expression for the most of the genes. An RNA extraction, amplification and fluorescent labeling protocol has been developed, allowing the analysis of small samples. After hybridization, fluorescence intensity images are acquired, using confocal laser scanner, as 16-bit TIFF files. From these images, fluorescence intensities are obtained using dedicated image analysis software. Special plug-ins are developed for data pre-processing (filtering, normalization) and analysis. V. TOWARDS AN INTEGRATED CLINICOGENOMICS ENVIRONMENT In the context of the Prognochip project we have forwarded, scheduled, and initiated efforts towards the delivery of an Integrated Clinico-Genomics Information Technology Environment (ICG-ITE) with the combined genetic- and individualized-medicine being the target.

VI. KNOWLEDGE DISCOVERY AND SYNERGISTIC CLINICO-GENOMICS DECISION-MAKING The vision of PrognoChip is to realize and operationalize integrated clinico-genomics knowledge-discovery and decision-making scenarios, in the lines of the tasks and procedures outlined below. A. From Phenotypes to Genotypes Applying advanced data-mining operations (e.g., discriminatory analysis for gene-selection) on the acquired gene-expression matrix we are able to identify potential discriminatory genes, i.e., genes that distinguishe between identified phenotypes (e.g., phenotypes A and B; see Figure 2). These genes compose and indicate the molecular signature or gene markers of the specific patients phenotypes. In other words, we are able to link potential phenotypical profiles to respective molecular or genotypical ones. Such advancement may be utilised in the course of both prognostic and therapeutic decision-making processes. That is, respective patients, whose gene-expression profiles match the discovered molecular signature, could be detected to belong to one of the identified phenotypes. Then, according to established guidelines and treatment protocols, prognostic indicators may be assessed with patients admitted to (potentially) available treatment protocols. B. From Genotypes to Phenotypes The above scenario could be initiated the other way around. That is, applying again data-mining operations (e.g., unsupervised learning such as clustering) we are able to identify clusters of samples based on their geneexpression profiles. These clusters may represent potential interesting genotypes. Assume that two such genotypical profiles are discovered and identified, X, and Y (based on the exact parameterization of the clustering process more clusters may be identified; see Figure 2). Having on our disposal recorded phenotypical information and data about the samples (i.e., response, positive reaction or resistance to specific chemotherapeutic agents and/or clinico-histopathological state of tumour) we may assign each, yet untreated, sample to one of the two classes, X or Y. Then, we may initiate a supervised data mining process (e.g., classification) in order to discover respective predictive models. Each of these models represents a potential phenotype. In this mode of the scenario we may achieve a re-classification of breast cancer, i.e., a hierarchical organization of different disease-related phenotypes - a major task in cancer research. In this context, patients with different phenotypical profiles are (potentially) subject to follow different chemo- and/or radiotherapeutic protocols. So, a more individualised healthcare plan may be accessed.

Fig. 1. Architectural layout and building blocks of the Integrated Clinico-Genomics Information technology Environment

The envisioned building blocks of ICG-ITE include (see Figure 1): a set of clinical information systems to keep patients clinical information (i.e., clinical, laboratory and histo-pathology information systems) based on Electronic Health Care Record (EHCR) standard datamodels [TSI, 02], [COA, 99], [HL7, 02], a genomic information system (GIS) to store and manage the specifications of the respective microarray experiments (i.e., chip design, hybridizations, etc.), analyze measured biossays, as well as to store samples genomic information. GIS is based on the BASE system (http://base.thep. lu.se) where, the underlying standard genomic data model ([MIA, 04]) and functionality was extended to meet the project requirements, and a middleware layer for information/ data integration and intelligent processing - realized by a puzzle of integrated software components that enable: (i) the seamless and efficient extraction of data from the various data and information sources (clinical and genomic); (ii) uniform information modeling- enabled by the utilization of standard clinical/ genomic data models and respective ontologies [XML, 04], [KAR, 03], (iii) uniform information representation - enabled by the utilization and the appropriate customization of RDF/XML technology; and (iv) intelligent data processing and visualization - enabled by a suite of data-mining components and tools [TIB, 99], [AWE, 99], [PER, 00], [POT, 04], [SYM, 04]. The demanding clinical and genomic data integration environment post the need to elaborate on the concept of Integrated Electronic Health Care Record (IEHCR) architectures [TSI, 02], utilize the respective technological advances, and extend the standard clinical data models to include and amalgamate genomic ones. In this context, the provided security and authorisation infrastructure is fully employed.

original patients samples will be also available and recorded in the respective information systems. PrognoChip is a very demanding project, in terms of both human and infrastructure resources. So, resources from other, directly related, on-going projects (in which organization in PrognoChip participate) are also utilised. In this context, we want to acknowledge INFOBIOMED (a network of excellence project; funded by the EU IST program; http://www.infobiomed.net) where, results from a nationally-funded project (as PrognoChip) will be utilised and exploited in the context of a transEuropean one. REFERENCES
[ALI, 00] Alizadeh et al, Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling, Nature, 403, pp. 503511, 2000. [BER, 00] F. Bertucci et al, Gene expression profiling of primary breast carcinomas using arrays of candidate genes, Hum Mol Genet, 9, pp. 29812991, 2000. [CEL, 03] Celis, J. Proteomics and Functional Genomics in Translational Cancer Research: towards an integrated approach. Presentation in Cancer: Molecular Targets for novel Therapies. 3rd Simposio Scientifico, Pabelln San Carlos, Hospital Clinico, Madrid, April 2003. [COA, 99] COAS, Clinical Observations Access Service (COAS), Final Submission, OMG Document: corbamed/99-0325, 1999. [HED, 01] I. Hedenfalk et al, Gene-expression profiles in hereditary breast cancer, N Engl J Med, 344, pp. 539548, 2001. [HL7, 02] HL7 Health Level 7: Reference Information Model (RIM), http://www.hl7.org/library/data-model/RIM/C30118/ rim.htm. [KAR, 03] G. Karvounarakis, A. Magkanaraki, S. Alexaki, V. Christophides, D. Plexousakis, M. Scholl, and K. Tolle. Querying the Semantic Web with RQL. Computer Networks and ISDN Systems Journal, 42(5), pp. 617640, 2003. [KHA, 01] J. Khan et al, Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nat Med, 7, pp. 673679, 2001. [LAB, 01] E. Landesman-Bollag et al, Protein kinase CK2 in mammary gland tumourigenesis, Oncogene, 20, pp. 32473257, 2001. [MIA, 04] MIAME Web site. http://www.mged.org/ Workgroups/MIAME/miame.html, accessed Dec. 2004. [PER, 00] C.M. Perou et al, Molecular portraits of human breast tumours, Nature, 406, 747752, 2000. [PIN, 03] R. Pinedo, Cancer Clinical Trials in the next decade. Presentation in Cancer: Molecular Targets for novel Therapies. 3rd Simposio Scientifico, Pabelln San Carlos, Hospital Clinico, Madrid, April 2003. [POT, 04] G. Potamias, L. Koumakis, and V. Moustakis, Gene Selection via Discretized Gene-Expression Profiles and Greedy Feature-Elimination, LECT NOTES ARTIF INT (LNAI), 3025, pp. 256266, 2004.

Fig. 2. Synergistic clinico-genomics decision-making and knowledge-discovery support.

VII. CONCLUSION & FUTURE WORK Much of the genomic data of clinical relevance generated so far are in a format that is inappropriate for diagnostic testing. Very large epidemiological population samples followed prospectively (over a period of years) and characterized for their biomarker and genetic variation will be necessary to demonstrate the clinical utility of these tools. Obstacles to the routine application of these data in clinical practice include a cultural gap between the approaches to clinical practice that is currently employed and that which is possible with these new tools. This will require a change of mind of clinical oncologists. In the next 10 years clinical protocols will require a translational section based on the type of targeted treatment under study [CEL, 03]. In this paper weve presented PrognoChip, a multidisciplinary project that meets the aforementioned challenges and targets the raising need for individualised medicine (in terms of both prognosis and treatment). In the context of the project an Integrated Clinico-Genomics Environment was designed. The building-blocks of this environment are identified and specified. Various enabling components of the environment are already developed and deployed (the clinical and genomic information systems). Furthermore, experimentation and evaluation of known and (developed) innovative datamining techniques is in progress.On-going R&D work (as related to information technology) is now forwarded to the development of the integration infrastructure, i.e., to the operationalisation of the middleweare layer of the ICG-ITE. The plan is to have a first (prototype) implementation of the whole system by June 2005. By that, the clinical and genomic profiles of a number of

[SCH, 03] U. Schmidt et al, Cancer diagnosis and microarrays, Int J Biochem Cell Biol, 35(2), pp. 119124, 2003. [SOR, 01] T. Sorlie et al, Gene expression patterns of breast carcinomas distinguish tumour subclasses with clinical implications, Proc Natl Acad Sci, Sep 11, 98(19), pp. 10869 10874, 2001. [SYM, 04] A. Symeonidis and I.G. Tollis, Visualization of Biological Information with Circular Drawings, LNCS, 3337, pp. 468478, 2004. [TIB, 99] R. Tibshirani, R., Hastie, T., Eisen, M., Ross, D., Botstein, and Brown, P., Clustering methods for the analysis of DNA microarray data, Technical Report, Department of Statistics, Stanford University, 1999. [TSI, 02] M. Tsiknakis, D.G. Katehakis, and S.C. Orphanoudakis, An Open, Component-based Information

Infrastructure for Integrated Health Information Networks, International Journal of Medical Informatics, 68(1-3), pp. 3 26, 2002. [VEE, 02] E. van der Veer et al, Gene expression profiling predicts clinical outcome of breast cancer, Nature, 415(6871), pp. 530536, 2002. [XML, 04] XML Semantics. http://www.w3.org/ DesignIssues/Toolbox.html, accessed Dec. 2004. [YEO, 02] E.J. Yeoh, et al, Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling, Cancer Cell, 1(2), pp. 13343, 2002. [ZWE, 99] G. Zweiger, Knowledge discovery in geneexpression-microarray data: mining the information output of the genome. Trends Biotechnol., 17(11), pp. 429436, 1999.

Potrebbero piacerti anche