Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
1
Department of Biomedical Informatics, Stanford University School of Medicine
2
Department of Biology, Stanford University
3
Department of Human Biology, Stanford University
June 6, 2017
1 Abstract
Whole tissue biopsies are often an invasive means of tumor subtyping, requiring patients to undergo risky
surgical procedures and increasing the cost of patient care. Non-invasive methods, such as needle and liquid
biopsies, are optimal alternatives for diagnosing human glioma subtype/grade. In this project, an addi-
tive, regularized, gradient-boosted ensemble tree (XGBoost) model trained on microarray-derived mRNA
expression levels obtained from whole tissue tumor biopsies was used to identify important biomarker genes
for classifying brain tumors stage and cancer subtype. Though our XGBoost classification models were
inviable (due to their high misclassification error when distinguishing between all grades and all subtypes),
we report the successful identification of a pair of diagnostic biomarkers (TNFRSF1B and YBX3 ) to dis-
tinguish between glioblastoma and oligodendroglioma tumors, as well as a pair of diagnostic biomarkers
(TNFRSF1B and VMP1 ) to distinguish between grade II and grade IV gliomas. Our analysis identifies
important biomarkers that may aid in non-invasive diagnosis of various glioma subtypes and grades, and
provides novel insights about the genetic identification and development of gliomas.
2 Introduction/Background
Brain neoplasia represents the second most common cancer in children [1]. Early detection and treatment of
brain neoplasia can lead to better outcomes and survival rates for affected patients, and an accurate diagnosis
serves an important role in all stages of the cancer journey. Currently, invasive and risky procedures are used
for whole tissue biopsies to help with tumor subtyping. Diagnostic tools can enable doctors to determine
the best treatment approach, monitor the progress of treatment, and modify current therapies when needed.
Specifically, non-invasive methods such as needle biopsies constitute less risky and less invasive alternatives
than traditional whole-tissue biopsies, when used to diagnose glioma subtype and grade.
Here, we trained regularized, gradient-boosted, multinomial, ensemble tree classification (XGBoost) mod-
els on a glioma mRNA expression dataset, and used these models to identify genetic biomarkers for diagnos-
ing glioma subtype and grade. Our findings have interesting implications about how brain tumors develop,
how certain genes are regulated concurrently with the presence of disease, and also for the development of
non-invasive screening methods and novel diagnostic criteria.
1
1. Generating an XGBoost model to classify tumor grade as a function of mRNA expression levels at
various genes.
2. Generating an XGBoost model to classify tumor subtype as a function of mRNA expression levels at
various genes.
1. A null hypothesis (of no difference in mean expression of TNFRSF1B for glioblastoma and oligo-
dendroglioma) was tested against the alternative that glioblastoma have lower mean expression of
TNFRSF1B than oligodendroglioma. With p-value < 2.2e-16 << 0.05, there is evidence to suggest
that glioblastoma have significantly less mean expression of TNFRSF1B than oligodendroglioma.
2. A null hypothesis (of no difference in mean expression of YBX3 for glioblastoma and oligodendroglioma)
was tested against the alternative that glioblastoma have greater mean expression of YBX3 than
2
oligodendroglioma. With p-value = 2.271e-12 << 0.05, there is evidence to suggest that glioblastoma
have a significantly greater mean expression of YBX3 than oligodendroglioma.
3. A null hypothesis (of no difference in mean expression of TNFRSF1B for grade II and grade IV tumors)
was tested against the alternative that grade II tumors have greater mean expression of TNFRSF1B
than grade IV tumors. With p-value < 2.2e-16 << 0.05, there is evidence to suggest that grade II
tumors have significantly greater mean expression of TNFRSF1B than grade IV tumors.
4. A null hypothesis (of no difference in mean expression of VMP1 for grade II and grade IV tumors) was
tested against the alternative that grade II tumors have lower mean expression of VMP1 than grade
IV tumors. With p-value = 6.399e-13 << 0.05, there is evidence to suggest that grade II tumors have
significantly less mean expression of VMP1 than grade IV tumors.
References
[1] Ilie, Marius, and Paul Hofman. Pros: Can Tissue Biopsy Be Replaced by Liquid Biopsy? Translational
Lung Cancer Research. AME Publishing Company, Aug. 2016. Web. 06 June 2017.
3
[2] Sun, Lixin, Ai-Min Hui, Qin Su, Alexander Vortmeyer, Yuri Kotliarov, Sandra Pastorino, Antonino Pas-
saniti, Jayant Menon, Jennifer Walling, Rolando Bailey, Marc Rosenblum, Tom Mikkelsen, and Howard
A. Fine. Neuronal and Glioma-derived Stem Cell Factor Induces Angiogenesis within the Brain. Cancer
Cell 9.4 (2006): 287-300.
[3] Hastie, Trevor, Robert Tibshirani, and Jerome H. Friedman. The Elements of Statistical Learning Data
Mining, Inference, and Prediction. New York, NY: Springer, 2016. Print.
[4] Tnfrsf1b TNF Receptor Superfamily Member 1B [Rattus Norvegicus (Norway Rat)] - Gene - NCBI.
National Center for Biotechnology Information. U.S. National Library of Medicine, n.d. Web. 06 June
2017.
[5] YBX3 Y-box Binding Protein 3 [Homo Sapiens (human)] - Gene - NCBI. National Center for Biotech-
nology Information. U.S. National Library of Medicine, n.d. Web. 06 June 2017.
[6] Guo, X. Z., X. L. Ye, W. Z. Xiao, X. N. Wei, Q. H. You, X. H. Che, Y. J. Cai, F. Chen, H. Yuan, X.
J. Liu, and M. H. Yu. Downregulation of VMP1 Confers Aggressive Properties to Colorectal Cancer.
Oncology Reports. U.S. National Library of Medicine, Nov. 2015. Web. 06 June 2017.
4
5 Data Appendix
Figure 1: Variable importance plot for top predictors used in subtype classification model. 203608 at corre-
sponds to TNFRSF1B, and 201161 s at corresponds to YBX3.
5
Figure 2: Variable importance plot for top predictors used in grade classification model. 203608 at corre-
sponds to TNFRSF1B, and 1569003 at corresponds to VMP1.
6
Figure 3: Distribution of TNFRSF1B and YBX3 mRNA expression levels in astrocytoma, glioblastoma, and
oligodendroglioma tumor types.
7
Figure 4: Distribution of TNFRSF1B and VMP1 mRNA expression levels in grade II, III, and IV glioma.