Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Amir R Razavi
Department of Biomedical Engineering, Division of Medical Informatics
Linköpings universitet, Linköping, Sweden
• Introduction
• Knowledge Discovery in Databases
• Clinical Guidelines
• Discussion
• Future works
• Medical Registers
– To monitor trends in the incidence of
conditions and diseases
– Monitoring outcomes after the implementation
of disease-prevention and treatment programs
– Assessing the safety of new drugs and
procedures, identify best clinical practice and
compare healthcare systems
• Playground
– Breast cancer register
– Knowledge Discovery in Databases
– Clinical Guidelines
• Aim
– Decision Support to clinicians in oncology
• KDD steps:
– Understanding the domain
– Creating the main dataset for the KDD
– Data pre-processing
– Data mining
– Interpretation of the result or found patterns
– Evaluation
• Data pre-processing:
– Tasks in data preprocessing
• Cleaning
• Data integration
• Handling missing values
• Transformation
• Data reduction
• Data pre-processing :
– It describes any type of processing performed
on raw data to prepare it for another processing
procedure.
– Why? Real world data are generally
• Incomplete
• Noisy
• Inconsistent
• Cleaning
– Outliers
– Multies
– Noise
• Data integration
• Data reduction:
– Obtain a reduced representation of the dataset
that is much smaller in volume but yet produce
the same or almost the same analytical results.
• Why to do it?
– The dataset may be gigantic in volume
– Processing time
• Dimension reduction
– Removes unimportant attributes: Canonical
Correlation Analysis (CCA)
• Data Compression
• Reducing the number of instances
• Discretization and concept hierarchy
generation
• An example:
Exploring Cancer Register Data to find Risk
Factors for Recurrence of Breast Cancer-
Application of Canonical Correlation
Analysis
Razavi AR, Gill H, Stål O, Sundquist M, Thorstenson S, Åhlfeldt H,
Shahsavar N, the South-East Swedish Breast Cancer Study Group
• Data Mining
– “…the process of discovering meaningful new
correlations, patterns, and trends by sifting through
large amounts of data…” (Gartner Group)
– “…the analysis of observational data sets to find
unsuspected relationships and to summarize data in
novel ways…” (Hand et al.)
– “…is an interdisciplinary field bringing together
techniques from machine learning, pattern recognition,
statistics, databases, and visualization…” (Cabana et
al.)
– …
• An example:
• DTI
– Pros
• Reasonable training time
• Fast application
• Easy to interpret
• Easy to implement
• Can handle large number of features
– Cons
• Cannot handle complicated relationship between
features
• Validating methods:
– Examining an independent dataset.
– Cross validation:
• Divides the whole data by random sampling into n
folds (partitions) and perform n times testing.
– At each testing, one partition of data is used as the testing
set and the rest is training set.
• Leave-one-out cross-validation
–…
• An Example:
Predicting Metastasis in Breast Cancer:
Comparing a Decision Tree with
Domain Experts
Amir R. Razavi, Hans Gill, Hans Åhlfeldt, and Nosrat Shahsavar
• 3699 patients
• A decision tree was trained with all patients except
for 100 cases and tested with those 100 cases.
• Two domain experts were asked to give their
opinion about the probability of recurrence of a
certain outcome for these 100 patients.
• ROC curves and area under the ROC curves
(AUC) for predictions were computed and
compared.
80
60
Sensitivity
DTI_J48
Oncologist_1
Oncologist_2
40
20
0
0 20 40 60 80 100
100-Specificity
• An example:
A Data Mining Approach to Analyze Non-
compliance with a Guideline for the
Treatment of Breast Cancer
Razavi AR, Gill H, Åhlfeldt H, Shahsavar N