Sei sulla pagina 1di 28
CDISC Italian User Group 2007 Analysis Data Model (ADaM) Annamaria Muraro Helsinn Healthcare
CDISC Italian User Group 2007
Analysis Data
Model (ADaM)
Annamaria Muraro
Helsinn Healthcare
Data flow in clinical studies • Raw Datasets (=SDTM) – Data from a clinical trial
Data flow in clinical studies
• Raw Datasets (=SDTM)
– Data from a clinical trial
– Source: CRF
• Analysis Datasets
(=ADaM data)
– Datasets used in the
analysis, restructured and
contain additional
information (derived
variables, flags, etc.)
– Source: raw datasets
• Two sets of data
• Each with a specific purpose
2
FDA requirements 3
FDA requirements
3
Analysis Data Model: General Considerations Document http://www.cdisc.org/models/adam/V2.0/index.html • Analysis Data
Analysis Data Model: General Considerations
Document http://www.cdisc.org/models/adam/V2.0/index.html
• Analysis Data Model Version 2.0 (November 2006)
– key principles for analysis datasets
– conventions for standard analysis variables
– provides a model for subject-level analysis dataset
– Metadata for Analysis Datasets
• Analysis dataset metadata
• Analysis variable metadata
• Analysis results metadata
– Analysis datasets are discussed within the context of electronic
submissions to the FDA but “the same principles and standards
will apply, regardless of the purpose of the analysis datasets”
4
Key Principles for Analysis Datasets Creation Analysis datasets should: • facilitate a clear and unambiguous
Key Principles for Analysis Datasets Creation
Analysis datasets should:
• facilitate a clear and unambiguous communication of the
content, source and quality of the datasets supporting the
statistical analyses
• be useable by currently available tools (SAS XPT)
• be “Analysis-ready” or “One Statistical Procedure Away”
• redundancy may be acceptable
• well documented: metadata and other documentation
should provide clear description of the analytic results,
including statistical method, transformations, assumptions,
derivations and imputations performed
• include the optimum number of datasets
• garantee traceability
5
SDTM and ADaM • SDTM • ADaM – Source data – Derived data – Vertical
SDTM and ADaM
SDTM
ADaM
– Source data
– Derived data
– Vertical
– Structure may not necessarily by
vertical
– No redundancy
– Redundancy is needed for easy
analysis
– Character variables
– Numeric variables
– Each domain is specific to
itself
– Combines variables across
multiple domains
– Dates are ISO8601 character
strings
– Dates are formatted as numeric
(e.g. SAS dates) to allow
manipulation
– Two chars for dataset name
– Dataset Name: ADXXXX
– Data transfer
– Analytic & graphical analysis
– Interoperability
– Clear communication of statistical
analysis and related decision
BBOOTTHH AARREE NNEEEEDDEEDD FFOORR FFDDAA RREVEVIIEWEW !!
6
Analysis Dataset Variables • Analysis dataset variables should be compliant with SDTM standards – Maintain
Analysis Dataset Variables
• Analysis dataset variables should be compliant with
SDTM standards
– Maintain SDTM variable attributes (if the identical variable also
exists in an SDTM dataset)
– Follow naming conventions for datasets and variables consistent
with the SDTM conventions, where feasible
• Analysis variables to be included
– Identifiers
– Analysis Population Indicators
– Analysis Date Variables
– Analysis Study Day Variables
– Visit time Variables
– Numeric Code Variables
– Analysis Treatment Variables
7
Analysis Dataset Variables Analysis Population • Analysis datasets should include analysis population flag at
Analysis Dataset Variables
Analysis Population
• Analysis datasets should include analysis
population flag at whatever level (eg. subject,
visit or measurement) is necessary to clearly
describe the population set used for any
analysis
• Variables used to identify specific population
– FULLSET, SAFETY, PPROT
• Population flags may be required at Visit level
– FULLV, SAFV
• Population flags may be present in the SDTM
(supplemental domain)
8
Analysis Dataset Variables Numeric Code Variables • When a numeric version of a categorical variables
Analysis Dataset Variables
Numeric Code Variables
• When a numeric version of a categorical variables is
required for statistical purposes: append an ‘N’ to the
SDTM variable name
9
Analysis Dataset Variables Analysis Treatment Variables Variables • Treatment variables are required to be present
Analysis Dataset Variables
Analysis Treatment Variables Variables
• Treatment variables are required to be present in all analysis datasets
– Planned Treatment (TRTP char, TRTPN numeric)
– Actual Treatment (TRTA char, TRTAN numeric)
• If an analysis is performed on the actual treament instead of the
planned treatment, actual treatment variables are required in addition
to the planned treatment variables
10
Subject-Level Dataset (ADSL) • One record per subject • All the variables for describing the
Subject-Level Dataset (ADSL)
• One record per subject
• All the variables for describing the analysis population
• Demographic data (age, sex, race, other relevant factors)
• Baseline characteristics
• Disease factors
• Treatment code/group
• Factors that could affect response to therapy
• Other relevant variables (smoking, alcohol intake,
)
• Population flags
• Data included in the subject-level analysis dataset can
be used as source for data used in other analysis
datasets (derive variables only once!)
11
ADSL, Example SAMPLE DATASET FOR ADSL Obs Studyid USUBJID SAFETY ITT PPROT COMPLT DSREAS AGE
ADSL, Example
SAMPLE DATASET FOR ADSL
Obs
Studyid
USUBJID
SAFETY
ITT
PPROT
COMPLT
DSREAS
AGE
AGEGRP
1 XX0001
0001-1
Y
Y
Y
Y
30
21-35
2 XX0001
0001-2
Y
Y
N
N
ADVERSE EVENT
38
36-50
SAMPLE DATASET FOR ADSL (continued)
Obs
AGEGRPN
SEX
RACE
RACEN
TRTP
TRTPN
HEIGHTBL
WEIGHTBL
BMIBL
1
2
F
WHITE
1
DRUG A
1
170
63.5
21.97
2
3
M
ASIAN
4
PLACEBO
0
183
86.2
25.74
Dataset named
“ADxxxxxx”
SDTM variable
with no changes
ADaM Treatment
Variable
12
Vital Signs Analysis Dataset: horizontal structure Variable Variable Label Type Controlled Terms Source Name or
Vital Signs Analysis Dataset: horizontal structure
Variable
Variable Label
Type
Controlled Terms
Source
Name
or Format
STUDYID
Study Identifier
Char
$15.
VS.STUDYID
USUBJID
Unique Subject Identifier
Char
$30.
VS.USUBJID
SUBJID
Subject Identifier for the Study
Char
$5.
ADSL.SUBJID
SITEID
Study Site Identifier
Char
$5.
ADSL.SITEID
VS SDTM is
the source
VSBLFL
Baseline Flag
Char
Y or Null
VS.VSBLFL (where VS.VSTESTCD in ('DIABP' 'SYSBP' 'HR'))
VISITNUM
Visit Number
Num
3.
VS.VISITNUM
VISIT
Visit Name
Char
$100.
VS.VISIT
WGT_BASE
Body Weight Baseline Measurement
Num
5.1
VS.VSSTRESN (where VSTESTCD = 'WEIGHT' and
WGT_VAL
Body Weight Visit Measurement
Num
5.1
VS.VSSTRESN (where VSTESTCD = 'WEIGHT' )
WGT_CHG
Body Weight Change from Baseline
Num
5.1
ADVS.WGT_VAL - ADVS.WGT_BASE
HR_BASE
Heart Rate (beats/minute) Baseline
Num
3.
VS.VSSTRESN (where VSTESTCD = 'HR' and VS.VSBSFL='Y')
HR_VAL
Heart Rate (beats/minute) Visit
Num
3.
VS.VSSTRESN (where VSTESTCD = 'HR' )
HR_CHG
Heart Rate (beats/minute) Change
Num
3.
ADVS.HR_VAL - ADVS.HR_BASE
SBP_BASE
Systolic Blood Pressure (mmHg) Baseline
Num
3.
VS.VSSTRESN (where VSTESTCD = 'SYSBP' and
SBP_VAL
Systolic Blood Pressure (mmHg) Visit
Num
3.
VS.VSSTRESN (where VSTESTCD = 'SYSBP' )
SBP_CHG
Systolic Blood Pressure (mmHg) Change
Num
3.
ADVS.SBP_VAL - ADVS.SBP_BASE
Demographic
AGE
Age in AGEU at Reference Date/Time
Num
3.
ADSL.AGE
AGEU
Age Units
Char
years
ADSL.AGEU
variables
SEX
Sex
Char
F,M,U
ADSL.SEX
SEXN
Sex Numeric
Num
1=Male, 2=Female
ADSL.SEXN
ADSL is the
RACE
Race
Char
White, Black, Hispanic,
Asian, Other
ADSL.RACE
source
RACEN
Race Numeric
Num
1=White, 2=Black
ADSL.RACEN
Treatment
3=Hispanic, 4=Asian
9=Other
variables
TRTP
Planned Treatment Group
Char
ADSL.TRTP
TRTPN
Planned Treatment Group Numeric Code
Num
ADSL.TRTPN
Analysis
TRTA
Actual Treatment Group
Char
ADSL.TRTA
TRTAN
Actual Treatment Group Numeric Code
Num
ADSL.TRTAN
Population
SAFETY
Safety Set
Char
Y, N
ADSL.SAFETY
FULLSET
Full Analysis Set
Char
Y, N
ADSL.FULLSET
PPROT
Per-Protocol Set
Char
Y, N
ADSL.PPROT
13
Adverse Events Analysis Dataset Add flags for Treatment Emergent AE Variable Variable Label Type Name
Adverse Events Analysis Dataset
Add flags for
Treatment
Emergent AE
Variable
Variable Label
Type
Name
Keep variables
from AE SDTM
Variable
Variable Label
Type
STUDYID
Study Identifier
Char
Name
USUBJID
Unique Subject Identifier
Char
AEPRE
Pre-Treatment Adverse Event
Char
SUBJID
Subject Identifier for the Study
Char
AETRTEM
Treatment Emergent Adverse Event
Char
SITEID
Study Site Identifier
Char
AEPOST
Post-Treatment Adverse Event
Char
AESEQ
Sequence Number
Num
HEIGHTBL
Baseline Height (cm)
Num
AETERM
Reported Term for the Adverse Event
Char
WEIGHTBL
Baseline Body Weight (kg)
Num
AEDECOD
Dictionary-Derived Term
Char
Add numeric
AEBODSYS
Body System or Organ Class
Char
AGE
Age in AGEU at Reference Date/Time
Num
variables
AESEV
Severity/Intensity
Char
AGEU
Age Units
Char
Add demographic
AESEVN
Severity/Intensity Numeric
Num
SEX
Sex
Char
variables from ADSL
SEXN
Sex Numeric
Num
AESER
Serious Event
Char
RACE
Race
Char
AEACN
Action Taken with Study Treatment
Char
RACEN
Race Numeric
Num
AEREL
Causality
Char
RACEOTH
Specify Other Race
Add treatment
Char
AERELN
Causality Numeric
Num
variables from
ADSL
AEOUT
Outcome of Adverse Event
Char
TRTP
Planned Treatment Group
Char
AEOUTN
Outcome of Adverse Event Numeric
Num
TRTPN
Planned Treatment Group Numeric Code
Num
AESTDT
Start Date of Adverse Event Numeric
Num
TRTA
Actual Treatment Group
Char
AESTDY
Study Day of Onset of Event
Num
TRTAN
Actual Treatment Group Numeric Code
Num
Add derived
variables
SAFETY
Safety Set
Char
Add population
AERELAT
Event Related to Study Drug
Char
flag from ADSL
AEDUR
Duration of Adverse Event (days)
Num
14
Analysis Dataset Documentation • Provide the link between the general description of the analysis (as
Analysis Dataset Documentation
• Provide the link between the general description of the
analysis (as found on the study protocol, SAP) and the
source data
• The source of the analysis dataset should be clearly
documented, allowing the reviewer to trace back data
items to their source
• Documentation includes:
– Analysis dataset metadata
– Analysis variable metadata
– Analysis results metadata
– Other (SAS programs and/or other written documentation)
15
Analysis Dataset Metadata • Should contain: – Dataset name, Dataset description, Structure, Purpose, Keys,
Analysis Dataset Metadata
• Should contain:
– Dataset name, Dataset description, Structure, Purpose, Keys,
Location, Documentation
Link to detailed
documentation
16
Analysis Variable Metadata ADSL (example from CDISC guideline) / 1 • describes each variable in
Analysis Variable Metadata
ADSL (example from CDISC guideline) / 1
• describes each variable in the analysis dataset
• provides details about where the variable came from in
the source data or how the variable was derived
17
ADSL / 2 18
ADSL / 2
18
ADSL / 3 19
ADSL / 3
19
Analysis Results Metadata • Describes the major attributes of each important analysis results A unique
Analysis Results Metadata
• Describes the major attributes of each important analysis
results
A unique identifier
for the analysis
Reason for performing the
analysis (pre-specified,
exploratory, reg request
Name of the datasets
/ subset used in the
analysis
Analysis name
Description
Reason
Dataset
Documentation
Table 5.1:
Demographic data - full analysis set
Summary of demographic data for full
analysis set
Analysis pre-specified in
SAP
ADSL
select records with FULLSET=Y
SAP Section XX
Table 5.2:
Demographic data - per-protocol set
Summary of demographic data for per-
protocol set
Analysis pre-specified in
SAP
ADSL
select records with PPROT=Y
SAP Section XX
Table 5.3:
Demographic data - safety set
Summary of demographic data for safety set
Analysis pre-specified in
SAP
ADSL
select records with SAFETY=Y
SAP Section XX
Table 5.4:
Demographic data by country - full
analysis set
Summary of demographic data by country for
full analysis set
Analysis pre-specified in
SAP
ADSL
select records with FULLSET=Y
SAP Section XX
Table 5.5:
Demographic data by gender - full
analysis set
Summary of demographic data by gender for
full analysis set
Analysis pre-specified in
SAP
ADSL
select records with FULLSET=Y
SAP Section XX
20
Select a strategy for ADaM implementation http://www.lexjansen.com/pharmasug/2005/fdacompliance/fc03.pdf • Parallel
Select a strategy for ADaM implementation
http://www.lexjansen.com/pharmasug/2005/fdacompliance/fc03.pdf
• Parallel method
SDTM
CDMS
ADaM
• Linear method
SDTM
ADaM
CDMS
• Hybrid method
Draft
ADaM
CDMS
SDTM
SDTM
• Other approaches
21
Implementation issues, Helsinn experience • Key aspects discussed during implementation: – Vertical vs horizontal
Implementation issues, Helsinn experience
• Key aspects discussed during implementation:
– Vertical vs horizontal structure
– Analysis ready and redundancy
– Clear link between SDTM and ADaM (AE à ADAE, VS àADVS etc.): traceability
• Datasets
– Subject level: full complaint with CDISC ADaM
– Defined a generation sequence
– One analysis dataset for each SDTM dataset (ADAE, ADIE, ADMH, ADPE, ADEX, ADCM,
ADLB etc.)
– More than one dataset when needed (example EG, ADEG for par and findings)
– Keep the vertical structure when possible (just add variables)
– Efficacy datasets: study specific, no specifications
– Additional datasets needed for the analysis may be created (example: to store
totals/denominators to be used in the summaries)
• Variables
– Variables in SDTM SUPPQUAL merged back to the original domain (ex. Race, other)
– Common set of variables in each dataset (age, gender, race, stratifications variables,
treatment planned/actual)
– Analysis population flag: added to each dataset
– Numeric variables: added as needed for the analysis (dates, numeric version of categorical
variables)
– Add dataset specific variables (analysis day, TE, change from baseline etc.)
22
Benefits (even if you are not working on a submission) • Minimized programming effort •
Benefits
(even if you are not working on a submission)
• Minimized programming effort
• Reduce risk of programming error
• Less validation effort
• Reuse of programs
• Reduce the time need for analysis datasets
creation (we can spend more time to analysis)
• Integrated Analysis make easier
23
ADaM Work in Progress • Develop Implementation Guide – The ADaM team is working on
ADaM Work in Progress
• Develop Implementation Guide
– The ADaM team is working on an implementation guide that will build on
the considerations discussed in the Analysis Data Model Version 2.0.
– This implementation guide will outline specific standards and
recommendations for the structure and content of analysis data sets
– will contain a library of examples of analysis data sets that would serve to
support specific statistical methodology used within clinical trials, such as
- Change from Baseline
- Time to Event
- Categorical Analysis
- Adverse Events
• Develop Training Course
• Cross-team activities including:
– SDS/ADaM Pilot project
– DEFINE.XML and analysis data
– Trial Design Model for 2-3 frequently used trial designs
– Controlled terminology to be used for analysis data
24
Questions 25
Questions
25
Analysis Dataset Creation documentation – back-up • Descriptions for each dataset: – the source datasets
Analysis Dataset Creation documentation –
back-up
• Descriptions for each dataset:
– the source datasets
– processing steps
– scientific decisions pertaining to creation
• Clearly distinguish:
– derivations & decision rules specified a priori
– decisions that were data-driven
• Key issues:
– derived variables documentation: algorithms
– handling of missing data
– data item specific derivations, i.e change to a data value for a
specific observation
• Analysis dataset creation programs may be used as
documentation
26
Standardized process for analysis datasets creation – back-up • ADSL should be created before other
Standardized process for analysis datasets
creation – back-up
• ADSL should be created before other ADaM datasets
• Derivations should be performed only once (more
efficient and reduces the risk of discrepancies)
• Define the datasets creation order (depending on
existing relationships between ADaM datasets)
• Some SDTM variables may be not needed in the ADaM
• The list of ADaM datasets may be shorter than the
SDTM (no suppqual datasets, efficacy data may be
combined in one dataset)
There is still a lot of freedom in the possible set-up of ADaM structure.
Define a standard approach!
27
Analysis Results Metadata – back-up • Describes the major attributes of each important analysis results
Analysis Results Metadata – back-up
• Describes the major attributes of each important
analysis results
• Links statistical results to
– analysis datasets and programs used to generate the analysis
– metadata describing the analysis
– reason for performing the analysis
• Should contain
– ANALYSIS NAME: A unique identifier for this analysis. May include a table
number or other sponsor-specific reference.
– DESCRIPTION: A text description documenting the analysis performed.
– REASON: The reason for performing this analysis. Examples may include Pre-
specified, Exploratory, and Regulatory Request.
– DATASET: the name of the analysis dataset used for this analysis. The column
may also include specific selection criteria (e.g. where SAFETY=‘Y’)
– DOCUMENTATION: information about how the analysis was performed (text
description, link to another document or the analysis generation program)
28