Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
ABSTRACT
Missing data is part of any clinical study been performed and its always a priority for any clinical
team on how to deal with it. There are numerous approaches been adopted in clinical data
analysis to impute the missing data using SAS software. This paper is presented from the
programmers point of view on the list of things which have to be taken care while imputing
missing data points using last observation carried forward (LOCF) approach. To carry forward the
previous records to impute the missing data point, a clear approach, a list of rules/conditions and
a detailed algorithm design has to be stated. this paper outlines couple of real case scenarios and
approaches on imputing missing values for repeated measured studies, will caution on few of the
pitfalls, how to avoid them and validate to make sure that the algorithm is working as desired.
This paper also talks a little about recommendations on how to deal with other scenarios and
adopt the same approaches discussed in this paper. This paper targets mid level programmers
working on windows/Unix platforms.
KEYWORDS
LOCF, Last observation carried forward, Windowing
INTRODUCTION
Missing data in clinical studies is pervasive. It is always important to determine how to handle this
during the study design phase before any analytical approach is adopted to report these data
points for any statistical inference and decision making. Based on the complexity of study design,
its data collection schedule and set of rules to be followed for imputing missing data, LOCF can
be a daunting task to be performed which can be more time consuming and which requires to
design a more robust algorithm with more careful thought process to follow for imputing missing
data points while creation of analysis dataset for key variables which will then feed into statistical
procedures for analysis, reporting and Inference.
Before we do any reporting on data points, we have to first thoroughly undergo data cleaning,
data manipulation and transformation with the set of rules and conditions to be applied using a
strategic approach as defined in the analysis plan while creating a reporting dataset. We will limit
our scope of this paper to discuss on visit windowing and LOCF algorithms which facilities to be a
key tasks in order to create the analysis dataset in order to suffice the clinical study reporting and
ad hoc analysis of the data.
Laboratory tests have to be performed at each visit as per the study design and lab result has to
be collected on a particular day. Analysis of lab data has to be performed for each scheduled visit
in the study. Each scheduled visit is required to occur on a specific study day for each patient and
there is no room provided for these visits. Despite of following GCP guidelines by sites, many
patients missed to visit the site on the day they should have visited and so collection of data falls
on unscheduled days. Even though such collected results are not for scheduled visits in the
study, these results are important from analysis point of view and they should not be excluded
1
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences
from analysis. At the same time, data is missing for important time points in the study (Scheduled
Visits) for analysis purpose. By carrying forward the non-missing results from the previous
available visit of the patient, analysis can be performed.
RAW DATA:
2
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences
WINDOWING:
Following rule is applied in this case study to define the visit window.
In above table, Screening, Baseline, Week1, Week2, Week3, Week4, Week5 are scheduled visits
and any data falling in between these visits are unplanned visits.
In data, it is possible that few subjects have more than one record in one visit and have no record
for following visit. But when windowing is applied based on the collection day it is possible that
extra records may fall in next scheduled visit or unplanned visit. If some records are falling in
unplanned visits and following scheduled visit is not there in data at all or it is there with missing
value, then such unplanned visits lab value has to be carried forward to impute the missing data.
This LOCF algorithm has to be applied within each patient.
ALGORITHM:
Calculate collection day for each patients lab record from Day 1(Baseline) by using
following formula:
Based on this DIFFDAY and from Windowing Table, apply the visit window to each
record.
After Applying Windowing the data will get following visit numbers according to visit schedule.
3
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences
For each subject add a record for each Scheduled and unplanned visit which is missing
from the data.
If there are multiple records coming up within a window for a scheduled visit for a patient,
apply appropriate rule to get one record for each patient for each scheduled Visit (In the
current example, we will take last non-missing record in the window).
If a scheduled visit value is missing, look for last non missing value from previous visits of
the patient. If this value is available, carry forward it to the next scheduled visit. Apply the
same algorithm for each scheduled visit within each subject.
4
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences
1. Always check for and apply LOCF logic on records with non missing values.
Programmatically exclude the records with missing values and then start carrying forward
the values.
2. Always check for more than one record for each patient within a window. An appropriate
rule should be implemented in the program to handle this situation. Based on different
study design and analysis, multiple records within a window can be handled by following:
3. In the final analysis dataset, check for total number of records. Each patient must have a
record for each scheduled visit in the study and it should be only one record for each
such visit. If a subject is missing a record for any scheduled visit in raw data, a dummy
record must be entered before starting LOCF algorithm.
4. Check for any values which are coming from one patient and carried forward for second
patient. If any subject is missing results for Screening in raw data, it should remain
missing in the final analysis dataset also as Screening is the first visit for any subject and
it can not be imputed by LOCF logic. This check will tell if any value is carried forward
from previous patient or not.
During this longitudinal study various study related safety and efficacy parameters have been
collected at screening (Week -2), baseline (Week 0), follow-up visits (Week 1, 2, 4, 8, 12 and 16)
and unplanned visits as per the study design. For the scope of our discussion we had picked up
the Systolic Blood pressure variable for analysis purpose. At each phase of study duration data
assessments were missed due to screening failures, withdrawal from study by patients, forgot to
show up for follow-up study visits and as a result, there were patients who had partial information
5
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences
of valuable data for analysis and reporting purpose. To make full use of data collected, imputation
of data points was incorporated based on the LOCF for the list of key analysis variables. Table 1
represents the outline of the data collected from the patients during the scheduled study visits.
Below are the set of rules followed before any imputation takes place for missing data points
which will follow in the subsequent sections.
Before we apply the LOCF algorithm to the data points, we first incorporate the windowing of
derived study visits by mean of calculating the difference in days between collection date and first
active dosing days
Table 2: Windowing Schema and then based on
the days with respect
to the predefined
Actual Visit Lower Bound Upper Bound Target Week Derived range we assign the
Type of Days of Days as Per windowing
derived target visit to
Week -2 <1 Not Applicable
that particular record
Week 0 1 0 as mentioned in
Week 1 2 11 1 Table 2.
Week 2 12 18 2
Week 4 19 36 4
Week 8 37 64 8
Week 12 65 92 12
Week 16 93 120 16
6
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences
Table 3. Transformed Data after applying widowing rules and LOCF algorithm
7
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences
Missing data imputation rules based on Derived Visit Week and Parameter Value during
LOCF
o Baseline Visit Week has LOCF = Value
o If Baseline Visit Week value is missing then LOCF = LOCF of immediate non-
missing screening value have collection day < 1
o If Visit Window is missing, then LOCF = missing
o If non-Baseline Visit Week is not missing, then LOCF = Value
o If non-Baseline Visit Week is missing, then LOCF = LOCF of immediately
preceding Visit Week Value.
Refer to table 3 below for pictorial representation of above rules to the data points.
PROGRAMMING PLAN
This section will help outline the steps to be considered while windowing of data to derive the visit
weeks and then apply the rules setup for LOCF during programming.
Step 1:
Subset out if any records have missing first active dosing date missing.
Derive the collection days for assignment of each visit week as per windowing and subset out the
records to subsequent datasets for screening and baseline assessment, post baseline
assessments and out of study window assessments.
Step 2:
Remove any duplicate records by sorting by all variables in the dataset before doing any
manipulation for deriving the baseline assessment record as per visit windowing.
Make sure you sort your records logically by Patient ID, Derived Visit Week and Scheduled
Collection Date. This is a very important validation step to check to have your LOCF performed
correctly for baseline assessment. Logically manipulate your records to check which one will be
picked up for baseline assessment. If there are more than one assessment at baseline window
then select the most latest non-missing assessment from records with collection days <= 1. If the
baseline assessment value at collection day = 1 is missing then select the most latest non-
missing pre-baseline assessment form record/records having collection day < 1.
Step 3:
Remove any duplicate records by sorting by _all_ from doing any manipulation for deriving the
post baseline assessment records as per visit windowing. Make sure you sort your records
logically by Patient ID, Derived Visit Week and Scheduled Collection Date. This is a very
important validation step to check to have your LOCF performed correctly for post baseline
assessments. Logically manipulate your records to check which record will be picked up for each
derived post baseline assessments. If there are more than one assessment at post baseline then
select the most latest non-missing assessment for records with collection days <= 1. If the post
baseline assessment value for respective derived visit week has more than 1 assessment then
select the most latest non-missing pre-baseline assessment from record/records.
Step 4:
For all the distinct patients in our analysis dataset, create a dummy visit week for each of the
defined weeks in the visit window schema as defined in the table 2.
Step 5:
Merge the dummy derived visit weeks dataset with the actual post baseline assessment dataset
by patient ID and the Derived visit week. The resulting dataset will have extra records for each
subject if they did not have the derived planned visits based on the windowing schema.
Step 6:
The resulting dataset from step 5 after been sorted by Patient ID and Derived Visit Week, LOCF
8
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences
logic is implemented can carry forward any previous non-missing parameter value from previous
visit week to subsequent visit parameter value if it was missing.
Step 7:
Sequentially arrange the related variables to be displayed in the reporting dataset which is been
sorted by Patient ID and Derived Visit week for our current study under consideration.
1. Check to make sure the baseline value should never to be carried into the post baseline
line LOCF assessment values at any given time as per the study rule.
2. Check to make sure if the baseline value assessment at collection day 1 is missing and
also the pre-baseline assessments for collection days < 1, in such an instance the LOCF
value at baseline will be missing. The same rule of validation will apply for the post
baseline assessments too which is further elaborate with set of condition in bullet point 3
and 4 below.
3. Check to make sure if the first post baseline visit week parameter value (i.e. Visit Week =
2) is missing, then the LOCF value is also set to missing.
4. Check to make sure if post baseline visit week parameter value (i.e. Visit Week = 2) is
missing and its subsequent post baseline visit weeks parameter value is missing the
LOCF value is also missing for these records till it encounters a non-missing parameter
value in the subsequent follow-up visits to carry forward the value for subsequent derived
visit weeks.
CONCLUSION
As the title of this paper says, LOCF is not just carrying forward and from programmers point of
view it is not just use of Lag function or Retain statement to be used on data values. Windowing is
one of the important aspects while applying LOCF logic across many study designs. Also
handling of multiple records within a window is one major point that has to be taken care and
programmer has to pay close attention to the rules of handling such cases specific to the study
design. There can not be a generalized algorithm to handle different cases where LOCF is
required and based on the different requirements, logic has to change, since its completely
9
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences
based on the set of rules and conditions set during the development of Statistical Analysis Plan
which is been defined under Handling and Imputation of Missing Data Section.
REFERENCES
Mark Keese: Handling Different Rules for the Imputation of Missing Data Using an LOCF
Approach; Proceedings of NESUG 2005
Venky Chakravarthy: The DOW (Not that DOW!!!) and the LOCF in Clinical Trials;
Proceedings of SUGI 2003
ACKNOWLEDGMENTS
Authors would like to acknowledge eClinical Solutions, division at Eliassen Group for providing
the opportunity to work on this paper.
CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the authors at:
Vikash Jain
Email: vjain@egistar.com or jainvikash77@yahoo.com
Niraj J. Pandya
Email: npandya@egistar.com or niraj_mech@yahoo.com
SAS and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration.
Other brand and product names are trademarks of their respective companies.
10