Sei sulla pagina 1di 10

NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences

LOCF: Its More Than Just Carrying Forward


Vikash Jain, eClinical Solutions, A Division of Eliassen Group

Niraj J. Pandya, eClinical Solutions, A Division of Eliassen Group

ABSTRACT

Missing data is part of any clinical study been performed and its always a priority for any clinical
team on how to deal with it. There are numerous approaches been adopted in clinical data
analysis to impute the missing data using SAS software. This paper is presented from the
programmers point of view on the list of things which have to be taken care while imputing
missing data points using last observation carried forward (LOCF) approach. To carry forward the
previous records to impute the missing data point, a clear approach, a list of rules/conditions and
a detailed algorithm design has to be stated. this paper outlines couple of real case scenarios and
approaches on imputing missing values for repeated measured studies, will caution on few of the
pitfalls, how to avoid them and validate to make sure that the algorithm is working as desired.
This paper also talks a little about recommendations on how to deal with other scenarios and
adopt the same approaches discussed in this paper. This paper targets mid level programmers
working on windows/Unix platforms.

KEYWORDS
LOCF, Last observation carried forward, Windowing

INTRODUCTION

Missing data in clinical studies is pervasive. It is always important to determine how to handle this
during the study design phase before any analytical approach is adopted to report these data
points for any statistical inference and decision making. Based on the complexity of study design,
its data collection schedule and set of rules to be followed for imputing missing data, LOCF can
be a daunting task to be performed which can be more time consuming and which requires to
design a more robust algorithm with more careful thought process to follow for imputing missing
data points while creation of analysis dataset for key variables which will then feed into statistical
procedures for analysis, reporting and Inference.

Before we do any reporting on data points, we have to first thoroughly undergo data cleaning,
data manipulation and transformation with the set of rules and conditions to be applied using a
strategic approach as defined in the analysis plan while creating a reporting dataset. We will limit
our scope of this paper to discuss on visit windowing and LOCF algorithms which facilities to be a
key tasks in order to create the analysis dataset in order to suffice the clinical study reporting and
ad hoc analysis of the data.

CASE STUDY 1: OUTLINE ON STUDY AND DATA COLLECTION

Laboratory tests have to be performed at each visit as per the study design and lab result has to
be collected on a particular day. Analysis of lab data has to be performed for each scheduled visit
in the study. Each scheduled visit is required to occur on a specific study day for each patient and
there is no room provided for these visits. Despite of following GCP guidelines by sites, many
patients missed to visit the site on the day they should have visited and so collection of data falls
on unscheduled days. Even though such collected results are not for scheduled visits in the
study, these results are important from analysis point of view and they should not be excluded

1
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences

from analysis. At the same time, data is missing for important time points in the study (Scheduled
Visits) for analysis purpose. By carrying forward the non-missing results from the previous
available visit of the patient, analysis can be performed.

We are interested in a lab result of each subject at each scheduled visit.

DATA VARIABLES AND VALUES DEFINITION:

Variable Name Variable Description


PT_ID Patient ID
CDATE Collection date
VTYPE Visit Type
LVAL Actual Variable value of lab
FDATE First Active Dosing Date
DIFFDAYS Derived Days difference (CDATE - FDATE +1)
VWIN Derived Visit Windowing Numeric
VWINC Derived Visit Windowing Character
LOCF_VAL Derived value of the lab after Appling conditions based on LOCF rules for case study 2
EOS End of Study is the variable value of VTYPE

RAW DATA:

We will use following raw data in this case study.

PT_ID VTYPE LVAL FDATE CDATE


1001 Screening 1.9 13-Mar-07 5-Mar-07
1001 Screening 1.2 13-Mar-07 10-Mar-07
1001 Baseline 13-Mar-07 13-Mar-07
1001 Week1 2.5 13-Mar-07 16-Mar-07
1001 Week2 1.6 13-Mar-07 26-Mar-07
1001 Week4 1.9 13-Mar-07 14-Apr-07
1001 Week5 2.8 13-Mar-07 16-Apr-07
1002 Screening 1.6 13-Apr-07 5-Apr-07
1002 Baseline 1.7 13-Apr-07 13-Apr-07
1002 Week2 13-Apr-07 26-Apr-07
1002 Week3 1.6 13-Apr-07 1-May-07
1002 Week4 1.9 13-Apr-07 10-May-07
1002 Week5 2.6 13-Apr-07 17-May-07
1003 Screening 1.1 15-Apr-07 7-Apr-07
1003 Baseline 15-Apr-07 15-Apr-07
1003 Baseline 15-Apr-07 17-Apr-07
1003 Week2 2.5 15-Apr-07 28-Apr-07
1003 Week4 2.7 15-Apr-07 14-May-07
1004 Screening 1.2 12-May-07 4-May-07
1004 Week2 2.5 12-May-07 23-May-07
1004 Week5 2.4 12-May-07 15-Jun-07
1005 Baseline 2.2 14-May-07 16-May-07
1005 Week4 14-May-07 10-Jun-07
1005 Week5 1.6 14-May-07 18-Jun-07

2
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences

WINDOWING:

Following rule is applied in this case study to define the visit window.

Days Window (C) Window (N)


-7 Screening 10
-6 to 0 UPL_Scr_Bsl 15
1 Baseline 20
2 to 6 UPL_Bsl_Wk1 25
7 Week1 30
8 to 13 UPL_Wk1_Wk2 35
14 Week2 40
15 to 20 UPL_Wk2_Wk3 45
21 Week3 50
22 to 27 UPL_Wk3_Wk4 55
28 Week4 60
29 to 34 UPL_Wk4_Wk5 65
35 Week5 70
>35 UPL_Aft_Wk5 75

In above table, Screening, Baseline, Week1, Week2, Week3, Week4, Week5 are scheduled visits
and any data falling in between these visits are unplanned visits.

In data, it is possible that few subjects have more than one record in one visit and have no record
for following visit. But when windowing is applied based on the collection day it is possible that
extra records may fall in next scheduled visit or unplanned visit. If some records are falling in
unplanned visits and following scheduled visit is not there in data at all or it is there with missing
value, then such unplanned visits lab value has to be carried forward to impute the missing data.
This LOCF algorithm has to be applied within each patient.

ALGORITHM:

 Calculate collection day for each patients lab record from Day 1(Baseline) by using
following formula:

DIFFDAY = CDATE FDATE + 1

 Based on this DIFFDAY and from Windowing Table, apply the visit window to each
record.

After Applying Windowing the data will get following visit numbers according to visit schedule.

PT_ID VTYPE LVAL FDATE CDATE DIFFDAY VWIN VWIN_C


1001 Screening 1.9 13-Mar-07 5-Mar-07 -7 10 Screening
1001 Screening 1.2 13-Mar-07 10-Mar-07 -2 15 UPL_Scr_Bsl
1001 Baseline 13-Mar-07 13-Mar-07 1 20 Baseline
1001 Week1 2.5 13-Mar-07 16-Mar-07 4 25 UPL_Bsl_Wk1
1001 Week2 1.6 13-Mar-07 26-Mar-07 14 40 Week2
1001 Week4 1.9 13-Mar-07 14-Apr-07 33 65 UPL_Wk4_Wk5
1001 Week5 2.8 13-Mar-07 16-Apr-07 35 70 Week5
1002 Screening 1.6 13-Apr-07 5-Apr-07 -7 10 Screening
1002 Baseline 1.7 13-Apr-07 13-Apr-07 1 20 Baseline
1002 Week2 13-Apr-07 26-Apr-07 14 40 Week2

3
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences

1002 Week3 1.6 13-Apr-07 1-May-07 19 45 UPL_Wk2_Wk3


1002 Week4 1.9 13-Apr-07 10-May-07 28 60 Week4
1002 Week5 2.6 13-Apr-07 17-May-07 35 70 Week5
1003 Screening 1.1 15-Apr-07 7-Apr-07 -7 10 Screening
1003 Baseline 15-Apr-07 15-Apr-07 1 20 Baseline
1003 Baseline 15-Apr-07 17-Apr-07 3 25 UPL_Bsl_Wk1
1003 Week2 2.5 15-Apr-07 28-Apr-07 14 40 Week2
1003 Week4 2.7 15-Apr-07 14-May-07 30 65 UPL_Wk4_Wk5
1004 Screening 1.2 12-May-07 4-May-07 -7 10 Screening
1004 Week2 2.5 12-May-07 23-May-07 12 35 UPL_Wk1_Wk2
1004 Week5 2.4 12-May-07 15-Jun-07 35 70 Week5
1005 Baseline 2.2 14-May-07 16-May-07 3 25 UPL_Bsl_Wk1
1005 Week4 14-May-07 10-Jun-07 28 60 Week4
1005 Week5 1.6 14-May-07 18-Jun-07 36 75 UPL_Aft_Wk5

 For each subject add a record for each Scheduled and unplanned visit which is missing
from the data.

 If there are multiple records coming up within a window for a scheduled visit for a patient,
apply appropriate rule to get one record for each patient for each scheduled Visit (In the
current example, we will take last non-missing record in the window).

 If a scheduled visit value is missing, look for last non missing value from previous visits of
the patient. If this value is available, carry forward it to the next scheduled visit. Apply the
same algorithm for each scheduled visit within each subject.

DATA AFTER APPLYING LOCF:

PT_ID VWIN VWIN_C LVAL LOCF_VAL


1001 10 Screening 1.9 1.9
1001 15 UPL_Scr_Bsl 1.2
1001 20 Baseline 1.2
1001 25 UPL_Bsl_Wk1 2.5
1001 30 Week1 2.5
1001 40 Week2 1.6 1.6
1001 50 Week3 1.6
1001 60 Week4 1.6
1001 65 UPL_Wk4_Wk5 1.9
1001 70 Week5 2.8 2.8
1002 10 Screening 1.6 1.6
1002 20 Baseline 1.7 1.7
1002 30 Week1 1.7
1002 40 Week2 1.7
1002 45 UPL_Wk2_Wk3 1.6
1002 50 Week3 1.6
1002 60 Week4 1.9 1.9
1002 70 Week5 2.6 2.6
1003 10 Screening 1.1 1.1
1003 20 Baseline 1.1
1003 30 Week1 1.1
1003 40 Week2 2.5 2.5
1003 50 Week3 2.5
1003 60 Week4 2.5
1003 65 UPL_Wk4_Wk5 2.7
1003 70 Week5 2.7

4
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences

1004 10 Screening 1.2 1.2


1004 20 Baseline 1.2
1004 30 Week1 1.2
1004 35 UPL_Wk1_Wk2 2.5
1004 40 Week2 2.5
1004 50 Week3 2.5
1004 60 Week4 2.5
1004 70 Week5 2.4 2.4
1005 10 Screening
1005 20 Baseline
1005 25 UPL_Bsl_Wk1 2.2
1005 30 Week1 2.2
1005 40 Week2 2.2
1005 50 Week3 2.2
1005 60 Week4 2.2
1005 70 Week5 2.2
1005 75 UPL_Aft_Wk5 1.6

VALIDATION AND LOOK FOR PITFALLS

1. Always check for and apply LOCF logic on records with non missing values.
Programmatically exclude the records with missing values and then start carrying forward
the values.

2. Always check for more than one record for each patient within a window. An appropriate
rule should be implemented in the program to handle this situation. Based on different
study design and analysis, multiple records within a window can be handled by following:

Take First Record within a window (First. window)


Take Last Record within a window (Last. window)
Take average of all the records within a window
Take the closest record to the target day within a window

3. In the final analysis dataset, check for total number of records. Each patient must have a
record for each scheduled visit in the study and it should be only one record for each
such visit. If a subject is missing a record for any scheduled visit in raw data, a dummy
record must be entered before starting LOCF algorithm.

4. Check for any values which are coming from one patient and carried forward for second
patient. If any subject is missing results for Screening in raw data, it should remain
missing in the final analysis dataset also as Screening is the first visit for any subject and
it can not be imputed by LOCF logic. This check will tell if any value is carried forward
from previous patient or not.

CASE STUDY 2: OUTLINE ON STUDY AND DATA COLLECTION

During this longitudinal study various study related safety and efficacy parameters have been
collected at screening (Week -2), baseline (Week 0), follow-up visits (Week 1, 2, 4, 8, 12 and 16)
and unplanned visits as per the study design. For the scope of our discussion we had picked up
the Systolic Blood pressure variable for analysis purpose. At each phase of study duration data
assessments were missed due to screening failures, withdrawal from study by patients, forgot to
show up for follow-up study visits and as a result, there were patients who had partial information

5
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences

of valuable data for analysis and reporting purpose. To make full use of data collected, imputation
of data points was incorporated based on the LOCF for the list of key analysis variables. Table 1
represents the outline of the data collected from the patients during the scheduled study visits.
Below are the set of rules followed before any imputation takes place for missing data points
which will follow in the subsequent sections.

Table 1: Raw Data Presentation

WINDOWING OF DATA SCHEMA

Before we apply the LOCF algorithm to the data points, we first incorporate the windowing of
derived study visits by mean of calculating the difference in days between collection date and first
active dosing days
Table 2: Windowing Schema and then based on
the days with respect
to the predefined
Actual Visit Lower Bound Upper Bound Target Week Derived range we assign the
Type of Days of Days as Per windowing
derived target visit to
Week -2 <1 Not Applicable
that particular record
Week 0 1 0 as mentioned in
Week 1 2 11 1 Table 2.
Week 2 12 18 2
Week 4 19 36 4
Week 8 37 64 8
Week 12 65 92 12
Week 16 93 120 16

6
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences

Table 3. Transformed Data after applying widowing rules and LOCF algorithm

WINDOWING RULES AND RECORD SELECTION ALGORITHM FOR LOCF

Collection Day(DIFFDAYS) is derived as the difference between Collection Date(CDATE)


and First Active Dosing Date(FDATE)
Classification scheme of records based on the collection day for visit windowing:
o Pre-baseline assessments are not assigned individual derived visit as per
windowing. Baseline represents last assessment prior to first treatment. Thus,
the derived baseline visit as per windowing (VWEEK = 0) encompasses all study
days up to and including Day 1.
o Derived baseline visit record value will not be carried for any post baseline values
imputation.
o For multiple values in same visit window, only the last non-missing value will be
assigned to that particular derived visit week as per the visit windowing. All other
earlier window record values will have a missing value assigned.
o Add a single record for each missing derived week as per visit windowing per
subject, with value set to missing. For these records Patient ID, Derived Visit
Week and LOCF value will be non-missing.

7
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences

Missing data imputation rules based on Derived Visit Week and Parameter Value during
LOCF
o Baseline Visit Week has LOCF = Value
o If Baseline Visit Week value is missing then LOCF = LOCF of immediate non-
missing screening value have collection day < 1
o If Visit Window is missing, then LOCF = missing
o If non-Baseline Visit Week is not missing, then LOCF = Value
o If non-Baseline Visit Week is missing, then LOCF = LOCF of immediately
preceding Visit Week Value.
Refer to table 3 below for pictorial representation of above rules to the data points.

PROGRAMMING PLAN

This section will help outline the steps to be considered while windowing of data to derive the visit
weeks and then apply the rules setup for LOCF during programming.

Step 1:
Subset out if any records have missing first active dosing date missing.
Derive the collection days for assignment of each visit week as per windowing and subset out the
records to subsequent datasets for screening and baseline assessment, post baseline
assessments and out of study window assessments.

Step 2:
Remove any duplicate records by sorting by all variables in the dataset before doing any
manipulation for deriving the baseline assessment record as per visit windowing.
Make sure you sort your records logically by Patient ID, Derived Visit Week and Scheduled
Collection Date. This is a very important validation step to check to have your LOCF performed
correctly for baseline assessment. Logically manipulate your records to check which one will be
picked up for baseline assessment. If there are more than one assessment at baseline window
then select the most latest non-missing assessment from records with collection days <= 1. If the
baseline assessment value at collection day = 1 is missing then select the most latest non-
missing pre-baseline assessment form record/records having collection day < 1.

Step 3:
Remove any duplicate records by sorting by _all_ from doing any manipulation for deriving the
post baseline assessment records as per visit windowing. Make sure you sort your records
logically by Patient ID, Derived Visit Week and Scheduled Collection Date. This is a very
important validation step to check to have your LOCF performed correctly for post baseline
assessments. Logically manipulate your records to check which record will be picked up for each
derived post baseline assessments. If there are more than one assessment at post baseline then
select the most latest non-missing assessment for records with collection days <= 1. If the post
baseline assessment value for respective derived visit week has more than 1 assessment then
select the most latest non-missing pre-baseline assessment from record/records.

Step 4:
For all the distinct patients in our analysis dataset, create a dummy visit week for each of the
defined weeks in the visit window schema as defined in the table 2.

Step 5:
Merge the dummy derived visit weeks dataset with the actual post baseline assessment dataset
by patient ID and the Derived visit week. The resulting dataset will have extra records for each
subject if they did not have the derived planned visits based on the windowing schema.

Step 6:
The resulting dataset from step 5 after been sorted by Patient ID and Derived Visit Week, LOCF

8
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences

logic is implemented can carry forward any previous non-missing parameter value from previous
visit week to subsequent visit parameter value if it was missing.

Step 7:
Sequentially arrange the related variables to be displayed in the reporting dataset which is been
sorted by Patient ID and Derived Visit week for our current study under consideration.

VALIDATION AND LOOK FOR PITFALLS

1. Check to make sure the baseline value should never to be carried into the post baseline
line LOCF assessment values at any given time as per the study rule.

2. Check to make sure if the baseline value assessment at collection day 1 is missing and
also the pre-baseline assessments for collection days < 1, in such an instance the LOCF
value at baseline will be missing. The same rule of validation will apply for the post
baseline assessments too which is further elaborate with set of condition in bullet point 3
and 4 below.

3. Check to make sure if the first post baseline visit week parameter value (i.e. Visit Week =
2) is missing, then the LOCF value is also set to missing.

4. Check to make sure if post baseline visit week parameter value (i.e. Visit Week = 2) is
missing and its subsequent post baseline visit weeks parameter value is missing the
LOCF value is also missing for these records till it encounters a non-missing parameter
value in the subsequent follow-up visits to carry forward the value for subsequent derived
visit weeks.

DATA SETS AND CODE FOR REFERENCE


Case study 1:
Reference dataset and the code to get the final result are provided and are as per following
Dataset Name: Lab_LOCF.sas7bdat
Program Name: Lab_LOCF.sas
Case study 2:
Each of these above steps defined in the programming plan section have been again defined at
respective data steps in the SAS program to map the respective reference in the code as
discussed.
Dataset Name: sysbp_anly.sas7bdat
Program Name: sysbp_locf.sas

CONCLUSION
As the title of this paper says, LOCF is not just carrying forward and from programmers point of
view it is not just use of Lag function or Retain statement to be used on data values. Windowing is
one of the important aspects while applying LOCF logic across many study designs. Also
handling of multiple records within a window is one major point that has to be taken care and
programmer has to pay close attention to the rules of handling such cases specific to the study
design. There can not be a generalized algorithm to handle different cases where LOCF is
required and based on the different requirements, logic has to change, since its completely

9
NESUG 2009 Pharmaceuticals, Health Care, and Life Sciences

based on the set of rules and conditions set during the development of Statistical Analysis Plan
which is been defined under Handling and Imputation of Missing Data Section.

REFERENCES

Mark Keese: Handling Different Rules for the Imputation of Missing Data Using an LOCF
Approach; Proceedings of NESUG 2005

Venky Chakravarthy: The DOW (Not that DOW!!!) and the LOCF in Clinical Trials;
Proceedings of SUGI 2003

ACKNOWLEDGMENTS

Authors would like to acknowledge eClinical Solutions, division at Eliassen Group for providing
the opportunity to work on this paper.

CONTACT INFORMATION
Your comments and questions are valued and encouraged. Contact the authors at:

Vikash Jain
Email: vjain@egistar.com or jainvikash77@yahoo.com

Niraj J. Pandya
Email: npandya@egistar.com or niraj_mech@yahoo.com

SAS and all other SAS Institute Inc. product or service names are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other countries. indicates USA registration.

Other brand and product names are trademarks of their respective companies.

10

Potrebbero piacerti anche