Sei sulla pagina 1di 6

Available online at www.sciencedirect.

com

Comprehensive Psychiatry 50 (2009) 257 262 www.elsevier.com/locate/comppsych

The Clinical Global Impressions scale: errors in understanding and use


Joan Busnera,b,, Steven D. Targumb,c,d , David S. Millerb
Department of Psychiatry, Penn State College of Medicine, Hershey, PA, USA b United BioSource Corporation, Wayne, PA, USA c Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA d Oxford BioScience Partners, Boston, MA
a

Abstract Objective: The Clinical Global Impressions Severity and Improvement scales (CGI-S and CGI-I) are widely included as efficacy data in psychopharmacology new drug application submissions. This study was conducted to determine the extent to which clinical trials investigators included information unrelated to efficacy in their CGI ratings. Method: Forty-five principal investigators provided CGI-S and CGI-I ratings of narratives of patients with major depressive disorder or generalized anxiety disorder. Investigators were blindly randomized to receive narratives that either did (experimental) or did not (control) contain indication-unrelated medical or psychiatric adverse events. Investigators then completed a survey assessing CGI-S and CGI-I rating patterns. Results: CGI-S and CGI-I ratings were significantly more severe and less improved when the narratives contained medical and psychiatric adverse events unrelated to the diseases under study (major depressive disorder and generalized anxiety disorder) than when the narratives did not (Ps b .04). In response to the survey, 46% and 56% of investigators reported that a psychiatric adverse event unrelated to the disease under study would not affect their CGI-S and CGI-I ratings, respectively. Although 87% of investigators reported that their CGI-S and CGI-I ratings would not be affected by a medical adverse event, actual CGI-S ratings were significantly more severe when an unrelated medical adverse event was described as occurring than when it was not (P b .03). Conclusion: Clinical trials investigators' inclusion of indication-irrelevant adverse events threatens the validity of the CGI as an efficacy measure and may contribute to failure to detect efficacy signals in psychopharmacology clinical trials. 2009 Elsevier Inc. All rights reserved.

1. Introduction To gain Food and Drug Administration (FDA) approval to market a new drug for a given disease indication, pharmaceutical companies are required to submit both data that demonstrate safety of the drug and data that demonstrate efficacy of the drug for the indication under consideration [1]. Pharmaceutical sponsors identify in their study protocols the measures by which safety will be demonstrated and the separate measures by which efficacy will be demonstrated.
Portions of the data were previously presented as posters at the 47th Annual Meeting of the NCDEU, Boca Raton, FL, June 11 to 14, 2007, and the 20th Congress of the European College of Neuropsychopharmacology, October 13 to 17, 2007, Vienna, Austria. The authors are all affiliated with United BioSource Corporation (Wayne, PA), which provides rater training services. Dr Targum is an equity holder of United BioSource Corporation. Corresponding author. United BioSource Corporation, Wayne, PA 19087, USA. Tel.: +1 610 225 5982. E-mail address: joan.busner@unitedbiosource.com (J. Busner). 0010-440X/$ see front matter 2009 Elsevier Inc. All rights reserved. doi:10.1016/j.comppsych.2008.08.005

Statistical analysis plans for the analysis of safety data and the analysis of efficacy data are submitted and approved. Two of the more widely used efficacy scales in central nervous system (CNS) trials are the Clinical Global Impressions Severity and Improvement scales (CGI-S and CGI-I) [2]. The CGI-S and CGI-I were first published as part of an assessment packet promulgated by the US government for the study of psychotropic drugs [3]. The CGI-S and CGI-I were designed to provide a basis, independent of ratings on a questionnaire, for the study clinician to make a global assessment of a study patient's condition before and then after the initiation of a study medication. In this manner, it provided a means of determining whether in the view of an experienced clinician the condition under study had improved, worsened, or stayed the same. For the CGI-S item, researchers conducting psychopharmacology trials of a pharmacologic agent for the treatment of a defined condition were asked to evaluate the patient's condition before the initiation of the studied medication (ie, at baseline): Considering your total clinical experience with

258

J. Busner et al. / Comprehensive Psychiatry 50 (2009) 257262

this particular population, how mentally ill is the patient at this time? An illness severity rating was then made on a scale of 1 to 7, with 1 being normal not at all mentally ill and 7 being among the most extremely ill patients. Subsequently, the patient's condition on the study drug (or placebo) was to be compared to the patient's condition before the initiation of the study drug (or placebo) (baseline) via additional CGI-S ratings or the CGI-I item. For the CGI-I, the investigator assessed whether the patient's condition was improved, worse, or the same; the scale ranged from 1 very much improved to 7 very much worse, with 4 denoting no change. In 1985, an NIMH publication on assessments reminded raters that the CGI-S in its very early renditions (dates not given) used to read, Considering your total clinical experience, how mentally ill is the patient at this time? whereas the current version reads, Considering your total clinical experience with this particular population, how mentally ill is the patient at this time? As explained in the 1985 NIMH publication, this was done to make it very clear that the rating was designed to pertain only to the disease under study [4]. A third item, CGI-Therapeutic Efficacy, rarely used, was presented in the 1970 and 1976 manuals [12]; this item, which consists of a 4-by-4 matrix, specifically directed the researcher to plot drug-related improvement or worsening of the condition under study on the y-axis and drug-related adverse events or side effects on the x-axis. The intersecting point was interpreted as the risk/benefit ratio of efficacy to safety. This third measure was explicitly different from the CGI-S and CGI-I, in which improvement or worsening in condition was considered irrespective of whether the investigator believed it was drug-related and in which adverse events or side effects were not considered. Thirty years after their publication, despite a variety of proposed revisions and modifications [5-8], the 1976 CGI-S and CGI-I continue to be widely included as efficacy data in FDA submissions. A March 2008 search of Clinical Trials. Gov, the online listing of clinical trials provided by the National Institutes of Health, identified 626 currently enrolling or recently completed studies that listed the CGI as an efficacy measure. In current usage, the CGI-S is often administered throughout the study, not just at baseline; furthermore, studies have expanded the importance of the CGI-S by requiring that a minimum CGI-S score (of 4, moderate, for example) be present as a criterion for study entry. Thus, not only is the CGI currently used as an efficacy measure, it is also used to define the population in whom drug efficacy will be studied [9-12]. The importance of the CGI-S and CGI-I is not limited to the trials and approval process. Should a sponsor be allowed to market a drug for a specific indication, the CGI-S and CGI-I data that helped form the basis of approval are often then included as part of FDA-governed labeling claims for efficacy. Not infrequently, FDA-governed package inserts describe drug efficacy in terms of the percent of subjects on drug who were assigned a CGI-I rating of 1 very much improved or 2 much improved. At present, CGI-S and

CGI-I data are part of the package insert of all major classes of marketed psychotropics [13]. CGI-S and CGI-I data are also relied upon by the scientific community at large. An influential article in the Journal of the American Medication Association that examined published antidepressant drug and placebo response rates in publications across 2 decades characterized response as either 50% baseline to end point reduction on the Hamilton Depression Rating Scale or end point CGI-I ratings of 1 very much improved or 2 much improved [14]; this article further noted that such CGI-I classifications were routinely used to characterize treatment response. Given the widespread use of the CGI-S and CGI-I as measures of efficacy of investigational agents for particular indications, as well as their apparent simplicity, we were often surprised to learn anecdotally that many active investigators were unclear as to whether they were to include in their efficacy ratings safety information and/or information concerning efficacy of other conditions. Based on discussions with many investigators, it seemed to us that these investigators did not understand the CGI and did not understand its role as an efficacy assessment. Instead they seemed to be focusing on the term global and interpreting it to mean that all aspects of the subject's condition were to be considered in the rating, including those unrelated to efficacy of the drug for the condition under study. Thus, for example, in a study of an investigational medication for the treatment of major depressive disorder, these investigators might assign a subject with a drug-related or drug-independent adverse event (side effect or physical illness, for example, upset stomach) a lower CGI-I rating than they would a subject with identical improvement in major depressive disorder who did not experience an adverse event. This is a misapplication of the efficacy measure and confounds efficacy with safety. In this example, the FDA and, if marketed, the prescribing physician, would have an inaccurate picture of the actual efficacy of the agent for major depressive disorder. Furthermore, we suspected that many investigators included improvement or worsening of comorbid illnessesthat is, illnesses not under studyin their global ratings. For example, in a study to determine whether a given agent is efficacious for the indication, generalized anxiety disorder, such investigators might assign a subject with improvement in a comorbid condition, such as major depressive disorder, a higher generalized anxiety disorder CGI-I rating than they would for a subject with identical generalized anxiety disorder improvement who did not have a comorbid illness that improved. Again, this would be a contamination of the process by which efficacy for the drug for the indication under study is determined. The problem did not seem to be limited to junior investigators. Even very senior CNS key opinion leaders with whom we spoke seemed to have widely discrepant views as whether adverse events and non-indication illnesses belonged in CGI ratings.

J. Busner et al. / Comprehensive Psychiatry 50 (2009) 257262

259

Given the high rate of failure of many CNS investigational drugs to separate from placebo [15-18], the reduction of error in the measure of efficacy is a critical clinical trials priority. The present study was designed (a) to explore empirically whether experienced trials investigators were unclear about the information they were to consider when performing a CGI rating and (b) to explore empirically whether actual CGI ratings are affected by the presence of information unrelated to efficacy of the drug for the disease under study. 2. Method 2.1. Subjects Potential subjects were 167 principal investigators actively engaged in industry-sponsored CNS clinical trials who had been trained on anxiety or depression efficacy scales by United BioSource Corporation (UBC, Wayne, Pa; formerly PharmaStara rater training company) within the past 4 years. 2.2. Procedure Potential principal investigator subjects (PI subjects) were solicited by email for interest in participating in a CGI ratings project for which they would receive compensation. Consenting PI subjects were asked to provide CGI-S and CGI-I ratings for 3 detailed patient narratives. Two of the narratives described patients with major depressive disorder and the third described a patient with generalized anxiety disorder. The patients were described at a baseline visit and at a subsequent follow-up visit. The narratives described symptom scenarios typical of those seen and rated by investigators in actual baseline and follow-up visits with real patients. Symptoms of the disease under study (major depressive disorder or generalized anxiety disorder) were depicted as having improved in all 3 patients' follow-up visits. Principal investigator subjects were instructed to rate the CGI as they normally would were they to encounter such a patient in a clinical trial of an investigational agent for major depressive disorder or generalized anxiety disorder trial, respectively. The narratives sent had been blindly randomized such that they either did (experimental) or did not (control) contain in the follow-up visits nonindicationrelevant medical (nausea and dizziness) or psychiatric (compulsions in one and hallucinations in a second) adverse events in addition to the study-disease information. Lest narrative length affect severity or improvement ratings, narrative length across the 2 conditions was held constant via neutral filler words (such as next appointment date) in the control condition. Principal investigator subjects received only experimental or only control narratives. Upon return of the scored narratives, PI subjects were sent a questionnaire that asked them to consider a typical placebocontrolled clinical trial of an investigational agent for major depressive disorder and indicate whether they typically

would or would not include in their CGI-I and CGI-S ratings, respectively, a physical or psychiatric condition that was distinct from major depressive disorder and that in their judgment did not affect the patient's level of major depressive disorder symptoms. Principal investigator subjects were then asked to provide their estimated number of years' experience rating the CGI in anxiety and/or depression clinical trials, and their estimated number of CGI ratings made in the past year. 3. Results Forty-five PI subjects (24 experimental and 21 control) returned the CGI narratives by the assigned deadline; 39 (87%) of the 45 then returned the follow-up survey. Principal investigator subjects were highly experienced CGI raters (overall self-reported mean years' experience conducting anxiety or depression CGI ratings = 11.59 years [SD = 6.67 years], overall self-reported mean number of anxiety or depression CGIs rated in the past 12 months = 349.74 ratings [SD = 521.68]). Mean years' self-reported CGI ratings experience and mean number of CGI ratings over the past year did not differ for control and experimental conditions, which is as expected given the blind randomization into conditions. Principal investigator subjects were physicians (MD or DO) in all but 1 case (PhD psychologist). 3.1. Survey: reported beliefs about CGI-I and CGI-S ratings 3.1.1. Unrelated psychiatric condition As shown in the first 2 columns of Fig. 1, for the CGI-S and CGI-I, respectively, 23% and 28% of PI subjects responded that their ratings would be affected in a typical major depressive disorder trial by an unrelated psychiatric condition that did not affect the patient's level of major depressive disorder; 46% and 56%, respectively, said their ratings would not be affected; and 31% and 15%, respectively, said it would depend. Write-in responses under the depends category indicated that the PI subjects were not sure or that the PI subjects would ask the study sponsor (pharmaceutical company). 3.1.2. Unrelated medical condition As shown in the second 2 columns of Fig. 1, for CGI-S and CGI-I, respectively, 3% and 5% of PI subjects responded that their ratings would be affected in a typical major depressive disorder trial by an unrelated medical condition that did not affect the patient's level of major depressive disorder; 87% and 87%, respectively, said their ratings would not be affected; and 10% and 8%, respectively, said it would depend. Again, write-in responses under the depends category indicated that the PI subjects were unsure or that they would seek guidance from the study sponsor. As shown by separate analyses of variance, yes, no, and depends responses did not differ for the physical or psychiatric CGI-S or CGI-I questions by PI subjects' years of CGI rating experience or number of CGI ratings over the past year.

260

J. Busner et al. / Comprehensive Psychiatry 50 (2009) 257262

Fig. 1. Principal investigator survey responses to the following question: would your CGI-S and CGI-I rating in a major depressive disorder trial be affected by an unrelated psychiatric or medical adverse event (AE)?

3.2. Clinical Global Impressions narratives: effect of adverse events on CGI-S and CGI-I efficacy ratings Independent-samples t tests were used to compare the experimental and control CGI-S and CGI-I narrative ratings. Again, narratives were identical across conditions in depicting patients with improvement in the disease under study (major depressive disorder or generalized anxiety disorder) but differed in whether they did (experimental) or did not (control) add disease-unrelated adverse events. As shown in Table 1, CGI-S ratings were significantly higher (more severe) when the described patient in a study of an investigational agent for major depressive disorder or
Table 1 Effect of adverse events on CGI severity and improvement ratings Control (n = 21) No AE added Mean (SD) CGI-S a Major depressive disorder Nausea/dizziness New-onset compulsion Generalized anxiety disorder New-onset hallucination CGI-I b Major depressive disorder Nausea/dizziness New-onset compulsion Generalized anxiety disorder New-onset hallucination
a b

generalized anxiety disorder was reported to have experienced an unrelated medical or psychiatric adverse event (experimental) than when the patient was not (control) (Ps b .03, .0001, .001, respectively). Clinical Global ImpressionsImprovement ratings were statistically higher (less improved) when the described patient in a study of an investigational agent for generalized anxiety disorder was reported to have experienced an unrelated psychiatric adverse event (experimental) than when the patient was not (P b .04). Clinical Global Impressions-Improvement ratings were nonsignificantly higher (less improved) when the described patient in a study of an investigational agent for major depressive disorder was reported to have experienced an unrelated

Experimental (n = 24) AE added Mean (SD)

tdf

2.19 (SD = 0.68) 1.43 (SD = 0.60) 1.8 (SD = 0.77),

2.71 (SD = 0.86) 2.25 (SD = 0.53) 2.9 (SD = 1.08)

t42 = 2.255 t40 = 4.84 t37 = 4.18

.03 .0001 .001

1.43 (SD = 0.60) 1.09 (SD = 0.30) 1.40 (SD = 0.60)

1.71 (SD = 0.86) 1.29 (SD = 0.46) 1.96 (SD = 1.04)

t41 = 1.278 t39 = 1.692 t37 = 2.221

NS .0978 .04

On a scale of 1 to 7, higher = more severe. On a scale of 1 to 7, higher = less improved.

J. Busner et al. / Comprehensive Psychiatry 50 (2009) 257262

261

medical adverse event and tended to be higher (P b .10) when a second described patient in a study of an investigational agent for major depressive disorder was reported to have experienced an unrelated psychiatric adverse event.

4. Discussion The results of this study suggest that active clinical trials principal investigators are unclear about whether to include in CGI ratings safety information and efficacy information unrelated to the indication under study. Only about half of PI subjects felt certain that they would not include an unrelated psychiatric condition in their CGI-S and CGI-I ratings. Furthermore, the results suggest that the problem may actually change CGI ratings. In our analogue of the actual trials situation, efficacy ratings were statistically reduced when investigators were presented with side effects or fluctuations in nonindication-relevant diseasesroutine clinical occurrences at clinical trials sites. Such ratings obscure efficacy signals and could result in incorrectly concluding that an efficacious drug is nonefficacious. The findings suggest the need for better training of investigators and sponsors as to the task of the investigator when performing ratings on the CGI-S and CGI-I efficacy end points. It is incumbent on sponsors to ensure that investigators are fully trained and adhere to all aspects of protocols, and this is mandated by law for studies under FDA Investigational New Drug applications [19]. The FDA and ultimately the public may be misled by imprecise efficacy data that measure improvement or worsening in nonstudied conditions or that measure the tolerability of a drug rather than its efficacy. It is not appropriate and not valid for CGI ratings to include safety events or efficacy outcomes for other indications. The findings also give cause for some concern with respect to the use of the CGI-S as an inclusion criterion. Clearly, investigators need be educated as to what precisely they are and are not to consider when formulating the severity rating. The rating is designed to measure severity only of the disease under study and should not include other conditions. Doing otherwise changes the study population and makes it impossible to fully characterize who has been included in the trial and about whom the results will ultimately be generalizable. We found the severity of major depressive disorder and generalized anxiety disorder to be rated differently (more severely) when an unrelated psychiatric condition was presented as part of the clinical picture. A limitation of the study is its small sample size, which raises the possibility of type II errors: some findings that were nonsignificant might have proved statistically significant had the sample been larger. Replicating the study with a larger sample is an important area of future attention and may identify additional effects of extraneous events on CGI ratings.

Despite their widespread use, there is no written guidance for sponsors or investigators as to the rules surrounding the CGI-S and CGI-I with respect to safety and efficacy for nonindication conditions; anecdotally, many investigators report repeatedly asking about this from trial to trial and being told widely discrepant information from sponsor medical leads, with the usual response being something along the lines of, use your best clinical judgment. Clearly, this results in haphazard application of conventions. Sometimes CGI ratings may include disease other than that under study and sometimes they may be strictly confined to only the disease under study. Similarly, sometimes CGI ratings may include adverse events the investigator believes are caused by the drug under study, sometimes CGI ratings may include adverse events the investigator believes are unrelated to the drug under study, and sometimes CGI ratings may not include adverse events at all. In clinical trials, patients routinely experience adverse events and/or fluctuations in comorbidities. In fact, these adverse event occurrences are the norm rather than the exception. It is our belief that the CGI ought only to reflect efficacy of the study drug in treating the symptoms of the disease under study. Regulatory bodies expect to see efficacy data distinct from adverse event data. There are 2 questions: does the drug work, and what side effects are associated with it? The CGI is a measure of whether the drug works for the condition under study. There are other mechanisms to determine if the drug is safe and other trials to determine the drug's efficacy in other conditions. It is of interest that although 87% of our sample of investigators reported they would not include a physical adverse event in their overall CGI-S rating of major depressive disorder, we saw a statistically significant increase in major depressive disorder severity in the experimental manipulation portion of the study when the patient was described as having become dizzy and nauseated than when the patient was not. Thus, in some cases, knowledge of what to include in a CGI-S rating alone may not be sufficient. Further interventions such as specific training with well-defined scoring criteria and practice scoring of scenarios with feedback may be needed to assist some investigators in changing their ratings behavior. Sponsors who submit CGI-S and CGI-I ratings as efficacy data must make explicit in their protocols the parameters the CGI-S and CGI-I are intended to cover and the means by which sponsors will ensure that investigators are properly collecting such data. To this end, we suggest that investigators be trained in the purposes and uses of the CGI, and, specifically, in understanding what should and what should not be included when investigators formulate a CGI rating. References
[1] Food and Drug Administration. Guidance for industry: E9 statistical principles for clinical trials; 1998.

262

J. Busner et al. / Comprehensive Psychiatry 50 (2009) 257262 [11] Lieberman JA, Tollefson G, Tohen M, et al. Comparative efficacy and safety of atypical and conventional antipsychotic drugs in first-episode psychosis: a randomized double-blind trial of olanzapine versus haloperidol. Am J Psychiatry 2003;160:1396-404. [12] Kinrys G, Vasconcelos e Sa D, Nery F. Adjunctive zonisamide for treatment refractory anxiety. Int J Clin Pract 2007;61(6):1050-3. [13] Physicians' Desk Reference 2007. Montvale (NJ): Thompson PDR; 2007. [14] Walsh BT, Seidman SN, Sysko R, Gould M. Placebo response in studies of major depression: variable, substantial, and growing. J Am Med Assoc 2002;287:1840-7. [15] Mallinckrodt CH, Meyers AL, Prakash A, Faries DE, Detke MJ. Simple options for improving signal detection in antidepressant clinical trials. Psychopharmacol Bull 2007;40(2):101-14. [16] Khan A, Klots RL, Thase ME, Krishnan KRR, Brown W. Research design features and patient characteristics associated with outcome of antidepressant clinical trials. Am J Psychiatry 2004;161:2045-9. [17] Fava M, Evins AE, Dorer DJ, Schoenfeld DA. The problem of the placebo response in clinical trials for psychiatric disorders: culprits, possible remedies, and a novel study design approach. Psychother Psychosom 2003;72:115-27. [18] Khan A, Schwartz K, Kolts RL, Ridgway D, Lineberry C. Relationship between depression severity entry criteria and antidepressant clinical trial outcomes. Biol Psychiat 2007;62:65-71. [19] Code of Federal Regulations. Title 21, Food and Drug, Volume 5, Revised as of April 1, 2007, Part 312, Investigational new drug application, subpart D, responsibilities of sponsors and investigators, section 312.50, general responsibilities of sponsors. United States Federal Government.

[2] Guy W. Clinical global impressions. In: Guy W, editor. ECDEU assessment manual for psychopharmacology (Revised). Rockville (Md): National Institute of Mental Health; 1976. p. 217-21. [3] Guy W, Bonato RR. Clinical global impression. In: Guy W, Bonato RR, editors. Manual for the ECDEU Assessment Battery. Chevy Chase (Md): National Institute of Mental Health; 1970. [4] Rapoport J, Conners CK. Clinical global impressions. In: Rapoport J, Conners CK, editors. Rating scales and assessment instruments for use in pediatric psychopharmacology research (special issue), 21(4). Psychopharmacol Bull; 1985. p. 839-43. [5] Beneke M, Rasmus W. Clinical global impressions (ECDEU): some critical comments. Pharmacopsychiatry 1992;25:171-6. [6] Haro JM, Kamath SA, Ochoa S, et al. The clinical global impressionschizophrenia scale: a simple instrument to measure the diversity of symptoms present in schizophrenia. Acta Psychiatr Scand 2003;107 (Suppl 416):16-23. [7] Kadouri A, Corruble E, Falissard B. The improved Clinical Global Impression Scale (iCGI): development and validation in depression. BMC Psychiatr 2007;7:7. [8] Spearing MK, Post RM, Leverich GS, Brandt D, Nolen W. Modification of the Clinical Global Impressions scale for use in bipolar illness (BP): the CGI-BP. Psychiatry Res 1997;73:159-71. [9] Schmidt ME, Fava M, Shuyu Z, et al. Treatment approaches to major depressive disorder relapse. Psychother Psychosom 2002;71 (4):190-4. [10] Green AI, Tohen MF, Hamer RM, et al. First episode schizophrenia-related psychosis and substance use disorders: acute response to olanzapine and haloperidol. Schizophr Res 2004;66: 125-35.

Potrebbero piacerti anche