Sei sulla pagina 1di 14

Stephen Marrin Post-revision draft 18 July 2011 Original draft submitted to Intelligence and National Security on 4 February 2011.

Accepted for publication on 24 May 2011 pending minor revision. Evaluating the Quality of Intelligence Analysis: By What (Mis) Measure? Dr. Stephen Marrin is a Lecturer in the Centre for Intelligence and Security Studies at Brunel University in London. He previously served as an analyst with the Central Intelligence Agency and US Government Accountability Office. Dr. Marrin has written about many different aspects of intelligence analysis, including new analyst training at CIAs Sherman Kent School, the similarities and differences between intelligence analysis and medical diagnosis, and the professionalization of intelligence analysis. In 2004 the National Journal profiled him as one of the ten leading US experts on intelligence reform. Abstract: Each of the criteria most frequently used to evaluate the quality of intelligence analysis has limitations and problems. When accuracy and surprise are employed as absolute standards, their use reflects unrealistic expectations of perfection and omniscience. Scholars have adjusted by exploring the use of a relative standard consisting of the ratio of success to failure, most frequently illustrated using the batting average analogy from baseball. Unfortunately even this relative standard is flawed in that there is no way to determine either what the batting average is or should be. Finally, a standard based on the decisionmakers perspective is sometimes used to evaluate the analytic products relevance and utility. But this metric, too, has significant limitations. In the end, there is no consensus as to which is the best criteria to use in evaluating analytic quality, reflecting the lack of consensus as to what the actual purpose of intelligence analysis is or should be. Evaluating the Quality of Intelligence Analysis: By What (Mis) Measure? Evaluating the quality of intelligence analysis is not a simple matter. Frequently quality is defined not by its presence but rather by its absence. When what are popularly known as intelligence failures occur, sometimes attention focuses on flaws in intelligence analysis as a contributing factor to that failure. But a closer look at the intelligence studies scholarship reveals that rather than a single meaning of the term failure there are instead many meanings, each reflecting an implicit assumption regarding the purpose of intelligence analysis. Some presume the purpose of intelligence analysis is providing assessments that are accurate, and as a result characterize inaccuracy as failure. An example of the accuracy criterion would be the inaccurate US intelligence estimates regarding Iraqi WMD, subsequently described as an intelligence failure. Others presume the purpose of intelligence analysis is preventing surprise, and characterize surprise as failure. An example of the surprise criterion would be the 1998 India nuclear tests, which were a surprise to US decisionmakers and subsequently described as an intelligence failure. Yet others presume that the purpose of intelligence analysis is to influence policy for the better, and characterize lack of influence as failure. Sometimes these failures are described as policy failures instead of intelligence failures. An example of the lack of influence criterion would be CIAs analysis during the Vietnam War, or perhaps CIAs assessments of post-2003 Iraq conflict scenarios after the conventional portion of the

war was over, which may have been accurate but not sufficiently compelling as to lead to a change in decision or policy. So which criterion is most effective as a way to evaluate the quality of intelligence analysis? No one knows precisely because it depends on how one defines the purpose of intelligence analysis, and as of yet no consensus has developed on that issue. Nonetheless, all three criteriaaccuracy, preventing surprise, and influence on policyare employed in retrospective evaluations of intelligence agency performance even though each has significant limitations and problems.1 Accuracy: The Unattainable Ideal One way to evaluate intelligence analysis is according to an accuracy standard. This is an easy-to-understand, black-and-white absolute standard: the analysis is either accurate, or it is not. When the analysis is not accurate, that must mean that there has been an intelligence failure because the purpose of intelligence analysis is to be accurate and it has failed to achieve that objective. However, while using accuracy as an evaluative criterion is simple in theory, actually comparing the analysis to ground truth and determining whether the analysis was accurate or inaccurate can be very difficult to implement in practice. First, there is the presence of qualifiers in the analysis. Uncertainty is part of the intelligence production process. Intelligence collected rarely provides analysts with a complete picture of what is occurring in a foreign country. When a CIA analyst taps into all the various data streams the U.S. government funnels into its secure communication system, the first sense is of an overwhelming amount of information about all kinds of topics. When precise information is desired, such as the condition of a foreign countrys weapons of mass destruction (WMD) program, a CIA analyst cobbles together bits and pieces of information to form a picture or story and frequently discovers many gaps in the data. As a result, an intelligence analysts judgment frequently rests on a rickety foundation of assumptions, inferences and educated guesses. Caveats and qualifiers are necessary in finished intelligence as a way to communicate analytic uncertainty. Intelligence agencies would be performing a disservice to policy-makers if their judgments communicated greater certainty than the analysts possessed. Not making every analytic call correctly is just part of the territory in the broader process of governmental learning, but accurately reflecting uncertainties when they exist is crucial. Unfortunately, caveats also complicate assessments of intelligence accuracy. Words such as probably, likely and may are scattered throughout intelligence publications and prevent easy assessment of accuracy. For example, if CIA analysts had said Iraq probably had weapons of mass destruction, was that analysis accurate or inaccurate? There is no way to tell, given the use of the word probably which qualified the statement to incorporate the analysts uncertainty. Removing caveats for sake of simplicity in assessing intelligence accuracy also unfairly removes the record of analytic uncertainty and, in the end, assesses something with which the analyst never would have agreed. For example, if an analyst says that a coup is likely to occur in a foreign country within six months, and the coup attempt happened 12 months later, would that analysis be accurate or inaccurate? There is no easy way to tell. It was accurate in that a coup occurred, but inaccurate on the timeframe. The determination of accuracy, then, may depend on whether or not one is most concerned about the occurrence of the event or the timeframe in which it took place. In addition, the analytic judgment could not be considered

completely accurate nor completely inaccurate; it is somewhere in between. It is for this reason that the then-Director of Central Intelligence George Tenet said, In the intelligence business, you are almost never completely wrong or completely right.2 Therefore, the use of accuracy as a metric to evaluate intelligence analysis must be done very, very carefully. In addition, even if accurate analysis was produced, a self-negating prophecy resulting from analysis produced within a decision cycle could occur. This means that intelligence analysis can help change what may happen in the future, making the analysis inaccurate. Since intelligence analysis can influence what decisionmakers decide to do, and what they do has the potential to prompt or preclude actions of other international actors, an accuracy yardstick would not effectively capture the quality of the analysis. For example, if an intelligence analyst warns that a terrorist bombing is imminent and policymakers implement security procedures to deter or prevent this incident based on this warning and the terrorists are deterred, then the warning will be inaccurate even though it helped prevent the bombing. This causal dynamic exists for all intelligence issues including political, economic, and scientific due to the nature of the intelligence mission. Therefore, post-hoc assessment of intelligence accuracy may not provide a true sense of the accuracy of the intelligence. It is precisely because of these practical difficulties of using accuracy as a criterion that neither the Office of the Director of National Intelligences analytic integrity and standards staff nor the Undersecretary of Defense for Intelligence uses it as a metric for analytic quality. While accuracy, or perhaps omniscience, is the desired goal of all intelligence analysts, using it as a criterion for reliably evaluating analytic quality is at this point not feasible.3 Preventing Surprise Like accuracy, another absolute standard for evaluating analytic quality involves the prevention of decisionmaker surprise.4 By describing, explaining, evaluating, and forecasting the external environment, intelligence analysts facilitate decisionmaker understanding to the point that decisionmakers are not surprised by the events that take place. When decisionmakers are surprised, by definition there must have been an intelligence failure since it failed to achieve its objective; preventing surprise. The problem with this expectation, of course, is that surprise is ever present in international relations.5 Many surprises are the intentional result of adversaries who employ secrecy to hide their intentions. Secrecy in policy creation and implementation magnifies the effectiveness of power application internationally because, when done successfully, the intended target has little or no time to effectively counter the respective policy. In military terms, this power magnification is known as a force multiplier although the concept is applicable to the economic and political arenas as well. Secrecy has thus become a ubiquitous technique in the implementation of most international policies as a way to ensure policy success through surprise. Accordingly, preventing surprise has become just as necessary and is usually assigned to intelligence organizations because of their ability to uncover secrets. Not all surprises, however, can be prevented by uncovering secrets. Sometimes international forces can produce spontaneous events that surprise everyone involved, such as the fall of the Berlin Wall. These are the mysteries emphasized in some writings on intelligence.6

The question is whether intelligence agencies can reasonably be expected to provide this kind of warning given the obstacles they face. According to Christopher Andrew, Good intelligence diminishes surprise, but even the best cannot possibly prevent it altogether. Human behavior is not, and probably never will be, fully predictable.7 Richard Betts, in his article on the inevitability of intelligence failure, suggests that policies should be implemented in such a way as to be able to withstand the inevitability of surprise, with tolerance for disaster.8 Surprise, then, may be inevitable but equating surprise with intelligence failure is problematic. Some surprises are due more to, as Betts puts it, a lack of response to warning than to a failure to warn.9 He goes on to say that intelligence services will be judged to have failed if they do not provide actionable tactical warning of the events, even if they provide ample strategic warning of their rising probability.10 As a result, some surprises are failures of policy rather than failures of intelligence. Nonetheless, if a surprise occurs intelligence agencies will probably be held responsible regardless of whether the surprise was due to their inability to uncover and correctly interpret secrets, divine an unknowable mystery, or convince a decisionmaker to act. But perhaps the problem is with the absolute standard in the first place. Perhaps the bifurcation of outcome into accurate versus inaccurate is both overly simplistic and unrealistic as a way to determine the success or failure of intelligence analysis. Betts and Shifting the Standard from Absolute to Relative Richard Betts has made the greatest contributions to shifting the evaluative metric from the unattainable ideal of accuracy to something more realistic with his argument that intelligence failures, consisting of either inaccuracy or surprise, are inevitable. Betts argument is a sophisticated one which acknowledges that (1) the analytic task is really difficult; and (2) anything done to fix or reform perceived problems will lead to other problems. Betts is primarily responding to earlier efforts to eliminate failure by identifying causes of inaccuracy or surprise and then trying to eliminate them one by one. Many causes of failure have been identified, including an individual analysts cognitive limitations, and as a result analysis is subject to many pitfalls--biases, stereotypes, mirror-imaging, simplistic thinking, confusion between cause and effect, bureaucratic politics, group-think, and a host of other human failings, according to Ron Garst and Max Gross.11 Many efforts to identify causes of failure then proceed to produce recommendations for ways to eliminate them. Betts, on the other hand, does not believe failure can be eliminated. According to Betts, failure results from paradoxes of perception that include the impossibility of perfect warning, and the distortion of analysis due to motivated biases resulting from organizational or operational goals.12 Another source of analytic inaccuracy, according to Betts, is a byproduct of the inherent ambiguity of information, which is related to the limitations of intelligence analysis due to the inductive fallacy which Klaus Knorr highlighted in 1964.13 Finally, Betts also suggests that ambivalence of judgment, or the inherent uncertainty of the analyst and use of caveats, also complicates achievement of analytic accuracy.14 According to Betts, the paradoxes of perception lead to unresolvable trade-offs and dilemmas that are essentially irreparable since fixing one kind of problem will lead to the development of a new one. One way to explain Betts argument to those unfamiliar with it is through the text of a 1964 childrens book titled Fortunately.15 It tells a story in an iterative way, and goes something

like this: Fortunately, Ned was invited to a surprise party. Unfortunately, the party was a thousand miles away. Fortunately, a friend loaned Ned an airplane. Unfortunately, the motor exploded. Fortunately, there was a parachute in the airplane. Unfortunately, there was a hole in the parachute. Fortunately there was a hay stack under him. Unfortunately there was a pitch fork in the hay. Fortunately he didn't land on the pitch fork. Unfortunately he didn't land on the hay! Betts argument about intelligence failures is very similar to this story about Ned, where every good thing that can be done to prevent intelligence failures leads to a bad thing that causes different kind of failures. There may always be an unfortunately to every fortunately, and failure will be inevitable. Betts conclusion that failure will be inevitable has become the consensus among intelligence scholars, as a 1996 group study published by the Consortium for the Study of Intelligence noted.16 If this consensus interpretation is accurate, then the continued evaluation of individual intelligence failures for purposes of understanding the causes of previous failure in order to implement reforms to prevent all future failures will be the equivalent of pursuit of the will-o-the-wisp; specific causes of failure may change, but failure itself will be ever-present. Due to the pervasive presence of unfortunately and the argument that failure is inevitable, students sometimes interpret Betts as a pessimist. But Betts sees himself as a realist heading towards an optimist precisely because his goal is to preempt disillusionment, backlash, and a drop in public support for intelligence activity that could result from dashed hopes if more boldly ambitious reform initiatives founder.17 While his core argument is that failure may be inevitable, this can be reframed as perfection is unattainablea truism if there ever was one--and that efforts to achieve perfection are likely to fail and therefore not worth pursuing. Yet at the same time Betts says that failure can become less frequent on the margins and also has recommendations for how policymakers can make the inevitable failures less costly or significant. In other words, Betts concludes his discussion of failure by suggesting, perhaps optimistically, that outcomes in the future can be better than they have been in the past. Betts does not believe that the presence of inaccuracy or surprise alone equates to failure of the enterprise, but rather that the creation of standards or expectations for performance should be based on a more sophisticated calculation of the ratio between success and failure. The Batting Average Metaphor Metaphors from baseball are frequently employed by scholars to frame the evaluation of intelligence performance precisely because many useful inferences can be derived from them.18 For example, the difference between the fielding percentage, where most anything less than perfection is an error, and a batting average, which provides more room for error without condemnation,19 highlights the importance of the standard used to evaluate relative performance. In addition, the use of the batting average metric also makes it clear that it is relative success versus an opposing force in the context of a competition where the fates of the batters will, as Betts says, depend heavily on the quality of the pitching they face.20 The fact that relative success or failure is contingent on the skill of the opposition has clear parallels in the world of intelligence. Most importantly, however, the batting average metaphor provides us with a notional sliding scale of proficiency. Glenn Hastedt suggests that while accuracy is an intuitively appealing standard, lending itself to an easy-to-convey scoreboard counting method to assess the performance of intelligence analysts, the use of accuracy as a standard also raises the

questions: How accurate must one be? What is an acceptable batting average?21 Betts says that while there is a limit to how high the intelligence batting average will get because perfection is unattainable, that is not to say that it cannot get significantly better.22 In 1964 Klaus Knorr suggested that rather than eliminate inaccuracy or surprise a more realistic goal would be to improve the "batting average"-say, from 275 to .301.23 Betts further develops this idea when he suggests that a batter may improve his average by changing his stance or swing, and such changes are worth making even if the improvement is small. Raising a few players averages from .275 to .290 is an incremental improvement, but it could turn out to make the difference in whether the team finishes the season in second place or in first.24 But even if success rates do increase, Betts goes on to say that even a .900 average will eventually yield another big failure. The inevitability of big failures can best be illustrated by extending the metaphor to discuss the relative importance or significance of each at bat. In baseball, each at bat in the spring and early summer is less important than it is in the fall. But nothing comes close to the significance of the World Series, particularly the last inning of the 7th game. Yet even then, while not desired failure at those highly significant moments is still normal. It is possible to strike out in the last at bat of the World Series, and great batters have at times had great failures. Or, in the intelligence context, even the best intelligence systems will have big failures, according to Betts.25 That is not to say that striking out in the last at bat of the World Series is acceptable, though. The batting average metaphor also provides us with implicit normative standards that we can use to evaluate each individual success or failure. As the late CIA officer Stanley Moskowitz pointed out, if the bases are loaded in the bottom of the ninth of the seventh game of the World Series, and our .300 hitter strikes out, you bet fans will say he failedor worse.26 As Betts suggests, a batter who strikes out is certainly at fault for failing to be smarter or quicker than the pitcher. Whether the batter should be judged incompetent depends on how often he strikes out against what quality of pitching. A batter with a .300 average should easily be forgiven for striking out occasionally, while one who hits .150 should be sent back to the minors.27 One major flaw in the use of the batting average metaphor, though, is that the pre-set normative expectations for performance drawn from baseball may not correspond well to the evaluation of intelligence analysis. In baseball, below .200 is bad and above .350 is very good. So what is the appropriate batting average for intelligence analysis? We do not know. According to Richard Posner, no one has any idea of what it actually is or should be.28 Perhaps another metaphor might be useful to introduce here. Another way to illustrate the ratio of success to failure without the pre-set expectation of the batting average analogy is through the use of the glass half-full/glass half-empty metaphor. There is a saying that an optimist sees a glass as half full whereas a pessimist might see the glass as half empty. If the full part of the glass represents intelligence accuracy (or success), Bettson seeing that glasswould probably say that it can never be 100% full. Others who highlight analytic accuracy and its role in preventing surprise would suggest that even though the glass might not be full, it is far from empty. For illustrative purposes, this analogy is both more general and more universal than the batting average one, and perhaps easier to use when explaining the issues and problems in evaluating intelligence analysis over time.

But how full is the glass, or what is the ratio of success to failure? And how full should the glass beor what should the ratio between success and failure be--given the difficulties of intelligence analysis and tradeoffs involved? A judicious exploration of the history of intelligence analysis over time should be able to provide us with answers to these questions. Trying to Establish a Baseline from History Intelligence analysis history should provide us with a baseline for evaluating performance over time, and the derivation of standards or expectations about current and future performance. When we look to the past, however, we see primarily failure. Pearl Harbor. Chinas involvement in Korea. Vietnam. Yom Kippur. The fall of the Shah in Iran. The end of the Cold War. 9/11. Iraq WMD. Compared to these failures, the few successes that are referenced seem less important. Perhaps there is some truth to the saying that in intelligence failures become public while successes are kept secret. If this is the case, how then do we evaluate the quality of intelligence analysis over time? Is it even possible? Evaluating the literature on intelligence analysis history seems like it should be useful, but as soon as we look to history we discover conflicting interpretations of the analytic track record. For example, both former Deputy Director of Central Intelligence Richard Kerr and former CIA analyst Richard Russell evaluate CIAs analytic performance over time, but come to opposing conclusions about its relative accuracy or quality. Kerr looks at CIA history and provides us with a positive interpretation highlighting analytic success while at the same time Russell looks at the same CIA history and provides us with a negative interpretation highlighting analytic failures.29 When read side-by-side, these conflicting perspectives provide us with more questions than answers. Using the batting average analogy, one focuses on the hits and the other focuses on the outs. Or to use the glass analogy, one argues the glass is partly full, which is true, and the other argues the glass is partly empty, which is also true. So both perspectives are to some degree correct. But which is more correct? Which is more useful for establishing a ratio of success to failure based on historical events? In terms of the specific differences between Kerrs and Russells interpretations, the key to understanding their different perspectives could very easily be their own professional histories. Kerr is a career-long CIA analyst who rose through the ranks to become Deputy Director of Central Intelligence and most likely believes in the value of the organization and its contribution to policy. He also has a vested interest in representing its track record in the most positive light possible. On the other hand, Russell is a former CIA analyst who left the CIA to pursue another career outside the Agency. Like John Gentry before him, Russell can be portrayed as a disgruntled idealista true believer-- who encountered the real world in all of its messiness and imperfection, and rejected that world as not as good as it could or should be. The evaluations of intelligence agency performance from these kinds of disgruntled idealists is very critical and sometimes very bitter, not because they hate or dislike the organization but because they believe in it so much that they have difficulty accepting its imperfections and problems. So the fact that one observer looks to the past and sees success whereas another looks to the past and sees failure could be entirely derivative of the observers background. But the real issue at hand is bigger than this one difference of opinion between former CIA analysts regarding the CIAs analytical track record. Instead, more fundamentally, it is: what can we learn from history? And how can we learn from history if different authorities come away from their study of history with different interpretations of the relative accuracy of analytic assessments? How do we adjudicate between different interpretations of the same

history? Is it possible to evaluate history objectively? Or is our evaluation of history a social construct, shaped by our own individual histories, unique or idiosyncratic understandings, and biases? Is the historical track record of intelligence analysis good or bad? On what basis can someone make that normative determination? And what is the value, if any, to studying this historical record? Are there lessons that can be drawn from the historical record to improve intelligence analysis in the future by preventing the mistakes or problems of the past? Or is that effort to prevent recurrence of failure doomed due to the tradeoffs that Betts mentions; that we might be able to avoid the specific mistakes of the past only to make different ones in the future? Will there be an unfortunately to every fortunately? In the end, what is the value of lessons from the past? Better understanding if not better performance? Maybe a more realistic expectation for what intelligence can do and what it cannot? In 1998, Professors James Blight and David Welch made a stab at answering a series of questions very much like these. They suggest that retrospective evaluationthough imperfectcan be useful if we have a particular understanding in mind of how we can benefit from it.30 As they go on to point out, though, a combination of three temptations frequently prevent useful evaluation of historical performance, consisting of the temptation to focus on the specific and the spectacular (a form of selection bias); the temptation to privilege hindsight; and the temptation to try to evaluate performance in terms of a rate of success.31 Blight and Welch argue that our study of history has a selection bias in the direction of failure, the actual track record or ratio of success to failure is impossible to calculate because the large majority of cases remain classified, hindsight bias shapes our subsequent evaluation in the direction of finding fault in mistaken analytic interpretations that may not have been preventable, and in fact the effort to determine a rate of success is fatally flawed.32 Blight and Welch also cite Klaus Knorrs 1964 use of the batting average metaphor, but call its use misguided as well as meaningless and unworkable.33 Should partial credit be given when the outcome deemed likely rather than certain occurs? Should the analyst lose credit if the outcome deemed unlikely but not impossible occurs? If the analysis contains a judgment of likelihood for an outcome that is conditional on something else occurring first, but the conditional event did not develop as expected, does that count, as Blight and Welch ask, as two at-bats - one hit and one out for a batting average of .500? Or was the second estimate simply part of the first - a swing-and-a-miss during the one at-bat of consequence, so to speak - for an average of 1.000?34 What about situations where the analyst forecasts that an event will not take place? As Blight and Welch ask, how do we score this? Do we count one successful at-bat for every day that the (forecasted event doesnt take place)? Every week? Every month? Every three months? Our decision will certainly have dramatic implications for (the resulting) batting average.35 Blight and Welch conclude that there is no non-arbitrary, rigorous way to determine a historical batting average or intelligence performance in general, even in retrospect.36 CIA historian Woodrow Kuhns comes to the same conclusion as Blight and Welch when he highlights that no one knows what the actual ratio of success to failure in intelligence analysis is.37 He points out that Abbot Smith, former chief of the Board of National Estimates, once noted that, 'It would seem reasonable to suppose that one could get a truly objective, statistical verdict on the accuracy of estimates. Go through the papers, tick off the right judgments and the wrong ones, and figure the batting average. I once thought that this could

be done and I tried it, and it proved to be impossible.38 As Kuhns points out, Smith guessed that to judge the track record of just 20 years' worth of NIEs a researcher would need to check the accuracy of no fewer than 25,000 judgments.39 As Kuhns concludes, the bottom line is that no one inside the intelligence community - or outside of it, for that matter knows what the 'batting average' is when it comes to analysts' forecasts.40 Even Richard Betts, the most high-profile user of the batting average metaphor, agrees that it is quite impossibleto compute an intelligence batting average in reality.41 According to Betts, inability to compute a systematic average means that judgment about the success-tofailure ratio can only be subjective and utterly unreliable by the standards of a statistician.42 So even if there is theoretical value in framing intelligence analysis performance using a relative success to failure ratio, actually determining either the real or optimal ratio will be impossible. Therefore, another set of evaluative criteria may be necessary. Using Decisionmakers Evaluative Framework In a general criticism of the field, the conventional approach for thinking of the intersection of intelligence analysis and decisionmaking is inadequate. What has been labeled the standard model portrays intelligence analysis as a precursor to and foundation for policy decisions, perhaps because of the prevalence of the intelligence cycle in the literature. 43 In this standard model of the intersection between intelligence analysis and decisionmaking, intelligence analysts provide information to decision makers who then use that information in the course of deciding which policy option to pursue. The reality, though, is very different. Intelligence analysis is regularly ignored by decisionmakers, and frequently has limited to no impact on the decisions they make. As a result, a new kind of theory or model more effectively explaining what happens at the intersection of intelligence analysis and decisionmaking has been developed. It conceptualizes the purpose of intelligence as to ensure that decisionmakers power is used as effectively and efficiently as possible, with the purpose of intelligence analysis being to integrate and assess information as a delegated, subordinate, and duplicative step in a decisionmaking process.44 This conceptualization privileges the role of the decisionmaker in the assessment process over that of the analyst, thus turning the standard model in its head. Unfortunately, this emphasis on the significance of the decisionmaker in evaluating the analytic product has not been universally embraced by scholars, practitioners, or the general public. Instead, more simplistic measures such as accuracy or surprise tend to predominate the discussion of intelligence performance as a way of characterizing failures. Yet evaluating intelligence analysis using the decisionmakers perspective could be important since, as Kuhns suggests, the decisionmaker is the only person whose opinion really matters.45 If decisionmakers find the analysis informative, insightful, relevant or useful, then the intelligence analysis has succeeded whereas if the decisionmakers are left unsatisfied then the analysis has failed. Intelligence analysis can be evaluated based on the decisionmakers perception of its relevance, influence, utility or impact. First, there is intelligence analysis that is relevant to decisionmakers. Relevance is usually determined by the conceptual frameworks possessed by decisionmakers; intelligence analysis that may be relevant to one decisionmaker may be deemed irrelevant to another even though they may have similar substantive responsibilities. However, even if intelligence analysis is considered to be relevant to the decisionmaker that

does not guarantee the analysis will either influence the decisionmakers judgment or be found useful. As a result, relevance is necessary but not sufficient to indicate impact on decisionmaker judgment and subsequent decisionmaking. Second, there is intelligence analysis that is influential in terms of shaping or influencing the decisionmakers judgment on a particular issue. Analysis that is influential has to be relevant by definition, but not all relevant analysis is influential. Former DCI and current Secretary of Defense Robert Gates has observed that the influence and effectiveness of intelligence analysis depends on whether its assessments are heeded, whether its information is considered relevant and timely enough to be useful, and whether the (analysts) relationship with policymakers, from issue to issue and problem to problem, is supportive or adversarial.46 Third, there is intelligence analysis that is useful--which also has to be relevant by definition-and could indicate analysis that is either useful in the sense of improving judgment (i.e. influential) or useful in the sense of achieving policy outcomes, or both. For example, both cherry picking and back stopping47 describe intelligence analysis that is relevant and useful, but not necessarily influential since the analysis is used to support policies already chosen rather than influence the choice. Well before any recent controversy, Harold Ford, a longtime CIA officer and former vice chairman of the National Intelligence Council, pointed out that Decision-makers think intelligence is great when it can be used to support or help sell their own particular positions. Conversely, they are free to ignore intelligence, or deem it no damn good when (as in the case of Vietnam) it does not support or, worse still - seems to question policy-making wisdom.48 How this dynamic plays out in the real world is best illustrated with the quote from Paul Pillar that the Bush administration used intelligence not to inform decisionmaking, but to justify a decision already made. It went to war without requesting -- and evidently without being influenced by -- any strategic-level intelligence assessments on any aspect of Iraq.49 Asking for feedback from decisionmakers may be a way to evaluate the analysis relevance, influence on judgment or utility, but doing so can be fraught with peril. Policymaker satisfaction with intelligence analysis is a notoriously fickle and idiosyncratic metric. Decisionmakers may not be satisfied with intelligence analysis if it conflicts with their own biases, assumptions, policy preferences, or conveys information that indicates a policy may be failing. For example, during the Vietnam War US national security policymakers were not happy or satisfied with pessimistic CIA analysis of the efficacy of the war effort despite the general accuracy of such assessments.50 Therefore, the use of consumer satisfaction as a proxy metric for the value of intelligence analysis can lead to confusion between analytic quality and its confluence with policy agendas and would require laborious sifting of data to ensure that the data represented intelligence quality instead of some other dynamic. Finally, there is impact or influence on the policy itself. This is the most difficult to measure, since the influence of the intelligence analysis independent of all other possible influences on decisionmaking is very difficult to evaluate. The difficulties of doing so lead to inherent uncertainty about the influence of the analytic product on the final decision and as a result some have decided this criterion is not appropriate for evaluating intelligence analysis. As Sherman Kent once said: A certain amount of all this worrying we do about influence upon policy is off the mark. For in many cases, no matter what we tell the policymakers, and no

matter how right we are and how convincing, he will upon occasion disregard the thrust of our findings for reasons beyond our ken.51 Kent went on to suggest that CIAs analytic goal should be to produce intelligence that is relevant within the area of our competence, and above all...credible.52 Some contributions to the intelligence literature explore decisionmakers perception of the value of intelligence analysis. In one article, Ford looked at the National Intelligence Estimates over time and observed that from the consumers perspective strategic estimates were not as useful to decisionmaking as many might assume. As Ford puts it, the purpose of intelligence has long been based on the philosophy that intelligence, once carefully collected, analyzed, and disseminated, will give policy-makers a more accurate picture of the world and help them determine the most rewarding courses to pursue. That philosophy (rests on an assumption) that the making and execution of policy are largely matters of reason.53 Unfortunately, as Ford goes on to say, reason is not the only factor that drives policymaking. Instead, all kinds of forces go into their making of policy, not excluding timidity, ambition, hubris, misunderstandings, budgetary ploys, and regard for how this or that policy will play in Peoria.54 Fords analysis highlights an important point: if the relative success or failure of intelligence analysis depends on the extent to which decisionmakers appreciate it, then intelligence analysis fails much more frequently than many would think. Richard Immerman, too, has argued that intelligence analysis has had little impact on strategic decisionmaking to the point that instances of the formulation of national security policy, or grand strategy, hinging on intelligence collection and analysis are few and far betweenif they exist at all.55 Clarifying Purpose and Improving on the Margins In the end, there is no single metric or standard used to evaluate intelligence analysis, and different people use different standards. This highlights an even more significant issue: the reason that different standards are used is because there is no consensus in either the practitioners or scholars camp regarding the purpose of intelligence analysis. Some believe that the purpose of intelligence analysis is to be accurate; others believe the purpose is to prevent surprise; while yet others believe the purpose is to be influential or useful. If the intelligence analysis does not meet any of these criteria, then failure is the descriptor that is frequently used. But the fact that different kinds of failures really represent different normative visions of what intelligence analysis is supposed to accomplish is not acknowledged by most participants. Despite the fact that intelligence analysis has existed as a function of government for decades, both practitioners and scholars have failed to develop a consensus on or even acknowledge differences of opinion regarding exactly what it is supposed to do. If the failure is determined to be inaccuracy, is the implicit expectation perfection? If the failure is one of surprise, is the implicit expectation omniscience? If the failure is one of lack of influence, to what degree is that more of a policy failure than an intelligence failure? These are questions that both scholars and practitioners should make explicit when they discuss the quality of intelligence analysis and the causes of intelligence failure. Perhaps the goal of policy should be trying to improve intelligence analysis across the board by improving accuracy, preventing surprise and increasing the value of the product for the

decisionmaker. But even this will not eliminate failures altogether. As Betts has said, echoing Knorr before him, we should focus on improving performance on the margins--raising the batting average by 50 points, or raising the level of liquid in the glassnot achieving perfection or omniscience. A model for how to do this may be found in medicine.56 In medicine similar limitations apply to the use of accuracy and other absolute metrics and evaluative criteria. In many cases accuracy would be difficult to establish even in hindsight, and in most cases a track record of success to failure would be difficult of not impossible to establish. As Jerome Groopman from the Harvard Medical School has suggested, no one can expect a physician to be infallible because medicine is, at its core, an uncertain science. Every doctor makes mistakes in diagnosis and treatment.57 But he then goes on to suggest that the frequency of those mistakes, and their severity, can be reduced by understanding how a doctor thinks and how he or she can think better. This emphasis on individual analytic cognition may provide a model for how to go about evaluating analytic quality on a case-by-case basis, just as it is done in medicine.58 Finally, understanding and improving intelligence analysis may also require clarifying what we believe the purpose of intelligence analysis is, or what the purposes of intelligence analysis are, and how to best achieve them. Rather than focusing on and studying failure, perhaps trying to achieve success will do more to improve the quality of intelligence analysis than trying to eliminate failure.

While the following evaluation of these criteria is drawn from and illustrated using cases from the United States experience with intelligence, the argument is intentionally generalized to all intelligence analysis across space and time. 2 George Tenet, DCI Remarks on Iraq's WMD Programs as Prepared for Delivery. February 5, 2004. 3 For more on the difficulties of using accuracy as an evaluative criterion, see: Richards J. Heuer, Jr., The Evolution of Structured Analytic Techniques. Presentation to the National Academy of Science, National Research Council Committee on Behavioral and Social Science Research to Improve Intelligence Analysis for National Security, Washington, DC, December 8, 2009. 4-5. 4 A portion of this discussion on preventing surprise is drawn from Stephen Marrin. Preventing Intelligence Failures By Learning From the Past. International Journal of Intelligence and Counterintelligence. Vol. 17. No. 4. (Oct-Dec 2004). 655-672. 5 For an in-depth discussion of the causes of surprise, see James J. Wirtz. Theory of Surprise. Paradoxes of Strategic Intelligence: Essays in Honor of Michael I. Handel. (Eds. Richard K. Betts and Thomas G. Mahnken) Frank Cass Publishers. 2003. 101-116. 6 See especially Gregory F. Treverton, Reshaping National Intelligence for an Age of Information (New York and Cambridge: Cambridge University Press, 2001). For an earlier version of the secrets versus mysteries discussion, see Joseph S. Nye Jr. Peering into the Future. Foreign Affairs. Vol. 73. No. 4. (July-August 1994). 82-93. 7 Christopher Andrew, For the President's Eyes Only: Secret Intelligence and the American Presidency from Washington to Bush. HarperPerennial: USA, 1995. 538. 8 Richard K. Betts. Analysis, War, and Decision: Why Intelligence Failures Are Inevitable. World Politics. Vol. 31, no. 2 (Oct. 1978). (61-89) 89. 9 Richard K. Betts. Enemies of Intelligence: Knowledge and Power in American National Security. Columbia University Press. 2007. 184 10 Betts, Enemies of Intelligence, 185 11 Ronald D. Garst and Max L. Gross, On Becoming an Intelligence Analyst. Defense Intelligence Journal Vol. 6, No. 2 (1997): 48. 12 Betts. Analysis, War, and Decision.



Klaus E. Knorr. Foreign Intelligence and the Social Sciences. Research Monograph No. 17. Center of International Studies. Woodrow Wilson School of Public and International Affairs. Princeton University. June 1, 1964. 23. 14 Betts. Analysis, War, and Decision. 15 Remy Charlip. Fortunately. Parents Magazine Press. 1964. 16 The Future of US Intelligence: Report Prepared for the Working Group on Intelligence Reform. Consortium for the Study of Intelligence. 1996. 29. 17 Betts, Enemies of Intelligence, 184 18 For example, see: John Hedley, Learning from Intelligence Failures International Journal of Intelligence and Counterintelligence, 18: 435450, 2005; Paul R. Pillar, Intelligent Design? The Unending Saga of Intelligence Reform, Foreign Affairs , March/April 2008 (citing Betts); Robert Mandel. Distortions in the Intelligence Decision-Making Process. Intelligence and Intelligence Policy in a Democratic Society. (Ed. Stephen J. Cimbala). Transnational Publishers, Inc. Dobbs Ferry, NY. 1987. 69-83. Mark M. Lowenthal. The Burdensome Concept of Failure. Intelligence: Policy and Process (Eds. Alfred C. Maurer, Marion D. Turnstall, and James M. Keagle). 43-56. Boulder, CO: Westview. 1985. 43-56. Mark M. Lowenthal. Towards a Reasonable Standard for Analysis: How Right, How Often on Which Issues? Intelligence and National Security. Vol. 23, No. 3 (Jun. 2008): 303-315. 19 As Robert Jervis points out, if we were right something like one time out of three we would be doing quite well. Robert Jervis, Improving the Intelligence Process: Informal Norms and Incentives. Intelligence: Policy and Process. Eds. Alfred C. Maurer, Marion D. Tunstall and James M. Keagle. Westview Press. Boulder and London. 1985. (113-124.) 113. 20 Betts, Enemies of Intelligence, 65. 21 Glenn Hastedt, (2009) 'Intelligence Estimates: NIEs vs. the Open Press in the 1958 China Straits Crisis', International Journal of Intelligence and CounterIntelligence, 23: 1, 104 -132. 22 Richard K. Betts, Fixing Intelligence. Foreign Affairs. V. 81. No. 1 (Jan/Feb 2002). (43-59). 59. 23 Klaus E. Knorr, Failures in National Intelligence Estimates: The Case of the Cuban Missiles. World Politics, Vol. 16, No. 3 (Apr., 1964), 455-467. 460 24 Betts, Enemies of Intelligence, 186 25 Betts, Fixing Intelligence, 59. 26 Stanley Moskowitz, Intelligence in Recent Public Literature: Uncertain Shield: The U.S. Intelligence System in the Throes of Reform by Richard A. Posner. Studies in Intelligence. Vol. 50. No. 3. (Sept 2006). Accessible online at: 27 Betts, Enemies of Intelligence, 185-186 28 Richard A. Posner, Preventing Surprise Attacks: Intelligence Reform in the Wake of 9/11. Rowman and Littlefield Publishers, Inc. 2005 p106. Kuhns made a similar point when he answered the question just how bad is it out there? by saying the short answer is, no one knows. Kuhns, 82. 29 Richard L. Russell, Sharpening Strategic Intelligence: Why the CIA Gets It Wrong and What Needs to Be Done to Get It Right. Cambridge University Press. 2007.; Richard J. Kerr, The Track Record: CIA Analysis from 1950 to 2000. Analyzing Intelligence: Origins, Obstacles, and Innovations. (Eds. Roger Z. George and James B. Bruce) Georgetown University Press. Washington DC. 2008. 35-54. 30 James G. Blight, and David A. Welch, The Cuban Missile Crisis and Intelligence Performance. Intelligence and the Cuban Missile Crisis. (Eds. James G. Blight and David A. Welch). Frank Cass, London. 1998. (173217). 189. 31 Blight and Welch, 190 32 Blight and Welch, 190-191 33 Blight and Welch, 201 and 192 34 Blight and Welch, 192 35 Blight and Welch, 193 36 Blight and Welch, 189, 193 37 Woodrow J. Kuhns, Intelligence Failures: Forecasting and the Lessons of Epistemology Paradoxes of strategic intelligence: essays in honor of Michael I. Handel. Ed. Richard K. Betts and Thomas G. Mahnken. Frank Cass Publishers, 2003. 82 38 Abbot E. Smith, On the Accuracy of National Intelligence Estimates, Studies in Intelligence, vol. 13, no. 3 (Fall 1969), p. 25. As cited in Kuhns, page 82 39 Kuhns, 82 40 Kuhns,83. 41 Betts, Enemies of Intelligence, 187 42 Betts, Enemies of Intelligence, 188



Stephen Marrin. Intelligence Analysis and Decisionmaking: Methodological Challenges. Intelligence Theory: Key Questions and Debates. (Eds. Peter Gill, Stephen Marrin, and Mark Phythian). Routledge. 2008. 131-150. 44 Stephen Marrin. Intelligence Analysis Theory: Explaining and Predicting Analytic Responsibilities. Intelligence and National Security. 22:6 (December 2007). 821- 846. Also see Stephen Marrin. At Arm's Length or at the Elbow?: Explaining the Distance between Analysts and Decisionmakers. International Journal of Intelligence and Counterintelligence 20, no. 3 (Fall 2007): 401-414. 45 Kuhns, 82 46 Robert M. Gates, The CIA and American Foreign Policy. Foreign Affairs. Vol. 66 No. 2. (Winter 1987/88). (215-230). 216. 47 Cherry-picking occurs when decisionmakers take data or analysis out of context to make decisions or justify decisions previously reached. A variant of cherry-picking is backstopping which involves, according to Roger Hilsman, the intelligence analysts mechanical search for facts tending to support a policy decision that has already been made in order to protect the decisionmaker by supplying him with facts to defend his position. Hilsman goes on to observe that decisionmakers seemed most pleased with backstopping. See Roger Hilsman. Intelligence and Policy-Making in Foreign Affairs. World Politics. April 1953. (1-45). 6. 48 Ford, 51 49 Paul R. Pillar. Intelligence, Policy, and the War in Iraq. Foreign Affairs. Vol.85 No.2. (Mar- Apr 2006). 1527. 50 Ford, 39 51 Sherman Kent. Estimates and Influence. Studies in Intelligence. Vol. 12, Issue 3. (Summer 1968). 11-21. 52 Kent, Estimates and Influence. 53 Harold P. Ford, The US Government's Experience with Intelligence Analysis: Pluses and Minuses. Intelligence Analysis and Assessment. (Eds. David Charters, A. Stuart Farson, and Glenn P. Hastedt). Frank Cass; London. 1996. (34-53), 50 54 Ford, 50 55 Richard H. Immerman. Intelligence and Strategy: Historicizing Psychology, Policy and Politics. Diplomatic History. Vol. 32. No. 1. (January 2008). (1-23). 7. 56 Medicine and medical diagnosis is frequently referenced in the intelligence literature as an analogous domain. See Stephen Marrin and Jonathan Clemente. Improving Intelligence Analysis by Looking to the Medical Profession. International Journal of Intelligence and Counterintelligence. Vol. 18. No. 4. (2005) 707729; Stephen Marrin and Jonathan Clemente. Modeling an Intelligence Analysis Profession on Medicine. International Journal of Intelligence and Counterintelligence. Vol. 19. No. 4. (Winter 2006-2007). 642-665; and Josh Kerbel. Lost for Words: The Intelligence Community's Struggle to Find Its Voice. Parameters 38, no. 2 (Summer 2008): 102-112. 57 Jerome E. Groopman, How Doctors Think, New York, Houghton Mifflin Company: 2007. 7. 58 For an example of this kind of emphasis, see: Uri Bar-Joseph and Rose McDermott. Change the Analyst and Not the System: A Different Approach to Intelligence Reform. Foreign Policy Analysis. Volume 4, Issue 2. April 2008. 127145.