Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
com
A White Paper providing context, tips and use cases on the topic of analysis over large quantities of data.
Inside:
Apache Hadoop and Cloudera Intelligence Community Use Cases Context You Can Use Today
CTOlabs.com
Executive Summary
Intelligence analysis is all about dealing with Big Data, massive collections of unstructured information. Already, the Intelligence Community works with much more data than it can process and continues to collect more through new and evolving sensors, open-source intelligence, better information sharing, and continued human information gathering. More information is always better, but to make use if it, analysis needs to keep pace through innovations in data management.
CTOlabs.com
combined with information such as coordinates and times, and subjects identified in videos. Hadoop could then effectively sort and search through the new multi-level and multi-media intelligence almost instantly despite the amount and type of material generated. Hadoops low cost and speed open up other intelligence capabilities. Hadoop clusters have often been used as data sandboxes to test out new analytics cheaply and quickly to see if they yield results and can be widely implemented. For analysts, this would mean that they could test out theories and algorithms even when they have a low probability of success without much wasted time or resources, allowing them to be more creative and thorough. This in turn helps prevent the failures of imagination blamed for misreading the intelligence before the September 11 attacks and several subsequent plots. Hadoop is also well suited for evolving analysis techniques such as Social Network Analysis and textual analysis, which are both being aggressively developed by intelligence agencies and contractors. Social Network Analysis uses human interactions, such as phone calls, text messages, meetings, and emails, to construct and decipher a social network, identifying leaders and key nodes for linking members, linking groups, getting exposure, and contacting other important members. Rarely are these members the figureheads in the media or even the stated leadership of terrorist and criminal organizations. Social Network Analysis is helpful for identifying high value targets to exploit or eliminate but for sizable organizations deals with a tremendous amount of data, thousands of interactions of varying types by thousands of members and associates,making Hadoop an excellent platform. Some projects such as Klout, which applies Social Network Analysis to social media to determine user influence, style, and role, already run on Hadoop. Hadoop has also been proven as a platform for textual analysis. Large text repositories such as chat rooms, newspapers, or email inboxes are Big Data, expansive and unstructured, and hence well suited for analysis using Hadoop. IBMs Watson, which beat human contestants on Jeopardy!, has been the Leveraging Hardware Design to Enhance Security and Functionality most prominent example of the power of textual analysis, and ran on Hadoop. Watson was able to look through and interpret libraries of text to form the right question for the answers presented in the game, from history to science to pop culture, faster and more accurately than the human champions he went against. Textual analysis
has value beyond game shows, however, as it can be used on forums and correspondences to analyze sentiment and find hidden connections to people, places, and topics.
CTOlabs.com
Sqoop: Enabling the import and export of SQL to Hadoop. Flume: A distributed, reliable and available service for efficiently collecting, aggregating and moving large amounts of streaming data. Oozie: A workflow engine to enhance management of data processing jobs for Hadoop. Manages dependencies of jobs between HDFS, Pig and MapReduce. Zookeeper: A very high performance coordination service for distributed applications. Hue: A browser-based desktop interface for interacting with Hadoop. It supports a file browser, job tracker interface, cluster health monitor and many other easy-to-use features.
More Reading
For more use cases for Hadoop in the intelligence community visit: CTOvision.com- an blog for enterprise technologists with a special focus on Big Data. CTOlabs.com - the respository for our research and reporting on all IT issues. Cloudera.com - providing enterprise solutions around CDH plus training, services and support.
CTOlabs.com