Documenti di Didattica
Documenti di Professioni
Documenti di Cultura
Introduction
Data has become a torrent flowing into every area of the global economy Companies churn out a burgeoning volume of transactional data, capturing trillions of bytes of information about their customers, suppliers, and operations Social media sites, smart phones, and other consumer devices including PCs and laptops have allowed billions of individuals around the world to contribute to the amount of big data available Each second of high-definition video, for example, generates more than 2,000 times as many bytes as required to store a single page of text
Big data is not defined in terms of being larger than a certain number of terabytes
As technology advances over time, the size of datasets that qualify as big data will also increase The definition can vary by sector, depending on what kinds of software tools are commonly available and what sizes of datasets are common in a particular industry Big data in many sectors today will range from a few dozen terabytes to multiple petabytes (thousands of terabytes).
Respondents were asked to choose up to two descriptions about how their organizations view big data from the choices above. Choices have been abbreviated, and selections have been normalized to equal 100%. Total respondents=1144. Source: IBM Analytics: The real-world use of big data,2012
1993
The W3Catalog, the World Wide Web's first primitive search engine, is released. 1995 Sun releases the Java platform, with the Java language first invented in 1991.
1997 A paper on visualization is published which discusses the challenges of working with data sets too large for the computing resources at hand Big data
1998 Carlo Strozzi develops an open-source relational database and calls it NoSQL. Google is founded. 2001 Tim Berners-Lee, inventor of the World Wide Web, coins the term Semantic Web, a dream for machine-to-machine interactions in which computers become capable of analyzing all the data on the Web.Wikipedia is launched.
2002 In wake of the Sept. 11, 2001, attacks, DARPA begins work on its Total Information Awareness System 2003 The amount of digital information created by computers and other data systems in this one year surpasses the amount of information created in all of human history prior to 2003, according to IDC and EMC studies. 2005 Apache Hadoop, destined to become a foundation of government big data efforts, is created. 2008 The number of devices connected to the Internet exceeds the worlds population.
2011 IBM's Watson scans and analyzes 4 terabytes (200 million pages) of data in seconds to defeat two human players on Jeopardy! Work begins in UnQL,a query language for NoSQL databases.
2012 The Obama administration announces the Big Data Research and Development Initiative, consisting of 84 programs in six departments. The National Science Foundation publishes Core Techniques and Technologies for Advancing Big Data Science & Engineering.IDC and EMC estimate that 2.8 zettabytes of data will be created in 2012 .The report predicts that the digital world will by 2020 hold 40 zettabytes.
Big Data falls in to the Peak of Inflated Expectations stage (in 2012) By 2013 Big data is expected to fall into the Trough of Disillusionment stage
Data warehouse: Specialized database optimized for reporting, often used for storing large amounts of structured data. Data is uploaded using ETL (extract, transform, and load) tools. Distributed system: Multiple computers, communicating through a network, used to solve a common computational problem. Dynamo: Proprietary distributed data storage system developed by Amazon. Hadoop: An open source (free) software framework for processing huge datasets on certain kinds of problems on a distributed system. Its development was inspired by Googles MapReduce and Google File System. HBase: An open source (free), distributed, non-relational database modeled on Googles Big Table. Mashup: An application that uses and combines data presentation or functionality from two or more sources to create new services.
Metadata: Data that describes the content and context of data files, e.g., means of creation, purpose, time and date of creation, and author. Non-relational database: A database that does not store data in tables (rows and columns) (In contrast to relational database). Relational database: A database made up of a collection of tables (relations), i.e., data is stored in rows and columns. Semi-structured data: Data that do not conform to fixed fields but contain tags and other markers to separate data elements. SQL: Originally an acronym for structured query language, SQL is a computer language designed for managing data in relational databases. Stream processing: Technologies designed to process large realtime streams of event data. Visualization: Technologies used for creating images, diagrams, or animations to communicate a message that are often used to synthesize the results of big data analyses.
Tech start ups/apps developers Partnering for new revenue streams Finance B2B supplier profiling Fraud detection Credit Scoring
30 billion pieces of content shared on Facebook every month 40% projected growth in global data generated per year vs. 5% growth in global IT spending