Sei sulla pagina 1di 11

Big data- A revolution that will transform how we live, work and think

By Victor Mayer-Schnberger & Kenneth Cukier

The main ideas, which I retained after reading the book are:

CORRELATION: Knowing what, not why, is good enough.


An example: Amazon is interested in knowing which books are often bought together. Amazon's
algorithms may pick up that people who buy books by Hemingway also tend to buy books by Scott
Fitzgerald. Amazon doesn't care why people buy those books together, it is only interested in what
people are buying together. The reason why people tend to like both of these authors is unimportant,
just the fact that there is a correlation between the 2 is relevant.

Big Data is about what, not why. Knowing why might be pleasant, but it's unimportant (e.g. for
driving sales at Amazon). The correlations show what, not why, but often knowing what is good
enough.
=> Is this the death of causality? Will cause and effect lose their relevance? If knowing why is not
important anymore, what will that do to human curiosity? And without curiosity, how do we advance?

DATAFICATION: the difference between digitization and datafication. To datafy a phenomenon is


to put it in a quantified format so it can be tabulated and analysed. This is very different from
digitization, which is the process of converting analogue information into zeros and ones of binary
code so computers can handle it.

This is very different from digitization, the process of converting analogue information into the zeros
and ones of binary code so computers can handle it. Measuring reality and recording data thrived
because of a combination of the tools and a receptive mind-set. That combination is the rich soil from
which modern Datafication has grown.

Next, the detail about the value of data: Unlike material things the food we eat, a candle that burns -
data's value does not diminish when it is used: it can be processed again and again.

Since data is an asset that is hard to value, the discrepancies between book value and market value
will only increase. Take Facebook: Its book value by accounting standards on the day of its IPO was $
6.3 billion, but the market valued Facebook that day at $ 104 billion. Most of the difference can be
explained by the data that Facebook has on its users.

BIG DATA VALUE CHAIN: The concept of the 3 step Big Data value chain:
- Owning the data.
- Having the skills to work the data and do the analysis
- Having the creative ideas for new uses of data. These people have the ''Big Data mind-set''.
Part Two

Big Data is that it does not take big data seriously enough. Although the authors have pedigree (Editor
at the Economist; Professor at Oxford) this is not an academic text: it belongs to that category of
popular essays that attempt to stimulate debate. Anyone who works with data (e.g. technologists,
scientists, politicians, consultants) or questions what will be borne from our age of data affluence may
have expectations for this book unfortunately it falls short on providing any real answer.

The book paints an impending revolution in mighty strokes. The authors claim the impact of data-
driven innovations will advance the march of humankind. What they end up presenting is a thin
collection of happy-ending business stories flight fare prediction, book recommendation, spell-
checkers and improved vehicle maintenance. Its too bad the books scientific champion Google Flu
Trends, a tool which predicts flu rates through search queries, has proven so fallible. Last February it
forecast almost twice the number of cases reported by the official count of the Center for Disease
Control.

Big data will certainly affect many processes in a range of industries and environments, however, this
book gestures at an inevitable social revolution in knowledge making (god is dead), for which I do
not find coherent evidence.

The book correctly points out that data is rapidly becoming the raw material of business. Many
organisations will tap into the new data affluence, the outcome of a long historical process that
includes datafication (Ill define later) and the diffusion of technologies that have tremendously
reduced the costs involved in data production, storage and processing.

So, wheres the revolution? The book argues for three rather simplistic shifts.

The first shift the new world is characterised by far more data. The authors say that just as a
movie emerges from a series of photographs, increasing amounts of data are as important because
quantitative changes bring about qualitative changes. The technical equivalent in big data is the ability
to survey a whole population instead of just sampling random portions of it.

The second shift is that looking at vastly more data also permits us to loosen up our desire for
exactitude. Apparently, in big data, with less error from sampling we can accept more measurement
error. According to the authors, science is obsessed with sampling and measurement error as a
consequence of coping in a small data world.

It would be amazing if the problems of sampling and measurement error really disappeared when
youre stuffed silly with data. But context matters, as Microsoft researcher Kate Crawford cogently
argues in her blog. It is easy to treat samples as n=all as data get closer to full coverage, yet
researchers still need to account for the representativeness of their sample. Consider how the digital
divide some people are on the Internet, others are not affects the data available to researchers.

While a missed prediction does not cause much damage if it is about book recommendations on
Amazon, a similar error when doing policy making through big data is potentially more serious.
Crawford reminds us that Google Flu Trends failed because of measurement error. In big data, data
are proxies of events, not the events themselves. Google Flu Trends cannot distinguish with certainty
people who have the flu from people who are just searching about it. Google may tune its predictions
on hundreds of millions of mathematical modelling exercises using billion of data points, but volume
is not enough. What matters is the nature of the data points and Google has apples mixed with
oranges.

The third and most radical shift implies we wont have to be fixated on causality the idea of
understanding the reasons behind all that happens. This is a straw man argument. The traditional
image of science the authors discuss (fixated with causality, paranoid about exactitude) conflates
principles with practices. Correlational thinking has been driving a lot of processes and institutional
behaviours in the real world. Nevertheless, Felix, qui potuit rerum cognoscere causas (Fortunate
who was able to know the causes of things) which happens to be the motto of the LSE is still
bedrock in Western political life and philosophy. The authors cannot dismiss causation so cavalierly.

However, it appears that they do. Big data, they say, means that the social sciences have lost their
monopoly on making sense of empirical data, as big-data analysis replaces the highly skilled survey
specialists of the past. The new algorithmists will be experts in the areas of computer science,
mathematics, and statistics; and they would act as reviewers of big data analyses and predictions.
This is an odd claim given that the social sciences are thriving precisely because expert narratives are
a necessary component of how data becomes operational. This book is a shining example that big data
speaks the narrative experts give it. What close observers know is that even at the most granular level
of practice, analytic understanding is necessary when managers attempt to implement these systems in
the world.

The book is blinded by its strongest assumption: that quantitative analysis is devoid of qualitative
assessment. For the authors, to datafy is merely to put a phenomenon in a quantified format so it can
be tabulated and analysed. Their argument, that mathematics gave new meaning to data it could
now be analysed, not just recorded and retrieved,implies that analysis begins only after phenomena
get reduced to quantifiable formats. Human judgement is just an inconvenience of a small data
world that has no role in the process of making data. This is why they warn that in the impending
world of big data, There will be a special need to carve out a place for the human.

It is hard to see how imagination and practical context will suddenly cease to play a fundamental role
in innovation. But innovation could definitely be jeopardised if big data systems are not recognized
for what they are tools for optimising resource management. Big data may not be an instrument of
discovery; while certainly it is a way of managing entities that are already known. Big data promises
to be financially valuable because it is primarily a managerial resource (e.g. pricing fares, finding
books, moving spare parts, etc.).

In the world according to Cukier and Mayer-Schnberger, all the challenges of knowledge-making are
about to evaporate. With big data affluence sampling, exactitude, and the pursuit of causality will no
longer be issues. The most pressing question is the problem of data valuation. Now there is a problem
the authors are willing to discuss seriously: how can data be transformed into a stable financial asset
when most of its utility as a predictive resource is not predictable?

So eager are the authors to mark the potential value of big data for organisations (data can only be an
asset to a corporation) that they overlook the impact of these systems on other social actors. So what if
big data environments reconfigure social inequalities? While the citizen will earn new responsibilities
(like privacy management), only corporate entities will be able to systematically generate, own and
exploit big data sets.

Big data is serious. There will be winners and there will be losers. What the public need is a book that
explains the stakes so that they can be active participants in this revolution, rather than be passive
recipients of corporate competition.

There have been a number of attempts to chronicle exactly what is big data and why anyone should
care. Last years The Human Face of Big Data by Rick Smolan and Jennifer Erwitt focused on telling
the personal stories behind big data (and accompanied these stories with some great photographs).
The year before, James Gleick wrote The Information: A History, A Theory, A Flood which
chronicled how information (and not just big data) has changed our world. The latest entrant is Big
Data: A Revolution That Will Transform How We Live, Work and Think by Viktor Mayer-
Schnberger and Kenneth Cukier which focuses heavily on explaining some of the more interesting
impacts of living in a big data world. (Personally, Im still not a fan of the term big data because 1)
the term scares off people who think this is equivalent to Big Oil and 2) the term underrepresents
the innovation happening around small data. But since this is the term used in the book, Ill stick
with it for this review.)

The first part of this book provides a fairly compelling vision of how big data is changing how we use
data. Unlike some technology proponents who simply ignore the past, Mayer-Schnberger and Cukier
make a point to highlight that the use of data itself is not new, but that information technology (IT)
has made it possible to collect and analyze data on a scale not seen in the past. The authors explore
three main changes they see arising from big data. First, we will have significantly more data
available than in the past. This means that we will be able to approach N = all for some datasets rather
than just using population samples. Second, as we increasingly quantify the world, we will have more
measurement error in our data, but that is okay because with much larger datasets the messiness of
data becomes less important. Third, we will focus much less on understanding causation (why) and
more on understanding correlation (what). (For a detailed look at this last point, see Chris
Andersons essay The End of Theory.)

While these chapters are interesting, Mayer-Schnberger and Cukier are at their best later in the book
when they describe the economic consequences of big data, both in terms of how data is creating
economic value and how data is disrupting many industries. Unlike other economic resources, the
value from data is not exhausted after its initial use. Instead, data can be reused an unlimited number
of times, either directly or by combining it with additional information. In addition, data exhaust
that would have been discarded in the past can now be put to practical use, such as Google using
typos entered by users in its search engine to create a better spell check program.

This is a crucial point. It is not always possible to know how data will be used when it is collected,
and even if some uses are identified, the value of big data comes from its reuse. Policymakers stuck in
the old way of thinking want to impose data minimization requirements which would effectively
create a use once policy for data. Instead, to take advantage of data-driven economic value, we need
policies that allow and encourage responsible reuse of data.

Mayer-Schnberger and Cukier offer one of the best metaphors for the new type of thinking that we
need around data. Using a normal camera, a photographer must decide when taking a photo where to
focus the lens. In contrast, plenoptic cameras, like the new Lytro camera, capture light field
information and allow photographers to change the focus of a picture after the picture has been taken.
Like photographers, most data users have been stuck having to decide how to use data at the outset.
But with increasingly lower costs for collection, storage and processing, users are now free to explore
possible uses after collecting it.

The authors also discuss the new value chain created by companies involved in big data. They identify
three primary value propositions: those providing data, those providing the skills, such as the
technology and the analytics, and those providing business opportunities. One of their more
interesting insights is that new business models are being created to take advantage of data
opportunities that do not fit into existing organizations. For example, the health insurers formed the
non-profit Health Care Cost Institute to combine data sets for research that individually they could not
perform. Similarly, UPS spun off its internal data analytics unit because it could provide substantially
more value if it had access to data from UPSs competitors, but this would never happen if it remained
part of the parent company. The authors argue that most of the value will be in the data part of the
value chain, but that it isnt there now. Unfortunately, such an assertion is impossible to prove or
disprove. We are still in the early stages of assigning value to data, both at the macro-economic level
and the firm level. Government statistics agencies need to include more than just goods and services if
they want to accurately measure the data economy (Mike Mandel has written a thoughtful piece on
this exact point).

While the authors also carve out a chapter to explore the dark side of big data, including privacy
and misuse, they mostly avoid the overwrought handwringing that typically characterizes writing on
this subject. And they recognize that much of the big data revolution does not involve personal data.
With regards to personal data, my primary criticism is that they unfairly dismiss de-identification
techniques, mostly relying on the critiques leveled by Paul Ohm, while ignoring the shortcomings of
his work described by individuals such as Jane Yakowitz or the continued advancement of differential
privacy research. They also get wrapped up in a surprisingly lengthy discussion of the risk of criminal
profiling similar to what was seen in the movie Minority Report, where individuals were arrested for
crimes before they were actually committed. While perhaps an interesting thought experiment, the
authors provide little evidence that this is anything but a far-fetched science-fiction nightmare

Other books on big data are:

Data Science for Business: What You Need to Know about Data Mining and Data-Analytic
Thinking: this book introduces the fundamental principles of data science, and walks you
through the "data-analytic thinking" necessary for extracting useful knowledge and
business value from the data you collect. This guide also helps you understand the many
data-mining techniques in use today.

Big Data: Using SMART Big Data, Analytics and Metrics To Make Better Decisions and
Improve Performance: the book,

Discusses how companies need to clearly define what it is they need to know

Outlines how companies can collect relevant data and measure the metrics that will help them
answer their most important business questions

Addresses how the results of big data analytics can be visualised and communicated to ensure
key decisions-makers understand them

Includes many high-profile case studies from the author's work with some of the world's best
known brands
Part 3

There is no rigorous definition of Big Data. Initially the idea was that the volume of information had
grown so large that the quantity of data being examined no longer fit into the memory that computers
use for processing, so engineers needed to revamp the tools they used for analysing it all. This led to
the origin of new processing technologies like Google's MapReduce and its open source equivalent,
Hadoop which came out of Yahoo. These let one manage far larger quantities of data than before, and
the data - importantly- need not be placed in tidy rows or classic database tables. At the same time,
because internet companies could collect vast troves of data and had a burning financial incentive to
make sense of them, they became the leading users of the latest processing technologies, superseding
offline companies that had, in some cases, decades more experience. Big a Data refers to things one
can do at a large scale that cannot be done at a smaller one, to extract new insights or create new
forms of value, in ways that change markets, organizations, the relationship between citizens and
governments, and more. Big data marks the beginning of a major transformation. To appreciate the
degree to which an information revolution is already underway, consider the processing of
astronomical quantities of data. When scientists first decoded the Human Genome in 2003, it took
them a decade of intensive work to sequence the three billion base pairs. Now, a single facility can
sequence that much DNA in a day. In finance, about seven billion shares change hands every day on
U.S. Equity markets, of which two-thirds is traded by computer algorithms based on mathematical
models that crunch mountains of data to predict gains while trying to reduce risk. From sciences to
healthcare, from banking to Internet, the sectors may be diverse yet together they tell a similar story:
the amount of data in the world is growing fast, outstripping not just our machines but our
imaginations. Consider an analogy from nanotechnology- where things get smaller, not bigger. The
principle behind nanotechnology is that when you get to the molecular level, the physical properties
can change. Knowing those new characteristics means you can devise materials to do things that
could not be done before. At the nanoscale, for example, more flexible metals and stretchable
ceramics are possible. Conversely, when we increase the scale of the data that we work with, we can
do new things that weren't possible when we just worked with smaller amounts. Sometimes the
constraints that we live with, and presume are the same for everything, are really only functions of the
scale in which we operate. Another analogy from science. For humans, the single most important
physical law is gravity: it reigns over all that we do. But for tiny insects, gravity is most immaterial.
For some like water strikers, the operative law of the physical universe is surface tension, which
allows them to walk across a pond without falling in. With information, as with physics, size matters.
Hence, Google is able to identify the prevalence of the H1N1 flu just about as well as official data
based on actual patient visits to the doctor. It can do this by combino through hundreds of billions of
search terms - and it can produce an answer in near real time, far faster than official sources.
Likewise, Etzioni's Farecast can predict the price volatility of an airplane ticket and thus shift
substantial economic power into the hands of the consumer. But both can do so well only by
analysing hundreds of billions of data points. These examples show the scientific and societal
importance of Big Data as well as the degree to which big data can become a source of economic
value. They mark two ways in which the world of big data is poised to shake up everything from
businesses and the sciences to healthcare, government, education, economics, the humanities, and
every other aspect of society. at its core, big data is about predictions. Though it is described as part of
the branch of computer science called Artificial Intelligence, and more specifically, an area called
machine learning, this characterisation is misleading. Big data is not about trying to "teach" a
computer to "think" like humans. Instead, it is about applying math to huge quantities of data in order
to infer probabilities: the likelihood than an email message is spam; that the typed letters " teh" are
supposed to be "the"; that the trajectory and velocity of a person jaywalking mean he'll make it across
the street on in time - the self-driving car need only slow slightly. The key is that these systems
perform well because they are fed with lots of data on which to base their predictions. Moreover, the
systems are built to improve themselves over time, by keeping tab on what are the best signals and
patterns to look for as more data is fed in. In the future - sooner than we may think - many aspects of
our world will be augmented or replaced by computer systems that today are the sole preview of
human judgement. Not just driving or matchmaking, but even more complex tasks. After all, Amazon
can recommend an ideal book, Google can rank the most relevant website, Facebook knows our likes,
and LinkedIn divines whom we know. The same technologies will be applied to diagnosing illnesses,
recommending treatments. Just as the internet radically changed the world by adding communication,
so too will big data change fundamental aspects of life by give it a quantitative dimension IT never
had before. And more and messy is good enough data. Because we can never have perfect
information, our predictions are inherently fallible. This doesn't mean they are wrong, only that they
are always incomplete. It doesn't negate the insights that big data offers, but it puts big data in its
place - as a tool that doesn't offer ultimate answers, just good enough ones to help us now until better
methods and hence better answers come along. It also suggests that we use this tool with a generous
degree of humility and humanity.
Some important facts from the book:

Google does it. Amazon does it. Walmart does it.

Google published a paper in Nature claiming that they could predict the spread of flu having
analysed 50m search terms and then run 450m different mathematical models. In 2009, their
model was more accurate and faster at predicting the spread than government statistics.

Oren Etzioni of Farecast took big data files of airline ticket prices relative to days before the flight
so it was able to calculate the optimum time for flight purchase. It crunches 200bn flight price
records to make its predictions, saving passenger an average of $50 a flight. Microsoft eventually
bought the company for $110m and integrated it into Bing.

Amazon uses customer data to give us recommendations based on our previous purchases. Google
uses our search data and other information it collects to sell ads and to fuel a host of other services
and products.

Why spread such a huge net in search of a handful of terrorist suspects? Why vacuum up data so
indiscriminately?

The new thinking is that people are the sum of their social relationships, online interactions and
connections with content. In order to fully investigate an individual, analysts need to look at the
widest possible penumbra of data that surrounds the person not just whom they know, but
whom those people know too, and so on.

Big data analytics are revolutionizing the way we see and process the world

Data is growing incredibly fast by one account, it is more than doubling every two years! As
storage costs plummet and algorithms improve, data-crunching techniques, once available only to
spy agencies, research labs and gigantic companies, are becoming increasingly democratized.

There has been a dramatic increase in the amount of data. Big data analysis has been made
possible by three technological advances: increased datafication of things, increased memory
storage capacity and increased processing power.
The advantage of n=all is it shows up correlations that would not appear under normal
circumstances.

Correlation does not equal causation

Rather than relying on intuition and experience (which is often unconsciously influenced by
other irrelevant factors), it relies on amassing a lot of clean data that has been uninfluenced by
emotions and prejudice.

Big data is often messy and incomplete. But the sheer scale of data compensates for this lack of
precision.

Big data will increasingly be used as the primary default mechanism for many decisions as it
increases accuracy and reduces irrelevant influences.

Datafication is the unearthing of data from seemingly undatafiable sources. The reality is these
days almost anything can be datafied from pressure points across a retail floor, through to
measuring sleep patterns via our mobile phones.

We are also seeing the datafication of people, and their relationships. Facebooks likes have
datafied sentiment but the rich data of all the personal interconnections provides a great source of
analysis

One of the biggest opportunities is the using of data for secondary uses.

There are three groups who are at the heart of the development of big data: The data owners; the
data analysts (who convert it into usable information) and finally the big data entrepreneurs (who
spot new uses that other people are blind to).

The essential point about big data is that change of scale leads to a change in state.

Dark Side of Big Data: The ability to capture personal data is often built deep into the tools we
use every day, from Web sites to smartphone apps, the authors write. And given the myriad ways
data can be reused, repurposed and sold to other companies, its often impossible for users to give
informed consent to innovative secondary uses that havent even been imagined when the data
was first collected.

Dark Side of Big Data: Big Data may bring about a situation in which judgments of culpability
are based on individualized predictions of future behaviour.

Big Data will employ predictive policing, crunching data to select what streets, groups and
individuals to subject to extra scrutiny, simply because an algorithm pointed to them as more
likely to commit crime.

Probabilities can negate the very idea of the presumption of innocence.

Too Much Data!

When scientists first decoded the human genome in 2003, it took them a decade of intensive work
to sequence the three billion base pairs. Now, a decade later, a single facility can sequence that
much DNA in a day

Big data is not the magic elixir for everything.

Big data is already transforming many aspects of our lives and ways of thinking, forcing us to
reconsider basic principles on how to encourage its growth and mitigate its potential for harm.
However, unlike our forebears during and after the printing revolution, we dont have centuries to
adjust; perhaps just a few years.

Relevancy of big data in todays business and management:

Weather
WeatherSignal works by repurposing the sensors in Android devices to map atmospheric readings.
Handsets such as the Samsun S4, contain a barometer, hygrometer (humidity), ambient thermometer
and lightmeter.
Obviously, the prospect of millions of personal weather stations feeding into one machine that will
average out readings is exciting, and one that has the potential to improve forecasting.

Heart Disease
IBM are predicting heart disease with big data. Analysis of electronic health record data could reveal
symptoms at earlier stages than previously.
IBM uses the Apache Unstructured Information Management Architecture (UIMA) to extract the
known signs and symptoms of heart failure from available text.
With no single strong indicator, only weak signals or co-morbidities, such as hypertension, diabetes,
associated medications, ECG and genomic data etc. can be analysed. Drawing out probabilities from
disparate and size-differing databases is a task for big data analytics.

Infectious diseases
Again IBM, this Venture Beat article looks at a model and data from the World Health Organization.
IBM looked at local climate and temperature to find correlations with how malaria spreads. This
analysis is used to predict the location of future outbreaks. The Spatio Temporal Epidemiological
Modeler (STEM) is free and open source.

Doctor performance
Crimson is a system that shows variables including complications, hospital readmissions and
measures of cost. It colour codes signals as to how well a doctor is performing against his or her
peers.
The technology has reduced average stay and average cost at the Long Beach Memorial Hospital.
One doctor was warned by a pharmacist that data showed one physician was using Levaquin, an
antibiotic, at a far higher rate than peers. With concerns about generating drug-resistant bacteria, the
physician was encouraged to reduce the usage of said antibiotic.
This particular medical group has used big data to make sure medication is correctly prescribed. 2012
data showing 76% of patients getting recommended shots compared with 56% in 2010.

Travel and hospitality


Keeping customers happy is key to the travel and hotel industry, but customer satisfaction can be hard
to gauge especially in a timely manner. Resorts and casinos, for example, have only a short window
of opportunity to turn around a customer experience thats going south fast. Big data analytics gives
these businesses the ability to collect customer data, apply analytics and immediately identify
potential problems before its too late.

Government
Certain government agencies face a big challenge: tighten the budget without compromising quality
or productivity. This is particularly troublesome with law enforcement agencies, which are struggling
to keep crime rates down with relatively scarce resources. And thats why many agencies use big data
analytics; the technology streamlines operations while giving the agency a more holistic view of
criminal activity.

Retail
Customer service has evolved in the past several years, as savvier shoppers expect retailers to
understand exactly what they need, when they need it. Big data analytics technology helps retailers
meet those demands. Armed with endless amounts of data from customer loyalty programs, buying
habits and other sources, retailers not only have an in-depth understanding of their customers, they
can also predict trends, recommend new products and boost profitability.

Health care
Big data is a given in the health care industry. Patient records, health plans, insurance information and
other types of information can be difficult to manage but are full of key insights once analytics are
applied. Thats why big data analytics technology is so important to heath care. By analyzing large
amounts of information both structured and unstructured quickly, health care providers can
provide lifesaving diagnoses or treatment options almost immediately.

Big data will help the management in:

Cost reduction. Big data technologies such as Hadoop and cloud-based analytics bring significant
cost advantages when it comes to storing large amounts of data plus they can identify more
Efficient ways of doing business.

Faster, better decision making. With the speed of Hadoop and in-memory analytics, combined
with the ability to analyse new sources of data, businesses are able to analyse information
immediately and make decisions based on what theyve learned.

New products and services. With the ability to gauge customer needs and satisfaction through
analytics comes the power to give customers what they want. Davenport points out that with big
data analytics, more companies are creating new products to meet customers needs.

Potrebbero piacerti anche