Sei sulla pagina 1di 92

Introduction to Data Visualization

“The greatest value of a picture is


when it forces us to notice
what we never expected to see”

John F Tukey
The vaccination debate was
all the rage again. “Pro-
vaxxers” were loudly
proclaiming that everyone
should get vaccinated and
discussing the science behind
it, and “anti-vaxxers” were
casting their doubts and still
refusing to get vaccinated for
personal reasons. Around that
time,

The Wall Street Journal


released a brilliant series of
heat maps showing infection
rates for various diseases over
time, broken down by state.
The Broad Street cholera outbreak was a severe outbreak of cholera that occurred near Broad Street in the Soho
district of London, England in 1854 (in the area now known as Carnaby Street). This outbreak is best known for the
physician John Snow's study of the outbreak and his hypothesis that contaminated water, not air, spread cholera.
The base map is a simple road network,
with few buildings named or depicted.

Study area is outlined along the relevant


road centerlines.

Water pumps around the neighborhood


are symbolized with points and
bold, uppercase labels.

Cholera deaths are depicted along the


road network in their correct locations by
address, with quantities measured by
parallel tick marks stacked next to the
road. The symbology, while simple, is
effective for a study of fatal disease. The
symbology of the cholera deaths is
reminiscent of large Plague events,
where bodies are stacked next to the
roadway for disposal.
The Swedish scientist Hans Rosling had been working with developmental data for
over 30 years. A great visualization and a 2007 TED talk helped him to share his
passion with the world.

3 Minutes 15 Seconds
In 1812, Napoleon marched to Moscow in order to conquer the city. 98% of his
soldiers died.

The Parisian engineer Charles Minard’s visualization still inspires those who see it to
ponder the true cost of war.
It displays six types of data in two dimensions:
1. the number of Napoleon's troops; 2. the distance traveled; 3. temperature;
4. latitude and longitude; 5. direction of travel; 6. location relative to specific dates
Florence Nightingale was a Data nerd who provided insights

1855. The Crimea. Britain is fighting a battle with both Russia and disease. As a nurse, how do you convince an
army to invest in hospitals and healthcare instead of guns and ammunition?
Florence Nightingale told her story with data by showing the staggering amount of deaths due to preventable
disease (shown in blue/grey).

After this vizualization, sanitation became a major priority for the British Army.
http://livelovetravelwork.com/florence-nightingale-data-nerd/
https://understandinguncertainty.org/coxcombs
Assignment 1.
Chart of Biography – Joseph Priestley

Write two A4 pages on this:


1. What is it?
2. Why is it so popular?
3. What is it that fascinated you about chart of biography?
4. What did it help in?
5. How do you relate this to analytics and visualization?
International Number Ones
International Number Ones
Agenda
• What is Visualization?
– Why Do Visualization?
– Why is Visualization Important?
– The Visualization Process
• Doing Visualization
– Data Sources
– Data Variables
– Representation Types
– Visualization Techniques
– Interactive or Batch?
– Data Types and Topologies
What is Data Visualization
• Data visualization is the process of converting raw data into easily understood
pictures of information that enable fast and effective decisions.

• Jacques Bertin who wrote the classic works of graphical


visualization “Semiology of Graphics” states that the
“transformation from numbers to insight requires two stages.”

Data/Processes Algorithm

Image Perception

Insight Jacques Bertin


Image Theory
• Visual Processing occurs in 3 steps.
1. Formation of the retinal image,
2. Decomposition of the retinal image information into an array of
specialized representations and
3. Reassembly of the information into object perception.
Image
• Bertin's key concept is the image, from which the theory
derives its name.
• Roughly speaking, an image is the fundamental perceptual unit
of a visualization.
– An ideal visualizations will contain only a single image in order to
optimize "efficiency," the speed with which observer can extract the
information
• You’re looking at the screen in front of you, it seems like you can see everything within an
180-degree angle, but in actuality, you can only see with full accuracy those things that lie in
a very narrow 2-degree field straight ahead of you.
• How is it then that we don’t see a blurry mass of things?
• Thanks to rapid eye movements called saccades, our eyes quickly dart around scenes to
create composite images from the aggregate information, thus creating the very believable
illusion that our eyes act like a 180-degree lens.

Alberto Cairo, in his book The Functional Art,


• What does this have to do with information design? It’s useful
to know that our eyes don’t fixate on random points in a scene
or image but rather prioritize.
• They first detect basic features and focus on things that stand
out such as moving objects, bright-colored patches and
uncommon shapes.
• These basic features are also called pre-attentive features
Preattentive
Attributes
• The difference between
the background and
foreground

• The higher the contrast between the


elements, the easier it is for the brain
to discern the difference
• the best data visualizations
deliberately use shade differences to
draw attention to certain key pieces
of information
Pre-attentive
Attributes
• To save time, the
brain has also
evolved to group
similar objects
together and quickly
identify objects that
are different.
Bertin’s 7 Visual
Variables
Gestalt Principles of Data Visualization

A configuration or pattern of elements so unified as a whole that it cannot be described merely as a sum of its parts

Prepared by :

Anshika Bhatnagar | Roll No. 20 | Section A


What is gestalt and principles of gestalt ?

• Gestalt refers to the patterns that you  Gestalt principles describe how our mind
perceive when presented with few graphical organizes individual visual elements into groups,
elements. to make sense of the entire visual
principles of gestalt

Let’s decode these images

• Proximity: We see three rows of dots instead of


four columns of dots because they are closer
horizontally than vertically.

• Similarity: We see similar looking objects as part


of the same group.

• Enclosure: We group the first four and last four


dots as two rows instead of eight dots.

• Symmetry: We see three pairs of symmetrical


brackets rather than six individual brackets.
principles of gestalt

• Closure: We automatically close the square and


circle instead of seeing three disconnected paths.
• Continuity: We see one continuous path instead
of three arbitrary ones.
• Connection: We group the connected dots as
belonging to the same group.
• Figure & ground: We either notice the two faces,
or the vase. Whichever we notice becomes the
figure, and the other the ground
Which principle is in action?

Continuity

Closure

Figure & ground Closure


https://www.usertesting.com/blog/gestalt-principles/
Six Step Process
• A good data visualizations should have a good visual story.
The whole story.
The Visual Process 1. Starting with who or what
is involved,
2. The quantities in question,
3. The relative positions of
those elements on graph
4. The sequence in which
they interact,
5. The insight that surfaces.
The Visual Process

require zero high-level processing to see and understand.


The orientation, length, width, or intersections of lines
the size, shape, color, and position of shapes and objects are immediately obvious to your audience’s
vision system.
The Visual Process

Validate:
Which chart is suitable based on precognition concept in
DV in the example below?

Shall require zero high-level processing to see and


understand.
The orientation, length, width, or intersections of
lines ; the size, shape, color, and position of shapes
and objects
Know Your Audience
Well !

Grades Grades
160
150 150 150 150

140
125 125
120

100

# Students
# Students

80 75
75
Series1
60
50 50
50 50
40

20

0
A B C D E F
Series1 150 150 125 50 75 50

A B C D E F Grades
The Visual Process
1) Who and what is involved. Give a visual summary of the people
and things you are going to be talking about.
2) How many are involved. Next, provide a quantitative measure (or
many measures) of the people or things. Changes in number
(trends) are particularly revealing.
3) Where the pieces are located. Present a map illustrating the
relative position of these people or things according to
geographical or conceptual coordinates.
4) When things occur. Show a timeline that illustrates the sequence
in which these people or things interact, or the steps required to
bring them into alignment.
5) How things impact each other. Provide a flowchart that adds
cause-and-effect influences superimposed on any (or all) of your
previous pictures; show the change and how you will achieve it.
6) Why this matters. Complete your visual story with a concluding
“visual equation” that summarizes the keep learnings, takeaways,
or action items triggered by the previous visual insights.
Start thinking visually
• Consider the Nature and Purpose of your visualization: Ask
these two questions: NATURE
Is the information
conceptual or
Is it Conceptual or Data Driven? data-driven?

PURPOSE
1. We will talk about the organizational
Am I declaring
structure of the firm.
something or
exploring
something?
2. We will talk about the past two years
revenue trends.
Start thinking visually
• Consider the Nature and Purpose of your visualization: Ask
these two questions:
Am I declaring
something or
exploring
something?

Is it declaration or Exploration?

We will present the budget allocation to different departments

We will present whether marketing investment contributes to higher profits?

We will present sales by location by product intelligence?


The Nature and Purpose 2×2

Using Concept based declaration:


Process diagrams, cycle diagrams Using data based declaration:
metaphors (trees, bridges) and Simple low volume graphs
simple design conventions (circles,
hierarchies). Org charts and
decision trees

Using Concept based Exploration: Using data based Exploration:


Brain storming, Making sense of Simple bi-variate or multivariate
complex undefined problems, graphs to find trends, association,
discovering information needs volatility, connections etc.
Which quadrant does this belong to?

Purpose: to show quarterly sales in a presentation

Purpose: To understand why the sales team’s performance has lagged lately?
The Dos and Don’ts of Data Visualization
• Time axis. When using time in charts, set it on the horizontal axis. Time should
run from left to right. Do not skip values (time periods), even if there are no
values.
• Proportional values. The numbers in a chart (displayed as bar, area, bubble, or
other physically measured element in the chart) should be directly proportional
to the numerical quantities presented.
• Data-Ink Ratio. Remove any excess information, lines, colors, and text from a
chart that does not add value.
• Sorting. For column and bar charts, to enable easier comparison, sort your data
in ascending or descending order by the value, not alphabetically. This applies
also to pie charts.
• Legend. You don’t need a legend if you have only one data category.
• Labels. Use labels directly on the line, column, bar, pie, etc., whenever possible,
to avoid indirect look-up.
The Dos and Don’ts of Data Visualization
• Inflation adjustment. When using monetary values in a long-term series, make sure
to adjust for inflation.
• Colors. In any chart, don’t use more than six colors.
• Colors. For comparing the same value at different time periods, use the same color
in a different intensity (from light to dark).
• Colors. For different categories, use different colors. The most widely used colors are
black, white, red, green, blue, and yellow.
• Colors. Keep the same color palette or style for all charts in the series, and same axes
and labels for similar charts to make your charts consistent and easy to compare.
• Colors. Check how your charts would look when printed out in grayscale. If you
cannot distinguish color differences, you should change hue and saturation of colors.
• Colors. Seven to 10 percent of men have color deficiency. Keep that in mind when
creating charts, ensuring they are readable for color-blind people. Use Vischeck to
test your images. Or, try to use color palettes that are friendly to color-blind people.
The Dos and Don’ts of Data Visualization
• Data Complexity. Don’t add too much information to a single chart. If necessary,
split data in two charts, use highlighting, simplify colors, or change chart type.
Rate this chart!
Ethics in visualization
Validate Dos and Don’ts

Honest : Lie factor = 1

Acceptable: 0.95< Lie factor < 1.05


Calculate the lie factor Can you point
the errors?

Height of bar 35 39.6


Baseline 34 34 Size of effect in graph =
Difference 1 5.6 (5.6-1)/1 = 4.6 OR 460%

Height of bar 35 39.6 Size of effect in data =


Baseline 0 0 (5.6)/35 = 1.1314 OR 13.14%
Difference 35 39.6

Lie 460/13.14 = 35%


factor =
https://venngage.com/blog/misleading-graphs/
Wrong Scaling
Wrong Magnification. Why not to use 3D?

put it in the language of geometry, shouldn't the value be


relative to the volume of the bag
The Psychological Lie Factor

• Gun deaths: X axis represent years


• Is there a decrease in gun deaths in 2005?

Here the lie factor is one, but still manage to fool


the eye into seeing something that wasn't in the
data. These charts are technically correct but they
lie because they are not in accordance with the
way common people are used to of seeing things

𝑊ℎ𝑎𝑡 𝑡ℎ𝑒 𝑒𝑓𝑓𝑒𝑐𝑡 look like 𝑡𝑜 𝑒𝑦𝑒


𝑃𝑠𝑦. 𝐿𝑖𝑒 𝐹𝑎𝑐𝑡𝑜𝑟 =
𝐴𝑐𝑡𝑢𝑎𝑙 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑒𝑓𝑓𝑒𝑐𝑡 𝑖𝑛 𝑔𝑟𝑎𝑝ℎ
https://venngage.com/blog/misleading-graphs/
Precognition: Data ink ratio
Tufte refers to data-ink as the non-erasable ink used for
the presentation of data
Labeling
Why Bar is better than Pie

Even with labeled % the pie size for Even without the % label the message is
Person A, B and C appears to be same. unambiguous
https://venngage.com/blog/misleading-graphs/
Pie?
Wrong Choice of Axis
No. of Market
Products Sales Share
14 11200 13
20 60000 23
18 14400 5

70000 30
30
60000 25
25
50000 20
20
40000 15
15
30000 10
10 5
20000
5 0
10000
0 0 5 10 15 20 25 30
0
0 5 10 15 20 25
0 5 10 15 20 25
Wrong Choice of Axis
F2F Phone Text IM IRC Mail Blogs Feeds Twitter
Immediacy, Lifespan, and
Immediacy 40 40 20 30 30 10 5 10 10 Audience scores were assigned
Lifespan 3 3 10 10 10 30 40 10 10 arbitrary in the range 0-40 to a
Audience 3 3 3 10 20 20 40 30 30 bunch of communication
modes.
Beautiful Vs Effective

What is a better idea?

A legend in the corner


Or
A crisp annotation on the object.
Idea to Handle Clutter

• Focus on the one


which contributes
to your story.
Idea to Handle Clutter

• Focus on the one


which contributes
to your story.
Table might bring more comprehensive information
For Colour Blinds

The data-viz rule:


“Don't use
red/green/brown/orange
together”

If you must use red and green together,


• leverage light vs. dark
• offer alternate methods of distinguishing data

https://www.tableau.com/about/blog/2016/4/examining-data-viz-rules-dont-use-red-green-together-53463
ACCENT Principles for effective graphical display
• Apprehension: Are you able to correctly perceive relations among
variables?
• Clarity: Are the most important elements or relations visually most
prominent?
• Consistency: Are the elements, symbol shapes and colors consistent with
their use in previous graphs?
• Efficiency: Is the graph easy to interpret?
• Necessity: Is the graph a more useful way to represent the data than
alternatives (table, text)?
• Truthfulness: Are the graph elements by their magnitude accurately
positioned and scaled relative to the implicit or explicit scale.?

http://www.datavis.ca/gallery/accent.php http://www.datavis.ca/gallery/index.php
variables :
Price, Gear Ratio and Turning Circle etc.
Larger values represent "better" for all variables;
All variables are first scaled to a 0-1 range.
Variables are arranged around the circle by a multivariate effect
ordering according to their order on the largest discriminant
dimension.
The error bars next to each radial axis shows the smallest value
of a difference between means required for a (univariate) .05
significant difference.

Rating 1 to 5
Apprehension: Correctly perceive relations among variables
Clarity: Are the most important elements visually most
prominent
Consistency: Are the elements, symbol shapes and colors
consistent
Efficiency: Is the graph easy to interpret
Necessity: Is it a more useful way to represent the data than
table
Truthfulness: Lie-factor
The goal of the graphic was to present results of a poll of
happiness from the World Values Survey project of people
throughout the world in relation to economic status, as
measured by GNP per capita.

Annotates Many countries, particularly those in Latin America,


had higher marks for happiness than their economic situation
would predict.

The main thing that is wrong here is the conclusion, based on


the assumption that happiness should be linearly related to
GNP.

Rating 1 to 5
Apprehension: Correctly perceive relations among variables
Clarity: Are the most important elements visually most
prominent
Consistency: Are the elements, symbol shapes and colors
consistent
Efficiency: Is the graph easy to interpret
Necessity: Is it a more useful way to represent the data than
table
Truthfulness: Lie-factor
ACCENT Principles for effective graphical display

http://www.datavis.ca/gallery/index.php

Visit this web resource and rate some Good and Bad graphs on 5 point rating scale on the following dimensions

Rating 1 to 5
Apprehension: Correctly perceive relations among variables
Clarity: Are the most important elements visually most
prominent
Consistency: Are the elements, symbol shapes and colors
consistent
Efficiency: Is the graph easy to interpret
Necessity: Is it a more useful way to represent the data than
table
Truthfulness: Lie-factor
Assignments and Websites
Based on your learning so far, create a presentation. Go through a website from the
list below or any other visualization learning website and pick at least one topic
which we haven’t covered in the class so far. It can even be a nice visualization
project.
You need to present that in the class so that others can also learn from it. This is
your third assignment as well. We’ll pick everybody to present at random starting
December 2nd class.
Have at least 2 questions for the class to be discussed as your first or last slide.
• http://www.storytellingwithdata.com/
• http://www.edwardtufte.com/tufte/
• http://www.visualcomplexity.com/
• http://www.webdesignerdepot.com/2009/06/50-great-examples-of-data-
visualization/
ADDITIONAL CONTENT
Some Visualization Tools
Tableau
Interactive

Guess R Many Eyes

NetworkX Parallel Sets


Sigma.js
Gephi Cobweb
Prefuse
GraphViz d3
InfoViz
matplotlib
Force-Directed Graph Cytoscape

Prefuse Gnuplot
NodeXL
Pajek Weka

Orange GUI
From the Memory Lanes
Reliability and Assumption
Types of Data Validity of a Data Cleaning Testing
measure. Inference

Nominal
Ordinal
Scale
Dimensions
Items
Factors
Questionnaire
Likert
Developing a Scale

Reliability and Assumption


Types of Data Validity of a Data Cleaning Testing
measure. Inference

Canon of research:
If something exists,
it can be measured
in numerals
Univariate outliers using Box-plot Multivariate outliers using M-distance

Reliability and Assumption


Types of Data Validity of a Data Cleaning Testing
measure. Inference

Outliers
Univariate and
multivariate
outliers
Reliability and Assumption
Types of Data Validity of a Data Cleaning Testing
measure. Inference
Some Fun
Some Fun
Examples
• Baby Name Wizard
http://www.babynamewizard.com/voyager

https://gramener.com/faces/

• Origin of Species – Edits


http://benfry.com/traces/

• Netflix Queues
http://www.nytimes.com/interactive/2010/01/10/nyregion/20100110-netflix-map.html?ref=nyregion

• Unemployment Visualization (NYTimes)


http://www.nytimes.com/interactive/2009/11/06/business/economy/unemployment-lines.html
A helpful list of questions to ask yourself
before publishing your visualization ask:
• Does this visualization answer all of your questions?
• Is the purpose of the visualization clearly explained in its title or
surrounding text?
• Can you understand the visualization in 30 second or less, without
additional information?
• Source: Tableau (2013)

Potrebbero piacerti anche