Sei sulla pagina 1di 34

Data Visualization in

Data Science
Maloy Manna
biguru.wordpress.com

linkedin.com/in/maloy

twitter.com/itsmaloy

Synopsis
Having data is not enough. Adding context to data is essential to understand the
data, find patterns and engage audiences. Data visualization is a key element of data
science, the interdisciplinary field which deals with finding insights from data.
In this webinar, we explore the roles of data visualization at different stages of
the data science process, and why it is essential.
We also look at how data is encoded visually with shape, size, color and other
variables and also the basic principles of visual encoding can be applied to build
better visualizations.
We cover narratives, types of bias and maps.
Finally we look at how various tools both open source and off-the-shelf
software thats used in data science to build effective data visualizations.

Speaker profile
Maloy Manna
Project Manager - Engineering
AXA Data Innovation Lab

Over 14 years experience building data driven products and services


Previous organizations: Thomson Reuters, Saama, Infosys, TCS

biguru.wordpress.com

linkedin.com/in/maloy

twitter.com/itsmaloy

Contents

Defining Data visualization


Data science process
Data visualization
Visual encoding of data
Narrative structures
Dataviz Technology & Tools

Defining Data visualization

Visual display of quantitative information


Mapping data to visual elements
Encoding data with size, shape, color...
Storytelling / narrative elements

Defining Data Visualization

Exploratory

Find insights
Conversation between data and you

Explanatory

Present insights

Data science project life-cycle

Acquire data
Prepare data
Analysis &
Modeling
Evaluation &
Interpretation
Deployment
Operations &
Optimization

Data science process

EDA:
Exploratory
Data Analysis
Data Wrangling
Exploratory

Explanatory

Data Visualization

Source: Computational Information Design | Ben Fry

Exploratory data visualization

Data analysis approaches:


Classical:
Problem > Data > Model > Analysis > Conclusions

EDA: [Exploratory Data Analysis]


Problem > Data > Analysis > Model > Conclusions

Bayesian:
Problem > Data > Model > Prior distribution > Analysis > Conclusions

EDA = approach, not a set of techniques

Exploratory data visualization


Statistical approaches:

Quantitative

Hypothesis testing
Analysis of variance (ANOVA)
Point estimates and confidence intervals
Least squares regression

Graphical

Scatter plots
Histograms
Probability plots
Residual plots
Box plots
Block plots

Exploratory data visualization


Graphical

Scatter plots
Histograms
Probability plots
Residual plots
Box plots
Block plots

Exploratory data visualization

Graphical analysis procedures:

Testing assumptions
Model selection
Model validation
Estimator selection
Relationship identification
Factor effect determination
Outlier detection

MUST USE for deriving insights from data

Exploratory data analysis

Anscombe's quartet
N=11
Mean of X = 9.0
Mean of Y = 7.5
Intercept = 3
Slope = 0.5
Residual standard deviation = 1.237
Correlation = 0.816

Exploratory data analysis

Explanatory data visualization

Design
Engineering
Journalism

Explanatory data visualization

Visualization is both an art and science

Harry Beck's subway map of London

Visual encoding of data


Data Types

Quantitative

Continuous, Discrete
Categorical

Nominal, Ordered, Interval

Visual encoding of data


Categorical scales and graph design

Visual encoding of data


Bandwidth of our senses: [Tor Norretranders]

Visual encoding of data

Data visual display elements

Position x
Position y
Retinal variables

Size, Orientation (ordered data)


Color Hue, Shape (nominal data)

Animation

Visual encoding of data

Ranking visual display elements (framework):


1.
2.

Position along a common-scale e.g. scatter plots


Position on identical but non-aligned scales

E.g. multiple scatter plots


3. Length e.g. bar chart
4. Angle & Slope e.g. pie-chart
5. Area e.g. bubbles
6.
7.

Volume, density & color saturation e.g. heat-map


Color hue e.g. highlights

Ref. Graphical Perception & graphical methods for analyzing scientific data William
Cleveland & Robert McGill (1985)

Design principles

Choose the right type of chart

Trends / Change over time Line charts


Distributions Histograms
Summary Information Table
Relationships Scatter Plots

Get it right in black & white (before adding color)


Prefer 2D to 3D for statistical charts
Use color to highlight
Avoid rainbow palette
Avoid chartjunk : less is more
Try to have a high data-ink ratio

Design principles

Choose the right type of chart

Ranking

Time-series

Correlation

Nominal comparison

Deviation

Narrative structures

Data Journalism
Traditional journalism

Data journalism

Data around narrative

Narrative around data

Linear flow

Complex, often non-linear flow

Physical static media

Online interactive media

Narrative structures

Narrative structures

Narrative structures
Bias (and ethics: Dont lie with data)

Bar-charts must have a zero-baseline

Present data in its context

Narrative structures
Bias: Misleading with data

Selective presentation with line-charts

Author Bias
Data Bias
Reader Bias

Narrative structures
Bias and Errors (statistics):

Selection bias e.g. in sampling


Omitted-variable bias

Errors:

Hypothesis testing
Null Hypothesis = default/no-effect state
Null Hypothesis H0

Valid

Invalid

Reject

Type I error
False positive

Correct inference
True positive

Accept

Correct inference
True negative

Type II error
False negative

Narrative structures
Storytelling:

Visual narratives have moved from author-driven to viewerdriven with use of highly interactive media for data visualization

Author-driven

Viewer-driven

Author driven

Viewer driven

Strong ordering

Exploratory

Heavy messaging

Ability to ask questions

Need for clarity and speed

Build own story

DataViz Technologies & Tools


Off-the-shelf:

Tableau, Qlikview

Tools:

Predefined charts: Raw, Chartio, Plotly


Google fusion tables, Excel, Gephi

Code & Javascript libraries:

R ggplot2, ggvis, rCharts + shiny(interactive apps)


Python matplotlib,
D3.js, Dimple.js, Leaflet, Rickshaw (use JSON data)
Linux gnuplot

DataViz Technologies & Tools


Tableau data viz

DataViz Technologies & Tools


Chart in R ggplot2

References
Visual display of Quantitative Information: Edward Tufte http://goo.gl/qb5ej
Exploratory Data Analysis: John Tukey http://goo.gl/tV57HP
Data Science Life cycle : Maloy Manna
http://www.datasciencecentral.com/profiles/blogs/the-data-science-project-lifecycle
Selecting right graph for your message: Stephen Few
www.perceptualedge.com/articles/ie/the_right_graph.pdf
Practical rules for using color in charts: Stephen Few
www.perceptualedge.com/articles/visual.../rules_for_using_color.pdf
OpenIntro Statistics: https://www.openintro.org/stat/
Misleading with statistics: Eric Portelance
https://medium.com/i-data/misleading-with-statistics-c63780efa928
Computational Information Design: Ben Fry
http://benfry.com/phd/dissertation-050312b-acrobat.pdf

Potrebbero piacerti anche