Sei sulla pagina 1di 38

1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

Advanced Visualization for Data


Scientists with Matplotlib
Contents: Basic plots, 3D plots and widgets

Veekesh Dhununjoy Follow


Mar 13 · 7 min read

A picture is worth a thousand words but a good visualization is worth


millions.

Visualization plays a fundamental role in communicating results in


many fields in today’s world. Without proper visualizations, it is very
hard to reveal findings, understand complex relationships among
variables and describe trends in the data.

In this blog post, we’ll start by plotting the basic plots with Matplotlib
and then drill down into some very useful advanced visualization
techniques such as “The mplot3d Toolkit” (to generate 3D plots) and
widgets.

The Vancouver property tax report dataset has been used to explore
different types of plots in the Matplotlib library. The dataset contains
information on properties from BC Assessment (BCA) and City sources
including Property ID, Year Built, Zone Category, Current Land Value, etc.

A Link to the codes is mentioned at the bottom of this blog.

. . .

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 1/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

Matplotlib Basic Plots

Frequently used commands in the given examples:

plt. gure(): To create a new figure


plt.plot(): Plot y versus x as lines and/or markers
plt.xlabel(): Set the label for the x-axis
plt.ylabel(): Set the label for the y-axis
plt.title(): Set a title for the axes
plt.grid(): Configure the grid lines
plt.legend(): Place a legend on the axes
plt.save g(): To save the current figure on the disk
plt.show(): Display a figure
plt.clf(): Clear the current figure(useful to plot multiple figures in the
same code)

1. Line Plot

A line plot is a basic chart that displays information as a series of data


points called markers connected by straight line segments.

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 2/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 # Line plot.
2
3 # Importing matplotlib to plot the graphs.
4 import matplotlib.pyplot as plt
5
6 # Importing pandas for using pandas dataframes.
7 import pandas as pd
8
9 # Reading the input file.
10 df = pd.read_csv("property_tax_report_2018.csv")
11
12 # Removing the null values in PROPERTY_POSTAL_CODE.
13 df = df[(df['PROPERTY_POSTAL_CODE'].notnull())]
14
15 # Grouping by YEAR_BUILT and aggregating based on PID to
16 df = df[['PID', 'YEAR_BUILT']].groupby('YEAR_BUILT', as_
17
18 # Filtering YEAR_BUILT and keeping only the values betwe
19 df = df[(df['YEAR_BUILT'] >= 1900) & (df['YEAR_BUILT'] <
20
21 # X-axis: YEAR_BUILT
22 x = df['YEAR_BUILT']
23
24 # Y-axis: Number of properties built.
25 y = df['No_of_properties_built']
26
27 # Change the size of the figure (in inches).
28 plt.figure(figsize=(17,6))
29
30 # Plotting the graph using x and y with 'dodgerblue' col
31 # Different labels can be given to different lines in th
32 # Linewidth determines the width of the line.
33 plt.plot(x, y, 'dodgerblue', label = 'Number of properti
34 # plt.plot(x2, y2, 'red', label = 'Line 2', linewidth =
35
36 # X-axis label.
37 plt.xlabel('YEAR', fontsize = 16)
38

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 3/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

The above code snippet can be used to create a line graph. Here, Pandas
Dataframe has been used to perform basic data manipulations. After
reading and processing the input dataset, plt.plot() is used to plot the
line graph with Year on the x-axis and the Number of properties built on
the y-axis.

2. Bar Plot

A bar graph displays categorical data with rectangular bars of heights or


lengths proportional to the values which they represent.

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 4/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 # Bar plot.
2
3 # Importing matplotlib to plot the graphs.
4 import matplotlib.pyplot as plt
5
6 # Importing pandas for using pandas dataframes.
7 import pandas as pd
8
9 # Reading the input file.
10 df = pd.read_csv("property_tax_report_2018.csv")
11
12 # Removing the null values in PROPERTY_POSTAL_CODE.
13 df = df[(df['PROPERTY_POSTAL_CODE'].notnull())]
14
15 # Grouping by YEAR_BUILT and aggregating based on PID to
16 df = df[['PID', 'YEAR_BUILT']].groupby('YEAR_BUILT', as_
17
18 # Filtering YEAR_BUILT and keeping only the values betwe
19 df = df[(df['YEAR_BUILT'] >= 1900) & (df['YEAR_BUILT'] <
20
21 # X-axis: YEAR_BUILT
22 x = df['YEAR_BUILT']
23
24 # Y-axis: Number of properties built.
25 y = df['No_of_properties_built']
26
27 # Change the size of the figure (in inches).
28 plt.figure(figsize=(17,6))
29
30 # Plotting the graph using x and y with 'dodgerblue' col
31 # Different labels can be given to different bar plots i
32 # Linewidth determines the width of the line.
33 plt.bar(x, y, label = 'Number of properties built', colo
34 # plt.bar(x2, y2, label = 'Bar 2', color = 'red', width
35
36 # X-axis label.
37 plt.xlabel('YEAR', fontsize = 16)
38

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 5/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

The above code snippet can be used to create a Bar graph.

3. Histogram

A histogram is an accurate representation of the distribution of


numerical data. It is an estimate of the probability distribution of a
continuous variable.

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 6/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 # Histogram
2
3 # Importing matplotlib to plot the graphs.
4 import matplotlib.pyplot as plt
5
6 # Importing pandas for using pandas dataframes.
7 import pandas as pd
8
9 # Reading the input file.
10 df = pd.read_csv("property_tax_report_2018.csv")
11
12 # Removing the null values in PROPERTY_POSTAL_CODE.
13 df = df[(df['PROPERTY_POSTAL_CODE'].notnull())]
14
15 # Grouping by YEAR_BUILT and aggregating based on PID to
16 df = df[['PID', 'YEAR_BUILT']].groupby('YEAR_BUILT', as_
17
18 # Filtering YEAR_BUILT and keeping only the values betwe
19 df = df[(df['YEAR_BUILT'] >= 1900) & (df['YEAR_BUILT'] <
20
21 # Change the size of the figure (in inches).
22 plt.figure(figsize=(17,6))
23
24 # X-axis: Number of properties built from 1900 to 2018.
25 # Y-axis: Frequency.
26 plt.hist(df['No_of_properties_built'],
27 bins = 50,
28 histtype='bar',
29 rwidth = 1.0,
30 color = 'dodgerblue',
31 alpha = 0.8
32 )
33
34 # X-axis label.
35 plt.xlabel('Number of properties built from 1900 to 2018

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 7/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

The above code snippet can be used to create a Histogram.

4. Pie Chart

A pie chart is a circular statistical graphic which is divided into slices to


illustrate numerical proportions. In a pie chart, the arc length of each
slice is proportional to the quantity it represents.

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 8/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 # Pie-chart.
2
3 # Importing matplotlib to plot the graphs.
4 import matplotlib.pyplot as plt
5
6 # Importing pandas for using pandas dataframes.
7 import pandas as pd
8
9 # Reading the input file.
10 df = pd.read_csv("property_tax_report_2018.csv")
11
12 # Filtering out the null values in ZONE_CATEGORY
13 df = df[df['ZONE_CATEGORY'].notnull()]
14
15 # Grouping by ZONE_CATEGORY and aggregating based on PID
16 df_zone_properties = df.groupby('ZONE_CATEGORY', as_inde
17
18 # Counting the total number of properties.
19 total_properties = df_zone_properties['No_of_properties'
20
21 # Calculating the percentage share of each ZONE for the
22 df_zone_properties['percentage_of_properties'] = ((df_zo
23
24 # Finding the ZONES with the top-5 percentage share in t
25 df_top_10_zone_percentage = df_zone_properties.nlargest(
26
27 # Change the size of the figure (in inches).
28 plt.figure(figsize=(8,6))
29
30 # Slices: percentage_of_properties.
31 slices = df_top_10_zone_percentage['percentage_of_proper
32 # Categories: ZONE_CATEGORY.
33 categories = df_top_10_zone_percentage['ZONE_CATEGORY']
34 # For different colors: https://matplotlib.org/examples/
35 cols = ['purple', 'red', 'green', 'orange', 'dodgerblue'
36
37 # Plotting the pie-chart.
38 plt.pie(slices,

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 9/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

The above code snippet can be used to create a Pie chart.

5. Scatter Plot

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 10/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 # Scatter plot.
2
3 # Importing matplotlib to plot the graphs.
4 import matplotlib.pyplot as plt
5
6 # Importing pandas for using pandas dataframes.
7 import pandas as pd
8
9 # Reading the input file.
10 df = pd.read_csv("property_tax_report_2018.csv")
11
12 # Removing the null values in PROPERTY_POSTAL_CODE.
13 df = df[(df['PROPERTY_POSTAL_CODE'].notnull())]
14
15 # Grouping by YEAR_BUILT and aggregating based on PID to
16 df = df[['PID', 'YEAR_BUILT']].groupby('YEAR_BUILT', as_
17
18 # Filtering YEAR_BUILT and keeping only the values betwe
19 df = df[(df['YEAR_BUILT'] >= 1900) & (df['YEAR_BUILT'] <
20
21 # X-axis: YEAR_BUILT
22 x = df['YEAR_BUILT']
23
24 # Y-axis: Number of properties built.
25 y = df['No_of_properties_built']
26
27 # Change the size of the figure (in inches).
28 plt.figure(figsize=(17,6))
29
30 # Plotting the scatter plot.
31 # For different types of markers: https://matplotlib.org
32 plt.scatter(x, y, label = 'Number of properties built',s
33 alpha = 0.8, marker = '.', edgecolors='black
34
35 # X-axis label.
36 plt.xlabel('YEAR', fontsize = 16)
37
38 # Y-axis label

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 11/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

The above code snippet can be used to create a Scatter plot.

6. Working with Images

Link to download the Lenna test image. (Source: Wikipedia)

1 # Reading, displaying and saving an image.


2
3 # Importing matplotlib pyplot and image.
4 import matplotlib.pyplot as plt
5 import matplotlib.image as mpimg
6
7 # Reading the image from the disk.
8 image = mpimg.imread('Lenna_test_image.png')
9
10 # Displaying the image.

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 12/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

. . .

3D Plots using Matplotlib

3D plots play an important role in visualizing complex data in three or


more dimensions.

1. 3D Scatter Plot

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 13/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 '''
2 ==============
3 3D scatterplot
4 ==============
5
6 Demonstration of a basic scatterplot in 3D.
7 '''
8
9 # Import libraries
10 from mpl_toolkits.mplot3d import Axes3D
11 import matplotlib.pyplot as plt
12 from matplotlib.lines import Line2D
13 import numpy as np
14 import pandas as pd
15
16 # Create figure object
17 fig = plt.figure()
18
19 # Get the current axes, creating one if necessary.
20 ax = fig.gca(projection='3d')
21
22 # Get the Property Tax Report dataset
23 # Dataset link: https://data.vancouver.ca/datacatalogue/
24 data = pd.read_csv('property_tax_report_2018.csv')
25
26 # Extract the columns and do some transformations
27 yearWiseAgg = data[['PID','CURRENT_LAND_VALUE']].groupby
28 yearWiseAgg = yearWiseAgg.reset_index().dropna()
29
30 # Define colors as red, green, blue
31 colors = ['r', 'g', 'b']
32
33 # Get only records which have more than 2000 properties
34 morethan2k = yearWiseAgg.query('PID>2000')
35
36 # Get shape of dataframe
37 dflen = morethan2k.shape[0]
38
39 # Fetch land values from dataframe
40 lanvalues = (morethan2k['CURRENT_LAND_VALUE']/2e6).tolis
41
42 # C t li t f l f h i t di t

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 14/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

42 # Create a list of colors for each point corresponding t


43 c_list = []
44 for i,value in enumerate(lanvalues):
45 if value>0 and value<1900:

3D scatter plots are used to plot data points on three axes in an attempt
to show the relationship between three variables. Each row in the data
table is represented by a marker whose position depends on its values
in the columns set on the X, Y, and Z axes.

2. 3D Line Plot

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 15/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 '''
2 ==============
3 3D lineplot
4 ==============
5
6 Demonstration of a basic lineplot in 3D.
7 '''
8
9 # Import libraries
10 import matplotlib as mpl
11 from mpl_toolkits.mplot3d import Axes3D
12 import numpy as np
13 import matplotlib.pyplot as plt
14
15 # Set the legend font size to 10
16 mpl.rcParams['legend.fontsize'] = 10
17
18 # Create figure object
19 fig = plt.figure()
20
21 # Get the current axes, creating one if necessary.
22 ax = fig.gca(projection='3d')
23
24 # Create data point to plot

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 16/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

3D Line Plots can be used in the cases when we have one variable that is
constantly increasing or decreasing. This variable can be placed on the
Z-axis while the change of the other two variables can be observed in
the X-axis and Y-axis w.r.t Z-axis. For example, if we are using time series
data (such as planetary motions) the time can be placed on Z-axis and
the change in the other two variables can be observed from the
visualization.

3. 3D Plots as Subplots

1 '''
2 ====================
3 3D plots as subplots
4 ====================
5
6 Demonstrate including 3D plots as subplots.
7 '''
8
9 import matplotlib.pyplot as plt
10 from mpl_toolkits.mplot3d.axes3d import Axes3D, get_test
11 from matplotlib import cm
12 import numpy as np
13
14
15 # set up a figure twice as wide as it is tall
16 fig = plt.figure(figsize=plt.figaspect(0.5))
17
18 #===============
19 # First subplot
20 #===============
21 # set up the axes for the first plot
22 ax = fig.add_subplot(1, 2, 1, projection='3d')
23
24 # plot a 3D surface like in the example mplot3d/surface3
25 # Get equally spaced numbers with interval of 0.25 from
26 X = np.arange(-5, 5, 0.25)
27 Y = np.arange(-5, 5, 0.25)
28 # Convert it into meshgrid for plotting purpose using x
29 X, Y = np.meshgrid(X, Y)
30 R = np.sqrt(X**2 + Y**2)
31 Z = np.sin(R)

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 17/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

The above code snippet can be used to create multiple 3D plots as


subplots in the same figure. Both the plots can be analyzed
independently.

4. Contour Plot

1 '''
2 ==============
3 Contour Plots
4 ==============
5 Plot a contour plot that shows intensity
6 '''
7
8 # Import libraries
9 from mpl_toolkits.mplot3d import axes3d
10 import matplotlib.pyplot as plt
11 from matplotlib import cm
12
13 # Create figure object
14 fig = plt.figure()
15
16 # Get the current axes, creating one if necessary.
17 ax = fig.gca(projection='3d')
18
19 # Get test data

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 18/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

The above code snippet can be used to create contour plots. Contour
plots can be used for representing a 3D surface on a 2D format. Given a
value for the Z-axis, lines are drawn for connecting the (x,y) coordinates
where that particular z value occurs. Contour plots are generally used
for continuous variables rather than categorical data.

5. Contour Plot with Intensity

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 19/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 '''
2 ==============
3 Contour Plots
4 ==============
5 Plot a contour plot that shows intensity
6 '''
7 # Import libraries
8 from mpl_toolkits.mplot3d import axes3d
9 import matplotlib.pyplot as plt
10 from matplotlib import cm
11
12 # Create figure object
13 fig = plt.figure()
14
15 # Get the current axes, creating one if necessary.
16 ax = fig.gca(projection='3d')
17
18 # Get test data

The above code snippet can be used to create filled contour plots.

6. Surface Plot

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 20/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 """
2 ========================
3 Create 3d surface plots
4 ========================
5 Plot a contoured surface plot
6 """
7
8 # Import libraries
9 from mpl_toolkits.mplot3d import Axes3D
10 import matplotlib.pyplot as plt
11 from matplotlib import cm
12 from matplotlib.ticker import LinearLocator, FormatStrFo
13 import numpy as np
14
15 # Create figures object
16 fig = plt.figure()
17
18 # Get the current axes, creating one if necessary.
19 ax = fig.gca(projection='3d')
20
21 # Make data.
22 X = np.arange(-5, 5, 0.25)
23 Y = np.arange(-5, 5, 0.25)
24 X, Y = np.meshgrid(X, Y)
25 R = np.sqrt(X**2 + Y**2)
26 Z = np.sin(R)

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 21/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

The above code snippet can be used to create Surface plots which are
used for plotting 3D data. They show a functional relationship between
a designated dependent variable (Y), and two independent variables (X
and Z) rather than showing the individual data points. A practical
application for the above plot would be to visualize how the Gradient
Descent algorithm converges.

7. Triangular Surface Plot

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 22/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 '''
2 ======================
3 Triangular 3D surfaces
4 ======================
5
6 Plot a 3D surface with a triangular mesh.
7 '''
8 # Import libraries
9 from mpl_toolkits.mplot3d import Axes3D
10 import matplotlib.pyplot as plt
11 import numpy as np
12
13 # Create figures object
14 fig = plt.figure()
15
16 # Get the current axes, creating one if necessary.
17 ax = fig.gca(projection='3d')
18
19 # Set parameters
20 n_radii = 8
21 n_angles = 36
22
23 # Make radii and angles spaces (radius r=0 omitted to el
24 radii = np.linspace(0.125, 1.0, n_radii)
25 angles = np.linspace(0, 2*np.pi, n_angles, endpoint=Fals

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 23/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

The above code snippet can be used to create Triangular Surface plot.

8. Polygon Plot

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 24/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 '''
2 ==============
3 Polygon Plots
4 ==============
5 Plot a polygon plot
6 '''
7 # Import libraries
8 from mpl_toolkits.mplot3d import Axes3D
9 from matplotlib.collections import PolyCollection
10 import matplotlib.pyplot as plt
11 from matplotlib import colors as mcolors
12 import numpy as np
13
14 # Fixing random state for reproducibility
15 np.random.seed(19680801)
16
17 def cc(arg):
18 '''
19 Shorthand to convert 'named' colors to rgba format a
20 '''
21 return mcolors.to_rgba(arg, alpha=0.6)
22
23
24 def polygon_under_graph(xlist, ylist):
25 '''
26 Construct the vertex list which defines the polygon
27 the (xlist, ylist) line graph. Assumes the xs are i
28 '''
29 return [(xlist[0], 0.), *zip(xlist, ylist), (xlist[-
30
31 # Create figure object
32 fig = plt.figure()
33
34 # Get the current axes, creating one if necessary.
35 ax = fig.gca(projection='3d')
36
37 # Make verts a list, verts[i] will be a list of (x,y) pa
38 verts = []
39

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 25/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

The above code snippet can be used to create Polygon Plots.

9. Text Annotations in 3D

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 26/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 '''
2 ======================
3 Text annotations in 3D
4 ======================
5
6 Demonstrates the placement of text annotations on a 3D p
7
8 Functionality shown:
9 - Using the text function with three types of 'zdir' val
10 an axis name (ex. 'x'), or a direction tuple (ex. (1,
11 - Using the text function with the color keyword.
12 - Using the text2D function to place text on a fixed pos
13 '''
14 # Import libraries
15 from mpl_toolkits.mplot3d import Axes3D
16 import matplotlib.pyplot as plt
17
18 # Create figure object
19 fig = plt.figure()
20
21 # Get the current axes, creating one if necessary.
22 ax = fig.gca(projection='3d')
23
24 # Demo 1: zdir
25 zdirs = (None, 'x', 'y', 'z', (1, 1, 0), (1, 1, 1))
26 xs = (1, 4, 4, 9, 4, 1)
27 ys = (2, 5, 8, 10, 1, 2)
28 zs = (10, 3, 8, 9, 1, 8)
29
30 for zdir, x, y, z in zip(zdirs, xs, ys, zs):
31 label = '(%d, %d, %d), dir=%s' % (x, y, z, zdir)
32 ax text(x y z label zdir)

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 27/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

The above code snippet can be used to create text annotations in 3D


plots. It is very useful when creating 3D plots as changing the angles of
the plot does not distort the readability of the text.

10. 2D Data in 3D Plot

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 28/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 """
2 =======================
3 Plot 2D data on 3D plot
4 =======================
5
6 Demonstrates using ax.plot's zdir keyword to plot 2D dat
7 selective axes of a 3D plot.
8 """
9
10 # Import libraries
11 from mpl_toolkits.mplot3d import Axes3D
12 import matplotlib.pyplot as plt
13 from matplotlib.lines import Line2D
14 import numpy as np
15 import pandas as pd
16
17 # Create figure object
18 fig = plt.figure()
19
20 # Get the current axes, creating one if necessary.
21 ax = fig.gca(projection='3d')
22
23 # Get the Property Tax Report dataset
24 # Dataset link: https://data.vancouver.ca/datacatalogue/
25 data = pd.read_csv('property_tax_report_2018.csv')
26
27 # Extract the columns and do some transformations
28 yearWiseAgg = data[['PID','CURRENT_LAND_VALUE']].groupby
29 yearWiseAgg = yearWiseAgg.reset_index().dropna()
30
31 # Where zs takes either an array of the same length as x
32 # and zdir takes ‘x’, ‘y’ or ‘z’ as direction to use as
33 ax.plot(yearWiseAgg['PID'],yearWiseAgg['YEAR_BUILT'], zs
34
35 # Define colors as red, green, blue
36 colors = ['r', 'g', 'b']
37
38 # Get only records which have more than 2000 properties
39 morethan2k = yearWiseAgg.query('PID>2000')
40
41 # Get shape of dataframe
42 dfl th 2k h [0]

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 29/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

42 dflen = morethan2k.shape[0]
43
44 # Fetch land values from dataframe
45 lanvalues = (morethan2k['CURRENT_LAND_VALUE']/2e6).tolis
46
47 # Create a list of colors for each point corresponding t
48 c_list = []
49 for i,value in enumerate(lanvalues):

The above code snippet can be used to plot 2D data in a 3D plot. It is


very useful as it allows to compare multiple 2D plots in 3D.

11. 2D Bar Plot in 3D

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 30/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 """
2 ========================================
3 Create 2D bar graphs in different planes
4 ========================================
5
6 Demonstrates making a 3D plot which has 2D bar graphs pr
7 planes y=0, y=1, etc.
8 """
9
10 # Import libraries
11 from mpl_toolkits.mplot3d import Axes3D
12 import matplotlib.pyplot as plt
13 from matplotlib.lines import Line2D
14 import numpy as np
15 import pandas as pd
16
17 # Create figure object
18 fig = plt.figure()
19
20 # Get the current axes, creating one if necessary.
21 ax = fig.gca(projection='3d')
22
23 # Get the Property Tax Report dataset
24 # Dataset link: https://data.vancouver.ca/datacatalogue/
25 data = pd.read_csv('property_tax_report_2018.csv')
26
27 # Groupby Zone catrgory and Year built to seperate for e
28 newdata = data.groupby(['YEAR_BUILT','ZONE_CATEGORY']).a
29
30 # Create list of years that are found in all zones that
31 years = [1995,2000,2005,2010,2015,2018]
32
33 # Create list of Zone categoreis that we want to plot
34 categories = ['One Family Dwelling', 'Multiple Family Dw
35
36 # Plot bar plot for each category

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 31/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

. . .

The above code snippet can be used to create multiple 2D bar plots in a
single 3D space to compare and analyze the differences.

Widgets in Matplotlib

So far we have been dealing with static plots where the user can only
visualize the charts or graphs without any interaction. However,
widgets provide this level of interactivity to the user for better
visualizing, filtering and comparing data.

1. Checkbox widget

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 32/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 import numpy as np
2 import matplotlib.pyplot as plt
3 from matplotlib.widgets import CheckButtons
4 import pandas as pd
5
6 df = pd.read_csv("property_tax_report_2018.csv")
7
8 # filter properties built on or after 1900
9 df_valid_year_built = df.loc[df['YEAR_BUILT'] >= 1900]
10 # retrieve PID, YEAR_BUILT and ZONE_CATEGORY only
11 df1 = df_valid_year_built[['PID', 'YEAR_BUILT','ZONE_CAT
12 # create 3 dataframes for 3 different zone categories
13 df_A = df1.loc[df1['ZONE_CATEGORY'] == 'Industrial']
14 df_B = df1.loc[df1['ZONE_CATEGORY'] == 'Commercial']
15 df_C = df1.loc[df1['ZONE_CATEGORY'] == 'Historic Area']
16 # retrieve the PID and YEAR_BUILT fields only
17 df_A = df_A[['PID','YEAR_BUILT']]
18 df_B = df_B[['PID','YEAR_BUILT']]
19 df_C = df_C[['PID','YEAR_BUILT']]
20 # Count the number of properties group by YEAR_BUILT
21 df2A = df_A.groupby(['YEAR_BUILT']).count()
22 df2B = df_B.groupby(['YEAR_BUILT']).count()
23 df2C = df_C.groupby(['YEAR_BUILT']).count()
24
25 # create line plots for each zone category
26 fig, ax = plt.subplots()
27 l0, = ax.plot(df2A, lw=2, color='k', label='Industrial')
28 l1, = ax.plot(df2B, lw=2, color='r', label='Commercial')
29 l2, = ax.plot(df2C, lw=2, color='g', label='Historic Are
30 # Adjusting the space around the figure
31 plt.subplots_adjust(left=0.2)
32 # Addinng title and labels
33 plt.title('Count of properties built by year')
34 plt.xlabel('Year Built')
35 plt.ylabel('Count of Properties Built')
36
37 #create a list for each zone category line plot

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 33/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

As you can see from the above graph, Matplotlib allows the user to
customize which graph to show with the help of checkboxes. This can
be particularly useful when there are many different categories making
comparisons difficult. Hence, widgets make it easier to isolate and
compare distinct graphs and reduce clutter.

2. Slider widget to control the visual properties of plots

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 34/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

1 import numpy as np
2 import matplotlib.pyplot as plt
3 from matplotlib.widgets import Slider, Button, RadioButt
4
5 # configure subplot
6 fig, ax = plt.subplots()
7 plt.subplots_adjust(left=0.25, bottom=0.25)
8 t = np.arange(0.0, 1.0, 0.001)
9
10 #set initial values of frequency and amplification
11 a0 = 5
12 f0 = 3
13 delta_f = 5.0
14 s = a0*np.cos(2*np.pi*f0*t)
15 l, = plt.plot(t, s, lw=2, color='red')
16
17 # plot cosine curve
18 plt.axis([0, 1, -10, 10])
19
20 #configure axes
21 axcolor = 'lightgoldenrodyellow'
22 axfreq = plt.axes([0.25, 0.1, 0.65, 0.03], facecolor=axc
23 axamp = plt.axes([0.25, 0.15, 0.65, 0.03], facecolor=axc
24
25 # add slider for Frequency and Amplification
26 sfreq = Slider(axfreq, 'Freq', 0.1, 30.0, valinit=f0, va
27 samp = Slider(axamp, 'Amp', 0.1, 10.0, valinit=a0)
28
29 # function to update the graph when frequency or amplifi
30 def update(val):
31 # get current amp value
32 amp = samp.val
33 # get current freq value
34 freq = sfreq.val
35 # plot cosine curve with updated values of amp and f
36 l.set_ydata(amp*np.cos(2*np.pi*freq*t))
37 # redraw the figure
38 fig.canvas.draw_idle()
39 # update slider frequency
40 sfreq.on_changed(update)
41 # update amp frequency
42 h d( d t )

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 35/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

42 samp on changed(update)

Matplotlib slider is very useful to visualize variations of parameters in


graphs or mathematical equations. As you can see, the slider enables the
user to change the values of the variables/parameters and view the
change instantly.

. . .

Where to go from here?

If you are interested in exploring more interactive plots with modern


design aesthetics, we recommend checking out Dash by Plotly.

This is it, folks. I hope you find this post useful. The full code (Jupyter
Notebook and Python files) can be found here. Due to the limitations of
Jupyter Notebook, the interactive plots (3D and widget) do not work
properly. Hence, the 2D plots are provided in a Jupyter Notebook and
the 3D and widget plots are provided as .py files.

Feel free to leave your comments below.

Cheers!

Contributors:

Gaurav Prachchhak, Tommy Betz, Veekesh Dhununjoy, Mihir Gajjar.

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 36/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 37/38
1/4/2019 Advanced Visualization for Data Scientists with Matplotlib

https://medium.com/sfu-big-data/advanced-visualization-for-data-scientists-with-matplotlib-15c28863c41c 38/38

Potrebbero piacerti anche