Open In Colab

Chapter 7 - Go Beyond Plotly Express

Plotly Express is convenient and fast, but it can only take you to select designations. To go where Plotly Express cannot reach, you can resort to Plotly, the foundational library that Plotly Express was built on.

Since Plotly Express does not provide a Pareto chart, let’s build one from scratch using Plotly.

Using 2020 population of countries as an example.

Demonstrate the use of subplots with secondary Y axis.

Demonstrate the use of color scales and how to hide the scale.

# As of this writing, the Google Colab has Plotly version 4.4.1 pre-installed
# We need to upgrade it to the latest version

!pip install --upgrade plotly
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: plotly in /home/codespace/.local/lib/python3.8/site-packages (5.3.1)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from plotly) (1.14.0)
Requirement already satisfied: tenacity>=6.2.0 in /home/codespace/.local/lib/python3.8/site-packages (from plotly) (8.0.1)

To note, for this chapter, we will not import Plotly Express module. WE will import Plotly’s graph_objects module instead.

import numpy as np                  # We use numpy to generate some sample data for ploting
import plotly.graph_objects as go   # graph_opjects package is the core of plotly
import plotly.io as pio

import plotly
plotly.__version__
'5.3.1'

7.1 Plotly Basics

Plotly uses Java Script Object Notation (JSON) format to describe how data are visualized. JSON is a standard format for web applications and data integrations. It is similar to Python’s dictionary object and uses key-value pairs to describe data and computing instructions.

A Plotly data visualization is represented by a Figure object. A figure has two components: Data and Layout.

The data component is is a list of Traces. A trace describes any predefined type of charts such as boxplot, bar chart, and scatter plot and any custom-coded type of charts.

The layout component describes the overall characteristics of a figure such as its title, legend, and titles of the axes among many others.

A sophisticated visualization can be implemented by incorporating multiple traces each representing a unique visual component with customized layout.

7.2 A “Hello World” Chart

This example uses the method update_layout() of Figure class to add a title for the figure as well as the X axis and Y axis.

This simple chart has no data to display.

fig = go.Figure()
fig.update_layout(title="Hello World!")

# Alternatively,
# my_layout = go.Layout(title="Hello World!")
# fig = go.Figure(layout = my_layout)

fig.show()

Here, the outpt shows thie figure has no data but its layout has a value for the title.

print(fig)
Figure({
    'data': [], 'layout': {'template': '...', 'title': {'text': 'Hello World!'}}
})

7.3. A Boxplot of Ages of Some Men

Here, we create a trace of type “boxplot” for a list of numbers and add it to the Figure object using add_trace() method of the Figure class. A trace is represented by a Python dictionary data type which contains key-value pairs. We use the Graph object’s box() method to create the trace. Alternatively, we can just create a Python dictionary. See the sectoin on Best Practices of which option to choose.

Statistics Background

“A boxplot is a standardized way of displaying the distribution of data based on a five number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”). It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.”

https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

Fences can be used to illustrate extreme values (outliers) in box plots. Sometimes you might see reference to “inner fences” and “outer fences”. These are defined as:

  • Lower inner fence: Q1 – (1.5 * IQR)

  • Upper inner fence: Q3 + (1.5 * IQR)

  • Lower outer fence: Q1 – (3 * IQR)

  • upper outer fence: Q3 + (3 * IQR)

Points beyond the inner fences in either direction are mild outliers;

points beyond the outer fences in either direction are extreme outliers.

In addition, we also add a title for the X Axis.

# Here, we use Numpy to generate a list of random numbers to represent the ages for a group of men.

male_ages = np.random.randint(low=1, high=101, size=20)   # 20 random integers between 1 and 101 excluding 101.

print(male_ages)
[ 3  4 61 39 91  2 61 33 36 25 22 37 28  5 25 89 12 19 68 26]
# A trace is represented by a Python Dict object which contains key-value pairs.
# The key "x" represents the X Axis and its values are represented by a Python List object
# The key "type" represents the type of the chart, "box" for boxplot, "scatter" scatter plot, etc.

trace_0 = go.Box(   # type of chart: Boxplot
    x=male_ages,    
    name="Male"     # The name of the trace, used as a legend to distinguish multiple traces.

)

# Alternatively, just use the Python dictionary 
# trace_0 = {
#     "x":male_ages,
#     "type":"box",    # type of chart: Boxplot
#     "name":"Male"    # The name of the trace, used as a legend to distinguish multiple traces.
# }

fig = go.Figure()
fig.add_trace(trace_0)

# Alternatively,
# fig = go.Figure(data=[trace_0])

fig.update_layout(
    title="Boxplot of Ages of Some Men",
    xaxis={"title":"Age"}         # This is equivalent to xaxis_title="Age"
)

fig.show()

Since Plotly figures are interactive, you can move your mouse around to see the five summary statistics.

Here the print() function show that the figure has one trace of type boxplot and the data points. The figure also has some custom layout properties specified including its title, title for the X axis, and title for Y axis.

print(fig)
Figure({
    'data': [{'name': 'Male',
              'type': 'box',
              'x': array([ 3,  4, 61, 39, 91,  2, 61, 33, 36, 25, 22, 37, 28,  5, 25, 89, 12, 19,
                          68, 26])}],
    'layout': {'template': '...', 'title': {'text': 'Boxplot of Ages of Some Men'}, 'xaxis': {'title': {'text': 'Age'}}}
})

7.4. A Boxplot of Ages of Some Men and Women

We add another trace representing the boxplot of ages of some women.

male_ages = np.random.randint(low=1, high=100, size=20)

trace_0 = go.Box(   
    x=male_ages,    
    name="Male"   
)

female_ages = np.random.randint(low=1, high=100, size=20)

trace_1 = go.Box(   
    x=female_ages,    
    name="Female"    
)

fig = go.Figure()
fig.add_trace(trace_0)
fig.add_trace(trace_1)

# Alternatively,
# fig = go.Figure(data=[trace_0, trace_1])

fig.update_layout(
    title="Boxplot of Ages of Some Men and Women",
    xaxis={"title":"Age"},
    showlegend=True             # The legend can be shown or hidden
)

fig.show()

Since we already have the label “Male” and “Female” for the Y axis, the color legend on the upper right is not necessary. We can hide it by changing the showlegend property of the Layout to False.

fig.update_layout(showlegend=False)

# Alternatively,
# fig.layout.showlegend = False

fig.show()

7.5 Plotly Flexibility

7.5.1 Different Ways to Create a Figure

We can create an empty Figure object and then add traces and update layout properties like this:

trace_0 = go.Box(   
    x=male_ages,    
    name="Male"   
)

fig = go.Figure()
fig.add_trace(trace_0)
fig.update_layout(title="A Boxplot")

Alternatively, we can create traces and add them to the Data object and create a Layout object with some specified properties and then create the figure using the Data object and Layout object as inputs:

trace_0 = go.Box(   
    x=male_ages,    
    name="Male"   
)

my_layout = go.Layout(title="A Boxplot")
fig = go.Figure(data=[trace_0], layout=my_layout)

7.5.2 Different Ways to Create a Trace

We can use a specifc method of the Graph object. Here we use Box() method to create a Boxplot. This method creates a Python dictionary object to represent a Boxplot.

trace_0 = go.Box(
    x=[10, 3, -5, -35, 23, 8, 78, -65, 13,31, 82],  
    name="Trace Name"                   
)

Alternatively, we can use a Python dictionary object to represent a trace:

trace_0 = {                         
    "x":[10, 3, -5, -35, 23, 8, 78, -65, 13,31, 82],  
    "type":"box",
    "name":"Trace Name"                   
}

7.5.3 Different Ways to Specify a Layout Property

For example, to specify the title of the X axis, the following three methods work the same:

  • fig.update_layout(xaxis={"title":"Age"})

  • fig.update_layout(xaxis_title="Age")

  • fig.layout.xaxis.title = "Age"

Python is a flexible language and offers alternative ways to achieve the same outcome. In some cases, there are industry best practices. For example, the commonly used indentation is four spaces. In other cases, it is up to your personal preference. In the latter, you should try to pick one and use it consistently.

7.6. Steps to Create a Plotly Visualization

Here are the steps to create a plotly chart:

  1. Create an instance of the Figure class.

  2. Create traces (one or more) each representing a plot.

  3. Add the traces to the Figure instance.

  4. Update the layout of the figure (title, legend, etc.).

  5. Display or export the figure.

7.7 Create a Pareto Chart

df = pd.read_csv("wdi_data.csv")
df.head()
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
/tmp/ipykernel_19290/334583196.py in <module>
----> 1 df = pd.read_csv("wdi_data.csv")
      2 df.head()

NameError: name 'pd' is not defined
trace_0 = go.Bar(
    x=df["Country Name"],
    y=df["value"]
)

fig = go.Figure()

fig.add_trace(trace_0)

fig.show()
df.sort_values(by="value", ascending=False, inplace=True)
df.head()
trace_0 = go.Bar(
    x=df["Country Name"],
    y=df["value"],
    marker=dict(color=df["value"], coloraxis="coloraxis")   
)

fig = go.Figure()

fig.add_trace(trace_0)

fig.update_layout(
    title="2020 Population by Country",
    yaxis={"title":"2020 Population"}
)

fig.show()
df["cumulative_%"] = 100 * df["value"].cumsum() / df["value"].sum()
df.head()
trace_1 = go.Scatter(
    x=df["Country Name"],
    y=df["cumulative_%"],
    mode="markers+lines"
)

fig = make_subplots(specs=[[{"secondary_y": True}]])

fig.add_trace(trace_0)

fig.add_trace(trace_1,secondary_y=True)

fig.update_layout(
    title="2020 Population by Countries",
    yaxis={"title":"2020 Population"},
    showlegend=False,
    coloraxis_showscale=False
)

#fig.update(layout_coloraxis_showscale=False)

fig.show()