Chapter 4 - Fun Ride on Plotly Express¶

Before we dive deep into the wonderland of Plotly data visualization, let’s enjoy a fun ride on Plotly Express - A simple yet powerful module to create beautiful and interactive data visualizations using just one line of code.

One big plus about Plotly Express is its tight integration with Pandas data frame. Pandas is the de facto tool for data preparation and analysis within Python ecosystem and can be thought of Microsoft Excel on steroid.

In Plotly Express, you specify the data source using a data frame and specify variables using columns names of the data frame.

Plotly Express comes with several build-in sample datasets in the form of Pandas Dataframe. We will use the gapminder dataset. The gapminder dataset contains Population, GDP per Capita, and Life Expectancy of countries from the past many years starting from 1952 until 2007 with five-year interval.

We will begin by examining a scatter plot to gain a comprehensive understanding of a Plotly Express visualization.

4.1 A Bare-minimal Scatter Plot¶

First, let’s create a simplest scatter plot. We will create a scatter plot with Life Expectancy on the Y axis and GDP per Capita on the X axis using data from 2007.

We are using Google Colab as our development environment. Colab is a free Jupyter Notebook environment hosted in the Cloud and provided by Google. You need to register a Google account first before you can use this free service.

At the time of this writing, Google Colab has an older version of Plot (4.4.1) pre-installed, so we would want to upgrade it to the latest version (5.3.1) by running the system command pip install --upgrade plotly.

Since the system command is run in Jupyter Notebook environment, we need to start the command with an exclamation ! like this !pip install --upgrade plotly.

The --upgrade option removes the old version and installs the latest version.

# Upgrade Plotly library since Google Colab environment has an older version of Plotly

!pip install --upgrade plotly

Defaulting to user installation because normal site-packages is not writeable

Requirement already satisfied: plotly in /home/codespace/.local/lib/python3.8/site-packages (5.3.1)

Requirement already satisfied: six in /usr/lib/python3/dist-packages (from plotly) (1.14.0)
Requirement already satisfied: tenacity>=6.2.0 in /home/codespace/.local/lib/python3.8/site-packages (from plotly) (8.0.1)

# Display the Plotly version number

import plotly

print(plotly.__version__)

5.3.1

# Load the plotly Express module
# Give it a shortcut alias px for easy reference later

import plotly.express as px

# Use the built-in sample dataset gapminder

df = px.data.gapminder()

df.head(15)

	country	continent	year	lifeExp	pop	gdpPercap	iso_alpha	iso_num
0	Afghanistan	Asia	1952	28.801	8425333	779.445314	AFG	4
1	Afghanistan	Asia	1957	30.332	9240934	820.853030	AFG	4
2	Afghanistan	Asia	1962	31.997	10267083	853.100710	AFG	4
3	Afghanistan	Asia	1967	34.020	11537966	836.197138	AFG	4
4	Afghanistan	Asia	1972	36.088	13079460	739.981106	AFG	4
5	Afghanistan	Asia	1977	38.438	14880372	786.113360	AFG	4
6	Afghanistan	Asia	1982	39.854	12881816	978.011439	AFG	4
7	Afghanistan	Asia	1987	40.822	13867957	852.395945	AFG	4
8	Afghanistan	Asia	1992	41.674	16317921	649.341395	AFG	4
9	Afghanistan	Asia	1997	41.763	22227415	635.341351	AFG	4
10	Afghanistan	Asia	2002	42.129	25268405	726.734055	AFG	4
11	Afghanistan	Asia	2007	43.828	31889923	974.580338	AFG	4
12	Albania	Europe	1952	55.230	1282697	1601.056136	ALB	8
13	Albania	Europe	1957	59.280	1476505	1942.284244	ALB	8
14	Albania	Europe	1962	64.820	1728137	2312.888958	ALB	8

# Use Plotly Express's scatter method to create a bare-minimal scatter plot.
# Provide the data frame and specify columns for the axies.

fig = px.scatter(
    df.query("year == 2007"),    # Only use 2007 data
    x="gdpPercap",               # gdpPercap is the column name for GDP per Capita
    y="lifeExp"                  # lifeExp is the column name for Life Expectancy
)

fig.show()

4.2 A Feature-rich Scatter Plot¶

Plotly Express’s scatter method provides many parameters for customization. In this example, we use the following features:

Use country name as the hoover name so that when we mouse over a dot we wil be able to know which country the dot represents.
Use the population to specify the size of the dot (geometric area). Since the dots now look like bubbles, this scatter plot is also known as bubble plot.
Use continent to specify the color of the dot. Plotly Express provides a color legend. You can use mouse click to select/de-select continents.
Provide a title for the scatter plot. Since life expectancy is an indicator of health while GDP per capita is an indicator of wealth, we will give the visualization a title “Health vs Wealth”.
Use year as the animation frame. We can play the animation to see the changes over time. Since the ranges for X and Y change from year to year, we fix the ranges using the minimum and maximum of X and Y so that the bubbles don’t disappear outside of the visualization. .

fig = px.scatter(
    df,
    template="plotly_dark",
    x="gdpPercap",
    y="lifeExp",
    color="continent",
    hover_name="country",
    title="Health vs Wealth 2007",
    height=600,
    animation_frame="year",
    size="pop",
    size_max=55, 
    log_x=True,
    range_x=(df["gdpPercap"].min(), df["gdpPercap"].max()),
    range_y=(df["lifeExp"].min(), df["lifeExp"].max())
)

fig.show()

4.3 Interact with the Visualization¶

Mouse hover
Crop
Using the Tool Bar

4.4 Customize the Visualization¶

Scene Template
Title
X Axis
- title
- ticks
Y Axis
- title
- ticks
Marker
- shape
- color
Legend

Data Visualization with Plotly Express

Chapter 4 - Fun Ride on Plotly Express¶

4.1 A Bare-minimal Scatter Plot¶

4.2 A Feature-rich Scatter Plot¶

4.3 Interact with the Visualization¶

4.4 Customize the Visualization¶

4.5 Download the Visualization¶