Open In Colab

Chapter 4 - Fun Ride on Plotly Express

Before we dive deep into the wonderland of Plotly data visualization, let’s enjoy a fun ride on Plotly Express - A simple yet powerful module to create beautiful and interactive data visualizations using just one line of code.

One big plus about Plotly Express is its tight integration with Pandas data frame. Pandas is the de facto tool for data preparation and analysis within Python ecosystem and can be thought of Microsoft Excel on steroid.

In Plotly Express, you specify the data source using a data frame and specify variables using columns names of the data frame.

Plotly Express comes with several build-in sample datasets in the form of Pandas Dataframe. We will use the gapminder dataset. The gapminder dataset contains Population, GDP per Capita, and Life Expectancy of countries from the past many years starting from 1952 until 2007 with five-year interval.

We will begin by examining a scatter plot to gain a comprehensive understanding of a Plotly Express visualization.

4.1 A Bare-minimal Scatter Plot

First, let’s create a simplest scatter plot. We will create a scatter plot with Life Expectancy on the Y axis and GDP per Capita on the X axis using data from 2007.

We are using Google Colab as our development environment. Colab is a free Jupyter Notebook environment hosted in the Cloud and provided by Google. You need to register a Google account first before you can use this free service.

At the time of this writing, Google Colab has an older version of Plot (4.4.1) pre-installed, so we would want to upgrade it to the latest version (5.3.1) by running the system command pip install --upgrade plotly.

Since the system command is run in Jupyter Notebook environment, we need to start the command with an exclamation ! like this !pip install --upgrade plotly.

The --upgrade option removes the old version and installs the latest version.

# Upgrade Plotly library since Google Colab environment has an older version of Plotly

!pip install --upgrade plotly
Defaulting to user installation because normal site-packages is not writeable
Requirement already satisfied: plotly in /home/codespace/.local/lib/python3.8/site-packages (5.3.1)
Requirement already satisfied: six in /usr/lib/python3/dist-packages (from plotly) (1.14.0)
Requirement already satisfied: tenacity>=6.2.0 in /home/codespace/.local/lib/python3.8/site-packages (from plotly) (8.0.1)
# Display the Plotly version number

import plotly

print(plotly.__version__)
5.3.1
# Load the plotly Express module
# Give it a shortcut alias px for easy reference later

import plotly.express as px
# Use the built-in sample dataset gapminder

df = px.data.gapminder()

df.head(15)
country continent year lifeExp pop gdpPercap iso_alpha iso_num
0 Afghanistan Asia 1952 28.801 8425333 779.445314 AFG 4
1 Afghanistan Asia 1957 30.332 9240934 820.853030 AFG 4
2 Afghanistan Asia 1962 31.997 10267083 853.100710 AFG 4
3 Afghanistan Asia 1967 34.020 11537966 836.197138 AFG 4
4 Afghanistan Asia 1972 36.088 13079460 739.981106 AFG 4
5 Afghanistan Asia 1977 38.438 14880372 786.113360 AFG 4
6 Afghanistan Asia 1982 39.854 12881816 978.011439 AFG 4
7 Afghanistan Asia 1987 40.822 13867957 852.395945 AFG 4
8 Afghanistan Asia 1992 41.674 16317921 649.341395 AFG 4
9 Afghanistan Asia 1997 41.763 22227415 635.341351 AFG 4
10 Afghanistan Asia 2002 42.129 25268405 726.734055 AFG 4
11 Afghanistan Asia 2007 43.828 31889923 974.580338 AFG 4
12 Albania Europe 1952 55.230 1282697 1601.056136 ALB 8
13 Albania Europe 1957 59.280 1476505 1942.284244 ALB 8
14 Albania Europe 1962 64.820 1728137 2312.888958 ALB 8
# Use Plotly Express's scatter method to create a bare-minimal scatter plot.
# Provide the data frame and specify columns for the axies.

fig = px.scatter(
    df.query("year == 2007"),    # Only use 2007 data
    x="gdpPercap",               # gdpPercap is the column name for GDP per Capita
    y="lifeExp"                  # lifeExp is the column name for Life Expectancy
)

fig.show()

4.2 A Feature-rich Scatter Plot

Plotly Express’s scatter method provides many parameters for customization. In this example, we use the following features:

  • Use country name as the hoover name so that when we mouse over a dot we wil be able to know which country the dot represents.

  • Use the population to specify the size of the dot (geometric area). Since the dots now look like bubbles, this scatter plot is also known as bubble plot.

  • Use continent to specify the color of the dot. Plotly Express provides a color legend. You can use mouse click to select/de-select continents.

  • Provide a title for the scatter plot. Since life expectancy is an indicator of health while GDP per capita is an indicator of wealth, we will give the visualization a title “Health vs Wealth”.

  • Use year as the animation frame. We can play the animation to see the changes over time. Since the ranges for X and Y change from year to year, we fix the ranges using the minimum and maximum of X and Y so that the bubbles don’t disappear outside of the visualization. .

fig = px.scatter(
    df,
    template="plotly_dark",
    x="gdpPercap",
    y="lifeExp",
    color="continent",
    hover_name="country",
    title="Health vs Wealth 2007",
    height=600,
    animation_frame="year",
    size="pop",
    size_max=55, 
    log_x=True,
    range_x=(df["gdpPercap"].min(), df["gdpPercap"].max()),
    range_y=(df["lifeExp"].min(), df["lifeExp"].max())
)

fig.show()

4.3 Interact with the Visualization

  • Mouse hover

  • Crop

  • Using the Tool Bar

4.4 Customize the Visualization

  • Scene Template

  • Title

  • X Axis

    • title

    • ticks

  • Y Axis

    • title

    • ticks

  • Marker

    • shape

    • color

  • Legend

4.5 Download the Visualization