Open In Colab

Chapter 3 - Get Ready for the Exciting Journey

This book introduce readers to Plotly Express, a Python library for interactive data visualization. Readers are expected to have some familiarity with Python and Pandas. Appendix A and B provide brief overview of Python and Pandas.

This book is best read and run. Each chapter is available as a Jupyter Notebook and can be opened and run in Google Colab or downloaded and uploaded to any Jupyter Notebook environment of choice.

This book provides minimal coverage on data preparation, typically step prior to visualization, in order to provide focused coverage on Plotly Express. There are plenty of resources available on data preparation using Pandas including videos, blog posts, and books on the Internet.

This book is organized by hands-on examples from a variety of problem domains from education to healthcare and from national culture to world development. Social researchers will find the topics interesting and relevant. Exposure to social issues will be helpful for technical professionals such as data engineers and data scientists.

3.1 Google Colab as Development Environment

Plotly means many related things. At the core, it refers to Plotly.js, a Java Script library for data visualization. Plotly.js itself is based on another Java Script library D3.js. D3 stands for Data-Driven Document. D3.js brings data to life by manipulating documents using HTML, SVG, CSS, and Java Script.

In the context of this book, Plotly refers to Plotly Python, a Python library based on Ploly.js. Beside Python, Plotly.js has been made avaialble for other programming languages including R, Julia, and Matlab.

Plotly also refers to the Canadian company that developed the afore-mentioned Plotly Python library which Plotly Express is part of. The company also developed Dash and Chart Studio on top of Plotly Python. Dash provides a Python framework for developing interactive dashboards and web applications. Chart Studio is a web-based drag-and-drop tool for generating interactive visualizations without the need for coding.

3.2 GitHub for Project Management

There are no shortage of data visualzation libraries in Python. Matplotlib, Seaborn, Plotly, Altair, Bokeh are among the most popular ones.

Visualization libraries can be classified using two dimensions - Static vs Interactive and Low-level vs High-level.

Static libraries only produce static visualizations that are still images without support for human interaction. Interactive libraries produces visualizations that are dynamic and allow users to interact via mouse movement and screen touch. Interactive visualizations are great tools for exploratory data analysis. Interactive visualizations also support the download of the interactive chars as static images for embedding in publication.

Low-level libraries provide more options and flexibility for customization but are more complex and require steeper learning curve. High-level libraries are easier to learn, simpler to use, and require fewer lines of code.

Table 1 provides a summary of Python data visualization libraries along these two dimensions.

Static

Interactive

Low-level

Matplotlib

Plotly

High-level

Seaborn

Plotly Express, Altair, Bokeh

This chapter illustrates the fundamental concepts of data visualization and Plotly.

3.3 Integration of Colab and GitHub

As depicted in table 1, Plotly is a low-level interactive library and Plot Express is a high-level interactive library. Plotly Express is built on top of Plotly and enjoys the best of both worlds. Plotly Express is simple yet powerful. It provides tens of built-in interactive charts with just one line of code while providing advanced functionality and customizations via access to the low-level functions of Plotly.

Plotly is also the name of a Toroto-based company which builds several related open source products based on Plotly.js, a Java Script library for interactive data visualization. These products include Plotly, Plotly Express, Dash, and Chart Studio.

Dash is a Python library for generating interactive dashboards using Plotly and Plotly Express. Interactive dashboards can be published as websites for sharing on the Internet.

Plotly Chart Studio provides registered users an online environment for authoring Plotly visualizations with drag-and-drop design tool without writing code. The visualizations are hosted on the Cloud and can be shared. Users can also write code using Plotly Express to save the visualizations to the Chart Studio’s Cloud environment for sharing.

For more information about Plotly, Plotly Express, Dash, and Chart Studio, check out the company website: http://www.plotly.com

3.4 A Simple Project Example

Data can be boardly classfied into two types:

  • Numerical

    • Interval

    • Ratio

  • Categorical

    • Ordinal

    • Nominal