Drawing a Line Plot with Seaborn
Contents
Introduction
A line plot is a graph that displays data using a number line. Many tools can be used to plot and visualize data. In this tutorial, you will do it with a powerful Python library for data visualization called Seaborn.
Requirements
For this tutorial, you need Python and Seaborn. You can install them separately on your computer, but Anaconda has all of them in one package. It is a free and open-source distribution of the Python and R programming languages for scientific computing (data science, machine learning applications, large-scale data processing, predictive analytics, etc.), that aims to simplify package management and deployment.
You need to download and install Anaconda on your machine if you have not yet done so (it has all the required libraries for this exercise, which are seaborn, pandas and matplotlib for data visualization). The procedure is simple, but if you have any challenges doing it, feel free to contact us.
How to create a single line plot with seaborn
For your coding, you can use any text editor (Notepad++, Sublime text etc.). We are going to use Jupyter Notebook (the text editor installed with Anaconda).
The syntax
The syntax to draw a single line plot with seaborn is:
seaborn.lineplot(x, y, data)
where:
x = Data variable for the x-axis
y = Data variable for the y-axis
data = Object pointing to the entire data set or data values
Note: Though this syntax has only 3 parameters, the seaborn lineplot function has more than 25 parameters as you can see from this screenshot. (Refer to the seaborn documentation for more information)
Data values can be created within the code or loaded from a dataset.
Example 1: Using random data created within the code
Suppose that the profit made by a firm for the past 10 years (2009 to 2019) is as follows:
year = [2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]
profit = [75.2, 76, 80.2, 86, 86.6, 90, 91.4, 85.4, 88, 90]
To plot a graph showing the evolution of the profit during the 10 years (line plot) using seaborn, proceed as follows:
Step 1: Import the relevant libraries (seaborn, pandas and matplotlib)
import pandas as pd
import seaborn as sbn
import matplotlib.pyplot as plt
Step 2: Create data values
year = [2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019]
profit = [75.2, 76, 80.2, 86, 86.6, 90, 91.4, 85.4, 88, 90]
Step 3: Create data frame (df)
df = pd.DataFrame({"Year":Year, "Profit":Profit})
Step 4: Draw line plot
sbn.lineplot(x = "Year", y = "Profit", data=df)
Step 5: Show line plot
plt.show()
Your jupyter notebook should look like this (the comments are optional, but it is a good practice to use them if you’re working with others):
Click on the Run button of the jupyter notebook or Shift + Enter to run the code. If everything is OK, you should have an output like the one in the figure below, with a line plot showing the relationship between two data variables – “year” and “profit”
Example 2: Using a dataset to draw a line plot
The data for the dataset will come from the table below showing the temperature in three major cities Dallas, Berlin and Ottawa for 7 days (fictitious data just for the exercise).
Day | Dallas | Berlin | Ottawa |
day1 | 33 | 74 | 40 |
day2 | 31 | 76 | 42 |
day3 | 29 | 74 | 44 |
day4 | 23 | 81 | 37 |
day5 | 31 | 82 | 29 |
day6 | 21 | 80 | 44 |
day7 | 26 | 76 | 46 |
Create a csv file named “temperature” or download it from the following link: https://github.com/JoeMaabo/seaborn.
If you want to put your dataset in the same location with the file containing the code, do as follows in jupyter notebook:
Step 1: Import the relevant libraries (seaborn, pandas and matplotlib)
import pandas as pd
import seaborn as sbn
import matplotlib.pyplot as plt
Step 2: Get dataset and create the dataframe
df = pd.read_csv('temperature.csv')
Step 3: Draw line plot
sbn.lineplot(x = "Day", y = "Dallas", data = df)
Step 5: Show line plot
plt.show()
Your code should look like the one on the screenshot below:
Note that we are interested only in the temperature of one city (Dallas, y = “Dallas”). After running the code (Click on the Run button on jupyter or Shift + Enter) you should have the following output.
Note: Until now, we have been using three parameters: x, y and data. It is good to know that there are many parameters that can be used to ameliorate the presentation of a line plot (Refer to the seaborn documentation).
Drawing Multiple Line Plots
To plot multiple lines on the same graph, you might need to reshape your dataset from wide (like the one that we used in the above example) to long. The reshaping can be done manually or you can use the melt function of pandas. Let’s look at the two possibilities.
Reshaping the dataset manually
You can download the reshaped dataset from the following link: https://github.com/JoeMaabo/seaborn. (Get the file “reshaped_temp.csv”).
Below are the steps to draw multiple line plots on the same graph:
Step 1: Import the relevant libraries (seaborn, pandas and matplotlib)
import pandas as pd
import seaborn as sbn
import matplotlib.pyplot as plt
Step 2: Load the reshaped dataset
df = pd.read_csv('reshaped_temp.csv')
Note:
- To view the dataset use the code:
print(df)
- To view the columns use the code:
df.columns
Step 3: Draw line plots
sbn.lineplot(x = "Day", y = "Temperature", hue = "City", data = df)
Notice the parameter “hue” that has been added to the syntax to group data by city.
Step 5: Show line plots
plt.show()
Your code should look like the one on the screenshot below.
Output
Reshaping dataset using the melt function of pandas
The melt() function is used to unpivot/reshape a given DataFrame from wide format to long format. With this function, you don’t need to do any modification on the original dataset.
Step 1: Import the relevant libraries (seaborn, pandas and matplotlib)
import pandas as pd
import seaborn as sbn
import matplotlib.pyplot as plt
Step 2: Load the dataset (original data that has not been reshaped)
df = pd.read_csv('temperature.csv')
Step 3: Reshape the dataset using the pandas melt function
df_data = pd.melt(df, id_vars=["Day"], var_name = "City", value_name = "Temperature" )
Note: id_vars
, var_name
and value_name
are parameters of the melt function. (Refer to the pandas documentation for more information).
Step 4: Draw the line plots
sbn.lineplot(x='Day', y='Temperature', hue='City', data=df_data)
Step 5: Show line plots
plt.show()
Your code should look like the one on the screenshot below.
Output
It has been a long journey. We hope that you have understood the basics of line plot drawing with seaborn. If you want to dig further, we encourage you to consult the seaborn documentation.
Related: If you like line plots, you might also like box plots. Check out this tutorial for drawing box plots, also using Pandas.