Pareto Chart is an effective tool that can help businesses identify the most significant factors contributing to a problem or opportunity. It is a visual representation of the Pareto Principle, which states that 80% of the effects come from 20% of the causes.
Python is a programming language for data analysis and visualization. In this article, we will show you how to create a Pareto Chart in Python, step by step.
Step 1: Create the Data
Imagine that we perform a survey and ask 100 different respondents to choose their preferred cereal brand from brands A, B, C, D, and E.
To store the survey’s findings, we can make the following pandas DataFrame:
Consider the panda DataFrame below:
import pandas as pd
# Create the data
data = {'Brand': ['A', 'B', 'C', 'D', 'E'],
'Count': [23, 12, 30, 10, 25]}
# Create the DataFrame
df = pd.DataFrame(data)
# Print the DataFrame
print(df)
Output:
# Brand Count
# 0 A 23
# 1 B 12
# 2 C 30
# 3 D 10
# 4 E 25
In this example, we asked 100 people to identify their favorite cereal brand, and the results are stored in the “Count” column of the DataFrame. The “Brand” column contains the different cereal brands (A, B, C, D, and E). You can modify the “data” dictionary to fit your specific needs.
Step 2: Create the Pareto Chart
We can generate the Pareto chart using the following code:
import pandas as pd
import matplotlib.pyplot as plt
# Create the data
data = {'Brand': ['A', 'B', 'C', 'D', 'E'],
'Count': [23, 12, 30, 10, 25]}
# Create the DataFrame
df = pd.DataFrame(data)
# Sort the DataFrame by Count in descending order
df = df.sort_values(by='Count', ascending=False)
# Calculate the cumulative percentage
df['cumulative_percentage'] = 100 * df['Count'].cumsum() / df['Count'].sum()
# Create the Pareto Chart
fig, ax1 = plt.subplots()
# Create the bar plot
ax1.bar(df['Brand'], df['Count'], color='b')
ax1.set_xlabel('Brand')
ax1.set_ylabel('Count', color='b')
ax1.tick_params('y', colors='b')
# Create the line plot
ax2 = ax1.twinx()
ax2.plot(df['Brand'], df['cumulative_percentage'], color='r', marker='o')
ax2.set_ylabel('Cumulative Percentage (%)', color='r')
ax2.tick_params('y', colors='r')
# Show the plot
plt.show()
Output:

In this example, we first sorted the DataFrame by the “Count” column in descending order. Then, we calculated the cumulative percentage of the counts using the cumsum()
and sum()
functions. Finally, we used matplotlib to create a bar plot of the counts and a line plot of the cumulative percentage.
The resulting Pareto Chart shows the counts of each cereal brand in descending order, and the cumulative percentage of the counts. The red line indicates the cumulative percentage, and the blue bars indicate the counts. The x-axis shows the different cereal brands (A, B, C, D, and E).
Step 3: Customize the Pareto Chart (Optional)
You can customize the appearance of the Pareto chart by altering the colors of the bars and the thickness of the cumulative percentage line.
For example, we could alter the color of the bars to green and the color and thickness of the line to light green and slightly thicker:
import pandas as pd
import matplotlib.pyplot as plt
# Create the data
data = {'Brand': ['A', 'B', 'C', 'D', 'E'],
'Count': [23, 12, 30, 10, 25]}
# Create the DataFrame
df = pd.DataFrame(data)
# Sort the DataFrame by Count in descending order
df = df.sort_values(by='Count', ascending=False)
# Calculate the cumulative percentage
df['cumulative_percentage'] = 100 * df['Count'].cumsum() / df['Count'].sum()
# Create the Pareto Chart
fig, ax1 = plt.subplots()
# Create the bar plot
ax1.bar(df['Brand'], df['Count'], color='green')
ax1.set_xlabel('Brand')
ax1.set_ylabel('Count', color='green')
ax1.tick_params('y', colors='green')
# Create the line plot
ax2 = ax1.twinx()
ax2.plot(df['Brand'], df['cumulative_percentage'], color='lightgreen', linewidth=2, marker='o')
ax2.set_ylabel('Cumulative Percentage (%)', color='lightgreen')
ax2.tick_params('y', colors='lightgreen')
# Show the plot
plt.show()
Output:

In this example, we changed the color of the bars to green using the color
parameter of the bar
function. We also changed the color of the y-axis label to green using the colors
parameter of the tick_params
function.
We also changed the color of the line to light green using the color
parameter of the plot
function. We increased the thickness of the line using the linewidth
parameter. We also changed the color of the y-axis label to light green using the colors
parameter of the tick_params
function.
Feel free to experiment with other customizations to make the Pareto Chart fit your specific needs.
Wrap up
To learn more about Pareto Chart check out the:
https://en.wikipedia.org/wiki/Pareto_chart
Thanks for reading. Happy coding!