Population pyramids are visual representations of the age and gender distribution of a population. They are commonly used in demographic analysis to better understand the characteristics of a population. In this article, we will explore how to create population pyramids using Python.
Why use Python for population pyramids
Python is a versatile programming language with powerful libraries for data analysis and visualization. It has become the go-to tool for data scientists and analysts because of its ease of use and readability. With Python, you can easily manipulate data and create visualizations, including population pyramids.
Consider the panda DataFrame below:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Define the age groups and population data
age_groups = ['0-4', '5-9', '10-14', '15-19', '20-24', '25-29', '30-34', '35-39', '40-44', '45-49', '50-54', '55-59', '60-64', '65-69', '70-74', '75-79', '80-84', '85+']
male_pop = [133146, 124439, 128648, 142585, 157154, 171148, 157133, 138505, 138507, 144611, 126235, 106705, 78934, 52689, 30513, 13527, 4442, 1292]
female_pop = [126186, 117932, 123652, 135939, 147988, 165485, 157281, 140726, 145624, 149011, 132175, 118776, 94087, 69677, 40454, 18038, 5788, 1809]
# Create a pandas DataFrame with the population data
df = pd.DataFrame({'age_groups': age_groups, 'male_pop': male_pop, 'female_pop': female_pop})
# Calculate the total population for each age group
df['total_pop'] = df['male_pop'] + df['female_pop']
# Calculate the percentage of male and female population for each age group
df['male_pct'] = df['male_pop'] / df['total_pop'] * 100
df['female_pct'] = df['female_pop'] / df['total_pop'] * 100
# Create a horizontal bar chart for male and female populations
fig, ax = plt.subplots(figsize=(10, 8))
ax.barh(df['age_groups'], -df['male_pop'], height=0.8, align='edge', color='blue', alpha=0.6, label='Male')
ax.barh(df['age_groups'], df['female_pop'], height=0.8, align='edge', color='red', alpha=0.6, label='Female')
ax.set_xlabel('Population')
ax.set_ylabel('Age group')
ax.set_title('Population Pyramid')
ax.invert_yaxis()
# Add male and female percentage labels
for i in range(len(df)):
ax.text(-df.iloc[i]['male_pop'], i, f'{df.iloc[i]["male_pct"]:.1f}%', ha='right', va='center', color='black', fontweight='bold')
ax.text(df.iloc[i]['female_pop'], i, f'{df.iloc[i]["female_pct"]:.1f}%', ha='left', va='center', color='black', fontweight='bold')
# Add legend
ax.legend()
plt.show()
Output:

This code creates a horizontal bar chart with male population bars extending to the left and female population bars extending to the right, resulting in a population pyramid. The age groups are displayed on the y-axis, and the population is displayed on the x-axis. The chart also displays the percentage of males and females for each age group as labels. Finally, a legend is added to show the colors for male and female bars. The pandas library is used to create a DataFrame with the population data for easier manipulation, and the numpy library is used for calculations.
Wrap up
In this article, we have shown you how to create a population pyramid in Python using demographic data. We used Pandas to organize the data and Matplotlib to create the population pyramid. With these tools, you can analyze the age and gender distribution of any population and create insightful visualizations.
To learn more about Population Python check out the:
https://altair-viz.github.io/gallery/us_population_pyramid_over_time.html
Thanks for reading. Happy coding!