As a data analyst or scientist, working with data is essential to your daily tasks. Pandas is one of the most widely used libraries for data manipulation and analysis in Python. Adding a new column to an existing DataFrame is one of the fundamental duties you will encounter frequently when working with Pandas. In this article, we will walk you through the steps of adding a new column to a Pandas DataFrame.
Creating a Sample Pandas DataFrame
Copy and paste the code below into your preferred code editor to follow along with this tutorial. If you possess your own dataset, you are welcome to utilize it; however, your results will vary.
Here’s an example:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Age': [18, 19, 17, 18, 19],
'Earnings': [8500, 7500, 9200, 6800, 9000]
}
df = pd.DataFrame(data)
print(df)
Output:
# Name Age Earnings
#0 Alice 18 8500
#1 Bob 19 7500
#2 Charlie 17 9200
#3 David 18 6800
#4 Emily 19 9000
In the preceding DataFrame, there are three columns: ['Name,' 'Age,' 'Earnings']
. Now that we have a DataFrame, we can start to add additional columns!
How to Add a Column to a Pandas DataFrame with a Constant Value
This section demonstrates how to add a column containing a constant value to a Pandas DataFrame. The simplest method is to designate a value directly to a new column. This assigns the specified value to each record in the column of the DataFrame.
Here’s an example:
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Age': [18, 19, 17, 18, 19],
'Earnings': [8500, 7500, 9200, 6800, 9000]
}
df = pd.DataFrame(data)
df['Company'] = 'softwareto'
print(df)
Output:
# Name Age Earnings Company
#0 Alice 18 8500 softwareto
#1 Bob 19 7500 softwareto
#2 Charlie 17 9200 softwareto
#3 David 18 6800 softwareto
#4 Emily 19 9000 softwareto
In the code block above, we assigned a single value (in this case, the string 'softwareto'
) to an entire DataFrame column.
Applying a single constant value to a Pandas DataFrame is typically not the most frequent operation, as the information is frequently redundant.
How to Add a Column to a Pandas DataFrame From a List
Assigning a list to a new column is an easy method to add a column to a Pandas DataFrame. This enables you to directly assign existing or new data to a new column.
Here’s an example:
# Add a New Column to a Pandas DataFrame from a List
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Age': [18, 19, 17, 18, 19],
'Earnings': [8500, 7500, 9200, 6800, 9000]
}
df = pd.DataFrame(data)
df['City'] = ['New York','Los Angeles','Chicago', 'Houston', 'Phoenix']
print(df)
Output:
# Name Age Earnings City
#0 Alice 18 8500 New York
#1 Bob 19 7500 Los Angeles
#2 Charlie 17 9200 Chicago
#3 David 18 6800 Houston
#4 Emily 19 9000 Phoenix
In the preceding code, a list was allocated to a new Pandas DataFrame column. It is essential to observe that the length of the list must exactly match the number of rows in the DataFrame. Without this parameter, Pandas will throw a ValueError
if the lengths do not match.
How to Add a Column to a Pandas DataFrame From a Dictionary
Mapping in a dictionary is an easy method to add a new column to a Pandas DataFrame based on other columns. This method is especially useful when a fixed number of items correspond to other categories.
Here’s an example:
# Add a New Column to a Pandas DataFrame from a List
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Age': [18, 19, 17, 18, 19],
'Earnings': [8500, 7500, 9200, 6800, 9000]
}
# create a dictionary
occupation_dict = {
'Alice': 'Engineer',
'Bob': 'Salesperson',
'Charlie': 'Student',
'David': 'Lawyer',
}
df = pd.DataFrame(data)
df['Occupation'] = df['Name'].map(occupation_dict)
print(df)
Output:
# Name Age Earnings Occupation
#0 Alice 18 8500 Engineer
#1 Bob 19 7500 Salesperson
#2 Charlie 17 9200 Student
#3 David 18 6800 Lawyer
#4 Emily 19 9000 Psychologist
In the preceding code snippet, a dictionary of values was mapped using the map()
method. The method was directly applied to another column, where the dictionary searches for the key and returns the corresponding value.
In the section that follows, you will discover how to add multiple columns to a Pandas DataFrame.
How to Add Multiple Columns to a Pandas DataFrame
You will often need to add multiple columns to a Pandas DataFrame. Any of the above methods will work. For instance, you can designate two columns by providing two data lists.
Here’s an example:
# Add a New Column to a Pandas DataFrame from a List
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Age': [18, 19, 17, 18, 19],
'Earnings': [8500, 7500, 9200, 6800, 9000]
}
df = pd.DataFrame(data)
df['Column1'], df['Column2'] = [[1,2,3,4,5], [6,7,8,9,0]]
print(df)
Output:
# Name Age Earnings Column1 Column2
#0 Alice 18 8500 1 6
#1 Bob 19 7500 2 7
#2 Charlie 17 9200 3 8
#3 David 18 6800 4 9
#4 Emily 19 9000 5 0
How to Add a New Column Derivative of Another Column of a Pandas DataFrame
You will often need to add multiple columns to a Pandas DataFrame. Any of the above methods will work. For instance, you can designate two columns by providing two data lists.
Here’s an example:
# Add a New Column to a Pandas DataFrame from a List
import pandas as pd
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Emily'],
'Age': [18, 19, 17, 18, 19],
'Earnings': [8500, 7500, 9200, 6800, 9000]
}
df = pd.DataFrame(data)
df['Column1'], df['Column2'] = [[1,2,3,4,5], [6,7,8,9,0]]
print(df)
Output:
# Name Age Earnings Column1 Column2
#0 Alice 18 8500 1 6
#1 Bob 19 7500 2 7
#2 Charlie 17 9200 3 8
#3 David 18 6800 4 9
#4 Emily 19 9000 5 0
How to Add a New Column Derivative of Another Column of a Pandas DataFrame
This enables you to add a column that is calculated based on another column’s values. For example, the values in one column can be multiplied to calculate a new column. In the following example, you will discover how to add sales tax to a column based on a single column.
Here’s an example:
import pandas as pd
# create the first DataFrame
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 32, 18, 47],
})
# create the second DataFrame
df2 = pd.DataFrame({
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Occupation': ['Engineer', 'Salesperson', 'Student', 'Lawyer'],
})
# merge the two DataFrames
df_merged = pd.merge(df1, df2, on='Name')
# view the updated DataFrame
print(df_merged)
Output:
# Name Age Occupation
#1 Bob 32 Salesperson
#2 Charlie 18 Student
#3 David 47 Lawyer
Wrap up
This tutorial showed you how to add a new column to an existing DataFrame using Pandas. You learned first how to explicitly designate a constant value. Then, you learned how to add various values based on values from a list or dictionary. Then, you discovered how to add multiple columns simultaneously to a Pandas DataFrame. Then, you were taught how to add columns derived from another column. You have finally learned how to combine two DataFrames to add a column to a DataFrame.
Here you find Pandas Official Docummentation
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pop.html
Thanks for reading. Happy coding!