Pandas is a powerful Python library for data analysis, providing data structures and functions needed to work with structured data seamlessly. One of the most common challenges in data analysis is dealing with missing data. In this comprehensive guide, we will dive into the fillna method and explore various strategies for tackling missing data in DataFrames using Pandas.

Understanding Missing Data and Its Impact

In Pandas, missing data is represented by the NaN (Not a Number) or None values. These values often occur due to errors during data collection, data entry, or data processing. Before we start exploring the fillna method, it is essential to understand how to detect missing data in a DataFrame. Pandas provide several methods to identify missing values, such as isna(), isnull(), and notna().

				
					import pandas as pd

# Sample DataFrame with missing data
data = {'A': [1, 2, None, 4],
        'B': [None, 2, 3, 4],
        'C': [1, None, None, 4]}

df = pd.DataFrame(data)
print(df.isna())

				
			

Output:

				
					#        A      B      C
# 0  False   True  False
# 1  False  False   True
# 2   True  False   True
# 3  False  False  False
				
			

Exploring the fillna Method

The fillna method in Pandas is designed to fill missing values in a DataFrame using a specified method or value. It has several parameters that give you flexibility and control over the filling process. Some common parameters include:

  • value: Scalar, dict, Series, or DataFrame used to fill missing values.
  • method: Method to use for filling holes in reindexed Series (pad, ffill, bfill, None).
  • axis: Axis along which to fill missing values (0 or ‘index’, 1 or ‘columns’).
  • inplace: If True, fill in-place, otherwise, return a new object.
  • limit: Maximum number of consecutive missing values to fill.

Loading a Sample Pandas DataFrame

I’ve included a sample Pandas DataFrame below so that you can follow the instruction line-by-line. Simply copy the code and paste it in your preferred code editor. Although your results will undoubtedly differ, feel free to use your own DataFrame if you have one.

				
					import pandas as pd

# create a dictionary of lookup values and results
country_map = {'USA': 'United States', 'Canada': 'Canada', 'Australia': 'Australia', 'UK': 'United Kingdom'}

# create a dataframe
data = {'Name': ['John', 'Emma', 'Peter', 'Hannah'],
        'Age': [25, 30, 21, 35],
        'Country': ['USA', 'Canada', 'Australia', 'UK']}
df = pd.DataFrame(data)

# use .map() method to replace values in the 'Country' column
df['Country'] = df['Country'].map(country_map)

# print the updated dataframe
print(df)

				
			

Output:

				
					#      A      B      C
# 0  False   True  False
# 1  False  False   True
# 2   True  False   True
# 3  False  False  False
				
			

This will create a Pandas DataFrame with four columns: ‘Name’, ‘Age’, ‘City’, and ‘Country’, and four rows of sample data. You can modify the data in the dictionary to create your own custom DataFrame.

Using Pandas fillna() To Fill with 0

In the same way as the given example, to replace all the missing values in a Pandas column with a constant value, we just need to provide that value to the .fillna()method’s value= argument. The value will be adapted to fit the column’s data type.

Here’s an example:

				
					import pandas as pd

# Create a dictionary with sample data (including a missing value)
data = {
    'Name': ['John', 'Jane', 'Adam', 'Emily', 'Mark'],
    'Age': [25, 30, 21, None, 28],  # Add a missing value
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami'],
    'Country': ['USA', 'USA', 'USA', 'USA', 'USA']
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Fill missing value in 'Age' column with 0
df['Age'].fillna(0, inplace=True)  # Fill missing value with 0

# Print the DataFrame with missing value filled
print(df)

				
			

Output:

				
					#    Name   Age         City Country
# 0   John  25.0     New York     USA
# 1   Jane  30.0  Los Angeles     USA
# 2   Adam  21.0      Chicago     USA
# 3  Emily  26.5      Houston     USA
# 4   Mark  28.0        Miami     USA
				
			

In this example, we first create a dictionary data with a missing value in the ‘Age’ column. Then we create a DataFrame df from this dictionary.

Next, we use the fillna() method to fill the missing value in the ‘Age’ column with 0. The inplace=True argument is used to modify the DataFrame in place instead of creating a copy.

Finally, we print the updated DataFrame with the missing value filled using the print() function.

Using Pandas fillna() To Fill with a Constant Value

In the same way as the given example, to replace all the missing values in a Pandas column with a constant value, we just need to provide that value to the .fillna() method’s value= argument. The value will be adapted to fit the column’s data type.

Here’s an example:

				
					import pandas as pd

# Create a dictionary with sample data (including a missing value)
data = {
    'Name': ['John', 'Jane', 'Adam', 'Emily', 'Mark'],
    'Age': [25, 30, 21, None, 28],  # Add a missing value
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami'],
    'Country': ['USA', 'USA', 'USA', 'USA', 'USA']
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Fill missing value in 'Age' column with a constant value of 99
df['Age'].fillna(99, inplace=True)  # Fill missing value with constant value of 99

# Print the DataFrame with missing value filled
print(df)

				
			

Output:

				
					#    Name   Age         City Country
# 0   John  25.0     New York     USA
# 1   Jane  30.0  Los Angeles     USA
# 2   Adam  21.0      Chicago     USA
# 3  Emily  99.0      Houston     USA
# 4   Mark  28.0        Miami     USA
				
			

In this example, we first create a dictionary data with a missing value in the ‘Age’ column. Then we create a DataFrame df from this dictionary.

Next, we use the fillna() method to fill the missing value in the ‘Age’ column with a constant value of 99. The inplace=True argument is used to modify the DataFrame in place instead of creating a copy.

Finally, we print the updated DataFrame with the missing value filled using the print() function.

Using Pandas fillna() To Fill with the Mean

To replace all missing values in a column with the column’s mean, you can utilize the .fillna() method along with the column’s mean value. Let’s explore how to use the Pandas .mean() method to substitute missing values with the mean.

Here’s an example:

				
					import pandas as pd

# Create a dictionary with sample data (including a missing value)
data = {
    'Name': ['John', 'Jane', 'Adam', 'Emily', 'Mark'],
    'Age': [25, 30, 21, None, 28],  # Add a missing value
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami'],
    'Country': ['USA', 'USA', 'USA', 'USA', 'USA']
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Fill missing value in 'Age' column with the mean age
mean_age = df['Age'].mean()  # Calculate mean age
df['Age'].fillna(mean_age, inplace=True)  # Fill missing value with mean age

# Print the DataFrame with missing value filled
print(df)

				
			

Output:

				
					#    Name   Age         City Country
# 0   John  25.0     New York     USA
# 1   Jane  30.0  Los Angeles     USA
# 2   Adam  21.0      Chicago     USA
# 3  Emily  26.0      Houston     USA
# 4   Mark  28.0        Miami     USA
				
			

In this example, we first create a dictionary data with a missing value in the ‘Age’ column. Then we create a DataFrame df from this dictionary.

Next, we calculate the mean age of the non-missing values using the mean() method. We then use the fillna() method to fill the missing value in the ‘Age’ column with the mean age. The inplace=True argument is used to modify the DataFrame in place instead of creating a copy.

Finally, we print the updated DataFrame with the missing value filled using the print() function.

The advantage of this method is that it enables us to employ any other type of computed value, such as the median or the mode of a dataset.

Using Pandas fillna() To Fill with a String

Likewise, we can provide a string to replace all missing values with the specified string. This operates in the same manner as inputting a constant value. Let’s see how we can use the string 'Missing' to fill all missing values in the 'Name' column

Here’s an example:

				
					import pandas as pd

# Create a dictionary with sample data (including a missing value)
data = {
    'Name': ['John', 'Jane', 'Adam', 'Emily', 'Mark'],
    'Age': [25, 30, 21, None, 28],  # Add a missing value
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami'],
    'Country': ['USA', 'USA', 'USA', 'USA', None]  # Add a missing value
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Fill missing value in 'Country' column with a string 'Unknown'
df['Country'].fillna('Unknown', inplace=True)  # Fill missing value with 'Unknown'

# Print the DataFrame with missing value filled
print(df)

				
			

Output:

				
					#     Name   Age         City  Country
# 0   John  25.0     New York      USA
# 1   Jane  30.0  Los Angeles      USA
# 2   Adam  21.0      Chicago      USA
# 3  Emily   NaN      Houston      USA
# 4   Mark  28.0        Miami  Unknown
				
			

In this example, we first create a dictionary data with a missing value in the ‘Age’ column and a missing value in the ‘Country’ column. Then we create a DataFrame df from this dictionary.

Next, we use the fillna() method to fill the missing value in the ‘Country’ column with the string ‘Unknown’. The inplace=True argument is used to modify the DataFrame in place instead of creating a copy.

Finally, we print the updated DataFrame with the missing value filled using the print() function.

Using Pandas fillna() to Fill Missing Values in an Entire DataFrame

To populate missing values in an entire Pandas DataFrame, we can just input a fill value into the value= parameter of the .fillna() method. The method will try to preserve the original column’s data type, if feasible.

Here’s an example:

				
					import pandas as pd
import numpy as np

# Create a dictionary with sample data (including missing values)
data = {
    'Name': ['John', 'Jane', 'Adam', None, 'Mark'],
    'Age': [25, None, 21, None, 28],
    'City': ['New York', 'Los Angeles', None, 'Houston', 'Miami'],
    'Country': [None, 'USA', 'USA', 'USA', None]
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Fill all missing values in the DataFrame with 0
df.fillna(0, inplace=True)

# Print the updated DataFrame with missing values filled
print(df)

				
			

Output:

				
					#   Name   Age         City Country
# 0  John  25.0     New York       0
# 1  Jane   0.0  Los Angeles     USA
# 2  Adam  21.0            0     USA
# 3     0   0.0      Houston     USA
# 4  Mark  28.0        Miami       0
				
			

In this example, we first create a dictionary data with missing values in multiple columns. Then we create a DataFrame df from this dictionary.

Next, we use the fillna() method to fill all missing values in the DataFrame with 0. The inplace=True argument is used to modify the DataFrame in place instead of creating a copy.

Finally, we print the updated DataFrame with all missing values filled using the print() function.

Using Pandas fillna() to Fill Missing Values in Specific DataFrame Columns

Up to this point, we’ve discussed filling missing data for either a single column or the entire DataFrame. Pandas enables you to input a dictionary of column-value pairs, which can be used to replace missing values in designated columns with specific values.

Here’s an example:

				
					import pandas as pd
import numpy as np

# Create a dictionary with sample data (including missing values)
data = {
    'Name': ['John', 'Jane', 'Adam', None, 'Mark'],
    'Age': [25, None, 21, None, 28],
    'City': ['New York', 'Los Angeles', None, 'Houston', 'Miami'],
    'Country': [None, 'USA', 'USA', 'USA', None]
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Fill missing values in the 'Age' and 'Country' columns with a constant value of 0
df['Age'].fillna(0, inplace=True)
df['Country'].fillna(0, inplace=True)

# Print the updated DataFrame with missing values filled
print(df)

				
			

Output:

				
					#   Name   Age         City Country
# 0  John  25.0     New York       0
# 1  Jane   0.0  Los Angeles     USA
# 2  Adam  21.0         None     USA
# 3  None   0.0      Houston     USA
# 4  Mark  28.0        Miami       0
				
			

In this example, we first create a dictionary data with missing values in multiple columns. Then we create a DataFrame df from this dictionary.

Next, we use the fillna() method to fill missing values in the ‘Age’ and ‘Country’ columns with a constant value of 0. We do this by specifying the column name and the constant value as arguments to the fillna() method for each column.

Finally, we print the updated DataFrame with the specified missing values filled using the print() function.

Using Pandas fillna() to Back Fill or Forward Fill Data

The Pandas .fillna() method additionally enables you to fill gaps in your data by utilizing the previous or subsequent observations. This technique is referred to as forward-filling or back-filling the data.

Here’s an example:

				
					import pandas as pd
import numpy as np

# Create a dictionary with sample data (including missing values)
data = {
    'Name': ['John', 'Jane', 'Adam', None, 'Mark'],
    'Age': [25, None, 21, None, 28],
    'City': ['New York', 'Los Angeles', None, 'Houston', 'Miami'],
    'Country': [None, 'USA', 'USA', 'USA', None]
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Forward fill missing values in 'City' column
df['City'].fillna(method='ffill', inplace=True)

# Back fill missing values in 'Name' column
df['Name'].fillna(method='bfill', inplace=True)

# Print the updated DataFrame with missing values filled
print(df)

				
			

Output:

				
					#    Name   Age         City Country
# 0  John  25.0     New York    None
# 1  Jane   NaN  Los Angeles     USA
# 2  Adam  21.0  Los Angeles     USA
# 3  Mark   NaN      Houston     USA
# 4  Mark  28.0        Miami    None
				
			

In this example, we first create a dictionary data with missing values in multiple columns. Then we create a DataFrame df from this dictionary.

Next, we use the fillna() method to forward fill missing values in the ‘City’ column using the method='ffill' argument. This fills missing values with the previous non-missing value in the same column.

Similarly, we use the fillna() method to back fill missing values in the ‘Name’ column using the method='bfill' argument. This fills missing values with the next non-missing value in the same column.

Finally, we print the updated DataFrame with the forward filled and back filled missing values using the print() function.

Limiting the Number of Consecutive Missing Data Filled with Pandas fillna()

When employing the method= parameter of the .fillna() method, you might not want to fill an entire gap in your data. By using the limit= parameter, you can designate the maximum number of consecutive missing values to forward-fill or back-fill. 

Let’s explore how we can apply this parameter to constrain the number of values filled in a gap within our data:

				
					import pandas as pd
import numpy as np

# Create a dictionary with sample data (including missing values)
data = {
    'Name': ['John', None, None, None, 'Mark'],
    'Age': [25, None, None, None, 28],
    'City': ['New York', None, None, None, 'Miami'],
    'Country': [None, 'USA', 'USA', 'USA', None]
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Forward fill missing values in 'Name', 'Age', and 'City' columns, limiting to 2 consecutive missing values
df[['Name', 'Age', 'City']] = df[['Name', 'Age', 'City']].fillna(method='ffill', limit=2)

# Back fill missing values in 'Country' column, limiting to 1 consecutive missing value
df['Country'] = df['Country'].fillna(method='bfill', limit=1)

# Print the updated DataFrame with missing values filled
print(df)

				
			

Output:

				
					#    Name   Age      City Country
# 0  John  25.0  New York     USA
# 1  John  25.0  New York     USA
# 2  John  25.0  New York     USA
# 3  None   NaN      None     USA
# 4  Mark  28.0     Miami    None

				
			

In this example, we first create a dictionary data with missing values in multiple columns. Then we create a DataFrame df from this dictionary.

Next, we use the fillna() method to forward fill missing values in the ‘Name’, ‘Age’, and ‘City’ columns using the method='ffill' argument and limiting to 2 consecutive missing values using the limit=2 argument. This fills missing values with the previous non-missing value in the same column, but only if there are 2 or fewer consecutive missing values.

Similarly, we use the fillna() method to back fill missing values in the ‘Country’ column using the method='bfill' argument and limiting to 1 consecutive missing value using the limit=1 argument. This fills missing values with the next non-missing value in the same column, but only if there is 1 or fewer consecutive missing values.

Finally, we print the updated DataFrame with the limited consecutive missing values filled using the print() function.

Using Pandsa fillna() with groupby and transform

In this section, we’re going to explore using the Pandas .fillna() method to fill data across different categories. You can use this method in Pandas with groupby() and transform() to fill missing values within groups in a DataFrame.

Here’s an example:

				
					import pandas as pd

# Create a dictionary with sample data (including missing values)
data = {
    'Name': ['John', 'Jane', 'Adam', None, 'Mark', 'Emily', None, 'Mike', 'Emma', 'David'],
    'Age': [25, 30, 21, None, 28, 32, None, 27, 24, None],
    'Gender': ['Male', 'Female', 'Male', 'Male', 'Male', 'Female', 'Female', 'Male', 'Female', 'Male'],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Miami', 'Chicago', 'Miami', 'Los Angeles', 'New York', 'Houston']
}

# Create a DataFrame from the dictionary
df = pd.DataFrame(data)

# Use groupby() and transform() to fill missing 'Age' values with the mean age of each group
df['Age'] = df.groupby('Gender')['Age'].transform(lambda x: x.fillna(x.mean()))

# Print the updated DataFrame with missing values filled
print(df)

				
			

Output:

				
					    Name        Age  Gender         City
0   John  25.000000    Male     New York
1   Jane  30.000000  Female  Los Angeles
2   Adam  21.000000    Male      Chicago
3   None  25.250000    Male      Houston
4   Mark  28.000000    Male        Miami
5  Emily  32.000000  Female      Chicago
6   None  28.666667  Female        Miami
7   Mike  27.000000    Male  Los Angeles
8   Emma  24.000000  Female     New York
9  David  25.250000    Male      Houston
				
			

In this example, we first create a dictionary data with missing values in the ‘Name’ and ‘Age’ columns. Then we create a DataFrame df from this dictionary.

Next, we use the groupby() method to group the DataFrame by the ‘Gender’ column. Then we use the transform() method to fill missing ‘Age’ values with the mean age of each group using the fillna() method. We do this by applying a lambda function that fills missing values with the mean age of the group using x.mean().

Finally, we print the updated DataFrame with the missing ‘Age’ values filled using the print() function.

Wrap up

We covered how to use the fillna() method in Pandas to fill missing values in a DataFrame. We discussed various scenarios, including filling missing values in a single column, filling missing values in an entire DataFrame, filling missing values with a constant value or a string, and forward filling or back filling missing data.

We also covered how to limit the number of consecutive missing data filled and how to use fillna() with groupby() and transform() to fill missing values within groups in a DataFrame.

Overall, the fillna() method is a powerful tool that allows you to handle missing data in a flexible and customizable way using Pandas.

To learn more about the Pandas .fillna() method, check out the official documentation:
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.fillna.htm


Thanks for reading. Happy coding!