VLOOKUP is a powerful function used extensively in spreadsheet software like Microsoft Excel to search for a value in a table and return the corresponding data from another column. In this article, we will discuss how to achieve similar functionality in Python using the popular data manipulation library, Pandas, with the help of .map() and .merge() methods.
Loading a Sample Pandas DataFrame
I’ve included a sample Pandas DataFrame below so that you can follow the instruction line-by-line. Simply copy the code and paste it in your preferred code editor. Although your results will undoubtedly differ, feel free to use your own DataFrame if you have one.
import pandas as pd
# create a dictionary of data
data = {'Name': ['John', 'Emma', 'Peter', 'Hannah'],
'Age': [25, 30, 21, 35],
'Country': ['USA', 'Canada', 'Australia', 'UK']}
# create a dataframe from the dictionary
df = pd.DataFrame(data)
# print the dataframe
print(df)
Output:
# Name Age Country
# 0 John 25 USA
# 1 Emma 30 Canada
# 2 Peter 21 Australia
# 3 Hannah 35 UK
In this example, the DataFrame has three columns: ‘name’, ‘age’, and ‘country, and four rows with the corresponding values. You can customize the data and column names to fit your need.
Using the .map() Method to Replicate VLOOKUP
The .map()
function in Pandas enables us to associate values with a Pandas series or a column within a DataFrame. To map values to a column in a Pandas DataFrame, we can use a dictionary. In this case, the dictionary’s key represents the value in the column, while the dictionary’s value is the new value we want to assign.
Here’s an example:
import pandas as pd
# create a dictionary of lookup values and results
country_map = {'USA': 'United States', 'Canada': 'Canada', 'Australia': 'Australia', 'UK': 'United Kingdom'}
# create a dataframe
data = {'Name': ['John', 'Emma', 'Peter', 'Hannah'],
'Age': [25, 30, 21, 35],
'Country': ['USA', 'Canada', 'Australia', 'UK']}
df = pd.DataFrame(data)
# use .map() method to replace values in the 'Country' column
df['Country'] = df['Country'].map(country_map)
# print the updated dataframe
print(df)
Output:
Name Age Country
0 John 25 United States
1 Emma 30 Canada
2 Peter 21 Australia
3 Hannah 35 United Kingdom
In this code, we first create a dictionary country_map
with the lookup values and their corresponding results. We then create a dataframe df
with the ‘Name’, ‘Age’, and ‘Country’ columns.
Next, we use the .map()
method to replace the values in the ‘Country’ column with their corresponding results from the country_map
dictionary. The updated values are then stored back in the ‘Country’ column of the dataframe.
Finally, we print the updated dataframe using the print()
function.
This method can be used to replace any set of values in a column with their corresponding results.
Using Pandas .merge() Method to Replicate VLOOKUP
There might be instances when you’re dealing with extensively normalized data tables that need to be combined. Thankfully, Pandas supplies a remarkably useful function called .merge(), which permits us to merge two DataFrames effectively.
Here’s an example:
import pandas as pd
# create a dataframe with lookup values and results
country_df = pd.DataFrame({'Country': ['USA', 'Canada', 'Australia', 'UK'],
'Country Name': ['United States', 'Canada', 'Australia', 'United Kingdom']})
# create a dataframe
data = {'Name': ['John', 'Emma', 'Peter', 'Hannah'],
'Age': [25, 30, 21, 35],
'Country': ['USA', 'Canada', 'Australia', 'UK']}
df = pd.DataFrame(data)
# merge the two dataframes based on the 'Country' column
merged_df = pd.merge(df, country_df, on='Country', how='left')
# print the merged dataframe
print(merged_df)
Output:
Name Age Country Country Name
0 John 25 USA United States
1 Emma 30 Canada Canada
2 Peter 21 Australia Australia
3 Hannah 35 UK United Kingdom
In this code, we first create a separate dataframe country_df
with the lookup values and their corresponding results. The dataframe has two columns – ‘Country’ and ‘Country Name’.
Next, we create the original dataframe df
with the ‘Name’, ‘Age’, and ‘Country’ columns.
Then, we use the .merge()
method to merge the two dataframes based on the ‘Country’ column. We use the on
parameter to specify the common key to merge on, and the how
parameter to specify the type of merge we want (in this case, a left join).
The merged dataframe is stored in the variable merged_df
. We can then print it using the print()
function.
This method can be used to merge any two dataframes based on a common key, and replace any set of values in a column with their corresponding results from the merged dataframe.
Wrap up
Pandas simplifies the process of replicating VLOOKUP-style functions, addressing many of the limitations VLOOKUP has, such as only merging on the leftmost column.
In this tutorial, you discovered how to utilize Python and Pandas to mimic the widely used Excel VLOOKUP function. The VLOOKUP function facilitates a left-join between two tables, enabling you to retrieve values from an additional table.
You discovered how to apply the Pandas .map()
method for associating a dictionary with another Pandas DataFrame column. Furthermore, you learned the usage of the Pandas merge()
function, which permits the combination of two DataFrames based on either a single key or multiple keys.
Both methods can be used to replace any set of values in a column with their corresponding results, and which method to choose depends on the specific use case and the structure of the data.
Here you can find the official documentation for .map() and .merge().
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.map.html
https://pandas.pydata.org/pandas-docs/version/0.23.4/generated/pandas.merge.html
Thanks for reading. Happy coding!