In the world of data analysis, it’s often necessary to restructure data to suit various analytical needs. One such operation is unpivoting, which involves converting wide-form data into a long-form format. The Pandas library in Python offers a powerful tool called melt function that can help you achieve this transformation efficiently. In this article, you will learn how to use the Pandas melt function to reshape wide data into a long data format.

Why Unpivot Data and the Benefits of Using Pandas Melt

The Pandas melt function is a valuable data reshaping tool that allows you to unpivot a DataFrame from a wide to long format. It can be particularly useful in cases where you need to analyze data in a more granular fashion or when you want to transform data for visualization purposes.

Unpivoting data, refers to the process of converting wide-format data into long-format data.
By doing this, you can:

  1. Make your data more readable, especially for large datasets.
  2. Simplify data analysis, as long-format data is more compatible with various analytical tools and methods.
  3. Enhance data consistency, as long-format data is easier to compare and manipulate.

The Pandas Melt function offers numerous advantages, such as:

  1. A user-friendly syntax that makes it easy to implement.
  2. Flexibility in handling various data types and formats.
  3. Seamless integration with other Pandas functions and tools.

Understanding the Pandas melt() Function

The melt() function takes a Pandas DataFrame and transforms it by melting multiple columns into a single column, while keeping the identifier variables intact. This can make the data easier to work with and is often a precursor to other data manipulation tasks.

Here’s a breakdown of the function’s syntax and its key parameters:

				
					pandas.melt(frame, id_vars=None, value_vars=None, var_name=None, value_name='value', col_level=None, ignore_index=True)

				
			
  • frame: The DataFrame you want to melt.
  • id_vars: A list of columns to use as identifier variables. These columns will remain unchanged during the melting process.
  • value_vars: A list of columns that you want to melt. If not specified, all columns not set as id_vars will be melted.
  • var_name: The name of the new column that will store the column headers from the melted columns. Defaults to None, in which case the column name will be set as “variable”.
  • value_name: The name of the new column that will store the values from the melted columns. Defaults to “value”.
  • col_level: If columns are a MultiIndex, this specifies which level(s) the id_vars and value_vars refer to. Defaults to None.
  • ignore_index: Whether to reset the index in the output DataFrame. Defaults to True.

Loading a Sample Pandas DataFrame

I’ve included a sample Pandas DataFrame below so that you can follow the instruction line-by-line. Simply copy the code and paste it in your preferred code editor. Although your results will undoubtedly differ, feel free to use your own DataFrame if you have one.

				
					import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 30, 35, 40],
        'gender': ['F', 'M', 'M', 'M']}

df = pd.DataFrame(data)
print(df)

				
			

Output:

				
					#        name  age gender
# 0     Alice   25      F
# 1       Bob   30      M
# 2   Charlie   35      M
# 3     David   40      M

				
			

In this example, the DataFrame has three columns: ‘name’, ‘age’, and ‘gender’, and four rows with the corresponding values. You can customize the data and column names to fit your need.

Using the Pandas melt() Function to Unpivot a DataFrame

In this part, we’ll concentrate on using the pandas melt function’s id_vars parameter to unpivot a DataFrame.
When it comes to melting a DataFrame, the term id_vars denotes the identifier variables, which are the columns in your DataFrame that remain unaltered. The other columns are transformed from a wide to long format. These identifier variables offer context for the resulting restructured DataFrame.

Here’s an example:

				
					import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 30, 35, 40],
        'gender': ['F', 'M', 'M', 'M']}

df = pd.DataFrame(data)

melted_df = pd.melt(df, id_vars=['name'], value_vars=['age', 'gender'],
                    var_name='variable', value_name='value')

print(melted_df)

				
			

Output:

				
					      name variable value
0    Alice      age    25
1      Bob      age    30
2  Charlie      age    35
3    David      age    40
4    Alice   gender     F
5      Bob   gender     M
6  Charlie   gender     M
7    David   gender     M
				
			

In this code block, we first define the data dictionary to create the initial DataFrame, and then use it to create the DataFrame object df.

Then we apply the melt() function to the df DataFrame in one line, and store the resulting DataFrame object in melted_df. We specify the same parameters as before, with id_vars as the ‘name’ column, value_vars as the ‘age’ and ‘gender’ columns, and var_name and value_name as the new column names.

Finally, we print the resulting DataFrame melted_df to verify the output.

Using Multiple Identifier Variables in Pandas melt()

A DataFrame can also be melted using numerous variables. We can give in a list of column labels to the Pandas melt function’s id_vars= in order to utilise multiple columns.

Here’s an example:

				
					import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 30, 35, 40],
        'gender': ['F', 'M', 'M', 'M']}

df = pd.DataFrame(data)
print(df)

melted_df = pd.melt(df, id_vars=['name', 'gender'], value_vars=['age'],
                    var_name='variable', value_name='value')

print(melted_df)

				
			

Output:

				
					      name  age gender
0    Alice   25      F
1      Bob   30      M
2  Charlie   35      M
3    David   40      M
      name gender variable  value
0    Alice      F      age     25
1      Bob      M      age     30
2  Charlie      M      age     35
3    David      M      age     40
				
			

In this example, we load the DataFrame using the dictionary approach and print it to verify the data.

We then use the melt() function to unpivot the ‘age’ column into rows, while using ‘name’ and ‘gender’ columns as identifier variables. We store the resulting DataFrame in melted_df, and print it to verify the output.

The resulting DataFrame melted_df has four columns – ‘name’, ‘gender’, ‘variable’, and ‘value’. The ‘name’ and ‘gender’ columns are identifier variables, while the ‘variable’ and ‘value’ columns contain the unpivoted data.

Using Value Variables in the Pandas melt() Function

By default, Pandas will use all remaining columns in the value_vars= parameter, meaning that all columns will be unpivoted. The value_vars parameter in the melt() function specifies the columns to be unpivoted. You can select one or more columns as value variables.

Here’s an example of using the ‘age’ and ‘gender’ columns as value variables:

				
					import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 30, 35, 40],
        'gender': ['F', 'M', 'M', 'M']}

df = pd.DataFrame(data)
print(df)

melted_df = pd.melt(df, id_vars=['name'], value_vars=['age', 'gender'],
                    var_name='variable', value_name='value')

print(melted_df)

				
			

Output:

				
					      name  age gender
0    Alice   25      F
1      Bob   30      M
2  Charlie   35      M
3    David   40      M
      name variable value
0    Alice      age    25
1      Bob      age    30
2  Charlie      age    35
3    David      age    40
4    Alice   gender     F
5      Bob   gender     M
6  Charlie   gender     M
7    David   gender     M
				
			

In this example, we load the DataFrame using the dictionary approach and print it to verify the data.

We then use the melt() function to unpivot the ‘age’ and ‘gender’ columns into rows, while using ‘name’ column as the identifier variable. We store the resulting DataFrame in melted_df, and print it to verify the output.

The resulting DataFrame melted_df has three columns – ‘name’, ‘variable’, and ‘value’. The ‘name’ column is the identifier variable, while the ‘variable’ column contains the names of the unpivoted columns, and the ‘value’ column contains the corresponding values.

Using Multiple Value Variables in Pandas melt()

 By passing multiple column labels into the value_vars= parameter of the Pandas melt() function, you can un-pivot multiple columns.

Here’s an example of using the ‘age’ and ‘gender’ columns as multiple value variables:

				
					import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 30, 35, 40],
        'gender': ['F', 'M', 'M', 'M']}

df = pd.DataFrame(data)
print(df)

melted_df = pd.melt(df, id_vars=['name'], value_vars=['age', 'gender'],
                    var_name='variable', value_name='value')

print(melted_df)

				
			

Output:

				
					      name  age gender
0    Alice   25      F
1      Bob   30      M
2  Charlie   35      M
3    David   40      M
      name variable value
0    Alice      age    25
1      Bob      age    30
2  Charlie      age    35
3    David      age    40
4    Alice   gender     F
5      Bob   gender     M
6  Charlie   gender     M
7    David   gender     M
				
			

n this example, we load the DataFrame using the dictionary approach and print it to verify the data.

We then use the melt() function to unpivot the ‘age’ and ‘gender’ columns into rows, while using ‘name’ column as the identifier variable. We store the resulting DataFrame in melted_df, and print it to verify the output.

The resulting DataFrame melted_df has three columns – ‘name’, ‘variable’, and ‘value’. The ‘name’ column is the identifier variable, while the ‘variable’ column contains the names of the unpivoted columns, and the ‘value’ column contains the corresponding values.

Customizing the Names of Value and Identifier Variables in Pandas melt()

By default, Pandas assigns the column names ‘variable’ and ‘value’ when melting a DataFrame. To change this default behavior, use the parameters var_name= and value_name= to provide custom column names.

Here’s an example of customizing the names of value and identifier variables:

				
					import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 30, 35, 40],
        'gender': ['F', 'M', 'M', 'M']}

df = pd.DataFrame(data)
print(df)

melted_df = pd.melt(df, id_vars=['name'], value_vars=['age', 'gender'],
                    var_name='attributes', value_name='data')

print(melted_df)

				
			

Output:

				
					      name  age gender
0    Alice   25      F
1      Bob   30      M
2  Charlie   35      M
3    David   40      M
      name attributes data
0    Alice        age   25
1      Bob        age   30
2  Charlie        age   35
3    David        age   40
4    Alice     gender    F
5      Bob     gender    M
6  Charlie     gender    M
7    David     gender    M
				
			

In this example, we load the DataFrame using the dictionary approach and print it to verify the data.

We then use the melt() function to unpivot the ‘age’ and ‘gender’ columns into rows, while using ‘name’ column as the identifier variable. We customize the names of the resulting variables by specifying ‘attributes’ and ‘data’ as the names of the var_name and value_name parameters, respectively. We store the resulting DataFrame in melted_df, and print it to verify the output.

The resulting DataFrame melted_df has three columns – ‘name’, ‘attributes’, and ‘data’. The ‘name’ column is the identifier variable, while the ‘attributes’ column contains the names of the unpivoted columns, and the ‘data’ column contains the corresponding values.

Retaining the Original DataFrame Index in Pandas melt()

By default, Pandas will replace the previous index. You can alter this default behavior by adjusting the ignore_index= parameter. It is set to True by default, but if you change it to False, the original index will be preserved.

Here’s an example:

				
					import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        'age': [25, 30, 35, 40],
        'gender': ['F', 'M', 'M', 'M']}

df = pd.DataFrame(data)
print(df)

melted_df = pd.melt(df, id_vars=['name'], value_vars=['age', 'gender'],
                    var_name='variable', value_name='value', ignore_index=False)

print(melted_df)

				
			

Output:

				
					      name  age gender
0    Alice   25      F
1      Bob   30      M
2  Charlie   35      M
3    David   40      M
      name variable value
0    Alice      age    25
1      Bob      age    30
2  Charlie      age    35
3    David      age    40
0    Alice   gender     F
1      Bob   gender     M
2  Charlie   gender     M
3    David   gender     M
				
			

In this example, we load the DataFrame using the dictionary approach and print it to verify the data.

We then use the melt() function to unpivot the ‘age’ and ‘gender’ columns into rows, while using ‘name’ column as the identifier variable. We customize the names of the resulting variables by specifying ‘variable’ and ‘value’ as the names of the var_name and value_name parameters, respectively. We set the ignore_index=False parameter to the melt() function to retain the original DataFrame index. We store the resulting DataFrame in melted_df, and print it to verify the output.

The resulting DataFrame melted_df has four columns – ‘index’, ‘name’, ‘variable’, and ‘value’. The ‘index’ column contains the original DataFrame index, while the other columns contain the unpivoted data.

Using col_level in the Pandas melt Function with Multi-Index Columns

The col_level parameter is useful when dealing with DataFrames that have MultiIndex columns. It allows you to specify which level of the column index should be utilized when melting the DataFrame.

Here’s an example:

				
					import pandas as pd

data = {'name': ['Alice', 'Bob', 'Charlie', 'David'],
        ('info', 'age'): [25, 30, 35, 40],
        ('info', 'gender'): ['F', 'M', 'M', 'M']}

df = pd.DataFrame(data)
print(df)

melted_df = pd.melt(df, id_vars=['name'], value_vars=[('info', 'age'), ('info', 'gender')],
                    var_name=['variable_1', 'variable_2'], value_name='value')

print(melted_df)

				
			

Output:

				
					       name info     
              age gender
0     Alice   25      F
1       Bob   30      M
2   Charlie   35      M
3     David   40      M

     name    variable_1   variable_2 value
0     Alice       info        age    25
1       Bob       info        age    30
2   Charlie       info        age    35
3     David       info        age    40
4     Alice       info     gender     F
5       Bob       info     gender     M
6   Charlie       info     gender     M
7     David       info     gender     M

				
			

In this example, we load the DataFrame using the dictionary approach and print it to verify the data.

We then use the melt() function to unpivot the ‘age’ and ‘gender’ columns into rows, while using ‘name’ column as the identifier variable. We use the value_vars parameter to specify the levels of the column index to be used as value variables, and the var_name parameter to specify the names of the resulting variables. We store the resulting DataFrame in melted_df, and print it to verify the output.

The resulting DataFrame melted_df has four columns – ‘name’, ‘variable_1’, ‘variable_2’, and ‘value’. The ‘name’ column is the identifier variable, while the ‘variable_1’ and ‘variable_2’ columns contain the names of the levels of the column index, and the ‘value’ column contains the corresponding values.

Wrap up

Well done on completing this extensive guide on the Pandas melt function! We’ve delved into a diverse range of topics, techniques, and case studies that showcased the numerous capabilities of this robust data handling instrument.

Using melt(), we can easily transform wide data into long data, which is useful for visualizations, statistical analysis, and machine learning models. Additionally, the function is capable of handling complex and multi-index data structures, making it a versatile tool for data manipulation.

Overall, the melt() function provides a simple and efficient solution for data reshaping in pandas, making it a valuable tool for data analysts and data scientists.

Find out more about the melt function by checking out its official documentation:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.melt.html


Thanks for reading. Happy coding!