In this article, we will be discussing an important feature of the pandas library in Python. We will explain the usage of the “describe” function and how to avoid the scientific notation while using it. We will also provide examples to help you better understand the concept.

What is the "describe" function in Pandas?

The “describe” function is used to generate descriptive statistics of a dataset. It provides information such as the mean, standard deviation, minimum, maximum, and quartiles of the dataset. It can be applied to both numerical and categorical data.

By default, the “describe” function displays the statistical summary of the dataset in scientific notation for large numbers. However, in some cases, we may want to view the values without scientific notation.

Method 1 : Suppress Scientific Notation When Using describe() with One Column

The describe() function is a useful tool for quickly summarizing the statistical properties of a dataset. However, when you have large numbers, it can sometimes display the values in scientific notation, which can make it difficult to read and understand the data.

To suppress scientific notation when using describe() with one column, you can use the set_option() function from the pandas library to change the display options.

Here’s an example:

				
					import pandas as pd

# Create a dataframe with a column of large numbers
df = pd.DataFrame({'large_numbers': [1000000000, 2000000000, 3000000000]})

# Set the display option to suppress scientific notation
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# Use the describe() function to summarize the statistical properties of the column
print(df['large_numbers'].describe())

				
			

Output:

				
					# count            3.000
# mean    2000000000.000
# std     1000000000.000
# min     1000000000.000
# 25%     1500000000.000
# 50%     2000000000.000
# 75%     2500000000.000
# max     3000000000.000
# Name: large_numbers, dtype: float64
				
			

In this example, we create a dataframe with a column of large numbers and then use the set_option() function to set the display.float_format option to a lambda function that formats the numbers to three decimal places. Finally, we use the describe() function to summarize the statistical properties of the column.

The output of this code snippet will display the summary statistics of the large_numbers column with the values formatted to three decimal places, without scientific notation.

Method 2: Suppress Scientific Notation When Using describe() with Multiple Columns

When using the describe() function with multiple columns, you may encounter the same issue of large numbers being displayed in scientific notation. However, you can apply a similar method to suppress scientific notation with one column, but this time you’ll have to set the option for all columns in the dataframe.

Here’s an example code snippet:

				
					import pandas as pd

# Create a dataframe with two columns of large numbers
df = pd.DataFrame({'large_numbers_1': [1000000000, 2000000000, 3000000000],
                   'large_numbers_2': [4000000000, 5000000000, 6000000000]})

# Set the display option to suppress scientific notation for all columns
pd.set_option('display.float_format', lambda x: '%.3f' % x)

# Use the describe() function to summarize the statistical properties of the columns
print(df.describe())

				
			

Output:

				
					#       large_numbers_1  large_numbers_2
#count            3.000            3.000
# mean    2000000000.000   5000000000.000
# std     1000000000.000   1000000000.000
# min     1000000000.000   4000000000.000
# 25%     1500000000.000   4500000000.000
# 50%     2000000000.000   5000000000.000
# 75%     2500000000.000   5500000000.000
# max     3000000000.000   6000000000.000
				
			

In this example, we create a dataframe with two columns of large numbers and then use the set_option() function to set the display.float_format option to a lambda function that formats the numbers to three decimal places. Finally, we use the describe() function to summarize the statistical properties of the columns.

The output of this code snippet will display the summary statistics of both columns with the values formatted to three decimal places, without scientific notation.

Note that this method applies to all columns in the dataframe, so if you have columns with small values that you want to display in scientific notation, you’ll need to change the display option again after running describe().

Wrap up

When using the describe() function in pandas to summarize the statistical properties of a dataset, large numbers can be displayed in scientific notation, which can be difficult to read and understand. However, you can suppress scientific notation in the output by using the set_option() function from the pandas library.

For one column, you can set the display.float_format option to a lambda function that formats the numbers to a desired number of decimal places. For multiple columns, you can apply the same method but set the option for all columns in the dataframe.

By using these methods, you can make the output of the describe() function more readable and easier to interpret, which can be especially useful when dealing with large datasets or when presenting results to others.

To learn more about the Pandas describe() method, check out the:
https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.reset_index.html


Thanks for reading. Happy coding!