In this article, we will be discussing an important feature of the pandas library in Python. We will explain the usage of the “describe” function and how to avoid the scientific notation while using it. We will also provide examples to help you better understand the concept.
What is the "describe" function in Pandas?
The “describe” function is used to generate descriptive statistics of a dataset. It provides information such as the mean, standard deviation, minimum, maximum, and quartiles of the dataset. It can be applied to both numerical and categorical data.
By default, the “describe” function displays the statistical summary of the dataset in scientific notation for large numbers. However, in some cases, we may want to view the values without scientific notation.
Method 1 : Suppress Scientific Notation When Using describe() with One Column
The describe()
function is a useful tool for quickly summarizing the statistical properties of a dataset. However, when you have large numbers, it can sometimes display the values in scientific notation, which can make it difficult to read and understand the data.
To suppress scientific notation when using describe() with one column, you can use the set_option() function from the pandas library to change the display options.
Here’s an example:
import pandas as pd
# Create a dataframe with a column of large numbers
df = pd.DataFrame({'large_numbers': [1000000000, 2000000000, 3000000000]})
# Set the display option to suppress scientific notation
pd.set_option('display.float_format', lambda x: '%.3f' % x)
# Use the describe() function to summarize the statistical properties of the column
print(df['large_numbers'].describe())
Output:
# count 3.000
# mean 2000000000.000
# std 1000000000.000
# min 1000000000.000
# 25% 1500000000.000
# 50% 2000000000.000
# 75% 2500000000.000
# max 3000000000.000
# Name: large_numbers, dtype: float64
In this example, we create a dataframe with a column of large numbers and then use the set_option()
function to set the display.float_format option to a lambda function that formats the numbers to three decimal places. Finally, we use the describe()
function to summarize the statistical properties of the column.
The output of this code snippet will display the summary statistics of the large_numbers column with the values formatted to three decimal places, without scientific notation.
Method 2: Suppress Scientific Notation When Using describe() with Multiple Columns
When using the describe()
function with multiple columns, you may encounter the same issue of large numbers being displayed in scientific notation. However, you can apply a similar method to suppress scientific notation with one column, but this time you’ll have to set the option for all columns in the dataframe.
Here’s an example code snippet:
import pandas as pd
# Create a dataframe with two columns of large numbers
df = pd.DataFrame({'large_numbers_1': [1000000000, 2000000000, 3000000000],
'large_numbers_2': [4000000000, 5000000000, 6000000000]})
# Set the display option to suppress scientific notation for all columns
pd.set_option('display.float_format', lambda x: '%.3f' % x)
# Use the describe() function to summarize the statistical properties of the columns
print(df.describe())
Output:
# large_numbers_1 large_numbers_2
#count 3.000 3.000
# mean 2000000000.000 5000000000.000
# std 1000000000.000 1000000000.000
# min 1000000000.000 4000000000.000
# 25% 1500000000.000 4500000000.000
# 50% 2000000000.000 5000000000.000
# 75% 2500000000.000 5500000000.000
# max 3000000000.000 6000000000.000
In this example, we create a dataframe with two columns of large numbers and then use the set_option()
function to set the display.float_format option to a lambda function that formats the numbers to three decimal places. Finally, we use the describe()
function to summarize the statistical properties of the columns.
The output of this code snippet will display the summary statistics of both columns with the values formatted to three decimal places, without scientific notation.
Note that this method applies to all columns in the dataframe, so if you have columns with small values that you want to display in scientific notation, you’ll need to change the display option again after running describe().
Wrap up
When using the describe()
function in pandas to summarize the statistical properties of a dataset, large numbers can be displayed in scientific notation, which can be difficult to read and understand. However, you can suppress scientific notation in the output by using the set_option()
function from the pandas library.
For one column, you can set the display.float
_format option to a lambda function that formats the numbers to a desired number of decimal places. For multiple columns, you can apply the same method but set the option for all columns in the dataframe.
By using these methods, you can make the output of the describe()
function more readable and easier to interpret, which can be especially useful when dealing with large datasets or when presenting results to others.
To learn more about the Pandas describe()
method, check out the:
https://pandas.pydata.org/pandas-docs/dev/reference/api/pandas.DataFrame.reset_index.html
Thanks for reading. Happy coding!