In this article, we will guide you on how to calculate and plot a Cumulative Distribution Function (CDF) in Python. Is an important statistical concept used in data analysis, and knowing how to calculate and plot it in Python can be useful for any data scientist or analyst.

Before we dive into the technicalities, let’s first understand what it is.

What is a Cumulative Distribution Function?

A Cumulative Distribution Function  is a function that maps the probability of a random variable to its cumulative distribution. Is used to determine the probability that a random variable takes a value less than or equal to a certain value.

In other words, this gives the probability of a random variable being less than or equal to a specific value. It is the integral of the probability density function (PDF) of a random variable, and its range is between 0 and 1.

Now, let’s move on to the technical aspects of how to calculate and plot a CDF in Python.

Example 1: CDF of Random Distribution in python

To plot of a random distribution in Python using NumPy and Matplotlib libraries, you can use the numpy library to generate the data and then use the matplotlib library.

Here’s an example:

				
					import numpy as np
import matplotlib.pyplot as plt

# Generate some random data
data = np.random.normal(size=1000)

# Sort the data in ascending order
sorted_data = np.sort(data)

# Generate evenly spaced percentiles
percentiles = np.linspace(0, 100, len(sorted_data))

# Calculate the cumulative distribution function
cdf = np.cumsum(np.ones_like(sorted_data))/len(sorted_data)

# Plot
plt.plot(sorted_data, cdf)
plt.xlabel('Data')
plt.ylabel('CDF')
plt.show()

				
			

Output:

cdf python Random Distribution

In this code, we first generate some random data using the NumPy library’s normal() function. We then sort the data in ascending order using np.sort().

We generate evenly spaced percentiles using the linspace() function from NumPy, and then use np.cumsum() to calculate the cumulative sum of a vector of ones the same length as the sorted data, divided by the length of the sorted data to get the empirical cumulative distribution function.

Finally, we plot the using Matplotlib’s plot() function, and add labels to the axes before showing the plot with show().

Example 2: CDF of Normal Distribution

To plot as a normal distribution in Python, you can use the scipy.stats module to generate the values and the matplotlib library to plot.

Here’s an example:

				
					import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

# Define the mean and standard deviation of the normal distribution
mu, sigma = 0, 1

# Generate evenly spaced values for the x-axis
x = np.linspace(-5, 5, num=1000)

# Calculate the values for the normal distribution
cdf = norm.cdf(x, mu, sigma)

# Plot the CDF
plt.plot(x, cdf)
plt.xlabel('Data')
plt.ylabel('CDF')
plt.show()

				
			

Output:

cdf python normal distribiution

In this code, we first define the mean and standard deviation of the normal distribution. We then generate evenly spaced values for the x-axis using np.linspace().

Next, we use the norm.cdf() function from the scipy.stats module to calculate the CDF values for the normal distribution with the given mean and standard deviation.

Finally, we plot the  using Matplotlib’s plot() function, and add labels to the axes before showing the plot with show().

Wrap up

To learn more about SciPy library, check out the:
https://matplotlib.org/


Thanks for reading. Happy coding!