A Python Demo of the Central Limit Theorem

Praveen NG
2 min readJan 7, 2019

--

Central Limit Theorem (CLT) is theorem in Probability. It states that if you take a sufficiently large number of independent random numbers, their simple mean is always normally distributed (the random numbers have to satisfy certain conditions though).

Random numbers could be of different distributions. For example, if you ask someone to pick a number randomly between 1 and 100, it will be a equal distribution sample. This simply means that all numbers between 1 and 100 have equal chances for getting picked.

If you, for instance, are interested to know whether a picked number is divisible by 5 or not, the probability is not equal — there are more numbers that are not divisible by 5 than there are numbers that are divisible by 5.

CLT simply states that, irrespective of the distributions, the mean will always be Normally Distributed (a.k.a. Gaussian Distribution)

Python Demonstration

First, let us import all the libraries used in this demo. We use python module random to generate random numbers, and matplotlib to plot the distribution.

import random
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

Let’s next define a function that picks n random numbers between 0 and 1, and returns their mean value. Note that the random numbers generated by python function random.random() are equally distributed between 0 and 1.

def randomAvg(n=1):
'''This function draws n random numbers between 0 and 1,
and returns their average.'''
avg = 0
for i in range(n):
avg += random.random()
return avg/n

Next, we choose three values for n, which are 1, 5 and 50. These are the number of random numbers used to find the average. For example if N=5, we call 5 random numbers and take their average. To get a distribution of the averages, we call randomAvg() function 1,000,000 times.

myDict = {}for num in [1, 5, 50]:
myAvgs = []
for j in range(1000000):
avg = randomAvg(num)
myAvgs.append(avg)

myDict[num] = myAvgs

Note that for N=1, we get the actual distribution from random.random() function (which is equally distributed). For N=5, we get the distribution of average of 5 random numbers. Let’s plot the three distributions and see how they look.

plt.figure(figsize=(12,9))
sns.set(style="darkgrid")
plt.hist(myDict[1], bins=200)
plt.hist(myDict[5], bins=200, alpha=0.7)
plt.hist(myDict[50], bins=200, alpha=0.5)
plt.xlim([0,1])
plt.text(0.53, 17500, 'N = 50', fontsize=17)
plt.text(0.6, 12500, 'N = 5', fontsize=17)
plt.text(0.85, 6000, 'N = 1', fontsize=17)
plt.xticks(fontsize=16)
plt.yticks(fontsize=16)
plt.xlabel('Value', fontsize=18)
plt.ylabel('Count', fontsize=18)
plt.text(0.01, 19000, 'A demo of Central Limit Theorem', fontsize=20)
plt.show()

We see that for N=1, we get a pretty flat line. This means that the probability for getting a number between 0 and 0.1 is same as that between 0.1 and 0.2.

If we take five random numbers and take their average, the average has Normal Distribution. For 50, the average becomes even more profound. The distribution becomes narrower, and is centered at 0.5.

--

--

Praveen NG
Praveen NG

Written by Praveen NG

A data science professional with a research background.

No responses yet