descriptive_statistics
  |   Source

Descriptive Statisctics with Scipy.stats

In [2]:
import scipy as sp
import numpy as np
import matplotlib as mpl
from matplotlib import pyplot as plt
In [3]:
s = sp.randn(100)  # 100 randomn numbers from gaussian distribution 
In [6]:
s.mean() # mean
Out[6]:
0.066124461621295788
In [5]:
s.min()  # min
Out[5]:
-2.1234337330256787
In [7]:
s.max() # max
Out[7]:
2.5975342250993454
In [8]:
s.var()  # variance
Out[8]:
0.88824311241458376
In [9]:
s.std()  # standard vdeviation
Out[9]:
0.94246650466453386
In [11]:
sp.median(s)  # median
Out[11]:
0.073740361127278206
In [12]:
from scipy import stats
In [13]:
n, min_max, mean, var, skew, kurt = stats.describe(s)
In [14]:
n  # number of elements
Out[14]:
100
In [15]:
min_max[0]  # minimum
Out[15]:
-2.1234337330256787
In [16]:
min_max[1]  # maximum
Out[16]:
2.5975342250993454
In [17]:
mean   # mean
Out[17]:
0.066124461621295788
In [18]:
var  # variance
Out[18]:
0.89721526506523608
In [19]:
skew 
Out[19]:
0.08730528220732954
In [20]:
kurt  # kurtosis
Out[20]:
0.10301186201424972

Continuous Probability Distribution:

1. norm: Normal or Gaussian
2. chi2: Chi-squared
3. unifor: Uniform

Discrete Probability Distribution:

1. binom: Binomial
2. poisson: Poisson
Let's generate a gaussian distribution with mean = 4.5 and standard deviation = 1.5
In [21]:
n = stats.norm(loc=4.5, scale=1.5)
In [22]:
n.rvs()  # draw a random number from it
Out[22]:
5.1400980371851821
In [23]:
stats.norm.rvs(loc=4.5, scale=1.5)
Out[23]:
6.6906484383497551

Probability density function (for continuous dist) and Probability Mass Function (for discrete dist)

PDF is the probability that a variate is within a small interval about the given value.
PMS is the probability that a variate has the given value.
In [25]:
stats.norm.pdf(4, loc=4.5, scale=1.5)     # 0 is the given value 
Out[25]:
0.25158881846199549
In [27]:
stats.norm.pdf([2, 4, 4.5], loc=4.5, scale=1.5)    # you can get pdf for a list
Out[27]:
array([ 0.06631809,  0.25158882,  0.26596152])
In [28]:
tries = range(11)
stats.binom.pmf(tries, 10, 0.5)     # discrete tries for PMF
Out[28]:
array([ 0.00097656,  0.00976563,  0.04394531,  0.1171875 ,  0.20507813,
        0.24609375,  0.20507813,  0.1171875 ,  0.04394531,  0.00976563,
        0.00097656])
In [30]:
def binom_pmf(n=4, p=0.5):
    # There are n+1 possible number of "successes": 0 to n.
    x = range(n+1)
    y = stats.binom.pmf(x, n, p)
    plt.plot(x,y,"o", color="black")
    
    plt.axis([-(max(x)-min(x))*0.05, max(x)*1.05, -0.01, max(y)*1.10])
    plt.xticks(x)
    plt.title("Binomial distribution PMF for tries = {0} & p ={1}".format(
            n,p))
    plt.xlabel("Variate")
    plt.ylabel("Probability")

    plt.show()
In [31]:
binom_pmf()

Cumulative density function

probability that a variate has less than or equal to a given value

In [32]:
stats.norm.cdf(0.0, loc=0.0 ,scale=1.0 )
Out[32]:
0.5

Percent Point Function

You supply probability to the function and it gives the value of the variate, which makes it an inverse of probability density function.

In [33]:
stats.norm.ppf(0.5, loc=0.2, scale=0.5)
Out[33]:
0.20000000000000001

Survival Function

It gives the proability that a variate has a value greater than the given value. So it's like 1 - CDF

In [34]:
stats.norm.sf(0.0, 1.0, scale=1.0)
Out[34]:
0.84134474606854293

Inverse Survival Function

you supply an inverse function probability and you get a given value of the variate. It's an inverse of survival function.

In [36]:
stats.norm.isf(0.5, loc=0.0, scale=1.0)
Out[36]:
0.0

End for now.

Reference

1. simple statistics for scipy

2. SCipy doc

Comments powered by Disqus