IKH

chapter 20

Basic Statistics with Numpy

In datascience domain, we required to collect,store and analyze huge amount of
data. From this data we may required to find some basic statistics like.

  • Minimum value
  • Maximum value
  • Average Value
  • Sum of all values
  • Mean value
  • Median value
  • Variance
  • Standard deviation etc

Minimum value

  • np.min(a)
  • np.amin(a)
  • a.min()

Example

Python
import numpy as np
help(np.min)

Output

PowerShell
Help on function amin in module numpy:

amin(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
    Return the minimum of an array or minimum along an axis.

1-D array

Example

Python
a = np.array([10,5,20,3,25])
print(f"1-D array : {a}")
print(f"np.min(a) value : {np.min(a)}")
print(f"np.amin(a) value : {np.amin(a)}")
print(f"a.min() value : {a.min()}")

Output

PowerShell
1-D array : [10 5 20 3 25]
np.min(a) value : 3
np.amin(a) value : 3
a.min() value : 3

2-D array

  • axis=None(default) – The array is flattened to 1-D array and find the the min value.
  • .axis=0– minimum row and that row contains 3 element.
  • axis=1 – minimum column and that column contains 4 elements.
  • axis=0 – minimum row and that is by considering all the columns, in that min row value.
  • axis=1 – minimum column and that is by considering all rows, in that min column value

Example

Python
import numpy as np
a = np.array([[100,20,30],[10,50,60],[25,15,18],[4,5,19]])
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")

Output

PowerShell
array a :
 [[100 20 30]
 [ 10 50 60]
 [ 25 15 18]
 [  4  5 19]]
Minimum value along axis=None : 4
Minimum value along axis-0 : [ 4 5 18]
Minimum value along axis-1 : [20 10 15 4]

Example

Python
import numpy as np
a = np.arange(24).reshape(6,4)
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")

Output

PowerShell
array a :
 [[ 0 1 2 3]
 [ 4 5 6 7]
 [ 8 9 10 11]
 [12 13 14 15]
 [16 17 18 19]
 [20 21 22 23]]
Minimum value along axis=None : 0
Minimum value along axis-0 : [0 1 2 3]
Minimum value along axis-1 : [ 0 4 8 12 16 20]

Example

Python
import numpy as np
a = np.arange(24)
np.random.shuffle(a)
a = a.reshape(6,4)
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")

Output

PowerShell
array a :
 [[20 5 4 21]
 [ 1 10 6 14]
 [ 0 11 17 13]
 [ 3 2 22 23]
 [ 8 7 19 18]
 [ 9 12 15 16]]
Minimum value along axis=None : 0
Minimum value along axis-0 : [ 0 2 4 13]
Minimum value along axis-1 : [4 1 0 2 7 9]

Maximum value

  • np.max(a)
  • np.amax(a)
  • a.max()

Example

Python
import numpy as np
help(np.max)

Output

PowerShell
Help on function amax in module numpy:

amax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
    Return the maximum of an array or maximum along an axis.

1-D array

Example

Python
a = np.array([10,5,20,3,25])
print(f"1-D array : {a}")
print(f"np.max(a) value : {np.max(a)}")
print(f"np.amax(a) value : {np.amax(a)}")
print(f"a.max() value : {a.max()}")

Output

PowerShell
1-D array : [10 5 20 3 25]
np.max(a) value : 25 
np.amax(a) value : 25
a.max() value : 25

2-D array

  • axis=None(default) – The array is flattened to 1-D array and find the the max value.
  • axis=0 – maximum row and that row contains 3 element.
  • axis=1 – maximum column and that column contains 4 elements.
  • axis=0 – maximum row and that is by considering all the columns, in that max row value.
  • axis=1 – maximum column and that is by considering all rows, in that max column value.

Example

Python
import numpy as np
a = np.array([[100,20,30],[10,50,60],[25,15,18],[4,5,19]])
print(f"array a : \n {a}")
print(f"Maximum value along axis=None : {np.max(a)}")
print(f"Maximum value along axis-0 : {np.max(a,axis=0)}")
print(f"Maximum value along axis-1 : {np.max(a,axis=1)}")

Output

PowerShell
array a :
 [[100 20 30]
 [ 10 50 60]
 [ 25 15 18]
 [  4  5 19]]
Maximum value along axis=None : 100
Maximum value along axis-0 : [100 50 60]
Maximum value along axis-1 : [100 60 25 19]

sum of the elements

  • np.sum()
  • a.sum()

Example

Python
import numpy as np
help(np.sum)

Output

PowerShell
Help on function sum in module numpy:

sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no valu
e>, where=<no value>)
    Sum of array elements over a given axis.

1-D array

Example

Python
# sum of elements of 1-D array
a = np.arange(4)
print(f"The array a : {a}")
print(f"sum of elements using np.sum(a) :: {np.sum(a)}")
print(f"sum of elements using a.sum() :: {a.sum()}")

Output

PowerShell
The array a : [0 1 2 3]
sum of elements using np.sum(a) :: 6
sum of elements using a.sum() :: 6

2-D array

  • axis=None(default) – The array is flattened to 1-D array and sum is calculated.
  • axis=0 – all rows and sum of each column.
  • axis=1 – all columns and sum of each row.

Example

Python
a = np.arange(9).reshape(3,3)
print(f"array a : \n {a}")
print(f"Sum along axis=None : {np.sum(a)}")
print(f"Sum along axis-0 : {np.sum(a,axis=0)}")
print(f"Sum along axis-1 : {np.sum(a,axis=1)}")

Output

PowerShell
array a :
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
Sum along axis=None : 36
Sum along axis-0 : [ 9 12 15]
Sum along axis-1 : [ 3 12 21]

Mean value

  • np.mean(a)
  • a.mean()
  • Mean is the sum of elements along the specified axis divided by number of elements.

Example

Python
import numpy as np
help(np.mean)

Output

PowerShell
Help on function mean in module numpy:

mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)
    Compute the arithmetic mean along the specified axis.

1-D array

Python
a = np.arange(5)
print(f"1-D array : {a}")
print(f"np.mean(a) value : {np.mean(a)}")
print(f"a.mean() value : {a.mean()}")

Output

PowerShell
1-D array : [0 1 2 3 4]
np.mean(a) value : 2.0
a.mean() value : 2.0

2-D array

  • axis=None(default) – The array is flattened to 1-D array and find the mean(average) value
  • axis=0 – rows. Consider columns with all rows and find the average.
  • axis=1 – columns. Consider rows with all columns and find the average.

Example

Python
# 2-D array mean
a = np.arange(9).reshape(3,3)
print(f"The original 2-D array : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.mean(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.mean(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.mean(a,axis=1)}")

Output

PowerShell
The original 2-D array :
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
Mean of the 2-D array along axis=None : 4.0
Mean of the 2-D array along axis=0 : [3. 4. 5.]
Mean of the 2-D array along axis=1 : [1. 4. 7.]

Median value

np.median(a)
  • Median means middle element of the array (sorted form).
  • If the array contains even number of elements, then the median is the middle element value.
  • If the array contains odd number of elements, then the median is the average of 2 middle element values.

Example

Python
import numpy as np
help(np.median)

Output

PowerShell
Help on function median in module numpy:

median(a, axis=None, out=None, overwrite_input=False, keepdims=False)
    Compute the median along the specified axis.
    
Returns the median of the array elements.

1-D array

Example

Python
a = np.array([10,20,30,40])
b = np.array([10,20,30,40,50])
print(f"The array with even number of elements : {a}")
print(f"Median of the array with even number of elements : {np.median(a)}")
print()
print(f"The array with odd number of elements : {b}")
print(f"Median of the array with odd number of elements : {np.median(b)}")

Output

PowerShell
The array with even number of elements : [10 20 30 40]
Median of the array with even number of elements : 25.0

The array with odd number of elements : [10 20 30 40 50]
Median of the array with odd number of elements : 30.0

Example

Python
# unsorted array(even no of elements) will be converted to sorted array and then
#median is calculated
a = np.array([80,20,60,40])
print(f"The array with even number of elements(unsorted) : {a}")
print("*"*100)
print("This step is calculated internally ")
print(f"sorted form of given array : {np.sort(a)}")
print("*"*100)
print(f"Median of the array with even number of elements : {np.median(a)}")

Output

PowerShell
The array with even number of elements(unsorted) : [80 20 60 40]
*****************************************************************************
This step is calculated internally
sorted form of given array : [20 40 60 80]
****************************************************************************
Median of the array with even number of elements : 50.0

Example

Python
# unsorted array(odd no of elements) will be converted to sorted array and then
#median is calculated
a = np.array([80,20,60,40,100,140,120])
print(f"The array with even number of elements(unsorted) : {a}")
print("*"*100)
print("This step is calculated internally ")
print(f"sorted form of given array : {np.sort(a)}")
print("*"*100)
print(f"Median of the array with even number of elements :{np.median(a)}")

Output

PowerShell
The array with even number of elements(unsorted) : [ 80 20 60 40 100 140 120]
*****************************************************************************
This step is calculated internally
sorted form of given array : [ 20 40 60 80 100 120 140]
*****************************************************************************
Median of the array with even number of elements : 80.0

2-D array

  • axis=None(default) – The array is flattened to 1-D array(sorted) and find the median value.
  • axis=0 – rows. Consider columns with all rows and find the median.
  • axis=1 – columns. Consider rows with all columns and find the median.

figar bnana

Example

Python
# 2-D array median
a = np.arange(9).reshape(3,3)
print(f"The original 2-D array(already sorted) : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.median(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.median(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.median(a,axis=1)}")

Output

PowerShell
The original 2-D array(already sorted) :
 [[0 1 2]
 [3 4 5]
 [6 7 8]]
Mean of the 2-D array along axis=None : 4.0
Mean of the 2-D array along axis=0 : [3. 4. 5.]
Mean of the 2-D array along axis=1 : [1. 4. 7.]

Example

Python
# 2-D array median ==> unsorted elements
a = np.array([[22,55,88],[11,44,55],[33,66,99]])
print(f"The original 2-D array(unsorted) : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.median(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.median(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.median(a,axis=1)}")

Output

PowerShell
The original 2-D array(unsorted) :
 [[22 55 88]
 [11 44 55]
 [33 66 99]]
Mean of the 2-D array along axis=None : 55.0
Mean of the 2-D array along axis=0 : [22. 55. 88.]
Mean of the 2-D array along axis=1 : [55. 44. 66.]

figar banana

Example

Python
# 2-D array median ==> unsorted elements using shuffle
a = np.arange(9)
np.random.shuffle(a)
a = a.reshape(3,3)
print(f"The original 2-D array(unsorted) : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.median(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.median(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.median(a,axis=1)}")

Output

PowerShell
The original 2-D array(unsorted) :
 [[6 8 4]
 [3 0 5]
 [2 1 7]]
Mean of the 2-D array along axis=None : 4.0
Mean of the 2-D array along axis=0 : [3. 1. 5.]
Mean of the 2-D array along axis=1 : [6. 3. 2.]

Variance value

np.var(a)
a.var()

The variance is a measure of variability. It is calculated by taking the average of squared deviations from the mean.

  • average of
  • squared
  • deviations from the mean.

Example

Python
import numpy as np
help(np.var)

Output

PowerShell
Help on function var in module numpy:

var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>, *, where=<no value>)
    Compute the variance along the specified axis.
    
Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.

figar banana

1-D array

Python
a = np.array([1,2,3,4,5])
print(f"Original 1-D array : {a}")
print(f"Variance of 1-D array unsing np.var(a): {np.var(a)}")
print(f"Variance of 1-D array unsing a.var(): {a.var()}")

Output

PowerShell
Original 1-D array : [1 2 3 4 5]
Variance of 1-D array unsing np.var(a): 2.0
Variance of 1-D array unsing a.var(): 2.0

2-D array

  • axis=None(default) – The array is flattened to 1-D array(sorted) and find the variance value.
  • axis=0 – rows. Consider columns with all rows and find the variance.
  • axis=1 – columns. Consider rows with all columns and find the variance.

Example

Python
a = np.arange(6).reshape(2,3)
print(f"Original 2-D array :\n {a}")
print(f"Variance of 2-D array using np.var(a) along axis=None: {np.var(a)}")
print(f"Variance of 2-D array using np.var(a) along axis=0: {np.var(a,axis=0)}")
print(f"Variance of 2-D array using np.var(a) along axis=1: {np.var(a,axis=1)}")

Output

PowerShell
Original 2-D array :
 [[0 1 2]
 [3 4 5]]
Variance of 2-D array using np.var(a) along axis=None: 2.9166666666666665
Variance of 2-D array using np.var(a) along axis=0: [2.25 2.25 2.25]
Variance of 2-D array using np.var(a) along axis=1: [0.66666667 0.66666667]

Standard Deviation value

  • np.std(a)
  • a.std()
  • Variance means the average of squares of deviations from the mean.
  • Standard deviation is the square root of the variance.

1-D array

Python
import math
a = np.array([1,2,3,4,5])
print(f"Original 1-D array : {a}")
print(f"Variance of 1-D array unsing np.var(a): {np.var(a)}")
print(f"Standard Deviation of 1-D array unsing np.std(a): {np.std(a)}")
print(f"Square root of Variannce : {math.sqrt(np.var(a))}")

Output

PowerShell
Original 1-D array : [1 2 3 4 5]
Variance of 1-D array unsing np.var(a): 2.0
Standard Deviation of 1-D array unsing np.std(a): 1.4142135623730951
Square root of Variannce : 1.4142135623730951

2-D array

Python
import math
a = np.arange(6).reshape(2,3)
print(f"Original 2-D array :\n {a}")
print("*"*100)
print(f"Variance of 2-D array using np.var(a) along axis=None: {np.var(a)}")
print(f"Standard Deviation of 2-D array using np.std(a) along axis=None:
{np.std(a)}")
print(f"Square root of Variannce : {math.sqrt(np.var(a))}")
print("*"*100)
print(f"Variance of 2-D array using np.var(a) along axis=0: {np.var(a,axis=0)}")
print(f"Standard Deviation of 2-D array using np.std(a) along axis=0:
{np.std(a,axis=0)}")
print("*"*100)
print(f"Variance of 2-D array using np.var(a) along axis=1: {np.var(a,axis=1)}")
print(f"Standard Deviation of 2-D array using np.std(a) along axis=1:
{np.std(a,axis=1)}")
print("*"*100)

Output

PowerShell
Original 2-D array :
 [[0 1 2]
 [3 4 5]]
*****************************************************************************
Variance of 2-D array using np.var(a) along axis=None: 2.9166666666666665
Standard Deviation of 2-D array using np.std(a) along axis=None: 1.707825127659933
Square root of Variannce : 1.707825127659933
*****************************************************************************
Variance of 2-D array using np.var(a) along axis=0: [2.25 2.25 2.25]
Standard Deviation of 2-D array using np.std(a) along axis=0: [1.5 1.5 1.5]
*****************************************************************************
Variance of 2-D array using np.var(a) along axis=1: [0.66666667 0.66666667]
Standard Deviation of 2-D array using np.std(a) along axis=1: [0.81649658 0.8
1649658]
*****************************************************************************

Summary

  • np.min(a)/np.amin(a)/a.min()—>Returns the minimum value of the array.
  • np.max(a)/np.amax(a)/a.max()—>Returns the maximum value of the array.
  • np.sum(a)/a.sum()—>Returns the Sum of values of the array.
  • np.mean(a)/a.mean()—>Returns the arithmetic mean of the array.
  • np.median(a) —>Returns median value of the array.
  • np.var(a)/a.var() —>Returns variance of the values in the array.
  • np.std(a)/a.std() —>Returns Standard deviation of the values in the array.