Basic Statistics with Numpy
In datascience domain, we required to collect,store and analyze huge amount of
data. From this data we may required to find some basic statistics like.
- Minimum value
- Maximum value
- Average Value
- Sum of all values
- Mean value
- Median value
- Variance
- Standard deviation etc
Minimum value
- np.min(a)
- np.amin(a)
- a.min()
Example
Python
import numpy as np
help(np.min)Output
PowerShell
Help on function amin in module numpy:
amin(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
Return the minimum of an array or minimum along an axis.1-D array
Example
Python
a = np.array([10,5,20,3,25])
print(f"1-D array : {a}")
print(f"np.min(a) value : {np.min(a)}")
print(f"np.amin(a) value : {np.amin(a)}")
print(f"a.min() value : {a.min()}")Output
PowerShell
1-D array : [10 5 20 3 25]
np.min(a) value : 3
np.amin(a) value : 3
a.min() value : 32-D array
- axis=None(default) – The array is flattened to 1-D array and find the the min value.
- .axis=0– minimum row and that row contains 3 element.
- axis=1 – minimum column and that column contains 4 elements.
- axis=0 – minimum row and that is by considering all the columns, in that min row value.
- axis=1 – minimum column and that is by considering all rows, in that min column value
Example
Python
import numpy as np
a = np.array([[100,20,30],[10,50,60],[25,15,18],[4,5,19]])
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")Output
PowerShell
array a :
[[100 20 30]
[ 10 50 60]
[ 25 15 18]
[ 4 5 19]]
Minimum value along axis=None : 4
Minimum value along axis-0 : [ 4 5 18]
Minimum value along axis-1 : [20 10 15 4]Example
Python
import numpy as np
a = np.arange(24).reshape(6,4)
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")Output
PowerShell
array a :
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]
Minimum value along axis=None : 0
Minimum value along axis-0 : [0 1 2 3]
Minimum value along axis-1 : [ 0 4 8 12 16 20]Example
Python
import numpy as np
a = np.arange(24)
np.random.shuffle(a)
a = a.reshape(6,4)
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")Output
PowerShell
array a :
[[20 5 4 21]
[ 1 10 6 14]
[ 0 11 17 13]
[ 3 2 22 23]
[ 8 7 19 18]
[ 9 12 15 16]]
Minimum value along axis=None : 0
Minimum value along axis-0 : [ 0 2 4 13]
Minimum value along axis-1 : [4 1 0 2 7 9]Maximum value
- np.max(a)
- np.amax(a)
- a.max()
Example
Python
import numpy as np
help(np.max)Output
PowerShell
Help on function amax in module numpy:
amax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
Return the maximum of an array or maximum along an axis.1-D array
Example
Python
a = np.array([10,5,20,3,25])
print(f"1-D array : {a}")
print(f"np.max(a) value : {np.max(a)}")
print(f"np.amax(a) value : {np.amax(a)}")
print(f"a.max() value : {a.max()}")Output
PowerShell
1-D array : [10 5 20 3 25]
np.max(a) value : 25
np.amax(a) value : 25
a.max() value : 252-D array
- axis=None(default) – The array is flattened to 1-D array and find the the max value.
- axis=0 – maximum row and that row contains 3 element.
- axis=1 – maximum column and that column contains 4 elements.
- axis=0 – maximum row and that is by considering all the columns, in that max row value.
- axis=1 – maximum column and that is by considering all rows, in that max column value.
Example
Python
import numpy as np
a = np.array([[100,20,30],[10,50,60],[25,15,18],[4,5,19]])
print(f"array a : \n {a}")
print(f"Maximum value along axis=None : {np.max(a)}")
print(f"Maximum value along axis-0 : {np.max(a,axis=0)}")
print(f"Maximum value along axis-1 : {np.max(a,axis=1)}")Output
PowerShell
array a :
[[100 20 30]
[ 10 50 60]
[ 25 15 18]
[ 4 5 19]]
Maximum value along axis=None : 100
Maximum value along axis-0 : [100 50 60]
Maximum value along axis-1 : [100 60 25 19]sum of the elements
- np.sum()
- a.sum()
Example
Python
import numpy as np
help(np.sum)Output
PowerShell
Help on function sum in module numpy:
sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no valu
e>, where=<no value>)
Sum of array elements over a given axis.1-D array
Example
Python
# sum of elements of 1-D array
a = np.arange(4)
print(f"The array a : {a}")
print(f"sum of elements using np.sum(a) :: {np.sum(a)}")
print(f"sum of elements using a.sum() :: {a.sum()}")Output
PowerShell
The array a : [0 1 2 3]
sum of elements using np.sum(a) :: 6
sum of elements using a.sum() :: 62-D array
- axis=None(default) – The array is flattened to 1-D array and sum is calculated.
- axis=0 – all rows and sum of each column.
- axis=1 – all columns and sum of each row.
Example
Python
a = np.arange(9).reshape(3,3)
print(f"array a : \n {a}")
print(f"Sum along axis=None : {np.sum(a)}")
print(f"Sum along axis-0 : {np.sum(a,axis=0)}")
print(f"Sum along axis-1 : {np.sum(a,axis=1)}")Output
PowerShell
array a :
[[0 1 2]
[3 4 5]
[6 7 8]]
Sum along axis=None : 36
Sum along axis-0 : [ 9 12 15]
Sum along axis-1 : [ 3 12 21]Mean value
- np.mean(a)
- a.mean()
- Mean is the sum of elements along the specified axis divided by number of elements.
Example
Python
import numpy as np
help(np.mean)Output
PowerShell
Help on function mean in module numpy:
mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)
Compute the arithmetic mean along the specified axis.1-D array
Python
a = np.arange(5)
print(f"1-D array : {a}")
print(f"np.mean(a) value : {np.mean(a)}")
print(f"a.mean() value : {a.mean()}")Output
PowerShell
1-D array : [0 1 2 3 4]
np.mean(a) value : 2.0
a.mean() value : 2.02-D array
- axis=None(default) – The array is flattened to 1-D array and find the mean(average) value
- axis=0 – rows. Consider columns with all rows and find the average.
- axis=1 – columns. Consider rows with all columns and find the average.
Example
Python
# 2-D array mean
a = np.arange(9).reshape(3,3)
print(f"The original 2-D array : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.mean(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.mean(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.mean(a,axis=1)}")Output
PowerShell
The original 2-D array :
[[0 1 2]
[3 4 5]
[6 7 8]]
Mean of the 2-D array along axis=None : 4.0
Mean of the 2-D array along axis=0 : [3. 4. 5.]
Mean of the 2-D array along axis=1 : [1. 4. 7.]Median value
np.median(a)
- Median means middle element of the array (sorted form).
- If the array contains even number of elements, then the median is the middle element value.
- If the array contains odd number of elements, then the median is the average of 2 middle element values.
Example
Python
import numpy as np
help(np.median)Output
PowerShell
Help on function median in module numpy:
median(a, axis=None, out=None, overwrite_input=False, keepdims=False)
Compute the median along the specified axis.
Returns the median of the array elements.1-D array
Example
Python
a = np.array([10,20,30,40])
b = np.array([10,20,30,40,50])
print(f"The array with even number of elements : {a}")
print(f"Median of the array with even number of elements : {np.median(a)}")
print()
print(f"The array with odd number of elements : {b}")
print(f"Median of the array with odd number of elements : {np.median(b)}")Output
PowerShell
The array with even number of elements : [10 20 30 40]
Median of the array with even number of elements : 25.0
The array with odd number of elements : [10 20 30 40 50]
Median of the array with odd number of elements : 30.0Example
Python
# unsorted array(even no of elements) will be converted to sorted array and then
#median is calculated
a = np.array([80,20,60,40])
print(f"The array with even number of elements(unsorted) : {a}")
print("*"*100)
print("This step is calculated internally ")
print(f"sorted form of given array : {np.sort(a)}")
print("*"*100)
print(f"Median of the array with even number of elements : {np.median(a)}")Output
PowerShell
The array with even number of elements(unsorted) : [80 20 60 40]
*****************************************************************************
This step is calculated internally
sorted form of given array : [20 40 60 80]
****************************************************************************
Median of the array with even number of elements : 50.0Example
Python
# unsorted array(odd no of elements) will be converted to sorted array and then
#median is calculated
a = np.array([80,20,60,40,100,140,120])
print(f"The array with even number of elements(unsorted) : {a}")
print("*"*100)
print("This step is calculated internally ")
print(f"sorted form of given array : {np.sort(a)}")
print("*"*100)
print(f"Median of the array with even number of elements :{np.median(a)}")Output
PowerShell
The array with even number of elements(unsorted) : [ 80 20 60 40 100 140 120]
*****************************************************************************
This step is calculated internally
sorted form of given array : [ 20 40 60 80 100 120 140]
*****************************************************************************
Median of the array with even number of elements : 80.02-D array
- axis=None(default) – The array is flattened to 1-D array(sorted) and find the median value.
- axis=0 – rows. Consider columns with all rows and find the median.
- axis=1 – columns. Consider rows with all columns and find the median.
figar bnana
Example
Python
# 2-D array median
a = np.arange(9).reshape(3,3)
print(f"The original 2-D array(already sorted) : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.median(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.median(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.median(a,axis=1)}")Output
PowerShell
The original 2-D array(already sorted) :
[[0 1 2]
[3 4 5]
[6 7 8]]
Mean of the 2-D array along axis=None : 4.0
Mean of the 2-D array along axis=0 : [3. 4. 5.]
Mean of the 2-D array along axis=1 : [1. 4. 7.]Example
Python
# 2-D array median ==> unsorted elements
a = np.array([[22,55,88],[11,44,55],[33,66,99]])
print(f"The original 2-D array(unsorted) : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.median(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.median(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.median(a,axis=1)}")Output
PowerShell
The original 2-D array(unsorted) :
[[22 55 88]
[11 44 55]
[33 66 99]]
Mean of the 2-D array along axis=None : 55.0
Mean of the 2-D array along axis=0 : [22. 55. 88.]
Mean of the 2-D array along axis=1 : [55. 44. 66.]figar banana
Example
Python
# 2-D array median ==> unsorted elements using shuffle
a = np.arange(9)
np.random.shuffle(a)
a = a.reshape(3,3)
print(f"The original 2-D array(unsorted) : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.median(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.median(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.median(a,axis=1)}")Output
PowerShell
The original 2-D array(unsorted) :
[[6 8 4]
[3 0 5]
[2 1 7]]
Mean of the 2-D array along axis=None : 4.0
Mean of the 2-D array along axis=0 : [3. 1. 5.]
Mean of the 2-D array along axis=1 : [6. 3. 2.]Variance value
np.var(a)
a.var()
The variance is a measure of variability. It is calculated by taking the average of squared deviations from the mean.
- average of
- squared
- deviations from the mean.
Example
Python
import numpy as np
help(np.var)Output
PowerShell
Help on function var in module numpy:
var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>, *, where=<no value>)
Compute the variance along the specified axis.
Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.
figar banana
1-D array
Python
a = np.array([1,2,3,4,5])
print(f"Original 1-D array : {a}")
print(f"Variance of 1-D array unsing np.var(a): {np.var(a)}")
print(f"Variance of 1-D array unsing a.var(): {a.var()}")Output
PowerShell
Original 1-D array : [1 2 3 4 5]
Variance of 1-D array unsing np.var(a): 2.0
Variance of 1-D array unsing a.var(): 2.02-D array
- axis=None(default) – The array is flattened to 1-D array(sorted) and find the variance value.
- axis=0 – rows. Consider columns with all rows and find the variance.
- axis=1 – columns. Consider rows with all columns and find the variance.
Example
Python
a = np.arange(6).reshape(2,3)
print(f"Original 2-D array :\n {a}")
print(f"Variance of 2-D array using np.var(a) along axis=None: {np.var(a)}")
print(f"Variance of 2-D array using np.var(a) along axis=0: {np.var(a,axis=0)}")
print(f"Variance of 2-D array using np.var(a) along axis=1: {np.var(a,axis=1)}")Output
PowerShell
Original 2-D array :
[[0 1 2]
[3 4 5]]
Variance of 2-D array using np.var(a) along axis=None: 2.9166666666666665
Variance of 2-D array using np.var(a) along axis=0: [2.25 2.25 2.25]
Variance of 2-D array using np.var(a) along axis=1: [0.66666667 0.66666667]Standard Deviation value
- np.std(a)
- a.std()
- Variance means the average of squares of deviations from the mean.
- Standard deviation is the square root of the variance.
1-D array
Python
import math
a = np.array([1,2,3,4,5])
print(f"Original 1-D array : {a}")
print(f"Variance of 1-D array unsing np.var(a): {np.var(a)}")
print(f"Standard Deviation of 1-D array unsing np.std(a): {np.std(a)}")
print(f"Square root of Variannce : {math.sqrt(np.var(a))}")Output
PowerShell
Original 1-D array : [1 2 3 4 5]
Variance of 1-D array unsing np.var(a): 2.0
Standard Deviation of 1-D array unsing np.std(a): 1.4142135623730951
Square root of Variannce : 1.41421356237309512-D array
Python
import math
a = np.arange(6).reshape(2,3)
print(f"Original 2-D array :\n {a}")
print("*"*100)
print(f"Variance of 2-D array using np.var(a) along axis=None: {np.var(a)}")
print(f"Standard Deviation of 2-D array using np.std(a) along axis=None:
{np.std(a)}")
print(f"Square root of Variannce : {math.sqrt(np.var(a))}")
print("*"*100)
print(f"Variance of 2-D array using np.var(a) along axis=0: {np.var(a,axis=0)}")
print(f"Standard Deviation of 2-D array using np.std(a) along axis=0:
{np.std(a,axis=0)}")
print("*"*100)
print(f"Variance of 2-D array using np.var(a) along axis=1: {np.var(a,axis=1)}")
print(f"Standard Deviation of 2-D array using np.std(a) along axis=1:
{np.std(a,axis=1)}")
print("*"*100)Output
PowerShell
Original 2-D array :
[[0 1 2]
[3 4 5]]
*****************************************************************************
Variance of 2-D array using np.var(a) along axis=None: 2.9166666666666665
Standard Deviation of 2-D array using np.std(a) along axis=None: 1.707825127659933
Square root of Variannce : 1.707825127659933
*****************************************************************************
Variance of 2-D array using np.var(a) along axis=0: [2.25 2.25 2.25]
Standard Deviation of 2-D array using np.std(a) along axis=0: [1.5 1.5 1.5]
*****************************************************************************
Variance of 2-D array using np.var(a) along axis=1: [0.66666667 0.66666667]
Standard Deviation of 2-D array using np.std(a) along axis=1: [0.81649658 0.8
1649658]
*****************************************************************************Summary
- np.min(a)/np.amin(a)/a.min()—>Returns the minimum value of the array.
- np.max(a)/np.amax(a)/a.max()—>Returns the maximum value of the array.
- np.sum(a)/a.sum()—>Returns the Sum of values of the array.
- np.mean(a)/a.mean()—>Returns the arithmetic mean of the array.
- np.median(a) —>Returns median value of the array.
- np.var(a)/a.var() —>Returns variance of the values in the array.
- np.std(a)/a.std() —>Returns Standard deviation of the values in the array.