Basic Statistics with Numpy
In datascience domain, we required to collect,store and analyze huge amount of
data. From this data we may required to find some basic statistics like.
- Minimum value
- Maximum value
- Average Value
- Sum of all values
- Mean value
- Median value
- Variance
- Standard deviation etc
Minimum value
- np.min(a)
- np.amin(a)
- a.min()
Example
Python
import numpy as np
help(np.min)
Output
PowerShell
Help on function amin in module numpy:
amin(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
Return the minimum of an array or minimum along an axis.
1-D array
Example
Python
a = np.array([10,5,20,3,25])
print(f"1-D array : {a}")
print(f"np.min(a) value : {np.min(a)}")
print(f"np.amin(a) value : {np.amin(a)}")
print(f"a.min() value : {a.min()}")
Output
PowerShell
1-D array : [10 5 20 3 25]
np.min(a) value : 3
np.amin(a) value : 3
a.min() value : 3
2-D array
- axis=None(default) – The array is flattened to 1-D array and find the the min value.
- .axis=0– minimum row and that row contains 3 element.
- axis=1 – minimum column and that column contains 4 elements.
- axis=0 – minimum row and that is by considering all the columns, in that min row value.
- axis=1 – minimum column and that is by considering all rows, in that min column value
Example
Python
import numpy as np
a = np.array([[100,20,30],[10,50,60],[25,15,18],[4,5,19]])
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")
Output
PowerShell
array a :
[[100 20 30]
[ 10 50 60]
[ 25 15 18]
[ 4 5 19]]
Minimum value along axis=None : 4
Minimum value along axis-0 : [ 4 5 18]
Minimum value along axis-1 : [20 10 15 4]
Example
Python
import numpy as np
a = np.arange(24).reshape(6,4)
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")
Output
PowerShell
array a :
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]
Minimum value along axis=None : 0
Minimum value along axis-0 : [0 1 2 3]
Minimum value along axis-1 : [ 0 4 8 12 16 20]
Example
Python
import numpy as np
a = np.arange(24)
np.random.shuffle(a)
a = a.reshape(6,4)
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")
Output
PowerShell
array a :
[[20 5 4 21]
[ 1 10 6 14]
[ 0 11 17 13]
[ 3 2 22 23]
[ 8 7 19 18]
[ 9 12 15 16]]
Minimum value along axis=None : 0
Minimum value along axis-0 : [ 0 2 4 13]
Minimum value along axis-1 : [4 1 0 2 7 9]
Maximum value
- np.max(a)
- np.amax(a)
- a.max()
Example
Python
import numpy as np
help(np.max)
Output
PowerShell
Help on function amax in module numpy:
amax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<no value>)
Return the maximum of an array or maximum along an axis.
1-D array
Example
Python
a = np.array([10,5,20,3,25])
print(f"1-D array : {a}")
print(f"np.max(a) value : {np.max(a)}")
print(f"np.amax(a) value : {np.amax(a)}")
print(f"a.max() value : {a.max()}")
Output
PowerShell
1-D array : [10 5 20 3 25]
np.max(a) value : 25
np.amax(a) value : 25
a.max() value : 25
2-D array
- axis=None(default) – The array is flattened to 1-D array and find the the max value.
- axis=0 – maximum row and that row contains 3 element.
- axis=1 – maximum column and that column contains 4 elements.
- axis=0 – maximum row and that is by considering all the columns, in that max row value.
- axis=1 – maximum column and that is by considering all rows, in that max column value.
Example
Python
import numpy as np
a = np.array([[100,20,30],[10,50,60],[25,15,18],[4,5,19]])
print(f"array a : \n {a}")
print(f"Maximum value along axis=None : {np.max(a)}")
print(f"Maximum value along axis-0 : {np.max(a,axis=0)}")
print(f"Maximum value along axis-1 : {np.max(a,axis=1)}")
Output
PowerShell
array a :
[[100 20 30]
[ 10 50 60]
[ 25 15 18]
[ 4 5 19]]
Maximum value along axis=None : 100
Maximum value along axis-0 : [100 50 60]
Maximum value along axis-1 : [100 60 25 19]
sum of the elements
- np.sum()
- a.sum()
Example
Python
import numpy as np
help(np.sum)
Output
PowerShell
Help on function sum in module numpy:
sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no valu
e>, where=<no value>)
Sum of array elements over a given axis.
1-D array
Example
Python
# sum of elements of 1-D array
a = np.arange(4)
print(f"The array a : {a}")
print(f"sum of elements using np.sum(a) :: {np.sum(a)}")
print(f"sum of elements using a.sum() :: {a.sum()}")
Output
PowerShell
The array a : [0 1 2 3]
sum of elements using np.sum(a) :: 6
sum of elements using a.sum() :: 6
2-D array
- axis=None(default) – The array is flattened to 1-D array and sum is calculated.
- axis=0 – all rows and sum of each column.
- axis=1 – all columns and sum of each row.
Example
Python
a = np.arange(9).reshape(3,3)
print(f"array a : \n {a}")
print(f"Sum along axis=None : {np.sum(a)}")
print(f"Sum along axis-0 : {np.sum(a,axis=0)}")
print(f"Sum along axis-1 : {np.sum(a,axis=1)}")
Output
PowerShell
array a :
[[0 1 2]
[3 4 5]
[6 7 8]]
Sum along axis=None : 36
Sum along axis-0 : [ 9 12 15]
Sum along axis-1 : [ 3 12 21]
Mean value
- np.mean(a)
- a.mean()
- Mean is the sum of elements along the specified axis divided by number of elements.
Example
Python
import numpy as np
help(np.mean)
Output
PowerShell
Help on function mean in module numpy:
mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no value>)
Compute the arithmetic mean along the specified axis.
1-D array
Python
a = np.arange(5)
print(f"1-D array : {a}")
print(f"np.mean(a) value : {np.mean(a)}")
print(f"a.mean() value : {a.mean()}")
Output
PowerShell
1-D array : [0 1 2 3 4]
np.mean(a) value : 2.0
a.mean() value : 2.0
2-D array
- axis=None(default) – The array is flattened to 1-D array and find the mean(average) value
- axis=0 – rows. Consider columns with all rows and find the average.
- axis=1 – columns. Consider rows with all columns and find the average.
Example
Python
# 2-D array mean
a = np.arange(9).reshape(3,3)
print(f"The original 2-D array : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.mean(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.mean(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.mean(a,axis=1)}")
Output
PowerShell
The original 2-D array :
[[0 1 2]
[3 4 5]
[6 7 8]]
Mean of the 2-D array along axis=None : 4.0
Mean of the 2-D array along axis=0 : [3. 4. 5.]
Mean of the 2-D array along axis=1 : [1. 4. 7.]
Median value
np.median(a)
- Median means middle element of the array (sorted form).
- If the array contains even number of elements, then the median is the middle element value.
- If the array contains odd number of elements, then the median is the average of 2 middle element values.
Example
Python
import numpy as np
help(np.median)
Output
PowerShell
Help on function median in module numpy:
median(a, axis=None, out=None, overwrite_input=False, keepdims=False)
Compute the median along the specified axis.
Returns the median of the array elements.
1-D array
Example
Python
a = np.array([10,20,30,40])
b = np.array([10,20,30,40,50])
print(f"The array with even number of elements : {a}")
print(f"Median of the array with even number of elements : {np.median(a)}")
print()
print(f"The array with odd number of elements : {b}")
print(f"Median of the array with odd number of elements : {np.median(b)}")
Output
PowerShell
The array with even number of elements : [10 20 30 40]
Median of the array with even number of elements : 25.0
The array with odd number of elements : [10 20 30 40 50]
Median of the array with odd number of elements : 30.0
Example
Python
# unsorted array(even no of elements) will be converted to sorted array and then
#median is calculated
a = np.array([80,20,60,40])
print(f"The array with even number of elements(unsorted) : {a}")
print("*"*100)
print("This step is calculated internally ")
print(f"sorted form of given array : {np.sort(a)}")
print("*"*100)
print(f"Median of the array with even number of elements : {np.median(a)}")
Output
PowerShell
The array with even number of elements(unsorted) : [80 20 60 40]
*****************************************************************************
This step is calculated internally
sorted form of given array : [20 40 60 80]
****************************************************************************
Median of the array with even number of elements : 50.0
Example
Python
# unsorted array(odd no of elements) will be converted to sorted array and then
#median is calculated
a = np.array([80,20,60,40,100,140,120])
print(f"The array with even number of elements(unsorted) : {a}")
print("*"*100)
print("This step is calculated internally ")
print(f"sorted form of given array : {np.sort(a)}")
print("*"*100)
print(f"Median of the array with even number of elements :{np.median(a)}")
Output
PowerShell
The array with even number of elements(unsorted) : [ 80 20 60 40 100 140 120]
*****************************************************************************
This step is calculated internally
sorted form of given array : [ 20 40 60 80 100 120 140]
*****************************************************************************
Median of the array with even number of elements : 80.0
2-D array
- axis=None(default) – The array is flattened to 1-D array(sorted) and find the median value.
- axis=0 – rows. Consider columns with all rows and find the median.
- axis=1 – columns. Consider rows with all columns and find the median.
figar bnana
Example
Python
# 2-D array median
a = np.arange(9).reshape(3,3)
print(f"The original 2-D array(already sorted) : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.median(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.median(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.median(a,axis=1)}")
Output
PowerShell
The original 2-D array(already sorted) :
[[0 1 2]
[3 4 5]
[6 7 8]]
Mean of the 2-D array along axis=None : 4.0
Mean of the 2-D array along axis=0 : [3. 4. 5.]
Mean of the 2-D array along axis=1 : [1. 4. 7.]
Example
Python
# 2-D array median ==> unsorted elements
a = np.array([[22,55,88],[11,44,55],[33,66,99]])
print(f"The original 2-D array(unsorted) : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.median(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.median(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.median(a,axis=1)}")
Output
PowerShell
The original 2-D array(unsorted) :
[[22 55 88]
[11 44 55]
[33 66 99]]
Mean of the 2-D array along axis=None : 55.0
Mean of the 2-D array along axis=0 : [22. 55. 88.]
Mean of the 2-D array along axis=1 : [55. 44. 66.]
figar banana
Example
Python
# 2-D array median ==> unsorted elements using shuffle
a = np.arange(9)
np.random.shuffle(a)
a = a.reshape(3,3)
print(f"The original 2-D array(unsorted) : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.median(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.median(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.median(a,axis=1)}")
Output
PowerShell
The original 2-D array(unsorted) :
[[6 8 4]
[3 0 5]
[2 1 7]]
Mean of the 2-D array along axis=None : 4.0
Mean of the 2-D array along axis=0 : [3. 1. 5.]
Mean of the 2-D array along axis=1 : [6. 3. 2.]
Variance value
np.var(a)
a.var()
The variance is a measure of variability. It is calculated by taking the average of squared deviations from the mean.
- average of
- squared
- deviations from the mean.
Example
Python
import numpy as np
help(np.var)
Output
PowerShell
Help on function var in module numpy:
var(a, axis=None, dtype=None, out=None, ddof=0, keepdims=<no value>, *, where=<no value>)
Compute the variance along the specified axis.
Returns the variance of the array elements, a measure of the spread of a distribution. The variance is computed for the flattened array by default, otherwise over the specified axis.
figar banana
1-D array
Python
a = np.array([1,2,3,4,5])
print(f"Original 1-D array : {a}")
print(f"Variance of 1-D array unsing np.var(a): {np.var(a)}")
print(f"Variance of 1-D array unsing a.var(): {a.var()}")
Output
PowerShell
Original 1-D array : [1 2 3 4 5]
Variance of 1-D array unsing np.var(a): 2.0
Variance of 1-D array unsing a.var(): 2.0
2-D array
- axis=None(default) – The array is flattened to 1-D array(sorted) and find the variance value.
- axis=0 – rows. Consider columns with all rows and find the variance.
- axis=1 – columns. Consider rows with all columns and find the variance.
Example
Python
a = np.arange(6).reshape(2,3)
print(f"Original 2-D array :\n {a}")
print(f"Variance of 2-D array using np.var(a) along axis=None: {np.var(a)}")
print(f"Variance of 2-D array using np.var(a) along axis=0: {np.var(a,axis=0)}")
print(f"Variance of 2-D array using np.var(a) along axis=1: {np.var(a,axis=1)}")
Output
PowerShell
Original 2-D array :
[[0 1 2]
[3 4 5]]
Variance of 2-D array using np.var(a) along axis=None: 2.9166666666666665
Variance of 2-D array using np.var(a) along axis=0: [2.25 2.25 2.25]
Variance of 2-D array using np.var(a) along axis=1: [0.66666667 0.66666667]
Standard Deviation value
- np.std(a)
- a.std()
- Variance means the average of squares of deviations from the mean.
- Standard deviation is the square root of the variance.
1-D array
Python
import math
a = np.array([1,2,3,4,5])
print(f"Original 1-D array : {a}")
print(f"Variance of 1-D array unsing np.var(a): {np.var(a)}")
print(f"Standard Deviation of 1-D array unsing np.std(a): {np.std(a)}")
print(f"Square root of Variannce : {math.sqrt(np.var(a))}")
Output
PowerShell
Original 1-D array : [1 2 3 4 5]
Variance of 1-D array unsing np.var(a): 2.0
Standard Deviation of 1-D array unsing np.std(a): 1.4142135623730951
Square root of Variannce : 1.4142135623730951
2-D array
Python
import math
a = np.arange(6).reshape(2,3)
print(f"Original 2-D array :\n {a}")
print("*"*100)
print(f"Variance of 2-D array using np.var(a) along axis=None: {np.var(a)}")
print(f"Standard Deviation of 2-D array using np.std(a) along axis=None:
{np.std(a)}")
print(f"Square root of Variannce : {math.sqrt(np.var(a))}")
print("*"*100)
print(f"Variance of 2-D array using np.var(a) along axis=0: {np.var(a,axis=0)}")
print(f"Standard Deviation of 2-D array using np.std(a) along axis=0:
{np.std(a,axis=0)}")
print("*"*100)
print(f"Variance of 2-D array using np.var(a) along axis=1: {np.var(a,axis=1)}")
print(f"Standard Deviation of 2-D array using np.std(a) along axis=1:
{np.std(a,axis=1)}")
print("*"*100)
Output
PowerShell
Original 2-D array :
[[0 1 2]
[3 4 5]]
*****************************************************************************
Variance of 2-D array using np.var(a) along axis=None: 2.9166666666666665
Standard Deviation of 2-D array using np.std(a) along axis=None: 1.707825127659933
Square root of Variannce : 1.707825127659933
*****************************************************************************
Variance of 2-D array using np.var(a) along axis=0: [2.25 2.25 2.25]
Standard Deviation of 2-D array using np.std(a) along axis=0: [1.5 1.5 1.5]
*****************************************************************************
Variance of 2-D array using np.var(a) along axis=1: [0.66666667 0.66666667]
Standard Deviation of 2-D array using np.std(a) along axis=1: [0.81649658 0.8
1649658]
*****************************************************************************
Summary
- np.min(a)/np.amin(a)/a.min()—>Returns the minimum value of the array.
- np.max(a)/np.amax(a)/a.max()—>Returns the maximum value of the array.
- np.sum(a)/a.sum()—>Returns the Sum of values of the array.
- np.mean(a)/a.mean()—>Returns the arithmetic mean of the array.
- np.median(a) —>Returns median value of the array.
- np.var(a)/a.var() —>Returns variance of the values in the array.
- np.std(a)/a.std() —>Returns Standard deviation of the values in the array.