Basic Statistics with Numpy
In Datascience domain, we required to collect,store and analyze huge amount of
data. From this data we may required to find some basic statistics like
- Minimum value
- Maximum value
- Average Value
- Sum of all values
- Mean value
- Median value
- Variance
- Standard deviation etc
Minimum value
- np.min(a)
- np.amin(a)
- a.min()
Example
Python
In [462]:
import numpy as np
help(np.min)
Output
PowerShell
Help on function amin in module numpy:
amin(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<
no value>)
Return the minimum of an array or minimum along an axis.
1-D array
Example
Python
In [463]:
a = np.array([10,5,20,3,25])
print(f"1-D array : {a}")
print(f"np.min(a) value : {np.min(a)}")
print(f"np.amin(a) value : {np.amin(a)}")
print(f"a.min() value : {a.min()}")
Output
PowerShell
1-D array : [10 5 20 3 25]
np.min(a) value : 3
np.amin(a) value : 3
a.min() value : 3
2-D array
- axis=None(default) – The array is flattened to 1-D array and find the the min value
- axis=0 – minimum row and that row contains 3 element
- axis=1 – minimum column and that column contains 4 elements
- axis=0 – minimum row and that is by considering all the columns, in that min row
value - axis=1 – minimum column and that is by considering all rows, in that min column
value
Example
Python
In [464]:
import numpy as np
a = np.array([[100,20,30],[10,50,60],[25,15,18],[4,5,19]])
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")
Output
PowerShell
array a :
[[100 20 30]
[ 10 50 60]
[ 25 15 18]
[ 4 5 19]]
Minimum value along axis=None : 4
Minimum value along axis-0 : [ 4 5 18]
Minimum value along axis-1 : [20 10 15 4]
Example
Python
In [465]:
import numpy as np
a = np.arange(24).reshape(6,4)
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")
Output
PowerShell
array a :
[[ 0 1 2 3]
[ 4 5 6 7]
[ 8 9 10 11]
[12 13 14 15]
[16 17 18 19]
[20 21 22 23]]
Minimum value along axis=None : 0
Minimum value along axis-0 : [0 1 2 3]
Minimum value along axis-1 : [ 0 4 8 12 16 20]
Example
Python
In [466]:
import numpy as np
a = np.arange(24)
np.random.shuffle(a)
a = a.reshape(6,4)
print(f"array a : \n {a}")
print(f"Minimum value along axis=None : {np.min(a)}")
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}")
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")
Output
PowerShell
array a :
[[20 5 4 21]
[ 1 10 6 14]
[ 0 11 17 13]
[ 3 2 22 23]
[ 8 7 19 18]
[ 9 12 15 16]]
Minimum value along axis=None : 0
Minimum value along axis-0 : [ 0 2 4 13]
Minimum value along axis-1 : [4 1 0 2 7 9]
Maximum value
- np.max(a)
- np.amax(a)
- a.max()
Example
Python
In [467]:
import numpy as np
help(np.max)
Output
PowerShell
Help on function amax in module numpy:
amax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<
no value>)
Return the maximum of an array or maximum along an axis.
1-D array
Example
Python
In [468]:
a = np.array([10,5,20,3,25])
print(f"1-D array : {a}")
print(f"np.max(a) value : {np.max(a)}")
print(f"np.amax(a) value : {np.amax(a)}")
print(f"a.max() value : {a.max()}")
Output
PowerShell
1-D array : [10 5 20 3 25]
np.max(a) value : 25
np.amax(a) value : 25
a.max() value : 25
2-D array
- axis=None(default) – The array is flattened to 1-D array and find the the max value
- axis=0 – maximum row and that row contains 3 element
- axis=1 – maximum column and that column contains 4 elements
- axis=0 – maximum row and that is by considering all the columns, in that max row
value - axis=1 – maximum column and that is by considering all rows, in that max column
value
Example
Python
In [469]:
import numpy as np
a = np.array([[100,20,30],[10,50,60],[25,15,18],[4,5,19]])
print(f"array a : \n {a}")
print(f"Maximum value along axis=None : {np.max(a)}")
print(f"Maximum value along axis-0 : {np.max(a,axis=0)}")
print(f"Maximum value along axis-1 : {np.max(a,axis=1)}")
Output
PowerShell
array a :
[[100 20 30]
[ 10 50 60]
[ 25 15 18]
[ 4 5 19]]
Maximum value along axis=None : 100
Maximum value along axis-0 : [100 50 60]
Maximum value along axis-1 : [100 60 25 19]
sum of the elements
- np.sum()
- a.sum()
Example
Python
In [470]:
import numpy as np
help(np.sum)
Output
PowerShell
Help on function sum in module numpy:
sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no valu
e>, where=<no value>)
Sum of array elements over a given axis.
1-D array
Example
Python
In [471]:
# sum of elements of 1-D array
a = np.arange(4)
print(f"The array a : {a}")
print(f"sum of elements using np.sum(a) :: {np.sum(a)}")
print(f"sum of elements using a.sum() :: {a.sum()}")
Output
PowerShell
The array a : [0 1 2 3]
sum of elements using np.sum(a) :: 6
sum of elements using a.sum() :: 6
2-D array
- axis=None(default) – The array is flattened to 1-D array and sum is calculated
- axis=0 – all rows and sum of each column
- axis=1 – all columns and sum of each row
Example
Python
In [472]:
a = np.arange(9).reshape(3,3)
print(f"array a : \n {a}")
print(f"Sum along axis=None : {np.sum(a)}")
print(f"Sum along axis-0 : {np.sum(a,axis=0)}")
print(f"Sum along axis-1 : {np.sum(a,axis=1)}")
Output
PowerShell
array a :
[[0 1 2]
[3 4 5]
[6 7 8]]
Sum along axis=None : 36
Sum along axis-0 : [ 9 12 15]
Sum along axis-1 : [ 3 12 21]
Mean value
- np.mean(a)
- a.mean()
- Mean is the sum of elements along the specified axis divided by number of elements.
Example
Python
In [473]:
import numpy as np
help(np.mean)
Output
PowerShell
Help on function mean in module numpy:
mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no va
lue>)
Compute the arithmetic mean along the specified axis.
1-D array
Example
Python
In [474]:
a = np.arange(5)
print(f"1-D array : {a}")
print(f"np.mean(a) value : {np.mean(a)}")
print(f"a.mean() value : {a.mean()}")
Output
PowerShell
1-D array : [0 1 2 3 4]
np.mean(a) value : 2.0
a.mean() value : 2.0
2-D array
- axis=None(default) – The array is flattened to 1-D array and find the
- mean(average) value
- axis=0 – rows. Consider columns with all rows and find the average
- axis=1 – columns. Consider rows with all columns and find the average
Example
Python
In [475]:
# 2-D array mean
a = np.arange(9).reshape(3,3)
print(f"The original 2-D array : \n {a}")
print(f"Mean of the 2-D array along axis=None : {np.mean(a)}")
print(f"Mean of the 2-D array along axis=0 : {np.mean(a,axis=0)}")
print(f"Mean of the 2-D array along axis=1 : {np.mean(a,axis=1)}")
Output
PowerShell
The original 2-D array :
[[0 1 2]
[3 4 5]
[6 7 8]]
Mean of the 2-D array along axis=None : 4.0
Mean of the 2-D array along axis=0 : [3. 4. 5.]
Mean of the 2-D array along axis=1 : [1. 4. 7.]
Median value
np.median(a)
- Median means middle element of the array (sorted form)
- If the array contains even number of elements, then the median is the middle
element value - If the array contains odd number of elements, then the median is the average of 2
middle element values
Example
Python
In [476]:
import numpy as np
help(np.median)
Output
PowerShell
Help on function median in module numpy:
median(a, axis=None, out=None, overwrite_input=False, keepdims=False)
Compute the median along the specified axis.
Returns the median of the array elements.