IKH

Chepter-20

Basic Statistics with Numpy

In Datascience domain, we required to collect,store and analyze huge amount of
data. From this data we may required to find some basic statistics like

  • Minimum value
  • Maximum value
  • Average Value
  • Sum of all values
  • Mean value
  • Median value
  • Variance
  • Standard deviation etc

Minimum value

  • np.min(a)
  • np.amin(a)
  • a.min()

Example

Python
In [462]: 
import numpy as np 
help(np.min) 

Output

PowerShell
Help on function amin in module numpy: 
 
amin(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<
 no value>) 
    Return the minimum of an array or minimum along an axis. 

1-D array

Example

Python
In [463]: 
a = np.array([10,5,20,3,25]) 
print(f"1-D array : {a}") 
print(f"np.min(a) value : {np.min(a)}") 
print(f"np.amin(a) value : {np.amin(a)}") 
print(f"a.min() value : {a.min()}") 

Output

PowerShell
1-D array : [10  5 20  3 25] 
np.min(a) value : 3 
np.amin(a) value : 3 
a.min() value : 3 

2-D array

  • axis=None(default) – The array is flattened to 1-D array and find the the min value
  • axis=0 – minimum row and that row contains 3 element
  • axis=1 – minimum column and that column contains 4 elements
  • axis=0 – minimum row and that is by considering all the columns, in that min row
    value
  • axis=1 – minimum column and that is by considering all rows, in that min column
    value

Example

Python
In [464]: 
import numpy as np 
a = np.array([[100,20,30],[10,50,60],[25,15,18],[4,5,19]]) 
print(f"array a : \n {a}") 
print(f"Minimum value along axis=None : {np.min(a)}") 
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}") 
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}") 

Output

PowerShell
array a :  
 [[100  20  30] 
 [ 10  50  60] 
 [ 25  15  18] 
 [  4   5  19]] 
Minimum value along axis=None : 4 
Minimum value along axis-0 : [ 4  5 18] 
Minimum value along axis-1 : [20 10 15  4]

Example

Python
In [465]: 
import numpy as np 
a = np.arange(24).reshape(6,4) 
print(f"array a : \n {a}") 
print(f"Minimum value along axis=None : {np.min(a)}") 
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}") 
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}") 

Output

PowerShell
array a :  
 [[ 0  1  2  3] 
 [ 4  5  6  7] 
 [ 8  9 10 11] 
 [12 13 14 15] 
 [16 17 18 19] 
 [20 21 22 23]] 
Minimum value along axis=None : 0 
Minimum value along axis-0 : [0 1 2 3] 
Minimum value along axis-1 : [ 0  4  8 12 16 20]

Example

Python
In [466]: 
import numpy as np 
a = np.arange(24) 
np.random.shuffle(a) 
a = a.reshape(6,4)
print(f"array a : \n {a}") 
print(f"Minimum value along axis=None : {np.min(a)}") 
print(f"Minimum value along axis-0 : {np.min(a,axis=0)}") 
print(f"Minimum value along axis-1 : {np.min(a,axis=1)}")

Output

PowerShell
array a :  
 [[20  5  4 21] 
 [ 1 10  6 14] 
 [ 0 11 17 13] 
 [ 3  2 22 23] 
 [ 8  7 19 18] 
 [ 9 12 15 16]] 
Minimum value along axis=None : 0 
Minimum value along axis-0 : [ 0  2  4 13] 
Minimum value along axis-1 : [4 1 0 2 7 9] 

Maximum value

  • np.max(a)
  • np.amax(a)
  • a.max()

Example

Python
In [467]: 
import numpy as np 
help(np.max)

Output

PowerShell
Help on function amax in module numpy: 
 
amax(a, axis=None, out=None, keepdims=<no value>, initial=<no value>, where=<
 no value>) 
    Return the maximum of an array or maximum along an axis.

1-D array

Example

Python
In [468]: 
a = np.array([10,5,20,3,25]) 
print(f"1-D array : {a}") 
print(f"np.max(a) value : {np.max(a)}") 
print(f"np.amax(a) value : {np.amax(a)}") 
print(f"a.max() value : {a.max()}")

Output

PowerShell
1-D array : [10  5 20  3 25] 
np.max(a) value : 25
np.amax(a) value : 25 
a.max() value : 25 

2-D array

  • axis=None(default) – The array is flattened to 1-D array and find the the max value
  • axis=0 – maximum row and that row contains 3 element
  • axis=1 – maximum column and that column contains 4 elements
  • axis=0 – maximum row and that is by considering all the columns, in that max row
    value
  • axis=1 – maximum column and that is by considering all rows, in that max column
    value

Example

Python
In [469]: 
import numpy as np 
a = np.array([[100,20,30],[10,50,60],[25,15,18],[4,5,19]]) 
print(f"array a : \n {a}") 
print(f"Maximum value along axis=None : {np.max(a)}") 
print(f"Maximum value along axis-0 : {np.max(a,axis=0)}") 
print(f"Maximum value along axis-1 : {np.max(a,axis=1)}")

Output

PowerShell
array a :  
 [[100  20  30] 
 [ 10  50  60] 
 [ 25  15  18] 
 [  4   5  19]] 
Maximum value along axis=None : 100 
Maximum value along axis-0 : [100  50  60] 
Maximum value along axis-1 : [100  60  25  19] 

sum of the elements

  • np.sum()
  • a.sum()

Example

Python
In [470]: 
import numpy as np 
help(np.sum)

Output

PowerShell
Help on function sum in module numpy: 
 
sum(a, axis=None, dtype=None, out=None, keepdims=<no value>, initial=<no valu
 e>, where=<no value>) 
    Sum of array elements over a given axis.

1-D array

Example

Python
In [471]: 
# sum of elements of 1-D array 
a = np.arange(4) 
print(f"The array a : {a}") 
print(f"sum of elements using np.sum(a) :: {np.sum(a)}") 
print(f"sum of elements using a.sum() :: {a.sum()}")

Output

PowerShell
The array a : [0 1 2 3] 
sum of elements using np.sum(a) :: 6 
sum of elements using a.sum() :: 6 

2-D array

  • axis=None(default) – The array is flattened to 1-D array and sum is calculated
  • axis=0 – all rows and sum of each column
  • axis=1 – all columns and sum of each row

Example

Python
In [472]: 
a = np.arange(9).reshape(3,3) 
print(f"array a : \n {a}") 
print(f"Sum along axis=None : {np.sum(a)}") 
print(f"Sum along axis-0 : {np.sum(a,axis=0)}") 
print(f"Sum along axis-1 : {np.sum(a,axis=1)}") 

Output

PowerShell
array a :  
 [[0 1 2] 
 [3 4 5] 
 [6 7 8]] 
Sum along axis=None : 36 
Sum along axis-0 : [ 9 12 15] 
Sum along axis-1 : [ 3 12 21]

Mean value

  • np.mean(a)
  • a.mean()
  • Mean is the sum of elements along the specified axis divided by number of elements.

Example

Python
In [473]: 
import numpy as np 
help(np.mean) 

Output

PowerShell
Help on function mean in module numpy: 
 
mean(a, axis=None, dtype=None, out=None, keepdims=<no value>, *, where=<no va
 lue>) 
    Compute the arithmetic mean along the specified axis.

1-D array

Example

Python
In [474]: 
a = np.arange(5) 
print(f"1-D array : {a}") 
print(f"np.mean(a) value : {np.mean(a)}") 
print(f"a.mean() value : {a.mean()}") 

Output

PowerShell
1-D array : [0 1 2 3 4] 
np.mean(a) value : 2.0 
a.mean() value : 2.0 

2-D array

  • axis=None(default) – The array is flattened to 1-D array and find the
  • mean(average) value
  • axis=0 – rows. Consider columns with all rows and find the average
  • axis=1 – columns. Consider rows with all columns and find the average

Example

Python
In [475]: 
# 2-D array mean 
a = np.arange(9).reshape(3,3) 
print(f"The original 2-D array : \n {a}") 
print(f"Mean of the 2-D array along axis=None : {np.mean(a)}") 
print(f"Mean of the 2-D array along axis=0 : {np.mean(a,axis=0)}") 
print(f"Mean of the 2-D array along axis=1 : {np.mean(a,axis=1)}")

Output

PowerShell
The original 2-D array :  
 [[0 1 2] 
 [3 4 5] 
 [6 7 8]] 
Mean of the 2-D array along axis=None : 4.0 
Mean of the 2-D array along axis=0 : [3. 4. 5.] 
Mean of the 2-D array along axis=1 : [1. 4. 7.] 

Median value

np.median(a)

  • Median means middle element of the array (sorted form)
  • If the array contains even number of elements, then the median is the middle
    element value
  • If the array contains odd number of elements, then the median is the average of 2
    middle element values

Example

Python
In [476]: 
import numpy as np 
help(np.median)

Output

PowerShell
Help on function median in module numpy: 
 
median(a, axis=None, out=None, overwrite_input=False, keepdims=False) 
    Compute the median along the specified axis. 
     
    Returns the median of the array elements.

1-D array

Example