Vectorial Representation of Data

In order to understand the working of PCA, it is crucial to understand some essential linear algebra concepts, such as matrices, vectors and their associated operations. let’s take a look at the following lecture as you go through a checklist of linear algebra that you should be knowing before foraying into PCA.
To summarise what you’re going to learn in this segment here’s a handy checklist:

Vectors and their properties
Vector operations (addition, scaling, linear combination and
dot product)
Matrices
Matrix operation (matrix multiplication and matrix inverses)

Let’s start with understanding the dataset as a matrix of vectors in the
following lecture.

Note – In the video at 1:48 the graphic mistakenly shows [16565] instead of [16555]

As mentioned in the video, consider the following data set containing the height and weight of five patients:

The height and weight information can be represented in the form of a matrix as follows.

with each row representing a particular patient’s data and each column representing the original variable. Geometrically, these patients can be represented as shown in the following image:

[Note: The point P5 is slightly off from its actual position in the graph given above and in the video.]

Vector Representation

The vector associated with the first patient is given by the values (165, 55). This value can also be written in the following way:

1. A column containing the values along the rows. This is also known as the column-vector representation.
[16555]
As a transpose of the above form. Essentially, it is the same column vector but now written as a transpose of a row vector.
[16555]T
[Note: Transpose is something you must have learnt in your Python for DS module. If you need some brushing up on this topic, you can take a look at this link]
In terms of the basis vectors
This is something which you’ll learn in detail in later segments. To give a brief idea, the vector (165,55) can also be written as 165i +55j, where i and j are the unit vectors along X and Y respectively and are the basis vectors used to represent all vectors in the 2-D space.

Vector Representation for n-dimensional data

Each vector will contain values representing all the dimensions or variables in the data. For example, if there was an age variable also included in the above dataset and the first patient had an age of 22 years, then the vector representing him would be written as (165, 55, 22). Similarly, if the dataset had 10 variables, there would be 10 dimensions in the vector representation. Similarly, you can extend it for n dimensions or variables.

Now, these vectors have certain properties and operations associated with them. Let’s go ahead and learn them in the next segment. Before that, you can attempt the following question to test your understanding until now.

Report an error