Redshift Architecture

Redshift requires computing resources to be provisioned and set up in the form of clusters, which contain a collection of one or more nodes. Each node has its individual CPU, storage disk and RAM. A leader node compiles queries and transfers them to compute nodes, which execute the queries.

Let’s summarise the above video into some key points:

On each node, data is stored in chunks, called slices. Redshift uses columnar storage, which means that every data block includes values from a single column over a number of rows, rather than a single row of several column values.

Redshift cluster: Redshift uses a cluster of nodes as the core component of its infrastructure. Generally, a cluster has one leader nod and several compute nodes. If only one compute node exists, the cluster has no other leader node.

Compute nodes: Every compute node has its own CPU, storage and memory disk. Client applications are oblivious to the presence of compute nodes, and, hence, never have to deal with them directly.

Leader node: Leader node handles all client communications. It also manages the coordination between the compute nodes as well as query parsing and execution plan development. The leader node generates the execution plan after receiving a query and assigns the compiled code to the compute nodes. Every compute node is allocated a portion of the data. The Leader Node performs the final aggregation of the results.

As shown in the image below, on each node, data is stored in the form of chunks called slices. Redshift uses columnar storage, which means that each block of data contains values from a single column across a number of rows, instead of a single row with values from multiple columns.

Now that you have learnt about the architecture of Redshift, let’s understand how Redshift stores data slices and blocks. Let’s use the example below to understand Slices and Blocks.

Imagine a residential society is a redshift cluster with different buildings as its compute nodes. The floors in each building are slices, and each flat is a block. If the flats (blocks) are big in size, you can accommodate more people in each flat.

Following are some points related to Slices and Blocks:

Slices

Each compute node is partitioned into 2 or 16 slices, which is determined based on the node type.

A slice can be considered a virtual compute node (within a compute node); each slice is allocated a portion of the node’s memory and disk space.

Table rows (the actual data) are distributed among slices.

Blocks

Slices are further divided into blocks (1 MB each).

A full block can contain millions of values.

Each block can be encoded/compressed with one of the 13 encodings available.

Additional Reading

Redshift Architecture

Report an error

Perv