Redshift Spectrum

In the previous session, you learnt about the features of Redshift and query optimisation techniques. You also got hands-on querying using Redshift.

An analyst can query SQL on stored data from Amazon S₃buckets using Redshift Spectrum. It helps in saving time, as it removes the need to transfer data from a storage facility to a database. Redshift Spectrum also expands the spectrum of a specific application, as it applies to broad volumes, from unstructured S₃ data lakes to the current Redshift data storage nodes of a customer.

In the next video, let’s hear from our expert, as he explains how the Amazon Redshift spectrum impacts Redshift efficiency.

Redshift Spectrum breaks the user query into filtered subsets that run simultaneously. These requests are spread across thousands of AWS-managed nodes to maintain query speed and ensure consistency in performance. Redshift Spectrum can scale to run a query across more than an exabyte of data. Once the data in S₃ is aggregated, it is returned to a local Redshift cluster for final processing.

The key points from the video provided above are summarised below:

Redshift Spectrum enables customers to use a lake house approach.

It supports multiple open-source data formats, such as CSV, Avro and Parquet.

It supports On-demand pricing – pay per query and data scanned.

Redshift Spectrum utilises a fleet of Amazon Redshift clusters, which are independent of your cluster.

Filtering and aggregation are performed in Redshift Spectrum, reducing the load on the Redshift cluster.

Report an error