In the previous segment, you learnt how to configure your own Redshift cluster. In this segment, you will learn about the various considerations for creating a Redshift cluster. Let’s watch the next video where our SME will discuss this in detail.
Note:
By default, the port number for a Redshift cluster is 5439.
Note:
There is a mispronunciation in the video at 2:03
and 2:16 The SME is supposed to “DC2.8xlarge” and “DS2.8xlarge“.
Note:
You may come across a new node type ra3.xlplus while working with Redshift in AWS. This is a recent offering from AWS and more details on it can be found here.
In this video, our SME explained different types of nodes. Some key points from the video are summarised in the table below:
Note:
The pricing and specifications of these instance types are as follows at the time of recording.
Node Type | Highlights |
Amazon Redshift Analytics RA3 | Amazon Redshift Managed Store (RMS): Solid-state disks + Amazon S3 For RA3.xlplus, it is $1.086 per Hour For RA3.4xlarge, it is $3.26 per hour For RA3.16xlarge, it is $13.04 per hour |
Dense Compute (DC2) | For compute-intensive data warehouses with solid-state disks For DC2.large, it is $0.25 per hour For DC2.8xlarge, it is $4.80 per hour |
Dense Storage (DS2) | For large datawarehouses with HDD (magnetic) disks For DS2.xlarge, it is $0.85 per hour For DS2.8xlarge, it is $6.80 per hour |
You can refer to this page for more information about the pricing policies followed by Redshift.
The hardware specifications for the various node types are provided in the table below.
Instance | Disk Type | Size | Memory | CPUs | Slices |
RA3.4xlarge(new) | RMS | Scales to16TB | 96GB | 12 | 4 |
RA3.16xlarge(new) | RMS | Scales to 64 TB | 384 GB | 48 | 16 |
DC2.Iarge | SSD | 160 GB | 16 GB | 2 | 2 |
DC2.8xlarge | SSD | 2.56 TB | 244 GB | 32 | 16 |
DS2.xlarge | Magnetic | 2 TB | 32 GB | 4 | 2 |
DC2.8xlarge | Magnetic | 16 TB | 244 GB | 36 | 16 |
Amazon Redshift turbocharges query performance with machine learning-based automatic optimisations as discussed in the video below.
VACUUM
- VACUUM removes rows that are marked as ‘deleted’ and globally sorts tables.
- For the majority of the workload, AUTO VACUUM DELETE reclaims space and AUTO TABLE SORT sorts the needed portions of the table.
- In cases where you know your workload, VACUUM can be run manually.
- Use VACUUM BOOST at off-peak times (blocks deletes), which is as quick as ‘Deep Copy’.
ANALYZE
- The ANALYZE process collects table statistics for optimal query planning.
- In the vast majority of cases, AUTO ANALYZE automatically handles statistics gathering.