Data Solutions: Redshift Query performance tuning

Redshift Query performance tuning

Redshift Query performance tuning tips

Clustered vs Serverless Redshift:

In a provisioned cluster, the resources allocated are predefined which can sometimes become bottleneck when running heavy workloads. However, Serverless Redshift can scale up dynamically according to the workload which can increase efficiency and saves money

Order of Joins in SQL Query:

Always try to start joining tables with less data first and then move to the tables with alot of data

CTE vs Temp table:

Both CTE and temp table are good but try to use CTE's when you are dealing with less data. Use temp tables in your query when you are dealing with a lot of data.

Filter the dataset:

When data is huge, try if you can filter it out using a "where" clause based on a datetime/id column

Number of columns in Query matters:

Since redshift is a column database, try selecting columns that you need. Remove unnecessary columns from your query as more columns will result is slow performance.

Distkey and Sortkey:

Try adding a distkey column to Redshift table. it should be a column that you use while joining with other tables.
Add a sortkey column to Redshift table. The columns that you use in the "where" clause should be the sortkey. It can be a combination of multiple columns as well

Use of Distinct:

Try to not use the distinct keyword when dealing with a lot of data

Use of Explain keyword:

It is always a good idea to run the query plan (using explain keyword) to analyze your query and tell you the cost of each and individual step within query

Use of subqueries:

Try to limit using Subqueries as they can decrease performance.

Use Compression encoding:

It is always good idea to use compression encoding as it will reduce disk space and memory. Try using automatic encoding so that Redshift can automatically encode the columns for best performance
Generally, you should use the AZ64 encoding for numeric columns, the BYTEDICT encoding for text columns with a limited number of distinct values, and ZSTD encoding for other cases.

No comments:

Post a Comment

Subscribe to: Post Comments (Atom)