Greenplum configurations
Performance Optimizations
Tables in Greenplum have powerful optimization configurations to improve query performance:
- distribution
- column orientation
- compression
appendonly
toggle- partitions
Supplying these values as model-level configurations apply the corresponding settings in the generated CREATE TABLE
(except partitions). Note that these settings will have no effect for models set to view
.
Distribution
In Greenplum, you can choose a distribution key, that will be used to sort data by segments. Joining on the partition will become more performant after specifying distribution.
By default dbt-greenplum distributes data RANDOMLY
. To implement a distribution key you need to specify the distributed_by
parameter in model's config:
{{
config(
...
distributed_by='<field_name>'
...
)
}}
select ...
Also you can choose DISTRIBUTED REPLICATED
option:
{{
config(
...
distributed_replicated=true
...
)
}}
select ...
Column orientation
Greenpum supports two type of orientation row and column:
{{
config(
...
orientation='column'
...
)
}}
select ...
Compression
Compression allows reducing read-write time. Greenplum suggest several algorithms algotihms to compress append-optimized tables:
- RLE_TYPE(only for column oriented table)
- ZLIB
- ZSTD
- QUICKLZ
{{
config(
...
appendonly='true',
compresstype='ZLIB',
compresslevel=3,
blocksize=32768
...
)
}}
select ...
As you can see, you can also specify compresslevel
and blocksize
.
Partition
Greenplum does not support partitions with create table as
construction, so you need to build model in two steps
- create table schema
- insert data
To implement partitions into your dbt-model you need to specify the following config parameters:
fields_string
- definition of columns name, type and constraintsraw_partition
- partition specification
{% set fields_string %}
some_filed int4 null,
date_field timestamp NULL
{% endset %}
{% set raw_partition %}
PARTITION BY RANGE (date_field)
(
START ('2021-01-01'::timestamp) INCLUSIVE
END ('2023-01-01'::timestamp) EXCLUSIVE
EVERY (INTERVAL '1 day'),
DEFAULT PARTITION default_part
);
{% endset %}
{{
config(
...
fields_string=fields_string,
raw_partition=raw_partition,
...
)
}}
select *