About data tests property
- Models
- Sources
- Seeds
- Snapshots
- Analyses
version: 2
models:
- name: <model_name>
tests:
- <test_name>:
<argument_name>: <argument_value>
config:
<test_config>: <config-value>
columns:
- name: <column_name>
tests:
- <test_name>
- <test_name>:
<argument_name>: <argument_value>
config:
<test_config>: <config-value>
version: 2
sources:
- name: <source_name>
tables:
- name: <table_name>
tests:
- <test_name>
- <test_name>:
<argument_name>: <argument_value>
config:
<test_config>: <config-value>
columns:
- name: <column_name>
tests:
- <test_name>
- <test_name>:
<argument_name>: <argument_value>
config:
<test_config>: <config-value>
version: 2
seeds:
- name: <seed_name>
tests:
- <test_name>
- <test_name>:
<argument_name>: <argument_value>
config:
<test_config>: <config-value>
columns:
- name: <column_name>
tests:
- <test_name>
- <test_name>:
<argument_name>: <argument_value>
config:
<test_config>: <config-value>
version: 2
snapshots:
- name: <snapshot_name>
tests:
- <test_name>
- <test_name>:
<argument_name>: <argument_value>
config:
<test_config>: <config-value>
columns:
- name: <column_name>
tests:
- <test_name>
- <test_name>:
<argument_name>: <argument_value>
config:
<test_config>: <config-value>
This feature is not implemented for analyses.
Related documentation
Description
The data tests
property defines assertions about a column, table, or view. The property contains a list of generic tests, referenced by name, which can include the four built-in generic tests available in dbt. For example, you can add tests that ensure a column contains no duplicates and zero null values. Any arguments or configurations passed to those tests should be nested below the test name.
Once these tests are defined, you can validate their correctness by running dbt test
.
Out-of-the-box data tests
There are four generic data tests that are available out of the box, for everyone using dbt.
not_null
This test validates that there are no null
values present in a column.
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- not_null
unique
This test validates that there are no duplicate values present in a field.
The config and where clause are optional.
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- unique:
config:
where: "order_id > 21"
accepted_values
This test validates that all of the values in a column are present in a supplied list of values
. If any values other than those provided in the list are present, then the test will fail.
The accepted_values
test supports an optional quote
parameter which, by default, will single-quote the list of accepted values in the test query. To test non-strings (like integers or boolean values) explicitly set the quote
config to false
.
version: 2
models:
- name: orders
columns:
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'returned']
- name: status_id
tests:
- accepted_values:
values: [1, 2, 3, 4]
quote: false
relationships
This test validates that all of the records in a child table have a corresponding record in a parent table. This property is referred to as "referential integrity".
The following example tests that every order's customer_id
maps back to a valid customer
.
version: 2
models:
- name: orders
columns:
- name: customer_id
tests:
- relationships:
to: ref('customers')
field: id
The to
argument accepts a Relation – this means you can pass it a ref
to a model (e.g. ref('customers')
), or a source
(e.g. source('jaffle_shop', 'customers')
).
Additional examples
Test an expression
Some data tests require multiple columns, so it doesn't make sense to nest them under the columns:
key. In this case, you can apply the data test to the model (or source, seed, or snapshot) instead:
version: 2
models:
- name: orders
tests:
- unique:
column_name: "country_code || '-' || order_id"
Use custom generic test
If you've defined your own custom generic test, you can use that as the test_name
:
version: 2
models:
- name: orders
columns:
- name: order_id
tests:
- primary_key # name of my custom generic test
Check out the guide on writing a custom generic test for more information.
Custom data test name
By default, dbt will synthesize a name for your generic test by concatenating:
- test name (
not_null
,unique
, etc) - model name (or source/seed/snapshot)
- column name (if relevant)
- arguments (if relevant, e.g.
values
foraccepted_values
)
It does not include any configurations for the test. If the concatenated name is too long, dbt will use a truncated and hashed version instead. The goal is to preserve unique identifiers for all resources in your project, including tests.
You may also define your own name for a specific test, via the name
property.
When might you want this? dbt's default approach can result in some wonky (and ugly) test names. By defining a custom name, you get full control over how the test will appear in log messages and metadata artifacts. You'll also be able to select the test by that name.
version: 2
models:
- name: orders
columns:
- name: status
tests:
- accepted_values:
name: unexpected_order_status_today
values: ['placed', 'shipped', 'completed', 'returned']
config:
where: "order_date = current_date"
$ dbt test --select unexpected_order_status_today
12:43:41 Running with dbt=1.1.0
12:43:41 Found 1 model, 1 test, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics
12:43:41
12:43:41 Concurrency: 5 threads (target='dev')
12:43:41
12:43:41 1 of 1 START test unexpected_order_status_today ................................ [RUN]
12:43:41 1 of 1 PASS unexpected_order_status_today ...................................... [PASS in 0.03s]
12:43:41
12:43:41 Finished running 1 test in 0.13s.
12:43:41
12:43:41 Completed successfully
12:43:41
12:43:41 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
A test's name must be unique for all tests defined on a given model-column combination. If you give the same name to tests defined on several different columns, or across several different models, then dbt test --select <repeated_custom_name>
will select them all.
When might you need this? In cases where you have defined the same test twice, with only a difference in configuration, dbt will consider these tests to be duplicates:
version: 2
models:
- name: orders
columns:
- name: status
tests:
- accepted_values:
values: ['placed', 'shipped', 'completed', 'returned']
config:
where: "order_date = current_date"
- accepted_values:
values: ['placed', 'shipped', 'completed', 'returned']
config:
# only difference is in the 'where' config
where: "order_date = (current_date - interval '1 day')" # PostgreSQL syntax
Compilation Error
dbt found two tests with the name "accepted_values_orders_status__placed__shipped__completed__returned" defined on column "status" in "models.orders".
Since these resources have the same name, dbt will be unable to find the correct resource
when running tests.
To fix this, change the name of one of these resources:
- test.testy.accepted_values_orders_status__placed__shipped__completed__returned.69dce9e5d5 (models/one_file.yml)
- test.testy.accepted_values_orders_status__placed__shipped__completed__returned.69dce9e5d5 (models/one_file.yml)
By providing a custom name, you help dbt differentiate tests:
version: 2
models:
- name: orders
columns:
- name: status
tests:
- accepted_values:
name: unexpected_order_status_today
values: ['placed', 'shipped', 'completed', 'returned']
config:
where: "order_date = current_date"
- accepted_values:
name: unexpected_order_status_yesterday
values: ['placed', 'shipped', 'completed', 'returned']
config:
where: "order_date = (current_date - interval '1 day')" # PostgreSQL
$ dbt test
12:48:03 Running with dbt=1.1.0-b1
12:48:04 Found 1 model, 2 tests, 0 snapshots, 0 analyses, 167 macros, 0 operations, 1 seed file, 0 sources, 0 exposures, 0 metrics
12:48:04
12:48:04 Concurrency: 5 threads (target='dev')
12:48:04
12:48:04 1 of 2 START test unexpected_order_status_today ................................ [RUN]
12:48:04 2 of 2 START test unexpected_order_status_yesterday ............................ [RUN]
12:48:04 1 of 2 PASS unexpected_order_status_today ...................................... [PASS in 0.04s]
12:48:04 2 of 2 PASS unexpected_order_status_yesterday .................................. [PASS in 0.04s]
12:48:04
12:48:04 Finished running 2 tests in 0.21s.
12:48:04
12:48:04 Completed successfully
12:48:04
12:48:04 Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
If using store_failures
: dbt uses each data test's name as the name of the table in which to store any failing records. If you have defined a custom name for one test, that custom name will also be used for its table of failures. You may optionally configure an alias
for the test, to separately control both the name of the test (for metadata) and the name of its database table (for storing failures).
Alternative format for defining tests
When defining a generic data test with several arguments and configurations, the YAML can look and feel unwieldy. If you find it easier, you can define the same test properties as top-level keys of a single dictionary, by providing the test name as test_name
instead. It's totally up to you.
This example is identical to the one above:
version: 2
models:
- name: orders
columns:
- name: status
tests:
- name: unexpected_order_status_today
test_name: accepted_values # name of the generic test to apply
values:
- placed
- shipped
- completed
- returned
config:
where: "order_date = current_date"