Let's try: dbt part 7 - tests

we can add tests on important fields in dbt

Posted Dec 14, 2025

12 min read

In this series

Come to one of important parts of the data flow, we can add tests on important fields in dbt and here we are going to see how.

dbt tests

In dbt, we can perform tests by 2 types:

Generic tests
- This type of tests is predefined tests that we can simply call it like macros.
- Perform tests on a specific field of a model.
- We can configure generic tests in YAML files of many paths such as models, sources, seeds, and snapshots.
Singular tests
- This type is custom queries that we can design in a complex and flexible way.
- Perform tests either on fields or cross-models.
- We can write singular tests under tests/ directory.

We can execute tests by running commands in 2 options.

dbt test: execute tests only.
dbt build: execute tests and build models when tests passed.

In this blog, we only test on models.

Okay. Let’s begin.

Generic test

Generic tests can be configured with these 4 out-of-the-box patterns; unique, not_null, accepted_values, relationships under data_tests key.

So here we are looking into each of them plus crafting our own custom generic tests.

Prepare sources and models

Let’s say we have 2 source tables: student_grades and subjects where is referenced by subject field in student_grades as below.

  
sources:
  - name: students
    schema: raw
    tables:
      - name: student_grades
        columns:
          - name: id
          - name: name
          - name: subject
          - name: grade
          - name: updated_at
  - name: subjects
    schema: raw
    tables:
      - name: subjects
        columns:
          - name: id
          - name: names

We could spot that the table student_grades has some strange values:

id of the first row is null.
There are 2 “reed” names.
There is the subject “literature” which does not exist in table subjects.
The last row has grade “5” while others have 1-4.

Then we create a simple view model named calc_grades by just deriving everything from student_grades.

  
select * from {{ source("students", "student_grades") }}

  
models:
  - name: calc_grades
    config:
      materialized: view
      schema: transform
    columns:
      - name: id
      - name: name
      - name: subject
      - name: grade
      - name: updated_at

We should see the similar outputs after dbt run.

54:02  Running with dbt=1.10.13
54:03  Registered adapter: bigquery=1.10.2
54:04  Found 7 models, 14 seeds, 8 data tests, 4 snapshots, 1 analysis, 5 sources, 510 macros
54:04  
54:04  Concurrency: 1 threads (target='dev')
54:04  
54:09  1 of 1 START sql view model transform.calc_grades .............................. [RUN]
54:11  1 of 1 OK created sql view model transform.calc_grades ......................... [CREATE VIEW (0 processed) in 1.45s]
54:11  
54:11  Finished running 1 view model in 0 hours 0 minutes and 7.15 seconds (7.15s).
54:11  
54:11  Completed successfully
54:11  
54:11  Done. PASS=1 WARN=0 ERROR=0 SKIP=0 NO-OP=0 TOTAL=1

Now we’re gonna add generic tests into our view model.

test: `not_null`

not_null targets whether a specific field contains null values.

  
models:
  - name: <model_name>
    columns:
      - name: <column_name>
        data_tests:
          - not_null

  
models:
  - name: calc_grades
    config:
      materialized: view
      schema: transform
    columns:
      - name: id
        data_tests:
          - not_null
      - name: name
      - name: subject
      - name: grade
      - name: updated_at

56:32  1 of 1 START test not_null_calc_grades_id ...................................... [RUN]
56:34  1 of 1 FAIL 1 not_null_calc_grades_id .......................................... [FAIL 1 in 1.47s]
56:34  
56:34  Finished running 1 test in 0 hours 0 minutes and 4.10 seconds (4.10s).
56:34  
56:34  Completed with 1 error, 0 partial successes, and 0 warnings:
56:34  
56:34  Failure in test not_null_calc_grades_id (models/students/calc_grades.yml)
56:34    Got 1 result, configured to fail if != 0
56:34  
56:34    compiled code at target/compiled/dbt_proj01/models/students/calc_grades.yml/not_null_calc_grades_id.sql
56:34  
56:34  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 NO-OP=0 TOTAL=1

  
select id
from `bluebirz-playground`.`transform`.`calc_grades`
where id is null

test: `unique`

unique focuses if there is duplicate values in a specific field.

  
models:
  - name: <model_name>
    columns:
      - name: <column_name>
        data_tests:
          - unique

  
models:
  - name: calc_grades
    config:
      materialized: view
      schema: transform
    columns:
      - name: id
      - name: name
        data_tests:
          - unique
      - name: subject
      - name: grade
      - name: updated_at

57:41  1 of 1 START test unique_calc_grades_name ...................................... [RUN]
57:42  1 of 1 FAIL 1 unique_calc_grades_name .......................................... [FAIL 1 in 1.43s]
57:42  
57:42  Finished running 1 test in 0 hours 0 minutes and 4.15 seconds (4.15s).
57:42  
57:42  Completed with 1 error, 0 partial successes, and 0 warnings:
57:42  
57:42  Failure in test unique_calc_grades_name (models/students/calc_grades.yml)
57:42    Got 1 result, configured to fail if != 0
57:42  
57:42    compiled code at target/compiled/dbt_proj01/models/students/calc_grades.yml/unique_calc_grades_name.sql
57:42  
57:42  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 NO-OP=0 TOTAL=1

  
with dbt_test__target as (

  select name as unique_field
  from `bluebirz-playground`.`transform`.`calc_grades`
  where name is not null

)

select
    unique_field,
    count(*) as n_records

from dbt_test__target
group by unique_field
having count(*) > 1

test: `relationships`

relationships refers to referential values across models that every values in a model must be exist in a reference model.

  
models:
  - name: <model_name>
    columns:
      - name: <column_name>
        data_tests:
          - relationships:
              arguments:
                to: <referent table using source() or ref()>
                field: <field_name of referent table>

  
models:
  - name: calc_grades
    config:
      materialized: view
      schema: transform
    columns:
      - name: id
      - name: name
      - name: subject
        data_tests:
          - relationships:
              arguments:
                field: name
                to: source('subjects', 'subjects')
      - name: grade
      - name: updated_at

58:34  1 of 1 START test relationships_calc_grades_subject__name__source_subjects_subjects_  [RUN]
58:36  1 of 1 FAIL 1 relationships_calc_grades_subject__name__source_subjects_subjects_  [FAIL 1 in 1.40s]
58:36  
58:36  Finished running 1 test in 0 hours 0 minutes and 3.90 seconds (3.90s).
58:36  
58:36  Completed with 1 error, 0 partial successes, and 0 warnings:
58:36  
58:36  Failure in test relationships_calc_grades_subject__name__source_subjects_subjects_ (models/students/calc_grades.yml)
58:36    Got 1 result, configured to fail if != 0
58:36  
58:36    compiled code at target/compiled/dbt_proj01/models/students/calc_grades.yml/relationships_calc_grades_98611d65b006dbb31a361fa7d65fecc0.sql
58:36  
58:36  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 NO-OP=0 TOTAL=1

  
with child as (
    select subject as from_field
    from `bluebirz-playground`.`transform`.`calc_grades`
    where subject is not null
),

parent as (
    select name as to_field
    from `bluebirz-playground`.`raw`.`subjects`
)

select
    from_field

from child
left join parent
    on child.from_field = parent.to_field

where parent.to_field is null

test: `accepted_values`

accepted_values is a list of values that the specific field should have.

  
models:
  - name: <model_name>
    columns:
      - name: <column_name>
        data_tests:
          - accepted_values:
              arguments:
                values: <list of accepted values>
                quote: <true|false if `values` is a string>

  
models:
  - name: calc_grades
    config:
      materialized: view
      schema: transform
    columns:
      - name: id
      - name: name
      - name: subject
      - name: grade
        data_tests:
          - accepted_values:
              arguments:
                values:
                  - 1
                  - 2
                  - 3
                  - 4
                quote: false
      - name: updated_at

59:30  1 of 1 START test accepted_values_calc_grades_grade__False__1__2__3__4 ......... [RUN]
59:32  1 of 1 FAIL 1 accepted_values_calc_grades_grade__False__1__2__3__4 ............. [FAIL 1 in 1.38s]
59:32  
59:32  Finished running 1 test in 0 hours 0 minutes and 3.65 seconds (3.65s).
59:32  
59:32  Completed with 1 error, 0 partial successes, and 0 warnings:
59:32  
59:32  Failure in test accepted_values_calc_grades_grade__False__1__2__3__4 (models/students/calc_grades.yml)
59:32    Got 1 result, configured to fail if != 0
59:32  
59:32    compiled code at target/compiled/dbt_proj01/models/students/calc_grades.yml/accepted_values_calc_grades_grade__False__1__2__3__4.sql
59:32  
59:32  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 NO-OP=0 TOTAL=1

  
with all_values as (

    select
        grade as value_field,
        count(*) as n_records

    from `bluebirz-playground`.`transform`.`calc_grades`
    group by grade

)

select *
from all_values
where value_field not in (
    1,2,3,4
)

custom generic tests

We can craft our own generic tests¹ by writing test macros in test/generic/ and call it in YAML files.

This time we are testing and marking failed if there are any records having updated_at within date 5th of a month.

  
{% test <test_macro_name>(model, column_name) %}

select {{ column_name }} as target_field 
from {{ model }}
{# add where/join/group/etc. logic to fail the test #}

{% endtest %}

  
models:
  - name: <model_name>
    columns:
      - name: <column_name>
        data_tests:
          - <test macro name>

  
{% test no_update_first_5_days(model, column_name) %}

    with
        validation as (select {{ column_name }} as datetime_field from {{ model }}),
        validation_error as (
            select datetime_field
            from validation
            where extract(day from datetime_field) <= 5
        )

    select *
    from validation_error

{% endtest %}

  
models:
  - name: calc_grades
    config:
      materialized: view
      schema: transform
    columns:
      - name: id
      - name: name
      - name: subject
      - name: grade
      - name: updated_at
        data_tests:
          - no_update_first_5_days

59:30  1 of 1 START test no_update_first_5_days_calc_grades_updated_at ................ [RUN]
59:33  1 of 1 FAIL 1 no_update_first_5_days_calc_grades_updated_at .................... [FAIL 1 in 2.86s]
59:33  
59:33  Finished running 1 test in 0 hours 0 minutes and 7.03 seconds (7.03s).
59:33  
59:33  Completed with 1 error, 0 partial successes, and 0 warnings:
59:33  
59:33  Failure in test no_update_first_5_days_calc_grades_updated_at (models/students/calc_grades.yml)
59:33    Got 1 result, configured to fail if != 0
59:33  
59:33    compiled code at target/compiled/dbt_proj01/models/students/calc_grades.yml/no_update_first_5_days_calc_grades_updated_at.sql
59:33  
59:33  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 NO-OP=0 TOTAL=1

  
    with
        validation as (select updated_at as datetime_field from `bluebirz-playground`.`transform`.`calc_grades`),
        validation_error as (
            select datetime_field
            from validation
            where extract(day from datetime_field) <= 5
        )

    select *
    from validation_error

Singular tests

Singular tests² are custom queries to verify data in our own way.

This time we are testing if there are any grade in a subject is 1 or below then display id and name of every students who have grades in the subjects.

We are writing our query stem_under_grade_2.sql in tests/ directory and execute dbt test.

  
select id, name
from {{ ref("calc_grades") }}
where subject in ('biology', 'physics', 'computer', 'math')
qualify min(grade) over (partition by subject) <= 1

18:55  1 of 1 START test stem_under_grade_2 ........................................... [RUN]
18:58  1 of 1 FAIL 3 stem_under_grade_2 ............................................... [FAIL 3 in 3.01s]
18:58  
18:58  Finished running 1 test in 0 hours 0 minutes and 6.81 seconds (6.81s).
18:58  
18:58  Completed with 1 error, 0 partial successes, and 0 warnings:
18:58  
18:58  Failure in test stem_under_grade_2 (tests/stem_under_grade_2.sql)
18:58    Got 3 results, configured to fail if != 0
18:58  
18:58    compiled code at target/compiled/dbt_proj01/tests/stem_under_grade_2.sql
18:58  
18:58  Done. PASS=0 WARN=0 ERROR=1 SKIP=0 NO-OP=0 TOTAL=1

  
    select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (
      
    
  select id, name
from `bluebirz-playground`.`transform`.`calc_grades`
where subject in ('biology', 'physics', 'computer', 'math')
qualify min(grade) over (partition by subject) <= 1
  
  
      
    ) dbt_internal_test

Wrap up

commands:
- dbt test to execute tests.
- dbt build to execute tests and build models.
generic tests:
- configure in YAML files; models, sources, seeds, and snapshots.
- not_null to see if there are null values.
- unique to find duplicates.
- relationships to verify if value references are missing.
- accepted_values to compare if values are in the specific list.
- create custom generic tests by a new query file in tests/generic/ directory with the macro syntax {% test <test_macro_name>(model, column_name) %}...{% endtest %}.
singular tests:
- create new query files in tests/ directory.

References

data, data engineering

This post is licensed under CC BY 4.0 by the author.