Let's try: dbt part 5 - from Jinja to macros and hooks
One of core features is controlling dbt flows with Jinja and write macros and hooks
Not only SQL that we can execute in dbt, one of the core features is Jinja and in this blog we are controlling the flow by writing macros and configure hooks.
Jinja
I have the blog Let’s try: Jinja2 about Jinja and Python. Let’s recap one more time right here.
Comments
A comment must be written in {# ... #}.
1
{# this is a comment #}
Statements
A statement controls program flow and is written in {% ... %}.
Variables
Use set to declare and initiate a variable.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{% set var_a = 1 %}
{% set str = "abcd" %}
{% set is_empty = false %}
{% set arr = ['a', 'b', 'c'] %}
{% set tuple = ('a', 'b') %}
{% set dictionary = dict(a=1, b=2, c=3, d=[4, 5]) %}
{# or {'a': 1, 'b': 2, 'c': 3, 'd': [4, 5]} #}
{% set dict_arr = [dict(a=1), dict(b=2), dict(c=3), dict(d=[4, 5])] %}
{# or [{'a' : 1}, {'b': 2}, {'c': 3}, {'d': [4, 5]}] #}
{% set multiline_text %}
Lorem ipsum dolor sit amet consectetur adipiscing elit.
Quisque faucibus ex sapien vitae pellentesque sem placerat.
In id cursus mi pretium tellus duis convallis.
Tempus leo eu aenean sed diam urna tempor.
{% endset %}
If-else structure
Simply put if-elif-else and finish with endif.
1
2
3
4
5
6
7
{% if <statement> %}
...
{% elif <statement> %}
...
{% else %}
...
{% endif %}
For-loop
Simply put for-endfor as well.
1
2
3
{% for element in <iterator> %}
...
{% endfor %}
When iterate over dictionary, we can use .items() to extract key-value.
1
2
3
4
5
{% for dict in <dict_iterator> %}
{% for key, value in dict.items() %}
...
{% endfor %}
{% endfor %}
Expressions
An expression is an output. We refer them by {{ ... }}.
1
{{ var_a }}
Macros
Macros act like functions that we can call them to execute specific tasks such as querying latest run date from reference tables or inserting some records into log table.
We have written one macro generate_schema_name() before in Let’s try: dbt part 3 - seed and source for creating schema with plain input name instead of concatenation as below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
{% macro generate_schema_name(custom_schema_name, node) -%}
{%- set default_schema = target.schema -%}
{%- if custom_schema_name is none -%}
{{ default_schema }}
{%- else -%}
{{ custom_schema_name | trim }}
{%- endif -%}
{%- endmacro %}
The macro above is an overriding by just building the same macro name and add our desired behavior. The built-in macros in dbt can be found here that I came across the Stackoverflow forum.
Develop a macro
To develop a macro, we start a Jinja structure below.
1
2
3
4
5
{% macro <macro_name>(<argument_1>, <argument_2>, ... ) -%}
{# code stub #}
...
{{ return(<variable>) }} {# `return` if needed #}
{%- endmacro %}
And we also can define the property file1 like this.
1
2
3
4
5
6
7
macros:
- name: <macro_name>
description: <macro_description>
arguments:
- name: <argument_name>
type: <argument_type>
description: <argument_description>
Use a macro
We can simply call a macro with an expression ({{ ... }}) or defining a new variable for the return with a statement ({% ... %}) as examples below.
1
2
3
4
5
{# call a macro with expression #}
{{ my_macro(param01, param02) }}
{# call a macro with statement and get return value #}
{% set some_var = my_compute_macro(param01, param02, param03) %}
Example macro
Let’s say that we have this log table.
And we want a macro to find the data in the reference table as of the last date in the log table.
Then we write this macro property file.
1
2
3
4
5
6
7
macros:
- name: get_last_date_from_log
description: get last date from log table
arguments:
- name: ref_table
type: string
description: reference table to be searched in this log table
And develop the macro script as follows.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
{% macro get_last_date_from_log(ref_table) %}
{% set log_table = "raw.logs" %}
{% set query %}
SELECT MAX(process_date) as last_process_date
FROM `{{ log_table }}`
WHERE true
AND process_name = "{{ ref_table }}"
AND is_successful IS TRUE
{% endset %}
{% if execute %}
{{ print("query = " + query) }}
{% set query_result = run_query(query) %}
{{ log(query_result.columns["last_process_date"].values()) }}
{{ return(query_result.columns[0].values()[0]) }}
{% endif %}
{% endmacro %}
- line 1: start from
{% macro <macro_name>(<params>) %}whereparamsis justref_table. - line 2: I want to have a variable
log_tableto be the name of log table. - line 3: define
queryto be a statement to query log table with{% set query %}. - line 5,7: substitute
{{ log_table }}and{{ ref_table }}in thequery. - line 9: complete defining
querywith{% endset %}. - line 11: apply
{% if execute %}to be run when it is execute2 mode which means we’re building/compiling/running the model, not the steps generating the DAGs. - line 12:
{{ print() }}to print3 out to console. - line 13: query with
run_query4 and save result intoquery_result. - line 14:
query_resultis typeTable5 and we canlog()6 to write it into the log file to see the value of columnlast_process_date. - line 15: finally return the date with
{{ return() }}. - line 17: end the macro with
{% endmacro %}.
Now we can make use of this macro as below:
1
2
3
4
{%- set last_date = get_last_date_from_log(this.table) -%}
select *
from {{ source("delhi_climate", "daily_delhi_climate") }}
where date = "{{ last_date }}"
We are just call the macro and receive its returned value by a statement {% set var = macro(params) %}. And this.table means the table name of this model, in this case it’s the file name climate-stat-from-log.
this7 is also a dbt Jinja function.
And we can try compiling to see result.
When we
dbt compilewe can see the query that weprint("query = " ...)and the compiled model as below.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
22:42:51 Running with dbt=1.10.13 22:42:53 Registered adapter: bigquery=1.10.2 22:42:53 Found 6 models, 7 seeds, 4 data tests, 2 sources, 510 macros 22:42:53 22:42:53 Concurrency: 1 threads (target='dev') 22:42:53 query = SELECT MAX(process_date) as last_process_date FROM `raw.logs` WHERE true AND process_name = "climate-stat-from-log" AND is_successful IS TRUE Compiled node 'climate-stat-from-log' is: select * from `bluebirz-playground`.`raw`.`daily_delhi_climate` where date = "2016-11-10"When we check the log at
logs/dbt.log, We can also see what weprint()and the date that welog()at the second to last line.1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
23:42:55.769840 [debug] [Thread-1 (]: Began compiling node model.dbt_proj01.climate-stat-from-log 23:42:55.780263 [info ] [Thread-1 (]: query = SELECT MAX(process_date) as last_process_date FROM `raw.logs` WHERE true AND process_name = "climate-stat-from-log" AND is_successful IS TRUE 23:42:55.789551 [debug] [Thread-1 (]: On model.dbt_proj01.climate-stat-from-log: /* {"app": "dbt", "dbt_version": "1.10.13", "profile_name": "dbt_proj01", "target_name": "dev", "node_id": "model.dbt_proj01.climate-stat-from-log"} */ SELECT MAX(process_date) as last_process_date FROM `raw.logs` WHERE true AND process_name = "climate-stat-from-log" AND is_successful IS TRUE 23:42:55.790241 [debug] [Thread-1 (]: Opening a new connection, currently in state closed 23:42:56.422744 [debug] [Thread-1 (]: BigQuery adapter: https://console.cloud.google.com/bigquery?project=<project>&j=<job_id>&page=queryresults 23:42:57.036284 [debug] [Thread-1 (]: (datetime.date(2016, 11, 10),) 23:42:57.038868 [debug] [Thread-1 (]: Writing injected SQL for node "model.dbt_proj01.climate-stat-from-log"
These functions above are parts of dbt Jinja functions8.
Hooks
Hooks9 are commands automatically executing before or after running models. They’re frequently used for chain-execution e.g. stamping the record and time of which model is about to run or run successfully.
pre-hook: executed before building models, seeds, or snapshots.post-hook: executed after building models, seeds, or snapshots.on-run-start: executed at the start ofdbtcommands.on-run-end: executed at the end ofdbtcommands.
Define hooks
We can define pre-hook and post-hook in either dbt_project.yml, model files, or properties.yml. But on-run-start and on-run-end need to be configured in dbt_project.yml.
1 2 3 4 5 6
dbt_directory ├── dbt_project.yml └── models └─── <model_name> ├── property.yml └── model_file.sql1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
models: <project_name>: +pre-hook: - "<pre-hook statement for models>" +post-hook: - "<post-hook statement for models>" seeds: <project_name>: +pre-hook: - "<pre-hook statement for seeds>" +post-hook: - "<post-hook statement for seeds>" snapshots: <project_name>: +pre-hook: - "<pre-hook statement for snapshots>" +post-hook: - "<post-hook statement for snapshots>" on-run-start: - "<on-run-start statement>" on-run-end: - "<on-run-end statement>"
1 2 3 4 5 6 7
{{- config( pre_hook=["<statement>"], post_hook=["<statement>"], ) -}} SELECT ...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
models: - name: [<model_name>] config: pre_hook: - "<pre-hook statement for models" post_hook: - "<post-hook statement for models" seeds: - name: [<seed_name>] config: pre_hook: - "<pre-hook statement for seeds" post_hook: - "<post-hook statement for seeds" snapshots: - name: [<snapshot_name>] config: pre_hook: - "<pre-hook statement for snapshots" post_hook: - "<post-hook statement for snapshots"
Example hooks
We just want to print out each hook is executed and we can see the result.
Just print out which is executed.
1 2 3 4 5
on-run-start: - "{{ print('on-run-start executed') }}" on-run-end: - "{{ print('on-run-end executed') }}"
I reuse the model from macro section above to add hooks right here to print out which hook is executed.
1 2 3 4 5 6 7 8 9 10
{{- config( pre_hook=["{{ print('pre-hook executed') }}"], post_hook=["{{ print('post-hook executed') }}"], ) -}} {%- set last_date = get_last_date_from_log(this.table) -%} select * from {{ source("delhi_climate", "daily_delhi_climate") }} where date = "{{ last_date }}"
When
dbt run, we can see these texts respectively:on-run-start executedpre-hook executedpost-hook executedon-run-end executed
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
22:19:40 Running with dbt=1.10.13 22:19:42 Registered adapter: bigquery=1.10.2 22:19:42 Found 6 models, 7 seeds, 2 operations, 4 data tests, 2 sources, 510 macros 22:19:42 22:19:42 Concurrency: 1 threads (target='dev') 22:19:42 on-run-start executed 22:19:45 1 of 1 START hook: dbt_proj01.on-run-start.0 ................................... [RUN] 22:19:45 1 of 1 OK hook: dbt_proj01.on-run-start.0 ...................................... [OK in 0.02s] 22:19:45 22:19:45 1 of 1 START sql table model dbt_test_dataset.climate-stat-from-log ............ [RUN] query = SELECT MAX(process_date) as last_process_date FROM `raw.logs` WHERE true AND process_name = "climate-stat-from-log" AND is_successful IS TRUE pre-hook executed post-hook executed 22:19:49 1 of 1 OK created sql table model dbt_test_dataset.climate-stat-from-log ....... [CREATE TABLE (1.0 rows, 2.4 KiB processed) in 4.87s] 22:19:49 on-run-end executed 22:19:49 1 of 1 START hook: dbt_proj01.on-run-end.0 ..................................... [RUN] 22:19:49 1 of 1 OK hook: dbt_proj01.on-run-end.0 ........................................ [OK in 0.00s] 22:19:49 22:19:49 Finished running 2 project hooks, 1 table model in 0 hours 0 minutes and 7.31 seconds (7.31s). 22:19:50 22:19:50 Completed successfully 22:19:50 22:19:50 Done. PASS=3 WARN=0 ERROR=0 SKIP=0 NO-OP=0 TOTAL=3
Wrap up
- Jinja
- comment:
{# ... #} - variable:
{% set var = ... %}or{% set var %} ... {% endset %} - if-else:
{% if bool %} ... {% elif bool %} ... {% else %} ... {% endif %} - for-loop:
{% for element in iterator %} ... {% endfor %} - for-loop over dictionary:
{% for key, value in dict.items() %} ... {% endfor %} - expression:
{{ ... }}
- comment:
- Macros
- define macro:
{% macro <name>(params) %} ... {% endmacro %} - use macro by
{{ macro(params) }}or{% set var = macro(params) %} executeis true when it’s execute mode that is to build/compile/run the models.{{ print() }}to print out to console.{{ run_query() }}to execute the query with the project adapter.- After
run_query(), theTableobject will be returned and we can unpack the values. {{ log() }}to write into log file atlogs/dbt.log.{{ return() }}to return valuethisrefers to the current model andthis.tableis the table name of the current model.
- define macro:
- Hooks
- set
on-run-startoron-run-endindbt_project.ymlto trigger whendbtcommands start/end. - set
pre_hookorpost_hookto trigger before/after the specific model/seed/snapshot is built.
- set


