Let's try: dbt part 6 - snapshots and analyses
Now we can create snapshots to track history and analyses to observe data
Snapshots
Snapshots are like we “freeze” data at a specific time. They are useful to clearly display how the data was at a point of time or how they are changed over time or just backup.
There are several methods and tools to do so. For example, BigQuery allows us to:
query data at a point of time with
SYSTEM_TIME1 like this:1 2 3
SELECT * FROM `project.dataset.table` FOR SYSTEM_TIME AS OF "2025-01-01"
- create BigQuery table snapshots2 to be a new table holding the target data table back to the specific
SYSTEM_TIME. or even restore the deleted BigQuery tables using time travel3 like this:
1 2
epoch=$(date -d '2025-01-01 00:00:00' +%s000) bq --project_id=my_project cp dataset.table@$epoch dataset.restored_table
And now we are using dbt snapshots to create a new table with SCD type 24.
Prepare data source
Let’s say that we have this source definition of salary data:
1
2
3
4
5
6
7
8
9
10
11
12
version: 2
sources:
- name: salary
schema: raw
tables:
- name: salary
columns:
- name: id
- name: name
- name: salary
- name: updated_at
And the data at the initial and after change are as below:
We have 15 rows of people’s salary.
We have updated salary at row #13-#15 on the same
updated_atdate.
Configurations
To create snapshots5 with dbt, we can define strategy by 2 choices.
strategy='timestamp'
This strategy is great and recommended for tables having an updated timestamp column or similar to compare and determine which rows have been updated based on the updated timestamp.strategy='check'
This strategy will check the specific columns (or all columns) to find which rows are changed. This one is useful in case of no updated timestamp field with a trade-off to compute multiple columns to find differences.
Then we can start writing configurations:
- dbt version 1.8 or earlier, refer to legacy6 configurations in Jinja blocks (not cover in this blog).
- dbt version 1.9 or later, we are writing YAML files in this structure.
1
2
3
4
5
6
7
8
9
snapshots:
- name: <snapshot_name>
relation: <source() or ref()>
description: <description>
config:
strategy: <timestamp|check>
unique_key: <unique_key_field>
check_cols: <'all'|[column_names] when strategy='check'>
updated_at: <updated timestamp field when strategy='timestamp'>
Execution
We run the command below to build snapshots.
1
dbt snapshot
Execute strategy: timestamp
First we can try building snapshots with timestamp strategy.
1 2 3 4 5 6 7 8 9
snapshots: - name: snapshot-salary-timestamp relation: source('salary', 'salary') config: target_schema: snapshots alias: salary_ts strategy: timestamp unique_key: id updated_at: updated_at
We are using
timestampstrategy on the sourcesalarywhich hasidas a unique_key andupdated_atas an updated timestamp field.At the first time running
dbt snapshot, we can see a new table looks like the source with additional fields:dbt_scd_idis a dbt internal unique key.dbt_updated_atderives from originalupdated_atfield and will be used internally by dbt.dbt_valid_fromderives fromupdated_atas well and is implied that “this row has been valid since this time”.dbt_valid_toisnullimplying that “this row’s still valid then it’snull”.
We can see that the last 3 rows having values in
dbt_valid_to. This means those rows are no longer valid then look at the first 3 rows which:- have same
idwhose salary are changed. - have
dbt_valid_fromreflecting the value ofdbt_updated_atandupdated_at. - have
dbt_valid_toasnullto state themselves as valid.
- have same
Execute strategy: check
Now try check strategy for snapshots.
1 2 3 4 5 6 7 8 9
snapshots: - name: snapshot-salary-check relation: source('salary', 'salary') config: target_schema: snapshots alias: salary_chk strategy: check unique_key: id check_cols: all
Implementing
checkstrategy requiresunique_keyliketimestampstrategy but withcheck_colsto be eitherallto check all columns or a list of columns in case we don’t need to check every columns.At the first time running this snapshot, we can see additional fields similar to
timestampstrategy except…dbt_updated_atis the execution time because dbt treats there is no updated timestamp when we usecheckstrategy.dbt_valid_fromalso is the execution time.
In the same behavior as
timestampstrategy exceptdbt_updated_at,dbt_valid_fromanddbt_valid_toare execution time.
Analyses
Analyses7 are ad-hoc query scripts, for instance we want to find data as of a specific date time or observe some insights. We don’t want to make them the models because we just want to check something then we can create analyses.
Developing analyses is just creating a statement file inside analyses directory like this.
1
2
3
4
select *
from {{ source("salary", "salary") }} s
where s.salary >= 5000
And dbt compile to see what we get.
1
2
3
4
Compiled node 'verify-data' is:
select *
from `bluebirz-playground`.`raw`.`salary` s
where s.salary >= 5000
After that, we can copy-paste the compiled statement to run in databases.
In BigQuery we can execute bq query directly from the compiled file located in target/compiled/ as below.
1
2
3
4
5
6
7
8
9
10
11
12
13
$ bq query --use_legacy_sql=false \
"$(cat target/compiled/<dbt_project>/analyses/verify-data.sql)"
+-----+---------+--------+---------------------+
| id | name | salary | updated_at |
+-----+---------+--------+---------------------+
| 210 | Louis | 5796 | 2025-01-03 05:05:31 |
| 214 | Kristen | 5850 | 2025-01-14 13:29:28 |
| 212 | Angela | 5063 | 2025-01-18 16:50:52 |
| 209 | Tucker | 5946 | 2025-01-25 23:04:29 |
| 205 | Hillary | 5959 | 2025-01-31 12:35:46 |
| 211 | Wallace | 6090 | 2025-02-01 09:00:00 |
+-----+---------+--------+---------------------+
Wrap up
- Snapshots
- use to track change history.
- define snapshots in YAML files if dbt version is 1.9 or newer, otherwise write Jinja blocks.
- recommend
timestampstrategy for checking onlyupdated_atfield, orcheckstrategy if noupdated_atfield in the table. - supply target table in
relationand itsunique_keyto build snapshot on. dbt snapshotto build snapshot tables.
- Analyses
- use to run ad-hoc queries such as data observation or one-time analytics.
- create SQL file in
analysesdirectory dbt compileand find compiled statements intarget/compiled/<dbt_project>/analyses/to run in databases.












