Let's try: dbt part 6 - snapshots and analyses

Now we can create snapshots to track history and analyses to observe data

data, data engineering

Posted Nov 25, 2025

6 min read

dbt-labs/dbt-core

Let's try: dbt part 6 - snapshots and analyses

In this series

Let's try: dbt part 1 - prologue
Let's try: dbt part 2 - first start
Let's try: dbt part 3 - seed and source
Let's try: dbt part 4 - model materialization
Let's try: dbt part 5 - from Jinja to macros and hooks
Let's try: dbt part 6 - snapshots and analyses

Snapshots

Snapshots are like we “freeze” data at a specific time. They are useful to clearly display how the data was at a point of time or how they are changed over time or just backup.

There are several methods and tools to do so. For example, BigQuery allows us to:

query data at a point of time with SYSTEM_TIME¹ like this:

  
  SELECT *
  FROM `project.dataset.table`
  FOR SYSTEM_TIME AS OF "2025-01-01"

create BigQuery table snapshots² to be a new table holding the target data table back to the specific SYSTEM_TIME.

or even restore the deleted BigQuery tables using time travel³ like this:

  
  epoch=$(date -d '2025-01-01 00:00:00' +%s000)
  bq --project_id=my_project cp dataset.table@$epoch dataset.restored_table

And now we are using dbt snapshots to create a new table with SCD type 2⁴.

Prepare data source

Let’s say that we have this source definition of salary data:

  
version: 2

sources:
  - name: salary
    schema: raw
    tables:
      - name: salary
        columns:
          - name: id
          - name: name
          - name: salary
          - name: updated_at

And the data at the initial and after change are as below:

We have 15 rows of people’s salary.
We have updated salary at row #13-#15 on the same updated_at date.

Configurations

To create snapshots⁵ with dbt, we can define strategy by 2 choices.

strategy='timestamp'
This strategy is great and recommended for tables having an updated timestamp column or similar to compare and determine which rows have been updated based on the updated timestamp.
strategy='check'
This strategy will check the specific columns (or all columns) to find which rows are changed. This one is useful in case of no updated timestamp field with a trade-off to compute multiple columns to find differences.

Then we can start writing configurations:

dbt version 1.8 or earlier, refer to legacy⁶ configurations in Jinja blocks (not cover in this blog).
dbt version 1.9 or later, we are writing YAML files in this structure.

  
snapshots:
  - name: <snapshot_name>
    relation: <source() or ref()>
    description: <description>
    config:
      strategy: <timestamp|check>
      unique_key: <unique_key_field>
      check_cols: <'all'|[column_names] when strategy='check'>
      updated_at: <updated timestamp field when strategy='timestamp'>

Execution

We run the command below to build snapshots.

dbt snapshot

Execute strategy: timestamp

First we can try building snapshots with timestamp strategy.

  
snapshots:
  - name: snapshot-salary-timestamp
    relation: source('salary', 'salary')
    config:
      target_schema: snapshots
      alias: salary_ts
      strategy: timestamp
      unique_key: id
      updated_at: updated_at

We are using timestamp strategy on the source salary which has id as a unique_key and updated_at as an updated timestamp field.

At the first time running dbt snapshot, we can see a new table looks like the source with additional fields:
- dbt_scd_id is a dbt internal unique key.
- dbt_updated_at derives from original updated_at field and will be used internally by dbt.
- dbt_valid_from derives from updated_at as well and is implied that “this row has been valid since this time”.
- dbt_valid_to is null implying that “this row’s still valid then it’s null”.
We can see that the last 3 rows having values in dbt_valid_to. This means those rows are no longer valid then look at the first 3 rows which:
- have same id whose salary are changed.
- have dbt_valid_from reflecting the value of dbt_updated_at and updated_at.
- have dbt_valid_to as null to state themselves as valid.

Execute strategy: check

Now try check strategy for snapshots.

  
snapshots:
  - name: snapshot-salary-check
    relation: source('salary', 'salary')
    config:
      target_schema: snapshots
      alias: salary_chk
      strategy: check
      unique_key: id
      check_cols: all

Implementing check strategy requires unique_key like timestamp strategy but with check_cols to be either all to check all columns or a list of columns in case we don’t need to check every columns.

At the first time running this snapshot, we can see additional fields similar to timestamp strategy except…
- dbt_updated_at is the execution time because dbt treats there is no updated timestamp when we use check strategy.
- dbt_valid_from also is the execution time.
In the same behavior as timestamp strategy except dbt_updated_at, dbt_valid_from and dbt_valid_to are execution time.

Analyses

Analyses⁷ are ad-hoc query scripts, for instance we want to find data as of a specific date time or observe some insights. We don’t want to make them the models because we just want to check something then we can create analyses.

Developing analyses is just creating a statement file inside analyses directory like this.

  
select * 
from {{ source("salary", "salary") }} s 
where s.salary >= 5000

And dbt compile to see what we get.

Compiled node 'verify-data' is:
select * 
from `bluebirz-playground`.`raw`.`salary` s 
where s.salary >= 5000

After that, we can copy-paste the compiled statement to run in databases.

In BigQuery we can execute bq query directly from the compiled file located in target/compiled/ as below.

  
$ bq query --use_legacy_sql=false \
  "$(cat target/compiled/<dbt_project>/analyses/verify-data.sql)"
+-----+---------+--------+---------------------+
| id  |  name   | salary |     updated_at      |
+-----+---------+--------+---------------------+
| 210 | Louis   |   5796 | 2025-01-03 05:05:31 |
| 214 | Kristen |   5850 | 2025-01-14 13:29:28 |
| 212 | Angela  |   5063 | 2025-01-18 16:50:52 |
| 209 | Tucker  |   5946 | 2025-01-25 23:04:29 |
| 205 | Hillary |   5959 | 2025-01-31 12:35:46 |
| 211 | Wallace |   6090 | 2025-02-01 09:00:00 |
+-----+---------+--------+---------------------+

Wrap up

Snapshots
- use to track change history.
- define snapshots in YAML files if dbt version is 1.9 or newer, otherwise write Jinja blocks.
- recommend timestamp strategy for checking only updated_at field, or check strategy if no updated_at field in the table.
- supply target table in relation and its unique_key to build snapshot on.
- dbt snapshot to build snapshot tables.
Analyses
- use to run ad-hoc queries such as data observation or one-time analytics.
- create SQL file in analyses directory
- dbt compile and find compiled statements in target/compiled/<dbt_project>/analyses/ to run in databases.

References