Let's try: dbt part 1 - prologue
With dbt, we can manage data transformation streamline.
dbt is a tool known very well for data engineers and other data expertises. Then why we have to know this?
What is dbt?
dbt stands for “data build tool”. This tool works great for data transformation in the modern stack.
flowchart LR
src("data source") e1@--> e(("extract")) e2@--> t(("transform")) e3@--> l(("load")) e4@--> dest[("data warehouse")]
e1@{ animation: fast }
e2@{ animation: fast }
e3@{ animation: fast }
e4@{ animation: fast }
style t fill:orange,color:black
dbt provides functionality to control the transformation flow into streamlines. Every tables and views can refer and can be referred to each other. We can make it through SQL and Jinja syntax.
Below is the homepage of dbt.

Advantages
dbt allows us to manage the pipelines by the following features:
- writing just SQL and Jinja is enough to run dbt.
- supports many data warehouses such as Snowflake, BigQuery, Redshift, Postgres, DuckDB and many more.
- organize scripts in a well structure with references in-use so maintaining data streamline is simply comprehensible.
- build documentation so easily.
- able to implement functions —
macros
— to process complex logics and routines. - external libraries are available to install.
- easy to build and run tests.
- open-source and self-host option (dbt core), while dbt cloud is also available.
- great developer community to support.
Concepts
There are several concepts in dbt we should know then we can make use of dbt in full capacity.
Seeds
Seeds are files in CSV format. We can use seeds to load small static data into the data warehouse.
Sources
Sources are the tables or views in the data warehouse. We can use sources to refer in our models.
We can define sources by referring:
database
(equivalent toproject
in BigQuery)schema
(equivalent todataset
in BigQuery)table
(refer to the table or view name)
Models
This is one of core concepts of dbt.
Models are SQL select statements files. One model means one table or one view, by default it’s a view and the file name is used as the table or view name.
A model can configure source, materializations, macros, and hooks into a single file.
Materializations
Materializations are strategies to maintain models. There are 5 types:
- View
- Default materialization of a model.
- This view type is like a masquerade to run a given query over the real tables or other views.
- Best for insignificant transformation such as columns selection, renaming, filtering.
- Table
- Physical table to store data.
- This table type will rebuild the table every time the model runs.
- Best for long-running transformation or heavy computation.
- Incremental
- Data insertion or update to the existing table.
- Needs extra configurations to handle the insert/update logics.
- Best for transactional data or optimizing the processing time.
- Ephemeral
- Functions like CTE (Common Table Expression) in SQL.
- This type will not create any table or view so we can’t select directly from this.
- Best for simple - intermediate transformation or data preparation.
- Materialized views
- Combination of a view and a table.
- This type is supported in some data warehouses such as BigQuery, Redshift, Snowflake.
- Materialized views store the result of a query having joins, aggregations, and windows functions beforehand and can be refreshed periodically.
- Best for optimizing the performance of complex queries.
Snapshots
Snapshots are models to keep historical records of a table. We can use snapshots to track changes in the source data over time.
Macros
macros are functions written in Jinja. We can use macros to handle complex logics or routines.
For example:
- A macro to compute latest timestamp of a table in order to run incremental models.
- A macro to add run logs to a logging table.
Hooks
Hooks are SQL statements that run before or after models run. We can use hooks to prepare the environment or clean up the resources.
Analyses
Analyses are SQL select statements files like models but they are not part of the transformation flow. We can use analyses for ad-hoc queries or data exploration.
Tests
dbt offers two ways of tests:
- Generic tests
- May be known as schema tests.
- Predefined tests such as
unique
,not_null
,accepted_values
,relationships
and more. - We can configure these tests in a YAML file.
- Singular tests
- May be known as data tests.
- Custom SQL select statements that return zero rows to pass the tests.
- We can create singular tests in SQL file inside the
tests
directory.
Jinja
dbt runs on Python environment, uses SQL to run queries for data transformation, and uses Jinja to control the flow, define variables, calling macros, etc.
Read my blog about Jinja below:
Packages
Packages are external libraries to extend the functionality of dbt. There are many packages available in dbt Hub.
So this is all for a first chapter of my new dbt series. Stay tuned for the new one.