dbt - Documentation


profiles.yml defines how dbt connects to your data warehouse(s). By default, dbt expects this file to be located at ~/.dbt/profiles.yml, but this location can be changed on the command line with the --profiles-dir option.

Your profile is specific to you, and contains sensitive information about your warehouse. Never check your profile into your dbt project's git repository, or share your profile with others.

# This configuration file specifies information about connections to
# your data warehouse(s). The file contains a series of "profiles."
# Profiles specify database credentials and connection information
# By default, dbt looks for this file in ~/.dbt/profiles.yml. That option
# can be configured when dbt is invoked with the --profiles-dir option:
#  $ dbt run --profiles-dir /opt/dbt/

# Profiles should adhere to the structure defined below:
# [profile-name]:
#   outputs:
#     [target-name]:
#       type: {redshift, postgres}
#       threads: [1 - 8]
#       host: [host ip or fully qualified domain name]
#       port: [port]
#       user: [user]
#       pass: [password]
#       dbname: [dbname]
#       schema: [schema name]
#   target: [target-name]
# Commonly, it's helpful to define multiple targets for a profile. For example,
# these targets might be `dev` and `prod`. Whereas the `dev` target points to
# a development schema (eg. dbt_dev), the `prod` schema should point to the
# prod schema (eg. analytics). Analytical/BI tools should point to the
# prod schema so that local development does not interfere with analysis.
# In practice, a profile might look like this:

        dev:                      # specify the dev connection
            type: redshift
            threads: 1
            port: 5439
            user: elliot
            pass: pa55word
            dbname: warehouse
            schema: dbt_elliot    # use the dev schema
        prod:                     # specify the prod connection
            type: redshift
            threads: 1
            port: 5439
            user: elliot
            pass: pa55word
            dbname: warehouse
            schema: analytics     # use the prod schema instead
        snowflake:                # specify the snowflake connection
            type: snowflake
            threads: 1
            account: evilcorp     # the url prefix for your snowflake connection,
                                  # i.e. evilcorp.snowflakecomputing.com
            user: elliot
            password: pa55word
            role: SYSADMIN        # optional, the snowflake role you want to use
                                  # when connecting
            database: db
            warehouse: warehouse
            schema: analytics     # use the prod schema instead

    target: dev                   # default target is dev unless changed at run time

        dev:                      # specify the dev connection
            type: postgres
            threads: 1
            port: 5439
            user: mr_robot
            pass: password1
            dbname: warehouse
            schema: dbt_mr_robot  # use the dev schema
        prod:                     # specify the prod connection
            type: postgres
            threads: 1
            port: 5439
            user: mr_robot
            pass: password1
            dbname: warehouse
            schema: analytics     # use the prod schema instead
    target: dev                   # default target is dev unless changed at run time

# You can switch between profiles and targets on the command line. All of the
# following are valid ways to invoke dbt run/test/compile/etc
# $ dbt run --profile evil-corp
# $ dbt run --profile evil-corp --target dev
# $ dbt run --profile evil-corp --target prod
# $ dbt run --profile mr-robot
# $ dbt run --profile mr-robot --target dev
# $ dbt run --profile mr-robot --target prod

When you create a dbt project with dbt init, dbt creates a file in the root of the project called dbt_project.yml. This file contains the configuration for the project. You can see an exhaustive list of the fields available below.

# This configuration file specifies information about your package
# that dbt needs in order to build your models. This file is _required_
# For more information, consult:
# http://dbt.readthedocs.io/en/master/guide/configuring-models/

# Package Configuration

# name: Required. This is the name used to reference your package in the configs
# below. Package names must only contain letters and underscores
name: 'your_package_name'

# version: Required. This indicates the current version of your package and
# should conform to semantic versioning. The field is currently unused
version: '0.0.1'

# File Path Configurations

# The following configs specify directories that dbt uses to operate. All
# paths are specified relative to the path of dbt_project.yml

# source-paths: Required. Specify which path(s) dbt should look in to find
# models. Models are text files with a .sql file extension.
source-paths: ["models"]

# analysis-paths: Optional. Specify which path(s) dbt should look in to find
# analytical queries. These queries are compiled, but not executed, by dbt.
analysis-paths: ["analysis"]

# target-path: Required. Specify which path dbt should write compiled SQL to.
target-path: "target"

# test-paths: Optional. Specify which path(s) dbt should look in to find data
# test definitions.
test-paths: ["test"]

# data-paths: Optional. Specify which path(s) dbt should look in to find CSV
# files. Running `dbt seed` will load these CSVs as tables in your warehouse
data-paths: ["data"]

# macro-paths: Optional. Specify which path(s) dbt should look in to find
# macros. These macros will be globally available to all models in your project
macro-paths: ['macros']

# log-path: Optional. Specify which path dbt should write debug logs to.
log-path: "logs"

# clean-targets: Optional. Specify which path(s) should be removed by the
# clean task. Run `dbt clean` to delete these directories
clean-targets: ["target", "dbt_modules"]

# Connection Configurations

# profile: Required. This config specifies which profile dbt should use to
# connect to your data warehouse. The provided value must exist in the
# profiles.yml file.
profile: "evil-corp"

# Model Configurations

# The following section contains configurations that define how your models are
# instantiated by dbt. The `models` config below can specify options for both
# your package, and any packages included as dependencies.
# Options are specified on a per-package, per-directory, and per-model basis.
# The configs are inherited, so configs applied to a package can be overridden
# for directories and models contained within that package.
# The configuration structure within a package should mirror the directory
# structure within that package. The example configs provided below are based
# on the following directory structure.
# dbt_project.yml
# models/
# ├── adwords
# │   └── adwords_ads.sql
# └── snowplow
#     ├── base
#     │   └── snowplow_events.sql
#     └── snowplow_sessions.sql

    enabled: true        # configs defined here are applied to _all_ packages
    materialized: view   # but can be overridden in more specific configs below

    # pre- and post- hooks can be defined anywhere within the model hierarchy.
    # when defined at the root level (as they are here), they apply to all models
    # in all packages. These hooks are compiled into the model SQL and run either
    # before or after the model is materializated.
        - "insert into audit (model, state, time) values ('{{this.name}}', 'start', getdate())"

        - "grant select on {{this}} to user_1"
        - "insert into audit (model, state, time) values ('{{this.name}}', 'end', getdate())"

    # This is the configuration for the models in your local package. The name
    # `your_package_name` is defined above.
        # Applies to all SQL files found under ./models/adwords/
            enabled: false
            # Applies to the specific model ./models/adwords/adwords_ads.sql
                enabled: true
                materialized: table

        # Applies to all SQL files found under ./models/snowplow/
            # Applies to all SQL files found under ./models/snowplow/base
                # Applies to model ./models/snowplow/base/snowplow_events.sql
                    materialized: table
                    sort: ['timestamp', 'userid']
                    sort_type: interleaved
                    dist: 'userid'

            # Applies to model ./models/snowplow/snowplow_sessions.sql
                materialized: incremental
                sql_where: "created_at > (select max(created_at) from {{ this }})"
                unique_key: "id"
                sort: "timestamp"
                dist: "user_id"

    # This is the configuration for the "quickbooks" open source package
    # which is included in the `repositories` section below.
    # These configs override those defined within the quickbooks package
            materialized: ephemeral
            materialized: view

# Run Start & Complete Hooks

# Like the pre- and post- hooks above, the on-run-start and on-run-end configs
# make it possible to run SQL at the very beginning, and very end of a dbt run.
    - "create table if not exists audit (model text, state text, time timestamp)"

    - 'grant usage on schema "{{ target.schema }}" to db_reader'
    - 'grant select on all tables in schema "{{ target.schema }}" to db_reader'

# Package configurations

# repositories: Optional. Contains a list of packages (git repositories) that
# should be downloaded and included in this project. When a package is
# included, its models are added to the dbt model graph and executed with
# `dbt run`. These configs can be specified above (as with quickbooks).
# Each repository should be a git-cloneable url. Branches and Tags can be
# specified by adding @branch-or-version to the end of the url, ex:
# - https://github.com/fishtown-analytics/quickbooks
# - https://github.com/fishtown-analytics/quickbooks@v0.1.0
# - https://github.com/fishtown-analytics/quickbooks@master

  - https://github.com/fishtown-analytics/quickbooks

# Archival

# Data archival is a powerful feature intended for advanced dbt users. For more
# information, consult: https://dbt.readme.io/docs/archival

dbt run executes compiled sql model files against the current target database. dbt connects to the target database and runs the relevant SQL required to materialize all data models using the specified materialization strategies. Models are run in the order defined by the dependency graph generated during compilation. Intelligent multi-threading is used to minimize execution time without violating dependencies.

Deploying new models frequently involves destroying prior versions of these models. In these cases, dbt run minimizes the amount of time in which a model is unavailable by first building each model with a temporary name, then dropping the existing model, then renaming the model to its correct name. The drop and rename happen within a single database transaction for database adapters that support transactions.

Specifying models to run

By default, dbt run will execute all of the models in the dependency graph. During development (and deployment), it is useful to specify only a subset of models to run. Use the --models flag with dbt run to select a subset of models to run. Note that the following arguments (--models and --exclude) also apply to dbt test!

The --models flag accepts one or more arguments. Each argument can be one of:

  1. a package name
  2. a model name
  3. a path hierarchy to a models directory


dbt run --models my_dbt_project_name   # runs all models in your project
dbt run --models my_dbt_model          # runs a specific model
dbt run --models path.to.my.models     # runs all models in a specific directory
dbt run --models my_package.some_model # run a specific model in a specific package

# multiple arguments can be provided to --models
dbt run --models my_project other_project

# these arguments can be projects, models, or directory paths
dbt run --models my_project path.to.models a_specific_model

Additionally, arguments to --models can be modified with the + and * operators. If placed at the front of the model specifier, + will select all parents of the selected model(s). If placed at the end, + will select all children of the selected models. The * operator matches all models within a package or directory.

dbt run --models my_model+          # select my_model and all children
dbt run --models +my_model          # select my_model and all parents

dbt run --models my_package.*       # select all models in my_package
dbt run --models path.to.models.*   # select all models in path/to/models

dbt run --models my_package.*+      # select all models in my_package and their children
dbt run --models +some_model+       # select some_model and all parents and children

Finally, dbt provides an --exclude flag with the same semantics as --models. Models specified with the --exclude flag will be removed from the set of models selected with --models

dbt run --models my_package.*+ --exclude my_package.a_big_model+

Run dbt non-destructively

If you provide the --non-destructive argument to dbt run, dbt will minimize the amount of destructive changes it runs against your database. Specifically, dbt

  1. Ignore models materialized as views
  2. Truncate tables and re-insert data instead of dropping and re-creating these tables

This flag is useful for recurring jobs which only need to update table models and incremental models. DBT will not create, drop, or modify views if the --non-destructive flag is provided.

dbt run --non-destructive

Refresh incremental models

If you provide the --full-refresh argument to dbt run, dbt will treat incremental models as table models. This is useful when

  1. The schema of an incremental model changes and you need to recreate it
  2. You want to reprocess the entirety of the incremental model because of new logic in the model code
dbt run --full-refresh

In model code, this flag will be reflected as flags.FULL_REFRESH, for example:

select * from all_events

-- if the table already exists and `--full-refresh` is
-- not set, then only add new records. otherwise, select
-- all records.
{% if already_exists(this.schema, this.name) 
      and not flags.FULL_REFRESH %}
   where collector_tstamp > (
     select coalesce(max(max_tstamp), '0001-01-01') from {{ this }}
{% endif %}

dbt test runs tests on data in deployed models. There are two types of tests:

  • schema validations, declared in a schema.yml file.
  • custom data tests, written as SQL SELECT statements.

dbt test runs both types of test and reports the results to the console.

The tests to run can be selected using the --models flag discussed here.

dbt test --models one_specific_model  # run tests for one_specific_model
dbt test --models some_package.*      # run tests for all models in package

For more information on writing tests, see the Testing Reference.

dbt deps pulls the most recent version of the dependencies listed in your dbt_project.yml from git. See Adding a Package for more information.


dbt --version is a utility function to check the version of your installed dbt client.


Commonly, analysts need to "look back in time" at some previous state of data in their mutable tables. While some systems are built in a way that makes accessing historical data possible, this is often not the case. dbt provides a mechanism, dbt archive, to record the changes made to a mutable table over time.

To use dbt archive, declare the tables you want to archive in your dbt_project.yml file:

  - source_schema: production_data         # schema to look for tables in
    target_schema: dbt_archive             # where to archive the data to
    tables:                                # list of tables to archive
      - source_table: users                # table to archive (production_data.users)
        target_table: users_archived       # table to insert archived data into
        updated_at: updated_at             # used to determine when data has changed
        unique_key: id                     # used to generate archival query
      - source_table: other_table          # production_data.some_other_table
        target_table: other_table_archive # dbt_archive.some_other_table_archive
        updated_at: "updatedAt"
        unique_key: "expressions || work || LOWER(too)"
  - source_schema: some_other_schema

The archived tables will mirror the schema of the source tables they are generated from. In addition, three fields are added to the archive table:

  1. valid_from: The timestamp when this archived row was inserted (and first considered valid)
  2. valid_to: The timestamp when this archived row became invalidated. The first archived record for a given unique_key has valid_to = NULL. When newer data is archived for that unique_key, the valid_to field of the old record is set to the valid_from field of the new record.
  3. scd_id: A unique key generated for each archive record. scd = Slowly Changing Dimension.

dbt models can be built on top of these archived tables. The most recent record for a given unique_key is the one where valid_to is null.

To run this archive process, use the command dbt archive. After testing and confirming that the archival works, you should schedule this process to run on a recurring basis.

dbt init [project_name] performs several actions necessary to create a new dbt project:

  • creates a ~/.dbt/profiles.yml file if one does not already exist
  • creates a new folder called [project_name]
  • generates directories and sample files necessary to get started with dbt

dbt clean is a utility function that deletes all folders specified in the clean-targets list specified in dbt_project.yml. This is useful for deleting the dbt_modules and target directories.

dbt debug is a utility function to show debug information.

The --config-dir option to dbt debug will show you where your .dbt configuration directory is located:

$ dbt debug --config-dir
To view your profiles.yml file, run:

open /Users/alice/.dbt

dbt seed loads data from csv files into your data warehouse. Because these csv files are located in your dbt repository, they are version controlled and code reviewable. Thus, dbt seed is appropriate for loading static data which changes infrequently.

The seed command will load csv files located in the data-paths directory of your dbt project into your data warehouse. You can configure the data-paths directory by adding the following line to your dbt_project.yml file:

data-paths: ["data"] # default is './data'

Assuming you have a csv file that looks like this:

US,United States
GB,United Kingdom

Running dbt seed with the above csv located at data/country_codes.csv will create a table in your data warehouse with two columns: country_code and country_name.

In addition to the standard --profile and --target arguments, dbt seed also accepts the --drop-existing argument. If provided, dbt will drop and re-create the specified table instead of truncating and inserting new data. This is useful if the schema of the csv file changes in a way which is incompatible with the existing table.


dbt compile generates executable SQL from source model, test, and analysis files. You can find these compiled SQL files in the target/ directory of your dbt project.

The compile command is useful for:

  1. Visually inspecting the compiled output of model files. This is useful for validating complex jinja logic or macro usage.
  2. Manually running compiled SQL. While debugging a model or schema test, it's often useful to execute the underlying select statement to find the source of the bug.
  3. Compiling analysisfiles. Read more about analysis files here.

adapter is a wrapper around the internal database adapter used by dbt. It allows users to make calls to the database in their dbt models.

The following functions are available on adapter:

class DatabaseWrapper(object):
    Wrapper for runtime database interaction. Should only call adapter

    def already_exists(self, schema, table):
        Returns true if a relation named like `table` exists in schema
        `schema`, false otherwise.

    def get_columns_in_table(self, schema_name, table_name):
        Returns a list of Columns in a table. Useful for writing 
        `INSERT ... SELECT` queries.

    def get_missing_columns(self, from_schema, from_table,
                            to_schema, to_table):
        Returns the set of Columns that is the difference of the 
        columns in the `from_table` and the columns in the `to_table`,
        i.e. (`set(from_table.columns) - set(to_table.columns)`). 
        Useful for detecting new columns in a source table.
Suggest Edits




  • schema: The schema to test
  • table: The relation to look for

Returns true if a relation named like table exists in schema schema, false otherwise.

(Source on Github)

select * from {{ref('raw_table')}}

{% if adapter.already_exists(this.schema, this.name) %}
  where id > (select max(id) from {{this}})
{% endif %}
Suggest Edits




  • schema_name: The schema to test
  • table_name: The relation from which to select columns

Returns a list of Columns in a table. Useful for writing INSERT ... SELECT queries.

(Source on Github)

{% set dest_columns = adapter.get_columns_in_table(schema, identifier) %}
{% set dest_cols_csv = dest_columns | map(attribute='quoted') | join(', ') %}

insert into {{ this }} ({{ dest_cols_csv }}) (
  select {{ dest_cols_csv }}
  from {{ref('another_table')}}
Suggest Edits




  • from_schema: The schema for the from_table
  • from_table: The from_table to check for differences
  • to_schema: The schema for the to_table
  • to_table: The to_table to check for differences

Returns the set of Columns that is the difference of the columns in the from_table
and the columns in the to_table, i.e. (set(from_table.columns) - set(to_table.columns)).
Useful for detecting new columns in a source table.

(Source on Github)

{% for col in adapter.get_missing_columns(this.schema, 'source_table', this.schema, this.name) %}
  alter table {{this}} add column "{{col.name}}" {{col.data_type}};
{% endfor %}
Suggest Edits



invocation_id outputs a UUID generated for this dbt run (useful for auditing)

Suggest Edits



run_started_at outputs the timestamp that this run started, e.g. 2017-04-21 01:23:45.678.

this makes available schema information about the currently executing model. It's is useful in any context in which you need to write code that references the current model, for example when defining a sql_where clause for an incremental model and for writing pre- and post-model hooks that operate on the model in some way. Developers have options for how to use this:

DBT Model Syntax






table or table__dbt_tmp, depending on whether or not dbt created a temporary table for this model.



Here's an example of how to use this in dbt_project.yml to write a sql_where clause that only inserts the new records since the most recent timestamp found in the table:

  materialized: incremental
  sql_where: "session_end_tstamp > (select max(session_end_tstamp) from {{this}})"

Here's an example of how to use this in dbt_project.yml to grant select rights on a table to a different db user.

      - "grant select on {{ this }} to db_reader"

target makes available information about which database and schema this dbt run is executing against.

DBT Model Syntax


Name of the active target, i.e. target


Database name specified in active target


Host specified in active target


Schema specified in active target


Database type (postgres / redshift / snowflake / ...) specified in active target


User specified in active target


Port specified in active target


Number of threads specified in active target

The most important function in dbt is ref(); it's impossible to build even moderately complex models without it. ref() is how you reference one model within another. This is a very common behavior, as typically models are built to be "stacked" on top of one another. Here is how this looks in practice:

select *
from public.raw_data
select *
from {{ref('model_a')}}

ref() is, under the hood, actually doing two important things. First, it is interpolating the schema into your model file to allow you to change your deployment schema via configuration. Second, it is using these references between models to automatically build the dependency graph. This will enable dbt to deploy models in the correct order when using dbt run.

Advanced ref usage

There is also a two-argument variant of the ref function. With this variant, you can pass both a package name and model name to ref to avoid ambiguity. This functionality is not commonly required for typical dbt usage.

select * from {{ ref('package_name', 'model_name') }}

Variables can be passed from your dbt_project.yml file into models during compilation.
These variables are useful for configuring packages for deployment in multiple environments, or defining values that should be used across multiple models within a package.

To add a variable to a model, use the var() function:

select * from events where event_type = '{{ var("event_type") }}'

If you try to run this model without supplying an event_type variable, you'll receive
a compilation error that looks like this:

Encountered an error:
! Compilation error while compiling model package_name.my_model:
! Required var 'event_type' not found in config:
Vars supplied to package_name.my_model = {

To supply a variable to a given model, add one or more vars dictionaries to the models
config in your dbt_project.yml. These vars are in-scope for all models at or below
where they are defined, so place them where they make the most sense. Below are three different
placements of the vars dict, all of which will make the my_model model compile.

# 1) scoped at the model level
      materialized: view
        event_type: activation

# 2) scoped at the package level
      event_type: activation
      materialized: view

# 3) scoped globally
    event_type: activation
      materialized: view

The env_var function can be used to incorporate Environment Variables from the system into your dbt project. This env_var function can be used 1) in the profiles.yml file or 2) in model sql.

When used in a profiles.yml file (to avoid putting credentials on a server), it can be used like this:

  target: prod
      type: postgres
      # IMPORTANT: Make sure to quote the entire Jinja string here
      user: "{{ env_var('DBT_USER') }}"
      pass: "{{ env_var('DBT_PASSWORD') }}"

If the DBT_USER and DBT_PASSWORD environment variables are present when dbt is invoked, then these variables will be pulled into the profile as expected. If any environment variables are not set, then dbt will raise a compilation error.

Quoting, Curly Brackets, & You

Be sure to quote the entire jinja string (as shown above), or else the yaml parser will be confused by the Jinja curly brackets.

Suggest Edits

The dbt API


While dbt is typically invoked from the command line, it is possible to import and invoke dbt as a Python module. This API is presently undocumented and is liable to change in the future. Please contact us if you choose to run dbt in this manner!

Suggest Edits

Exit Codes


When dbt exits, it will return an exit code of either 0, 1, or 2.

Exit Code


The dbt invocation completed without error.


The dbt invocation completed with at least one handled error (eg. model syntax error, bad permissions, etc). The run was completed, but some models may have been skipped.


The dbt invocation completed with an unhandled error (eg. ctrl-c, network interruption, etc).

While these exit codes may change in the future, a zero exit code will always imply success whereas a nonzero exit code will always imply failure.