Unit Testing SQL in dbt #4455

MichelleArk · 2020-04-23T15:55:27Z

MichelleArk
Apr 23, 2020
Maintainer

Describe the feature

In addition to the existing data test support dbt provides, it would be great if users had the capability to write unit tests to assert model behaviour generally and in edge cases. These would validate expected behaviour of models for data that isn't yet observed in production.

To do this, dbt would need to provide the ability to run models on a set of static inputs which could either be created at query-time or ahead of time.

We prototyped a solution where users encode static input in CSV files, configure a 'tests.yml' file that provides mappings between source/ref models and CSV files, as well as specifying an expected output (also encoded as a CSV file). Our framework then generated a query that created a CTE for each static input, a CTE that represented the model being tested (replacing source/ref macros with the static input CTE names), and lastly ran a diff between the expected model and the model generated using static inputs. This generated query was then fed to dbt test - if the diff returned 0 results, the test would pass.

Feedback from data scientists was that encoding static inputs in CSV files was cumbersome, readability of tests was poor because of the many disparate files representing a test case, and flexibility to programmatically encode static inputs and write custom expectations beyond equality was also desired.

Wondering if other dbt users have tried to achieve something similar, and how the community feels it's best to approach unit testing in dbt.

Describe alternatives you've considered

We have considered running dbt built-in data tests and running them on a small sample of production data locally. However, creating a representative sample of data for all edge cases for all downstream models is a challenging task and also bad practice - unit tests should have a single reason to fail. Creating many small tables representing individual test cases could be done to counter this but our main concern was where/how these static datasets were encoded - if they are in separate (let's say CSV) files, this creates a readability issue where reviewers / users have to jump between multiple files to understand a test case.

Another more general issue with this approach is that writing assertions for unit tests feels quite unnatural in SQL - its tricky even to get the right semantics for an equality check.

Additional context

There are definitely aspects of this that are database-specific. For example, in BigQuery, we can create static inputs as CTEs using ARRAYs of STRUCT types. For other databases, a different syntax or more preferred method of creating static data for testing. In addition, to create static inputs in BigQuery as ARRAYs of SRUCTs the data type of each column needs to be specified.

Who will this benefit?

I think all dbt users would benefit, especially large organizations where there will be frequent updates and many collaborators for a single model. Unit testing will give users more confidence that the changes they are making will not break existing behaviour.

bashyroger · 2020-09-16T12:46:00Z

bashyroger
Sep 16, 2020

This indeed would be useful to have for all orgs where a focus on data quality is of the utmost importance. AFAIK this is a hard problem to solve on the people / processes side of things (as you mention) and not something that has been done before for DATA unit testing.

0 replies

boxysean · 2020-09-28T22:24:12Z

boxysean
Sep 28, 2020
Collaborator

Happy to find this issue! This is also an enhancement that would be useful to my team. I think this type of testing falls outside of the two existing kinds of dbt tests: schema tests and data tests.

I've implemented a form of unit testing in my company's codebase. It currently executes via pytest to test pl/pgsql transformations on Postgres, but I think the technique could be adapted to other databases and dbt.

Implementation sketch

My test suite folder looks like this:

tests
  -> orders
      -> 000_basic_order                     # the name of the unit test
          README.md                          # further details about the unit test
          -> input                           # setup pre-conditions with SQL and data from YAML files
              ddl.sql                        # creates tables
              source_order_1000.yml          # data to populate the tables with
              source_order_2000.yml          # more data to populate the tables with
          -> expected_output                 # assertions to make after executing dbt run
              reporting.dim_order.yml        # records that must exist in table reporting.dim_order
              reporting.dim_order_item.yml   # records that must exist in table reporting.dim_order_item
       -> 001_reprocess_orders               # another unit test
           ...
       -> 002_customer_pickup_order
           ...
       ...

(selected sample files)

The algorithm looks as follows:

For each unit test case folder,
a. Run SQL files under the input directory
b. Load static data from YAML files under the input directory
c. Execute SQL transforms
d. For each YAML file under the expected_output directory (which must be named after a table in the database), assert that each record exists and matches the YAML record.

The approach is probably similar to what @MichelleArk has reported with CSVs. These tests are cumbersome to setup, and haven't been able to convince my team to do this kind of testing yet. :-)

I see that Dataform has unit testing. I guess one advantage of their implementation is that they are generating the test dataset in the database. Since I am defining the data in YAML, there could be issues translating data types from YAML into the database under test.

0 replies

dm03514 · 2021-02-22T00:21:28Z

dm03514
Feb 22, 2021

Hello! I would love to see this feature as part of dbt core. I created a small
MVP that shows an approach that allows end users to write their tests in python using pythons built in unit testing framework.

https://discourse.getdbt.com/t/dbt-model-think-unit-tests-poc/2160

One of the core design constraints was the ability to exercise models one at a time. This means that The framework needed to provide some mechanism for stubbing our ref/source.

the approach I took with the mvp listed above was to namespace the stubbed tables with a prefix, which is set as an environmental variable. The following describes the logical steps the mvp test harness takes to stub out ref/source and provide test defined data:

model test package provides overridden ref/source macros
dbt project installs the model test package
Dbt project defines A standard test_*.py unit test file
Dbt project subclasses a model test class And defines a standard python test function
Test creates a pandas dataframe for each ref/source that needs to be stubbed
Test calls into the model test function which will create the tables from the dataframe
The test then invokes the dbt model toggling on the mode test framework which will rewrite the ref/source to look at the tables generated from the pandas dataframe
The model is effectively executed against the stub data

this allows very focused model (“unit”) tests. Tests configure a couple of rows of stub data, exercise the model, and then assert on the output using a python dataframe. This allows for targeted, fast testing of model transformation code.

I’m most likely going to move forward with this approach at dayjob.

If anyone is interesting it should be relatively easy to convert this python approach to a “configuration” yaml approach.

I would love to hear your thoughts.

0 replies

jmriego · 2021-03-01T15:01:15Z

jmriego
Mar 1, 2021

Hi!

I wanted continuing the conversation from #2740
I had not time to work on this but we are prioritizing this again. I'll paste the same comment I wrote over there:

I have been playing around with a way to automate this and I have a working concept here: https://github.com/jmriego/dbt-bdd

The tests are run with behave which is a library for BDD testing which in my opinion is a great fit for DBT as it makes the tests easy to understand by analysts the same way it already does for ELT.
Here's an example unit test for a model call fom that counts number of days in a month:

Scenario: run a sample unit test
Given calendar is loaded with this data
| dt | first_day_of_month |
| 2020-01-01 | 2020-01-01 |
| 2020-01-02 | 2020-01-01 |
| 2020-02-02 | 2020-02-01 |
When we run the load for fom
Then the results of the model are
| first_day_of_month | count_days |
| 2020-01-01 | 2 |
| 2020-02-01 | 1 |
The main trick is that the phrase "calendar is loaded with this data" will generate a seed file with that info and with a name specific to this test that it generates automatically. It does that with a scenario id prefix (i.e. abcd124_calendar).

Then, it will replace all ref to calendar with abcd124_calendar. This is really the main concept and I didn't find a better solution, but it does so by passing to dbt a var with the following key and value: {calendar: abcd124}. The code that detects the reference is here: https://github.com/jmriego/dbt-bdd/blob/master/macros/ref.sql
I played with the idea of just renaming the alias of the models, but then there's a conflict with the seeds created while setting up the test case

I'm seeing @dm03514 you also created something similar but with pytest

0 replies

reubster · 2021-05-13T18:56:00Z

reubster
May 13, 2021

We've been experimenting with unit tests. We've decided to (probably) use SQL mocks rather than seeds because they're faster.

For any given model we have a __source.sql/s and a __expected.sql.
We run equality / other tests against the __expected
The source for the real model is controlled by a customised ref function.. which is driven by variables / environment settings.
Selector tags group these into a set for easy invocation.

It took me a while to come to realise that there's a fundamental paradox; either I have to deploy the model and change it's sources... or I have to have different "versions" of the model itself pointing at different sources... because the source needs to be instantiated and the model deployed before it can be tested. Ideally, though this would be easier to control with config.

0 replies

jmriego · 2021-05-19T19:45:54Z

jmriego
May 19, 2021

Hi @reubster! How are you creating those SQL mocks? Do you mean that people writing the tests need to create fake source models and expected values with a SQL query similar to this?

SELECT 1 id, 'A' name, 100 value UNION ALL
SELECT 2 id, 'B' name, 200 value

0 replies

Zatte · 2021-06-15T11:11:55Z

Zatte
Jun 15, 2021

Getting (static) test data into the database is something that will be test dependent; sometimes you want small test data and then writing SQL that mocks the data is doable (select ... union all select...), medium data sets fit well inside yaml files and large files can be provisioned with regular dbt tooling practices.

Since all of them have different approaches / tooling I would rather see a solution to unit testing where we can get a model's sql code (parsed) but where refs and sources (potentially variables as well) can be overridden in a test local scope. Assume such a macro exists (where we can get the compiled sql code) and that it is called cte then a unit test could look like this: (note how it requires no changes to the model code and there are no new concepts are introduced to allow testing).

Contrived example
models/addder.sql

    SELECT 
      *,
      a+b AS sum
    FROM {{ ref('some_other_model') }}

/tests/adder_001.sql

{% set model_to_test = cte('model.addder')
      | replace(ref('some_other_model'), "mocked_src1") 
%}

WITH
mocked_src1 AS (
  SELECT
    *
  FROM
  -- bigquery construct; same effect as union all select ....
  UNNEST([
     struct(1 as a,  2 as b,  3 as expectedSum),
     struct(2 as a,  2 as b,  4 as expectedSum),
     struct(0 as a,  0 as b,  0 as expectedSum),
     struct(1 as a, -2 as b, -1 as expectedSum),
   ])
),

final AS (
  {{ model_to_test }}
)

-- Some SQL assertion based on the specific mocked data.
SELECT * FROM final WHERE sum != expectedSum

This example doesn't have ref/source overrides but rather does a simple string replace but you get the general idea. How the data is sourced in mocked_src1 is up to the test designer to decide.

_ This is mostly a copy from this thread we're I've tried to get some feedback on this approach : https://discourse.getdbt.com/t/testing-with-fixed-data-set/564/9 _

0 replies

jmriego · 2021-06-17T15:59:27Z

jmriego
Jun 17, 2021

hi @Zatte ,

I really like that approach. It definitely feels more DBT-onic than what I was proposing. As you say, there might be multiple ways of filling data for testing depending on the size of the tests. Nothing stops the yaml I was proposing to generate these test sqls automatically so it's not even like these two approaches are exclusive.
What you are describing seems like a good way of getting the number one requirement for getting any type of unit test: a way of overriding a ref/source. Getting the compiled SQL in a macro seems like a good way.
It could even be enhanced to have a parameter with the overrides you want to make instead of doing the replace separately

0 replies

noel · 2021-06-25T11:47:32Z

noel
Jun 25, 2021

I see the preference is to have the mock data in some file in the repo. I am curious as to why not have a different database / schema for the mock data e.g. raw_mock_data and replace sources with that maybe using a var and adding a tag to these tests so you can include/exclude them on a given run

0 replies

Zatte · 2021-06-25T17:59:44Z

Zatte
Jun 25, 2021

/../ maybe using a var and adding a tag to these tests so you can include/exclude them on a given run /.../

I personally would like all tests to be able to run using just dbt test, at most it cli-args can be used to separate unit/integration tests just because they are orders of magnitude slower/faster to run.
The second you have (different) cli-args to run (different) tests you end up needing tooling (e.g. make files) to ensure all test suites are run; otherwise you will risk missing a test(s). To me this seems like an anti pattern. Ideally I would like dbt test to contain just about everything you need to run tests so no additional tooling is needed.

/../ not have a different database / schema for the mock data e.g. raw_mock_data and replace sources with that /.../

I think this approach can work in many situations but not all. If you can only swap out the schema then you are limited to swapping 1:1 between production/mock data. What if you want to test a model using different mocks and/or which depends on 2 or more tables (let's call them A, B); Testing with mocks A1, A2, B1, B2, B3 and combinations of these would be difficult.

0 replies

noel · 2021-06-25T18:13:03Z

noel
Jun 25, 2021

makes sense, thanks for clarifying.

0 replies

cdiniz · 2021-10-04T19:07:16Z

cdiniz
Oct 4, 2021

Hi

I'm doing unit tests in dbt with a couple of custom macro helpers, with a couple of trade-offs and not practical things.
A test looks like this:

{% set model_to_test = ref('covid_19_cases_per_day') %}

{% set input %}
insert into covid_19_ingestion(day, payload) values ('2021-05-05', '[{}]');
insert into covid_19_ingestion(day, payload) values ('2021-05-06','[{"newCases": 20}]');
{% endset %}

{% set expectations %}
select '2021-05-05'::timestamp as day, 0 as cases
union all
select '2021-05-06'::timestamp as day, 20 as cases
{% endset %}

{{ unit_test(model_to_test, input, expectations) }}

It looks good:

the data setup is a set of inserts into a table used by the model_to_test.
the model_to_test needs to be a view when running the tests (so I don't have to do a dbt run in the middle of the testing macro to update the model with the inserted data).

But it has a couple of flaws:

Imagine the model_to_test is using a model in a model chain. I need to insert the data on the root table and rely on the logic of the models in the middle till it reaches the model_to_test, because I can't mock properly what is being used on the model_to_test. @Zatte's pr is trying to achieve this mocking capability with another workaround by having the sql for the model.
Every time that I change the model_to_test, I need to re-run dbt run before dbt test to update the models.

So it's far from a perfect setup.

I was having a look at Dataform and they have the concept of unit tests as a feature. In Dataform, for each unit test, we need to defined the model that we want to test (as in my approach) and we need to always provide the input data for each model used by the model_to_test.

Rewriting my initial test in Dataform looks like this:

config {
  type: "test",
  dataset: "covid19_cases_per_day_dataform"
}

input "covid19" {
  SELECT '2021-05-05' as day, '[{}]' as payload UNION ALL
  SELECT '2021-05-06' as day, '[{"newCases": 20}]' as payload
}

select cast('2021-05-05' as timestamp) as day, 0 as cases
union all
select cast('2021-05-06' as timestamp) as day, 20 as cases

That's neat imo.
The test setup is close to test expectations and it's possible and mandatory to mock the underlying models.
Also the neatest part of this approach is the test execution, it doesn't use the database models, so there is no need to update the models before running the tests.

That being said, I was impressed with the Dataform approach, and I think an approach like that is the way to go for dbt. I think this is slightly hard to put on a PR by an 'outsider'. Could someone from the dbt team please share what is the road-map for unit tests?

0 replies

jtcohen6 · 2021-10-07T15:50:30Z

jtcohen6
Oct 7, 2021
Maintainer

Just chiming in to link a solid slack thread, prompted by the comment above. This is a topic I'm very interested in — and would be interested in revisiting, in earnest, next year

0 replies

cdiniz · 2021-10-19T15:34:09Z

cdiniz
Oct 19, 2021

Hi.
Following up my last comment, the intent of this update is to share our journey on unit tests, which can be valuable for others. I've been working with @psousa50 to improve the previous approach, and we reached a better approach.
In our current approach a unit tests looks like this:

{{
    config(
        tags=['unit-test'],
        model_under_test='covid19_cases_per_day'
    )
}}

{% set inputs %}
covid19_raw as (
select cast('2021-05-05' as date) as day, '[{}]' as payload
UNION ALL
select cast('2021-05-06' as date) as day, '[{"newCases":20}]' as payload
{% endset %}

{% set expectations %}
select cast('2021-05-05' as Date) as day, 0 as cases
UNION ALL
select cast('2021-05-06' as Date) as day, 20 as cases
{% endset %}
 
{{ unit_test(inputs, expectations) }}

The unit test is composed by 4 separated parts:

Configuration: using tags to mark the test as a unit-test, so we can execute it in a slightly different way
Inputs: a set of with clauses where the name of the clause needs to match the name of the source that we are 'mocking' and the name of the columns should match the actual columns.
Expectations: a set of rows defined by selects with the column names matching the model column names
Boilerplate: we need to execute the custom macro

Under the hood the unit_test macro constructs a big sql query which doesn't depend on models, just depends on the inputs. That being said we don't need to make dbt run, to refresh the models each time we want to test a new change, so the feedback loop is seconds.

We solved our main problem which was mocking the sources of a model, also we improved the feedback loop, anyway, we still have a couple of ideas to improve based on our customers feedback:

Right now there is the need to have all the columns when defining an input, and if the column is not important for test we can have a 'null' as column_x, but it's not doable/maintainable when we have a big number of columns. So we want to make easier to define the inputs and we'll try to find a solution to create the non defined columns as null automatically.

We'll share the custom macros in Equal Experts GitHub.

3 replies

cdiniz Dec 9, 2021

Hi everyone.
@psousa50 and I have been improving the approach described on the last message.
We ended up changing a lot our last approach for practical aspects like having a description for a test, or having multiple tests in one file.
Here is an example of our last approach:

{{
    config(
        tags=['unit-test']
    )
}}

{% call test('covid19_cases_per_day', 'empty payload') %}
  {% call mock_source('covid19_stg', 'covid19_stg') %}
    select CAST('2021-05-05' as date) as day, '[{}]' as payload
  {% endcall %}

  {% call expect() %}
    select cast('2021-05-05' as Date) as day, 0 as cases
  {% endcall %}
{% endcall %}
 
UNION ALL

{% call test('covid19_cases_per_day', 'extracting cases from payload') %}
  {% call mock_source('covid19_stg', 'covid19_stg') %}
    select CAST('2021-05-06' as date) as day, '[{"newCases": 20}]' as payload
  {% endcall %}

  {% call expect() %}
    select cast('2021-05-06' as Date) as day, 20 as cases
  {% endcall %}
{% endcall %}

There is more boilerplate but we improved the practicability.
Here is a blog post that covers the changes: https://www.equalexperts.com/blog/our-thinking/writing-unit-tests-for-dbt-with-tdd/
Here is a github repo with the macros: https://github.com/EqualExperts/dbt-unit-test-demo

At the moment the macros are implemented to BigQuery but they can be easily changed to another db.

DrValani Dec 17, 2021

Hello,

I've recently started using this macro, and can see huge potential. It has a long way to go and I can see lots of features and improvements but this is an amazing start.

I am actually doing test driving development with SQL 🤯. This is something I thought would never be possible, and a huge step in the way we work with SQL. Thank you.

Hopefully I can contribute too where possible.

louis-vines Feb 12, 2022

@cdiniz I came across your two blog posts last week and they seem to be the best approach I've found so far to fill the unit testing gap in dbt. Thanks a lot for sharing this! I haven't got as far as implementing the full approach for unit testing models you show in your blogs but have been inspired to take this approach for testing my macros.

One thing I've found useful is creating a from_records macro that converts a jinja list of dicts into sql using the snowflake flatten function under the hood. (the name from_records is the same as the class method in pandas used for creating dataframes from data with this structure). Like your approach this allows us to define our test data in our file (I also agree that defining test inputs in seeds is a not a good developer experience) but allows for defining our data in a way that looks more like a traditional python unit test setup.

Below is a pseudo-codey example of how writing these tests look. I'd be keen to hear what you think

{% set actual %}
    with input as (
        {{
            from_records([
                {'id': 1, 'value': 'foo'},
                {'id': 2, 'value': 'bar'},
            ])
        }}
    )
    select
        *,
        {{ fizz_buzz_macro('value') }}
    from input
{% endset %}

{% set expected %}
    {{
        from_records([
            {'id': 1, 'value': 'foo', 'new_column': 'fizz'},
            {'id': 2, 'value': 'bar', 'new_column': 'buzz'},
        ])
    }}
{% endset %}

{{ assert_equal(actual, expected) }}

I too would be keen to contribute more to this area if there is any way I can - this stuff looks really promising!

jtcohen6 · 2022-02-23T13:29:16Z

jtcohen6
Feb 23, 2022
Maintainer

Very glad to see the activity in this discussion continue! I've been thinking about this on and off for a while now. Thanks to @tommyh for taking the initiative by pulling some more thoughts out of me. This started as a DM; I figure it's better here.

Update on current state:

I haven't prioritized this work for dbt-core in the near future
I am in favor of dbt-core offering a standard, built-in way to achieve this — thus enabling end users to standardize on how SQL business logic is tested
I'm very open to new ideas + approaches, as well as refinements of the proposals presented above + below

Premises:

Unit testing should be fast, for an individual model / sub-DAG / whole DAG
Unit testing should prioritize SQL where possible. I'm not strictly opposed to Jinja/Python magic, especially if it offers needed glue, but we should maximize the logic that is reproducible and accessible to end users. I tend to prefer defining fixtures as database tables (seeds), or reusable SQL select statements with literal inputs, rather than Jinja dictionaries
Setting up unit testing on a model should seek to minimize both (a) duplication of code, and (b) scattering of logic across multiple files
Unit testing may want to be a new kind of workflow, not just an additional generic test defined on a model that executes during dbt test

I sense that a trade-off exists today between:

Compactness: the ability to define fixture inputs + outputs (expectations) for one model, right within that model's .sql file
Reusability: acknowledging that dbt models exist in dependency relation to one another, finding ways to reuse fixture inputs + outputs between models that ref one another
Flexibility: Does every model want to be unit-tested? Or just a subset of highly important ones (exposed to downstream tools, particularly complex transformation logic)
Speed: Can I run just the model I want to test? Or do I need to run other models in the DAG?
Modularity: Is this really unit-testing one model?

I've seen three basic approaches presented, here and elsewhere:

Unit test one model, with inputs/outputs defined as in-file static inputs (SQL unions or Jinja dictionaries) or seeds. This is what I see most folks proposing above. It does a good job of optimizing for most of the qualities above, but with a serious loss of reusability. I'm not making use of the DAG at all, and as soon as I want to unit-test two adjacent models, I'm defining the same basic fixtures over again.
Mock sources, run the DAG as normal. Point sources to fixture inputs (via vars, custom source macro, etc). For specific models, define expected results as a fixture. Run every model in DAG order (like normal). Define tests that check equality between each "final" model output, and fixture of expected results. This approach optimizes for reusability + flexibility (which models you test), at the expense of compactness + speed + modularity. It's not really unit testing, more like the CI testing pattern where you use sampled or phony data instead of the full source data used in production.
Mock outputs for your whole DAG. Define fixture data (seeds) representing the expected input sources, and the expected output of every single model. Use a custom ref macro that repoints each downstream model's references to instead select from the "expected output" fixture for each upstream model. Run all your fixtures (seeds), and then run every single model in parallel, with lots of threads. (Maybe these queries could even be batched, into one bigger query?) The point is, the models don't need to be run in DAG order, because they only depend on fixtures, and not on each other.

I find myself drawn to option 3, even though it has some serious shortcomings. This approach optimizes for reusability, modularity, and speed. Given the need for seeds, it loses out on compactness—maybe we can find a way to stick both a seed + a model in the same file? It's also not very flexible, if it requires you to define fixture expectations for every single model—not to mention, the significant work of maintaining those fixtures going forward—that is, unless we could support a cleverer custom ref(), that somehow checks for the existence of a fixture at parse time, uses it if available, and otherwise switches back to the real upstream model (including DAG dependency) if one isn't defined.

Those approaches are all (or mostly) possible today with dbt as it exists today. Even if the implementation isn't ideal, I think it's enough to bake them out, and decide which is worthy of standardization. When it's time to build this into dbt-core proper, we'll be able to rewrite gnarly Jinja "glue" with Python, and ensure that the task/materialization logic can refine each of the desired qualities.

One last thing: To reiterate what I said in #4707 (comment), I think unit testing model logic (SQL) and macro logic (DDL/DML/functional behavior) ought to be two different endeavors. (That's not a perfect division: there are many macros that do little more than DRY up "model logic," and fall under the purview of this initiative. I'm talking about the macros you look at and say, "This is Jinja?") I believe folks building products on top of dbt, such as package + plugin maintainers, need access to a more rigorous testing framework. I also believe that framework cold be the very same as the one we're (re)building for ourselves, to better test dbt-core and the adapter plugins that we maintain. It may end up looking like a suite of pytest plugins with dbt-specific utilities.

0 replies

cdiniz · 2022-02-23T16:45:32Z

cdiniz
Feb 23, 2022

Hi everyone!

We've been improving our previous mentioned approach and we've transformed our sample project into a dbt package that can be reused across multiple projects: https://github.com/EqualExperts/dbt-unit-testing

We've received feedback on the dbt slack and we introduced a couple of changes:

Every test is now self-contained in a CTE: it means that you don't need to have models defined in the database to test them. You can just change the model, change the test and run dbt test. For us this was crucial, and it allow us to have a much faster feedback loop.
You can now define your input as SQL or as a CSV format if you will (allows to reduce the boilerplate):

{% call dbt_unit_testing.test('covid19_cases_per_day') %}
  {% call dbt_unit_testing.mock_source('dbt_unit_testing_staging', 'covid19_stg', {"input_format": "csv"}) %}
    day::Date, payload
    '2021-05-05', '[{}]'
    '2021-05-06', '[{"newCases": 20}]'
  {% endcall %}

  {% call dbt_unit_testing.expect({"input_format": "csv"}) %}
    day::Date, cases
    '2021-05-05', 0
    '2021-05-06', 20
  {% endcall %}
{% endcall %}

@jtcohen6 Thank you for your insightful feedback.

Unit test one model, with inputs/outputs defined as in-file static inputs (SQL unions or Jinja dictionaries) or seeds. This is what I see most folks proposing above. It does a good job of optimizing for most of the qualities above, but with a serious loss of reusability. I'm not making use of the DAG at all, and as soon as I want to unit-test two adjacent models, I'm defining the same basic fixtures over again.

We are definitely in this space. I understand your concern about reusability. Although, since we are doing unit tests, even if you are testing adjacent models you will probably test different behaviours, and for different behaviours, the test setup might be different.
So I'm not sure if reusability is a big factor due to all the benefits of this approach. Anyway, before all this journey we're using approach 3, and it was really hard to do proper unit tests and to maintain, basically, we were not doing unit tests, we were doing integration tests, which is not a good scenario to test the model logic.

I believe folks building products on top of dbt, such as package + plugin maintainers, need access to a more rigorous testing framework. I also believe that framework cold be the very same as the one we're (re)building for ourselves, to better test dbt-core and the adapter plugins that we maintain. It may end up looking like a suite of pytest plugins with dbt-specific utilities

Absolutely, really appreciate it!

0 replies

louis-vines · 2022-02-24T12:45:15Z

louis-vines
Feb 24, 2022

I have to agree with @cdiniz here... Taking apporach 1 from above here seems to be the option that will provide the best unit testing capabilities. Again, in agreement with @cdiniz, the other two approaches aren't really unit tests - I think the clue is in the fact that approach 1 is the only one that doesn't need the DAG to run, i.e. you are testing one model as if it were a function in isolation.

If I understand correctly, approach 3 will result in you only being allowed one mock output per model to act as an input to downstream tests? This means that all of our tests that share an upstream model become coupled together. Thinking about this conversely it means you have to maintain one master test dataset for each model that will cover all tests for all children of this model. This feels brittle.

For example... I've always been told when you discover a bug you should write a test to ellicit the buggy behaviour and then fix the source code. In this scenario if I find a bug in model A's logic, I now have to tweak the input fixtures to model A in such a way as to ellicit the buggy behaviour. This could then break any number of tests to models downstream of model A's parents (when the changes were to elicit behaviour in model A, which the other breaking tests may not even be dependant on). This sounds like a very high blast radius on my test suite for a bug fix.

Furthermore, It's very common to write many self contained tests for the same class/function with slightly different inputs to isolate the behaviour for each individual test case. This to me is a fundamental requirement for writing properly focused/isolated unit tests and only seems possible with approach 1.

Perhaps some sort of hybrid of approach 1 and 3 could be thought about? Make it straight forward to write proper unit tests with fully isolated inputs and the ability to define many inputs/expectations in a single file, but also have a clean and easy mechanism to share fixtures between test files (e.g. like conftest.py files in pytest).

0 replies

jtcohen6 · 2022-05-17T10:12:08Z

jtcohen6
May 17, 2022
Maintainer

We released our pytest framework for dbt-core in v1.1. There are docs here, which specifically address the use case of adapter maintainers: https://docs.getdbt.com/docs/contributing/testing-a-new-adapter

You may notice, in reading those docs, that the entire first section is just about mocking dbt projects and sequences; it isn't specific to adapter plugins at all. It's totally possible to use this framework outside the context of developing an adapter. Really, you can use it anywhere you can install dbt-core and pytest together, and run some pytest commands. (So, unfortunately, not dbt Cloud today—but not never, I think.)

The appeal of the framework is it's very easy to define multiple fixtures (mocked project resources) in one file, to reuse them across files / test cases, and to run test cases against multiple database adapters. We just did this for macros over in dbt-utils: dbt-labs/dbt-utils#577

So, what about unit testing models? A dbt SQL model is a kind of function: its inputs are refs + sources, its output is a tabular dataset. If all three of those can be mocked, then we can write a "unit" test. (It still requires running dbt, and a database connection—but if you've written your model to be cross-database compatible, you could conceivably run its tests against Postgres / DuckDB / SQLite instead of your actual database.)

Example code

Imagine I've got a "complex" model, selecting from another model and a source table, and then unioning them together:

-- models/complex_model.sql
select count(*) as num from {{ source('population', 'persons') }}
union all
select count(*) as num from {{ ref('stg_persons') }}

I want to, given some mocked versions of the stg_persons model and the population.persons source, validate that the dataset returned by complex_model matches my expectations of that model.

The first two files below are just one-time setup, reusable by each unit test case. The final file is where I mock and run complex_model in particular.

### tests/conftest.py
import pytest

# Import the standard functional fixtures as a plugin
pytest_plugins = ["dbt.tests.fixtures.project"]

# The profile dictionary, used to write out profiles.yml
@pytest.fixture(scope="class")
def dbt_profile_target():
    return {
        'type': 'postgres',
        'threads': 1,
        'host': "localhost",
        'port': 5432,
        'user': ...,
        'pass': ...,
        'dbname': ...,
    }

### tests/base_unit.py

import pytest
from dbt.tests.util import run_dbt, check_relations_equal

# This is pretty tricky Jinja, but the idea is just to "override" ref/source by repointing
# to the mocked seeds/models defined in the test case. The mapping is handled by
# 'mock_ref()' and 'mock_source()' methods defined on the test case

mock_ref_source = """
{{% macro ref(ref_name) %}}
    {{% set mock_ref = {} %}}
    {{% set mock_name = mock_ref.get('ref_name', ref_name) %}}
    {{% do return(builtins.ref(mock_name)) %}}
{{% endmacro %}}

{{% macro source(source_name, table_name) %}}
    {{% set lookup_name = source_name ~ '__' ~ table_name %}}
    {{% set mock_src = {} %}}
    {{% set mock_name = mock_src[lookup_name] %}}
    {{% do return(builtins.ref(mock_name)) %}}
{{% endmacro %}}
"""

# this isn't a test itself, it's just the "base case" for actual tests to inherit
class BaseUnitTestModel:
    def actual(self):
        return "actual"
    
    def expected(self):
        return "expected"
    
    def mock_ref(self):
        return {}

    def mock_source(self):
        return {}
    
    @pytest.fixture(scope="class")
    def macros(self):
        return {
            "overrides.sql": mock_ref_source.format(str(self.mock_ref()), str(self.mock_source()))
        }

    # The actual sequence of dbt commands and assertions
    # pytest will take care of all "setup" + "teardown"
    def test_mock_run_and_check(self, project):
        run_dbt(["build"])
        # this runs a pretty fancy query to validate: same columns, same types, same row values
        check_relations_equal(project.adapter, [self.actual(), self.expected()])

Now that the setup is done, I can define just the fixtures I need for unit-testing my "complex" model:

I'll read the code for the complex_model from its file location in my project (models/complex_model.sql)
Everything else is a Python f-string, written in either CSV format (→ seed) or as a SQL query (→ model)
If any of these fixtures is shared with another model's unit test, I could define it in a separate file and import / reuse it across both places. For instance, the mocked CSV input standing in for stg_persons could also be the expected output, when unit testing the stg_persons model.
Finally, I define a test class that inherits from BaseUnitTestModel, and reimplement just the pieces I need

### tests/test_unit_test_complex_model.py

import pytest
from dbt.tests.util import read_file
from tests.base_unit import BaseUnitTestModel

# Define mocks via CSV (seeds) or SQL (models)
mock_stg_persons_csv = """id,name,some_date
1,Easton,1981-05-20T06:46:51
2,Lillian,1978-09-03T18:10:33
""".lstrip()

mock_source_population_persons = """
select 1 as id, 'Easton' as name, '1981-05-20T06:46:51' as some_date
union all
select 2 as id, 'Lillian' as name, '1978-09-03T18:10:33' as some_date
"""

expected_csv = """num
2
2
""".lstrip()

actual = read_file('models/complex_model.sql')

class TestUnitTestComplexModel(BaseUnitTestModel):

    # everything that goes in the "seeds" directory (= CSV format)
    @pytest.fixture(scope="class")
    def seeds(self):
        return {
            "stg_persons.csv": mock_stg_persons_csv,
            "expected.csv": expected_csv,
        }

    # everything that goes in the "models" directory (= SQL)
    @pytest.fixture(scope="class")
    def models(self):
        return {
            "source_population_persons.sql": mock_source_population_persons,
            "actual.sql": actual,
        }

    # repoint 'source()' calls to mocks (seeds or models)
    def mock_source(self):
        return {
            "population__persons": "source_population_persons",
        }

    # not necessary, since the mocked model has the same name, but here for illustration
    def mock_ref(self):
        return {
            "stg_persons": "stg_persons",
        }

And:

$ python3 -m pytest tests/test_unit_test_specific_model.py
======================================== test session starts ========================================
platform darwin -- Python 3.9.12, pytest-7.0.1, pluggy-1.0.0
rootdir: /Users/jerco/dev/scratch/testy, configfile: pytest.ini
plugins: xdist-2.5.0, forked-1.4.0, csv-3.0.0, logbook-1.2.0, flaky-3.7.0, dotenv-0.5.2, cov-3.0.0
collected 1 item

tests/test_unit_test_specific_model.py .                                                      [100%]

========================================= 1 passed in 1.18s =========================================

Is this the thing?

I don't know. It feels close, but the need to run pytest instead of native dbt commands does feel like a limitation in terms of who can use this. Some folks in this thread have gone to great (and quite impressive) ends to build an all-in-Jinja approach to achieve very similar aims: https://github.com/EqualExperts/dbt-unit-testing

I wrote a bit about this in the Core roadmap a few days ago:

Is the new pytest-based framework actually the dbt feature called "unit testing" in disguise? Funny you asked—no, but I do think it's a step on the way there. It's possible to use this functional testing framework to quickly spin up "projects," with fixture inputs and outputs, that check the behavior of a macro. We can test that macro for consistency across many databases. (Check this out in that most complex project of them all, dbt-labs/dbt-utils#588.) The dream is indeed to expose a dbt-code-only version of this workflow, along the lines of what people have been cooking up in #4455.

4 replies

louis-vines May 18, 2022

@jtcohen6 thanks for the indepth response - I will take a look at this when I have time over the coming days...

jtcohen6 May 30, 2022
Maintainer

The obvious piece missing from the example shown above: the ability to unit-test models that call macros defined in the project (or in packages). Those could be defined as fixtures (like the input data and expected output), or read in from project files (macros/my_complex_macro.sql), in the same way that the SQL for complex_model is read in from the project file. I'm sure there's a way to do this automatically, and load the full macros/ directory + all packages each time, if desired.

An alternative approach would be to compile the model as normal, then load the compiled SQL and swap out the compiled ref/source for the fixtures instead. I don't like that approach nearly as much.

There's a trade-off here (as everywhere) between a unit testing framework that "just works" by loading in full project context, and one that limits the set of things actually being tested, by requiring that those things be defined as explicit inputs each time.

louis-vines May 30, 2022

@jtcohen6 coincidentally I was playing around with your suggestion above just as you posted your latest reply. I have a few bits of feedback from trying ou the method you are proposing:

Running run_dbt(["build"]) for every test is pretty slow (> 5 seconds per test). I couldn't work out a way to cache the dag parsing between tests: we get Unable to do partial parsing because profile has changed for each test and this is creating a pretty high overhead.
When a test does fail we get this message:
AssertionError: Got 1 different rows between DEV_CEREBRO.test16539256090777326094_test_complex_model.actual and DEV_CEREBRO.test16539256090777326094_test_complex_model.expected which isn't informative enough to workout what's different between the expected and actual
The captured output for each test is V noisy! I'm getting a line:
DEBUG configured_file:functions.py:235 15:46:49.092257 [debug] [MainThread]: Parsing macros/catalog.sql
for every macro in my project on every test. I'm sure there will be a way of tweaking my log level to not show this but it's still quite intimidating at the moment

But I can definitely see potential in this...

boxysean May 31, 2022
Collaborator

One pro I see with the pytest based approach, compared to most other approaches mentioned in this discussion, is the ability to use more assertion statements easily expressed with python, pytest, and other pytest extensions -- as compared to SQL and dbt tests. Not every test must be an equality test!

bashyroger · 2022-05-18T10:42:31Z

bashyroger
May 18, 2022

Using @Zatte 's approach as a start and then extending it, this is how I could see unit tests being implemented @jtcohen6.
Why like this?
As you mention, 'the need to run pytest instead of native dbt commands does feel like a limitation in terms of who can use this' .

IMHO, by basically 'borrowing' functionality from dbt macro's, seeds and singular tests. we should be able to get quite far.
Foremost, I would like to have the flexibility to either implement the unit test in the model itself OR to put it in a 'child' model as in @Zatte's example. Also, it is possible to enable / disable unit tests this way, only add them to models that require it and only run them instead of the model

My modified version of his example then boils down to 'something like this':

models/foobar.sql

with foo as (
    select * from {{ ref('model_foo') }}
    where 1=1
)
with foo as (
    select * from {{ ref('model_bar') }}
    where 1=1
)
select 
    foo.col1 as foo_col1,
    foo.col2 as foo_col2,
    foo.col3 as foo_col3,
    bar.col2 as bar_col2
  from foo
  inner join bar using (col1)

--A unit test could be defined 'somewhat like a macro'
{% unittest my_foobar_test %}

    --a unit test could accept config flags like singular tests do + two new param to enable / disable unit testing and choosing to run then as pre or post hook
    {{
        config(
	    unit_test="enabled"    --or disabled
	    run_as="post-hook",   --or pre-hook
            tags=["sometag"],
            error_if = '>0'
        )
    }}
    
    --either create unittest data as a dbt seed
    {% set testdata ref('model_foo') %}
        col1, col2, col3
        1, 10, 'like'
        2, 20, 'a'
    `	3, 30 , 'dbt seed'
    {% endset %}

     --and / or create unittest data by running any SELECT query
    {% set testdata ref('model_bar') %}
        SELECT  1 as col1, 10 as col2
        UNION ALL
        SELECT 2, 20
        UNION ALL
        SELECT 3, 10
        UNION ALL
        SELECT 3, 20
    {% endset %}
	
   --the assertion would then be like a writing a singular test, by embedding it in the model itself {{this}} can be used
   --if the test is being put into a 'child' model then {{ref}} should be used instead of {{this}}
    {% set testassert %}
	SELECT sum(foo_col2),sum(bar_col2) FROM {{this}} 
	WHERE 1=1 
	HAVING sum(foo_col2)!=sum(bar_col2)
    {% endset %}

--and a command to actually run the test _(though this might actually not be needed)_
{{ run_unittest(my_foobar_test) }}

{% endunittest %}

0 replies

JCZuurmond · 2022-06-01T12:15:41Z

JCZuurmond
Jun 1, 2022

Hi all 👋 ,

I released a pytest plugin to unit test dbt macros. It's on PyPi: pip install pytest-dbt-core. Here two examples on how to use it:

import pytest
from dbt.clients.jinja import MacroGenerator
from pyspark.sql import SparkSession

@pytest.mark.parametrize(
    "macro_generator", ["macro.spark_utils.get_tables"], indirect=True
)
def test_create_table(
    spark_session: SparkSession, macro_generator: MacroGenerator
) -> None:
    expected_table = "default.example"
    spark_session.sql(f"CREATE TABLE {expected_table} (id int) USING parquet")
    tables = macro_generator()
    assert tables == [expected_table]

And:

import pytest
from dbt.clients.jinja import MacroGenerator
from pyspark.sql import SparkSession

@pytest.mark.parametrize(
    "macro_generator",
    ["macro.my_project.to_cents"],
    indirect=True,
)
def test_dollar_to_cents(
    spark_session: SparkSession, macro_generator: MacroGenerator
) -> None:
    expected = spark_session.createDataFrame([{"cents": 1000}])
    to_cents = macro_generator("price")
    out = spark_session.sql(
        "with data AS (SELECT 10 AS price) "
        f"SELECT cast({to_cents} AS bigint) AS cents FROM data"
    )
    assert out.collect() == expected.collect()

The examples use dbt-spark but the package is adapter agnostic.

Some useful links:

This approach expects you to know Python and pytest, as mentioned above, this is a limitation. I would like to see how more technical users would like to unit tests their dbt SQL. Maybe some patterns become apparent that could be used for designing a SQL + jinja approach to unit testing.

To limit the scope of the project, you can only unit tests macros for now. I felt that macros where the simplest to start with. There is an issue on unit testing models. If you have input on this, please add it to that issue!

I am curious to see if some of you find this package useful! If you have feedback, you can create an issue - or PR - on the project. Also, you find me in the dbt slack as "Cor (GoDataDriven)".

0 replies

louis-vines · 2022-09-15T09:57:29Z

louis-vines
Sep 15, 2022

@jtcohen6 I'm just checking it to ask if there have been any developments in this area and whether adding unit-testing capabilities is on the roadmap??

3 replies

jtcohen6 Sep 20, 2022
Maintainer

Hey @louis-vines! The answer is no & yes. No in the sense that this isn't an explicit item on our roadmap, but yes in the sense that it's still very much on my mind, and it comes up in lots of other items we've been working on.

Python models

Our priority over the past several months has been preparing dbt Python models for launch. I expect users will be even-more-eager for unit testing capabilities when they're writing transformation via Python functions.

Niall started a good thread about this early in the beta period. If you write your transformation code as pure Python functions or pandas operations, it would be super easy + fast to also write + run unit tests for them locally that import those functions and run them over fixture inputs.

There isn't a native dbt command to do this yet, though, because dbt doesn't actually execute your Python code at all—it actually ships it up to run in Snowflake, Databricks, Dataproc, etc—which is understandably slower (even slower than shipping up SQL to run remotely). I've come to appreciate that one of the essential requirements for unit testing is that it's fast, so it really needs to run locally.

Perhaps DuckDB could be an answer here? You'd need to be able to trust that the same functions will run locally and remotely. So long as you're writing pure Python functions, pandas API operations, or using a cross-engine framework like ibis, which rigorously tests backend compatibility on your behalf so that you can focus on unit-testing your actual business logic. Related discussion here: #5738

Refactoring dbt-core internals -> faster test suites?

You called out some really fair points a few months ago, when thinking about whether dbt-core's new functional testing framework (built on pytest) could have a role to play here: #4455 (reply in thread)

Again, I think the biggest limitation is around speed. (We can reword error messages to be more verbose, and think about ways to filter down logs in test environments.) There are certain limits of physics here—functional tests against Postgres or DuckDB running locally will always feel faster than queries that need to be shipped off to a remote data warehouse—but there are also things we can definitely fix. Today, dbt-core requires file access for everything it does: it reads dbt_project.yml, profiles.yml, other config files, and your project files. Even though the pytest framework has you defining credentials in a Python dictionary, project config in a function, and project files as strings, what actually happens is these contents are written out to temporary files and then loaded back into dbt-core. That adds up, especially when repeated hundreds or thousands of times.

We're picking up work now to "API-ify" internals of dbt-core, and provide cleaner interfaces into some high-level pieces of its functionality. I expect that work to ultimately enable us to refactor how we set up test fixtures. In the medium term, this should speed up tests (including our own!). In the longer run, it could also make it easier to instantiate or mock dbt internal classes (ParsedModel, ParsedMacro, etc). Skip parsing entirely, jump straight to the querying, and find out even more quickly if your model or macro is returning the results you expect.

louis-vines Oct 5, 2022

Thanks for getting back to me and sorry for the slow response myself! I think we're agreed that the main issue here is the speed side of things. From playing around dbt-core's functional testing framework (#4455 (reply in thread)) I was struck by the fact that every test invocation did a full dbt run, which means it parses the whole dag from scratch for each test.

In the longer run, it could also make it easier to instantiate or mock dbt internal classes (ParsedModel, ParsedMacro, etc). Skip parsing entirely, jump straight to the querying, and find out even more quickly if your model or macro is returning the results you expect.

👏 this is exactly what is needed - essentially I feel like the overhead of parsing the dag from scratch for every test is a killer and if this could be bypassed then writing tests against (mocked) models could be workable in this way.

louis-vines Oct 21, 2022

@jtcohen6 I recently came across the dbt-rpc functionality. I'm still trying to wrap my head around exactly how it works, but... will this likely help me in my quest to be able to run smaller parts of the overall dag for testing purposes without having to recompile every time?

dgokeeffe · 2022-12-23T23:59:07Z

dgokeeffe
Dec 23, 2022

Hey everyone,

I've been trying to integrate dbt with Pytest but I'm struggling to get it working. Have we got any new material or examples on how to do this now?

Thanks,

4 replies

JCZuurmond Jan 18, 2023

Hi @dgokeeffe, did you read my post above? If you are using the pytest-dbt-core package, I can maybe help; please let me know what material and examples you are looking for. You can find me on the dbt slack or create an issue on the pytest-dbt-core project.

dgokeeffe Feb 7, 2023

Hey @JCZuurmond - yes I did, but unfortunately I couldn't get it to work out the box like I desired.

The main idea is to simply seed sources using generated data defined as code, then run quick unit tests on only a fraction of the transformations.

JCZuurmond Mar 1, 2023

Unfortunately, you did not get it to work out of the box. Could you create an issue on the GitHub project for any problem you encountered?

The "philosophy" behind pytest-dbt-core is to support unit testing for your dbt project. It currently supports unit tests for macros only because that was the easiest to implement, and unit testing macros is similar to unit testing functions. It might be extended to unit testing for models, though I am looking for feedback from the community first.

If you want to test based on data from seeds, I guess another tool is a better option for you.

louis-vines Mar 3, 2023

The issue with all of this dbt doesn't allow for defining input data into your models for tests on the fly which is a fundamental requirement for unit tests. I.e. what we want to set up is: "when I pass data x into my model the output is y" - in this statement there is no ability to define x on the fly to pass in to a model.

Furthermore, the way dbt is setup, you definitely don't want to have to re-compile the graph for each test to change the data as this is expensive! I was mulling over ways of acheiving this last year which will likely involve dbt server and/or dbt-osmosis. Unfortunately I started writing some code to acheive this but then got pulled into a machine learning project and haven't been working with dbt since so got side tracked.

I am back in dbt land now though so am going to start thinking about this again over the coming weeks I'm sure

darist · 2023-03-16T18:08:27Z

darist
Mar 16, 2023

I've been using EqualExperts/dbt-unit-testing (thanks! @cdiniz 🏅) to great delight. I find it to be a very good approach for testing models.

(Sadly) I've been writing a lot of business logic in Jinja macros. The desire to unit test macros and enable TTD took me down this path, resulting in this proposal to add macro_test and mock_macro to dbt-unit-testing: EqualExperts/dbt-unit-testing#123

Writing unit tests for macros could feel something like this, completely within dbt, no Python, doable from dbt Cloud.

{% call(t) dbt_unit_testing.macro_test('test title') %}
    {% do t.mock(dbt_unit_testing, 'sanitize', return_value='mocked value') %}
    {{ t.assert_equal(
        macro_that_depends_on_sanitize('foo'), 'sanitized returned: mocked value') }}
{% endcall %}

UNION ALL

{% call(t) dbt_unit_testing.macro_test_with_t('another test') %}
...

Working examples and implementation ideas in EqualExperts/dbt-unit-testing#122

0 replies

eliasgeee · 2023-05-05T16:15:40Z

eliasgeee
May 5, 2023

We decided to work based on the Shopify example: https://www.youtube.com/watch?v=dlFYP7EJiUU&t=29s. Previously I was also just doing a select from some dummy data while developing and when finished replacing it with the actual ref so that made the most sense.

In the transition from mainly C# to Dbt I kind of missed the way that simple software unit tests work where you can test cases that don't exist in the data yet. And in essence an sql query is nothing more then a function that can be mocked.

For me going the Python way outside of Dbt is the closest thing to readable tests. If we were to implement a system with macros or a thousand csv files just for testing it would get out of hand real quick and convoluted. Also the whole adapter thingy felt a little bit too much.

An starting point of this here: (implemented for Snowflake, using pytest). Might need some additions for other databases or specific use cases I have not encountered yet. Main goal is readability of tests and the options to use things like Faker to create fake data easily.
https://github.com/eliasgeee/dbt-unit-test

`
def test_double_events_are_filtered_out():
email_id = 'Email to Jef'

# Snowflake variants: dictionary maps to object and array to array
event_content = { 'correlation_id': 1, 'properties' : [ { 'data_origin': 'Service.Mail', 'email_provider': 'SendGrid' } ] }

email_events_table_mock = MockedTable.from_dict('stg_email_events',
        SnowflakeRowMock(email_status = 'sent', event_timestamp = datetime.datetime.now() - datetime.timedelta(hours=8), email_id=email_id,
                         event_content=event_content),
        SnowflakeRowMock(email_status = 'sent', event_timestamp = datetime.datetime.now() - datetime.timedelta(hours=8), email_id=email_id,
                         event_content=event_content),
        SnowflakeRowMock(email_status = 'answered', event_timestamp = datetime.datetime.now() - datetime.timedelta(hours=8), email_id=email_id,
                         event_content=event_content)
)

actual = MockedTable.from_mocks('fct_email_events', DbtRunType.FULLREFRESH, email_events_table_mock)

# Assert that the actual result has the expected number of rows
assert len(actual.df) == 2

# Check that only these events are in the event_types
assert set(actual.df['event_type']) == {'send', 'answered'}

# Check that both events for first mail and reminder are generated
assert 'SendGrid' in actual.df['event_content']
assert 'Service.Mail' in actual.df['event_content']`

0 replies

MichelleArk · 2023-08-01T23:34:59Z

MichelleArk
Aug 1, 2023
Maintainer Author

Hello, it’s me again! It may have taken a few years, but I’m so excited to be revisiting this discussion and share some updated thinking on the problem on behalf of the dbt-core team. We’re going to start tackling this problem in the next month (!), and have opened a new discussion here in order to keep feedback more organized and digestible.

The new discussion outlines our proposal for a unit testing framework native to dbt, and we’d love to hear feedback or suggestions on it from all the testing enthusiasts who have participated in this discussion, and in the broader discussions about unit testing in the modern data stack over the past few years.

0 replies

Unit Testing SQL in dbt #4455

MichelleArk Apr 23, 2020 Maintainer

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Replies: 25 comments · 14 replies

boxysean Sep 28, 2020 Collaborator

Implementation sketch

jtcohen6 Oct 7, 2021 Maintainer

jtcohen6 Feb 23, 2022 Maintainer

jtcohen6 May 17, 2022 Maintainer

Example code

Is this the thing?

jtcohen6 May 30, 2022 Maintainer

boxysean May 31, 2022 Collaborator

jtcohen6 Sep 20, 2022 Maintainer

Python models

Refactoring dbt-core internals -> faster test suites?

MichelleArk Aug 1, 2023 Maintainer Author

MichelleArk
Apr 23, 2020
Maintainer

Replies: 25 comments 14 replies

boxysean
Sep 28, 2020
Collaborator

jtcohen6
Oct 7, 2021
Maintainer

jtcohen6
Feb 23, 2022
Maintainer

jtcohen6
May 17, 2022
Maintainer

jtcohen6 May 30, 2022
Maintainer

boxysean May 31, 2022
Collaborator

jtcohen6 Sep 20, 2022
Maintainer

MichelleArk
Aug 1, 2023
Maintainer Author