Overview
In the previous tutorials we learned how SDF can help enforce metadata-based data contracts, which are defined against SDF’s information schema. In this tutorial, we will add tests against the data in our warehouse, creating an additional layer of data quality validation throughout the data warehouse. SDF Provides a standard and open-source testing library. The functions provided in the library are included natively in SDF.Prerequisites
Completion of the previous tutorial.Guide
1
What Are SDF Tests?
There are three types of builtin tests.
- Scalar Column Tests - validate expectations of individual values in columns (eg.
x > 100) - Aggregate Column Tests - validate expectations of all values in a columns (eg.
sum(x) > 100) - Table Tests - validate expectations for all columns in table (eg. columns a and b are unique).
2
Setup
Since SDF’s tests library leverages Jinja, we also need to add Jinja as a preprocessor.
In the For the sake of this tutorial, we will add tests on the
workspace.sdf.yml file, uncomment the following:workspace.sdf.yml
inapp_events staging model located in
models/staging/inapp_events.To add tests to the model, we need to create a YML file to hold the model’s metadata.Under metadata/staging, create a new file called inapp_events.sdf.yml containing the following definition:metadata/staging/inapp_events.sdf.yml
3
Scalar Column Tests
Let’s say we want to verify there are no negative orders, meaning the event_value which represents the
total order amount in USD.We can use the scalar test A few things to note:
valid_scalar(condition) where condition = event_value >=0.
Add to the newly created YML file the following test:metadata/staging/inapp_events.sdf.yml
- This is a column-level test so we need to add a column definition
- To specify the condition, we need to wrap it with triple quotes
"""condition""" - We can define a severity level for the test - either error or warning
[Pass] Test moms_flower_shop.staging.test_inapp_events4
Aggregate Column Tests
We can write a similar test as an aggregate column test. Instead of validating
each value individually, we can just look at the minimum value of Let’s run the test again:
event_value
and assert whether it is positive.Let’s add the new test to the YML file:metadata/staging/inapp_events.sdf.yml
[Pass] Test moms_flower_shop.staging.test_inapp_events5
Table Tests
On a table level, we want to make sure that our unique key is indeed unique.
In this case, the table key is event_id. However, in other cases, it could be
a combination of multiple columns. SDF supports all scenarios.Let’s add this table-level test to the YML file:We can run the tests again:
metadata/staging/inapp_events.sdf.yml
Notice we wrap the column name with quotation marks
"col_name".[Pass] Test moms_flower_shop.staging.test_inapp_events