0.3.0 and above, workspace 1.3 is required. As a first step, please update your workspace block to the following:
Breaking Changes
Providers -> Integrations
SDF can read metadata and data from a variety of sources, including databases, data lakes, and data catalogs. In the past, we referred to databases asproviders, then had confusing methods for configuring data lakes and data catalogs.
Now, all of these external relationships are managed in a single, unified integrations block.
This replaces the providers block in the workspace configuration.
If you have a workspace with a provider block like this:
The
type field is now required for all integrations. This is to differentiate between database, data lake, and data catalog integrations.Goodbye Compute
Previously, thecompute property was required in the defaults block to tell SDF where to run the query, i.e. local or remote.
This is now inferred from the integrations block, rendering it unnecessary. It should now be removed from the defaults block.
Information Schema V2
The information schema has been updated to include more metadata about your tables, and has been re-architected to enable performant queries on larger workspaces. If you have any checks or reports, these will need to be rewritten to use the new information schema. For example, if you have a check that looks like this:CONTAINS_ARRAY_VARCHAR.
For a full reference on the new information schema, see the information schema documentation.
Goodbye .sdfcache, hello sdftarget
Based on some quality user feedback, the.sdfcache directory has been renamed to sdftarget. This is where SDF will store all of its metadata and cache files.
This shouldn’t break anything as is, but if you have any scripts or processes that rely on the .sdfcache directory, you’ll need to update them to use sdftarget instead.
Furthermore, .gitignore files should be updated to ignore sdftarget instead of .sdfcache.
Support for trino syntax
We’ve enabled trino to be used as an alias for trino for the workspace’s dialect property.
To see all accepted dialects, see the dialect documentation.
seeded: true -> cycle-cut-point: true
Due to confusion with configuring seed models and breaking cycles with the seeded property, we’d renamed this field to cycle-cut-point. This now explicitly marks this table as first table to be processed in a cycle.