Overview
In the previous tutorial, we discoveredapp_installs was replaced by app_installs_v2.
This tutorial will walk through the deprecation of app_installs and demonstrate
how SDF can help overcome the two big hurdles in deprecating a model:
- Preventing prod from breaking by tracking and resolving downstream dependencies
- Setting up guardrails to prevent future mistakes
Prerequisites
Completion of the previous tutorial.Setup
Reference to Metadata Files
SDF’s semantic understanding is based on metadata files, defining your custom business logic, and smart propagation of values. To get started, we need to reference the metadata directory by simply including the relevant paths in our workspace configuration file,workspace.sdf.yml.
Open the file workspace.sdf.yml and uncomment the following:
workspace.sdf.yml
Note
- The classifications folder contains one YAML file for table classifiers, and one for column classifiers.
- The metadata folder is structured based on our models folder.
- The reports folder is currently empty.
Guide
1
Before Deprecation
Before deprecating a model, we can run an impact analysis on the its
downstream dependencies using SDF’s lineage command. We simply run:
Table: moms_flower_shop.staging.app_installsmoms_flower_shop.staging.app_installs.campaign_idmoms_flower_shop.staging.app_installs.campaign_name
│
│ copy
└──────┐
moms_flower_shop.analytics.agg_installs_and_campaigns.campaign_namemoms_flower_shop.staging.app_installs.campaign_typemoms_flower_shop.staging.app_installs.customer_id
│
│ mod
└──────┐
moms_flower_shop.analytics.agg_installs_and_campaigns.distinct_installsmoms_flower_shop.staging.app_installs.event_idmoms_flower_shop.staging.app_installs.install_time
│
│ mod
└──────┐
moms_flower_shop.analytics.agg_installs_and_campaigns.install_datemoms_flower_shop.staging.app_installs.platform
│
│ copy
└──────┐
moms_flower_shop.analytics.agg_installs_and_campaigns.platform2
Classify the Model as Deprecated
The file Let’s compile and see how it looks:
classifications/table_classifiers.sdf.yml holds definitions of
a table level classifier called TABLE_STATUS. You can see that one of the
table status options is deprecated.To add the classifier to our table, navigate to the table’s metadata file
metadata/staging/app_installs.sdf.yml. Remove the commented
section assigning the TABLE_STATUS.deprecated classifier to the table:metadata/staging/app_installs.sdf.yml
Schema moms_flower_shop.staging.app_installs
This table is a staging table which adds campaign information to app install events
[TABLE_STATUS.deprecated]
┌───────────────┬───────────┬────────────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═══════════════╪═══════════╪════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ event_id ┆ bigint ┆ ┆ │
│ customer_id ┆ bigint ┆ ┆ The identifier of the customer that performed the event │
│ install_time ┆ timestamp ┆ ┆ │
│ platform ┆ varchar ┆ ┆ iOS or Android │
│ campaign_id ┆ bigint ┆ ┆ The identifier of the campaign that is associated with the event │
│ campaign_name ┆ varchar ┆ ┆ The campaign name associated with the campaign_id │
│ campaign_type ┆ varchar ┆ ┆ A substring of the campaign name contain the campaign type. Supported types - instagram_ads, friends_referrals, facebook_ads, google_search │
│ ┆ ┆ ┆ │
└───────────────┴───────────┴────────────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘3
Create a Report of References to Deprecated Tables
We can create an SDF report to receive an instant snapshot of
all models referencing deprecated tables throughout the entire data warehouse.Create a new SQL file called Thanks to SDF’s lineage tracking capabilities, we already know that It works! Now, we should fix
deprecated_table_reference.sql in the
reports/ directory. Copy the following query to your new file:reports/deprecated_table_reference.sql
agg_installs_and_campaigns
is referencing the old table. It’s a good test to check -
we expect it to fail.Let’s check it out (pun intended):Report moms_flower_shop.pub.deprecated_table_reference
┌───────────────────────────────────────────────────────┬───────────────────────────────────────┐
│ table_id ┆ upstream_deprecated_table_id │
╞═══════════════════════════════════════════════════════╪═══════════════════════════════════════╡
│ moms_flower_shop.analytics.agg_installs_and_campaigns ┆ moms_flower_shop.staging.app_installs │
└───────────────────────────────────────────────────────┴───────────────────────────────────────┘
1 rows.agg_installs_and_campaigns.4
Resolve Downstream Dependencies
To make sure we won’t break prod if we delete our table, we
can choose to resolve the dependency in one of two ways:Let’s run our report again:All clear!
- If the model is not being used, we can just delete it altogether.
- Otherwise, we can update the
FROMstatement ofmodels/analytics/agg_installs_and_campaigns.sqllike we did for our previous model:
models/analytics/agg_installs_and_campaigns.sql
Report moms_flower_shop.pub.deprecated_table_reference
┌──────────┬──────────────────────────────┐
│ table_id ┆ upstream_deprecated_table_id │
╞══════════╪══════════════════════════════╡
└──────────┴──────────────────────────────┘
0 rows.