Overview
In this guide, we’ll materialize our first tables and views in Snowflake with SDF. We’ll be using some datasets from our partner, Cybersyn, to bootstrap some example data through Snowflake’s data marketplace.Prerequisites
This guide should be followed after completing the Getting Started with Snowflake and SDF guide.
- A Snowflake account with this Cybersyn dataset installed.
- Valid Snowflake username / password credentials with write access to at least one database we can materialize tables to.
- Instantiated credentials completed in the previous guide.
When installing the Cybersyn dataset, make sure to grant the Snowflake role you’ll use with SDF read access to
TECH__INNOVATION_ESSENTIALS (i.e. the Cybersyn database).Guide
1
Create a New SDF Project from the Cybersyn Tech Innovation Sample
Create a new SDF project using the Cybersyn Tech Innovation sample. This will create a new project in your current working directory with the sample project files.
2
Compile to Test Credentials
To ensure your credentials are working and have read access to the new Cybersyn database, let’s try compiling one of the models.
Working set 3 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.OPENALEX_FUNDERS_INDEX (schema)
Compiling sdf_snowflake.cybersyn_tech_innovation.funder_aggregates (./models/sdf_snowflake/cybersyn_tech_innovation/funder_aggregates.sql)
Finished 2 models [2 succeeded] in 1.985 secsSchema sdf_snowflake.cybersyn_tech_innovation.FUNDER_AGGREGATES
┌──────────────────────┬────────────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞══════════════════════╪════════════════╪════════════╪═════════════╡
│ FUNDER_NAME ┆ varchar ┆ ┆ │
│ TOTAL_FUNDERS ┆ decimal(38, 0) ┆ ┆ │
│ TOTAL_WORKS_COUNT ┆ decimal(38, 0) ┆ ┆ │
│ TOTAL_CITATIONS ┆ decimal(38, 0) ┆ ┆ │
│ TOTAL_GRANTS ┆ decimal(38, 0) ┆ ┆ │
│ UNIQUE_COUNTRY_COUNT ┆ decimal(38, 0) ┆ ┆ │
│ FIRST_CREATED_DATE ┆ timestamp ┆ ┆ │
│ LAST_UPDATED_DATE ┆ timestamp ┆ ┆ │
└──────────────────────┴────────────────┴────────────┴─────────────┘3
Materialize Tables in Snowflake
If you compiled successfully, great! That means SDF was able to read table schemas from the Cybersyn database. Now, let’s materialize in a separate database called Let’s go through these one by one:This tells SDF to hydrate missing table schemas from the This is great, but what if we want to see the data our views produced?Let’s add small flag to Congratulations! You’ve just materialized your first tables in Snowflake with SDF, and even got a sneak peek at the data they produced.
sdf_snowflake.You’ll notice in the workspace.sdf.yml file specifies the following defaults:dialect: snowflakespecifies that the SQL we’re working with uses Snowflake’s syntax and functions.preprocessor: jinjaspecifies that we’re using Jinja templating in our SQL models (this will come in handy later).materialization: viewspecifies that we’re materializing our models as views by default in Snowflake. Note this can be overwritten on a per-model, or per-includes-path basis.
tech__innovation_essentials.cybersyn database and schema and materialize tables to the sdf_snowflake database. For more information on integration configuration, see the integration documentation.Now, let’s materialize the tables in the sdf_snowflake database.sdf run that tells SDF to pull down a sample of the data from the views we just materialized.4
Materialize a New Table
Let’s say we want to materialize a new table in the The index here means we’ll interpret the database and schema name from the file path like so - This table will select the latest Great! Everything compiled. Now, let’s make sure this table is materialized as a Now, let’s run the model to materialize the table in the Congratulations! You’ve just materialized your first table in Snowflake with SDF. In the next step, we’ll add one more table and a downstream view to create a DAG.
sdf_snowflake database, but in a different schema. You’ll notice in the workspace.sdf.yml file, we have the following includes block:models/<database>/<schema>/<table>.sql.Since we want to add a new table, let’s create a new file in the models directory called sdf_snowflake/staging/latest_repo_name.sql with the following content:repo_name and repo_id from github_repos created in the last 24 hours.Now let’s compile this new table to ensure the SQL is valid and selects columns that actually exist:Working set 4 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_REPOS (schema)
Compiling sdf_snowflake.staging.latest_repo_name (./models/sdf_snowflake/staging/latest_repo_name.sql)
Finished 2 models [2 succeeded] in 1.750 secsSchema sdf_snowflake.staging.LATEST_REPO_NAME
┌─────────────┬────────────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪════════════════╪════════════╪═════════════╡
│ REPO_NAME ┆ varchar ┆ ┆ │
│ REPO_ID ┆ decimal(38, 0) ┆ ┆ │
└─────────────┴────────────────┴────────────┴─────────────┘table instead of a view in the sdf_snowflake database. We’re doing this now since later one we’ll want to use this table as an upstream dependency in our DAG.To accomplish this, we’ll need to overwrite our materialization default in the workspace.sdf.yml file, and specify the materialization as table for this specific model.Under our workspace.sdf.yml block, we can add:sdf_snowflake database.Working set 4 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_REPOS (schema)
Running sdf_snowflake.staging.latest_repo_name (./models/sdf_snowflake/staging/latest_repo_name.sql)
Finished 2 models [2 succeeded] in 5.496 secsTable sdf_snowflake.staging.LATEST_REPO_NAME
┌─────────────────────────────────┬───────────┐
│ REPO_NAME ┆ REPO_ID │
╞═════════════════════════════════╪═══════════╡
│ 0-Anonymous/Secure-SMTP-Server ┆ 926399969 │
│ 0-robinson-1/pizzeria ┆ 926565957 │
│ 00-berni/proj_0 ┆ 537524710 │
│ 00-berni/proj_3 ┆ 538682956 │
│ 00-kat/ghostty ┆ 926370638 │
│ 00-kat/wraith ┆ 926401364 │
│ 0000yuyu/youtube ┆ 926296773 │
│ 0001Moksh/Deva_voice_assistant ┆ 926354200 │
│ 0001Moksh/Deva_voice_assistant- ┆ 926350659 │
│ 0001Moksh/c_-_cpp_language ┆ 926330641 │
└─────────────────────────────────┴───────────┘
10 rows.5
Create a DAG
Now that we’ve materialized a new table, let’s create a DAG by adding a downstream view that depends on the table we just created and one more table we’ll materialize now.The DAG will look like this:Let’s first create a new file in the This table will select all Great! Everything compiled. Now, let’s make sure this table is materialized as a Now, let’s run the model to materialize the table in the Great! Now that we have our two tables, let’s finish up our DAG by creating a downstream view that depends on these two tables.Create a new file in the Since our workspace’s default Woohoo! Another successful compile. Now, let’s try one final run of everything in this workspace to materialize our new view and run the entire DAG.Congratulations! You’ve just created your first DAG in Snowflake with SDF.
models directory called sdf_snowflake/staging/push_events.sql with the following content:push events from github_events created in the last 24 hours.Now let’s compile this new table to ensure the SQL is valid and selects columns that actually exist and are properly typed:Working set 5 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_EVENTS (schema)
Compiling sdf_snowflake.staging.push_events (./models/sdf_snowflake/staging/push_events.sql)
Finished 2 models [2 succeeded] in 1.732 secsSchema sdf_snowflake.staging.PUSH_EVENTS
┌─────────────┬────────────────┬────────────┬─────────────┐
│ column_name ┆ data_type ┆ classifier ┆ description │
╞═════════════╪════════════════╪════════════╪═════════════╡
│ ID ┆ varchar ┆ ┆ │
│ REPO_ID ┆ decimal(38, 0) ┆ ┆ │
└─────────────┴────────────────┴────────────┴─────────────┘table instead of a view in the sdf_snowflake database.Let’s add another table block to our workspace.sdf.yml file:sdf_snowflake database.models directory called sdf_snowflake/cybersyn_tech_innovation/events_by_repo.sql with the following content:materialization is view, we don’t need to specify the materialization for this model.Now, let’s compile again to ensure our entire workspace is still in a good state:Working set 6 model files, 1 .sdf file
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_STARS (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.OPENALEX_FUNDERS_INDEX (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.USPTO_PATENT_INDEX (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_EVENTS (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.USPTO_CONTRIBUTOR_INDEX (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.USPTO_PATENT_CONTRIBUTOR_RELATIONSHIPS (schema)
Downloading TECH__INNOVATION_ESSENTIALS.CYBERSYN.GITHUB_REPOS (schema)
Compiling sdf_snowflake.staging.push_events (./models/sdf_snowflake/staging/push_events.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.funder_aggregates (./models/sdf_snowflake/cybersyn_tech_innovation/funder_aggregates.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.all_nvidia_patents (./models/sdf_snowflake/cybersyn_tech_innovation/all_nvidia_patents.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.most_starred_repos (./models/sdf_snowflake/cybersyn_tech_innovation/most_starred_repos.sql)
Compiling sdf_snowflake.staging.latest_repo_name (./models/sdf_snowflake/staging/latest_repo_name.sql)
Compiling sdf_snowflake.cybersyn_tech_innovation.events_by_repo (./models/sdf_snowflake/cybersyn_tech_innovation/events_by_repo.sql)
Finished 13 models [13 succeeded] in 2.054 secs