Quickstart for dbt Cloud and Teradata

dbt Cloud

Quickstart

Teradata

Beginner

Introduction

In this quickstart guide, you'll learn how to use dbt Cloud with Teradata Vantage. It will show you how to:

Create a new Teradata Clearscape instance
Load sample data into your Teradata Database
Connect dbt Cloud to Teradata.
Take a sample query and turn it into a model in your dbt project. A model in dbt is a select statement.
Add tests to your models.
Document your models.
Schedule a job to run.

Videos for you

You can check out dbt Fundamentals for free if you're interested in course learning with videos.

Prerequisites

You have a dbt Cloud account.
You have access to a Teradata Vantage instance. You can provision one for free at https://clearscape.teradata.com. See the ClearScape Analytics Experience guide for details.

Learn more with dbt Learn courses
How we provision Teradata Clearscape Vantage instance
CI jobs
Deploy jobs
Job notifications
Source freshness

Load data

The following steps will guide you through how to get the data stored as CSV files in a public S3 bucket and insert it into the tables.

SQL IDE

If you created your Teradata Vantage database instance at https://clearscape.teradata.com and you don't have an SQL IDE handy, use the JupyterLab bundled with your database to execute SQL:

Navigate to ClearScape Analytics Experience dashboard and click the Run Demos button. The demo will launch JupyterLab.
In JupyterLab, go to Launcher by clicking the blue + icon in the top left corner. Find the Notebooks section and click Teradata SQL.
In the notebook's first cell, connect to the database using connect magic. You will be prompted to enter your database password when you execute it:
```
%connect local
```
Use additional cells to type and run SQL statements.

Use your preferred SQL IDE editor to create the database, jaffle_shop:
```
CREATE DATABASE jaffle_shop AS PERM = 1e9;
```

In jaffle_shop database, create three foreign tables and reference the respective csv files located in object storage:

CREATE FOREIGN TABLE jaffle_shop.customers (
    id integer,
    first_name varchar (100),
    last_name varchar (100),
    email varchar (100)
)
USING (
    LOCATION ('/gs/storage.googleapis.com/clearscape_analytics_demo_data/dbt/raw_customers.csv')
)
NO PRIMARY INDEX;

CREATE FOREIGN TABLE jaffle_shop.orders (
    id integer,
    user_id integer,
    order_date date,
    status varchar(100)
)
USING (
    LOCATION ('/gs/storage.googleapis.com/clearscape_analytics_demo_data/dbt/raw_orders.csv')
)
NO PRIMARY INDEX;

CREATE FOREIGN TABLE jaffle_shop.payments (
    id integer,
    orderid integer,
    paymentmethod varchar (100),
    amount integer
)
USING (
    LOCATION ('/gs/storage.googleapis.com/clearscape_analytics_demo_data/dbt/raw_payments.csv')
)
NO PRIMARY INDEX;

Connect dbt Cloud to Teradata

Create a new project in dbt Cloud. From Account settings (using the gear menu in the top right corner), click New Project.
Enter a project name and click Continue.
In Configure your development environment, click Add new connection.
Select Teradata, fill in all the required details in the Settings section, and test the connection.

dbt Cloud - Choose Teradata Connection

dbt Cloud - Teradata Account Settings

Enter your Development Credentials for Teradata with:
- Username — The username of Teradata database.
- Password — The password of Teradata database.
- Schema — The default database to use
dbt Cloud - Teradata Development Credentials
Click Test Connection to verify that dbt Cloud can access your Teradata Vantage instance.
If the connection test succeeds, click Next. If it fails, check your Teradata settings and credentials.

Set up a dbt Cloud managed repository

When you develop in dbt Cloud, you can leverage Git to version control your code.

To connect to a repository, you can either set up a dbt Cloud-hosted managed repository or directly connect to a supported git provider. Managed repositories are a great way to trial dbt without needing to create a new repository. In the long run, it's better to connect to a supported git provider to use features like automation and continuous integration.

To set up a managed repository:

Under "Setup a repository", select Managed.
Type a name for your repo such as bbaggins-dbt-quickstart
Click Create. It will take a few seconds for your repository to be created and imported.
Once you see the "Successfully imported repository," click Continue.

Initialize your dbt project and start developing

Now that you have a repository configured, you can initialize your project and start development in dbt Cloud:

Click Start developing in the IDE. It might take a few minutes for your project to spin up for the first time as it establishes your git connection, clones your repo, and tests the connection to the warehouse.
Above the file tree to the left, click Initialize your project to build out your folder structure with example models.
Make your initial commit by clicking Commit and sync. Use the commit message initial commit to create the first commit to your managed repo. Once you’ve created the commit, you can open a branch to add new dbt code.

Delete the example models

You can now delete the files that dbt created when you initialized the project:

Delete the models/example/ directory.

Delete the example: key from your dbt_project.yml file, and any configurations that are listed under it.

dbt_project.yml

# before
models:
  my_new_project:
    +materialized: table
    example:
      +materialized: view

dbt_project.yml

# after
models:
  my_new_project:
    +materialized: table

Save your changes.
Commit your changes and merge to the main branch.

FAQs

How do I remove deleted models from my data warehouse?

I got an "unused model configurations" error message, what does this mean?

Build your first model

You have two options for working with files in the dbt Cloud IDE:

Create a new branch (recommended) — Create a new branch to edit and commit your changes. Navigate to Version Control on the left sidebar and click Create branch.
Edit in the protected primary branch — If you prefer to edit, format, lint files, or execute dbt commands directly in your primary git branch. The dbt Cloud IDE prevents commits to the protected branch, so you will receive a prompt to commit your changes to a new branch.

Name the new branch add-customers-model.

Click the ... next to the models directory, then select Create file.
Name the file bi_customers.sql, then click Create.
Copy the following query into the file and click Save.

with customers as (

   select
       id as customer_id,
       first_name,
       last_name

   from jaffle_shop.customers

),

orders as (

   select
       id as order_id,
       user_id as customer_id,
       order_date,
       status

   from jaffle_shop.orders

),

customer_orders as (

   select
       customer_id,

       min(order_date) as first_order_date,
       max(order_date) as most_recent_order_date,
       count(order_id) as number_of_orders

   from orders

   group by 1

),

final as (

   select
       customers.customer_id,
       customers.first_name,
       customers.last_name,
       customer_orders.first_order_date,
       customer_orders.most_recent_order_date,
       coalesce(customer_orders.number_of_orders, 0) as number_of_orders

   from customers

   left join customer_orders on customers.customer_id = customer_orders.customer_id

)

select * from final

Enter dbt run in the command prompt at the bottom of the screen. You should get a successful run and see the three models.

You can connect your business intelligence (BI) tools to these views and tables so they only read cleaned-up data rather than raw data in your BI tool.

Change the way your model is materialized

One of the most powerful features of dbt is that you can change the way a model is materialized in your warehouse, simply by changing a configuration value. You can change things between tables and views by changing a keyword rather than writing the data definition language (DDL) to do this behind the scenes.

By default, everything gets created as a view. You can override that at the directory level so everything in that directory will materialize to a different materialization.

Edit your dbt_project.yml file.
- Update your project name to:
  dbt_project.yml
  name: 'jaffle_shop'
- Configure jaffle_shop so everything in it will be materialized as a table; and configure example so everything in it will be materialized as a view. Update your models config block to:
  dbt_project.yml
  models: jaffle_shop: +materialized: table
- Click Save.
Enter the dbt run command. Your bi_customers model should now be built as a table!

info
To do this, dbt had to first run a drop view statement (or API call on BigQuery), then a create table as statement.
Edit models/bi_customers.sql to override the dbt_project.yml for the customers model only by adding the following snippet to the top, and click Save:
models/bi_customers.sql
```
{{
  config(
    materialized='view'
  )
}}

with customers as (

    select
        id as customer_id
        ...

)
```
Enter the dbt run command. Your model, bi_customers, should now build as a view.

FAQs

What materializations are available in dbt?

Which materialization should I use for my model?

What model configurations exist?

Build models on top of other models

As a best practice in SQL, you should separate logic that cleans up your data from logic that transforms your data. You have already started doing this in the existing query by using common table expressions (CTEs).

Now you can experiment by separating the logic out into separate models and using the ref function to build models on top of other models:

The DAG we want for our dbt project

Create a new SQL file, models/stg_customers.sql, with the SQL from the customers CTE in your original query.
models/stg_customers.sql
```
select
   id as customer_id,
   first_name,
   last_name

from jaffle_shop.customers
```
Create a second new SQL file, models/stg_orders.sql, with the SQL from the orders CTE in your original query.
models/stg_orders.sql
```
select
   id as order_id,
   user_id as customer_id,
   order_date,
   status

from jaffle_shop.orders
```

Edit the SQL in your models/bi_customers.sql file as follows:

models/bi_customers.sql

with customers as (

   select * from {{ ref('stg_customers') }}

),

orders as (

   select * from {{ ref('stg_orders') }}

),

customer_orders as (

   select
       customer_id,

       min(order_date) as first_order_date,
       max(order_date) as most_recent_order_date,
       count(order_id) as number_of_orders

   from orders

   group by 1

),

final as (

   select
       customers.customer_id,
       customers.first_name,
       customers.last_name,
       customer_orders.first_order_date,
       customer_orders.most_recent_order_date,
       coalesce(customer_orders.number_of_orders, 0) as number_of_orders

   from customers

   left join customer_orders on customers.customer_id = customer_orders.customer_id

)

select * from final

Execute dbt run.

This time, when you performed a dbt run, it created separate views/tables for stg_customers, stg_orders, and customers. dbt inferred the order in which these models should run. Because customers depends on stg_customers and stg_orders, dbt builds customers last. You don’t need to define these dependencies explicitly.

FAQs

How do I run one model at a time?

Do ref-able resource names need to be unique?

As I create more models, how should I keep my project organized? What should I name my models?

Build models on top of sources

Sources make it possible to name and describe the data loaded into your warehouse by your extract and load tools. By declaring these tables as sources in dbt, you can:

Select from source tables in your models using the {{ source() }} function, helping define the lineage of your data
Test your assumptions about your source data
Calculate the freshness of your source data

Create a new YML file, models/sources.yml.

Declare the sources by copying the following into the file and clicking Save.

models/sources.yml

version: 2

sources:
   - name: jaffle_shop
     description: This is a replica of the Postgres database used by the app
     database: raw
     schema: jaffle_shop
     tables:
         - name: customers
           description: One record per customer.
         - name: orders
           description: One record per order. Includes canceled and deleted orders.

Edit the models/stg_customers.sql file to select from the customers table in the jaffle_shop source.
models/stg_customers.sql
```
select
   id as customer_id,
   first_name,
   last_name

from {{ source('jaffle_shop', 'customers') }}
```

Edit the models/stg_orders.sql file to select from the orders table in the jaffle_shop source.

models/stg_orders.sql

select
   id as order_id,
   user_id as customer_id,
   order_date,
   status

from {{ source('jaffle_shop', 'orders') }}

Execute dbt run.

Your dbt run results will be the same as those in the previous step. Your stg_customers and stg_orders models will still query from the same raw data source in Teradata. By using source, you can test and document your raw data and also understand the lineage of your sources.

Add tests to your models

Adding tests to a project helps validate that your models are working correctly.

To add tests to your project:

Create a new YAML file in the models directory, named models/schema.yml

Add the following contents to the file:

models/schema.yml

version: 2

models:
  - name: bi_customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null

  - name: stg_customers
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null

  - name: stg_orders
    columns:
      - name: order_id
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('stg_customers')
              field: customer_id

Run dbt test, and confirm that all your tests passed.

When you run dbt test, dbt iterates through your YAML files, and constructs a query for each test. Each query will return the number of records that fail the test. If this number is 0, then the test is successful.

FAQs

What tests are available for me to use in dbt? Can I add my own custom tests?

How do I test one model at a time?

One of my tests failed, how can I debug it?

Does my test file need to be named `schema.yml`?

Why do model and source yml files always start with `version: 2`?

What tests should I add to my project?

When should I run my tests?

Document your models

Adding documentation to your project allows you to describe your models in rich detail, and share that information with your team. Here, we're going to add some basic documentation to our project.

Update your models/schema.yml file to include some descriptions, such as those below.

models/schema.yml

version: 2

models:
  - name: bi_customers
    description: One record per customer
    columns:
      - name: customer_id
        description: Primary key
        tests:
          - unique
          - not_null
      - name: first_order_date
        description: NULL when a customer has not yet placed an order.

  - name: stg_customers
    description: This model cleans up customer data
    columns:
      - name: customer_id
        description: Primary key
        tests:
          - unique
          - not_null

  - name: stg_orders
    description: This model cleans up order data
    columns:
      - name: order_id
        description: Primary key
        tests:
          - unique
          - not_null
      - name: status
        tests:
          - accepted_values:
              values: ['placed', 'shipped', 'completed', 'return_pending', 'returned']
      - name: customer_id
        tests:
          - not_null
          - relationships:
              to: ref('stg_customers')
              field: customer_id

Run dbt docs generate to generate the documentation for your project. dbt introspects your project and your warehouse to generate a JSON file with rich documentation about your project.
Click the book icon in the Develop interface to launch documentation in a new tab.

FAQs

How do I write long-form explanations in my descriptions?

How do I access documentation in dbt Explorer?

Commit your changes

Now that you've built your customer model, you need to commit the changes you made to the project so that the repository has your latest code.

If you edited directly in the protected primary branch:

Click the Commit and sync git button. This action prepares your changes for commit.
A modal titled Commit to a new branch will appear.
In the modal window, name your new branch add-customers-model. This branches off from your primary branch with your new changes.
Add a commit message, such as "Add customers model, tests, docs" and commit your changes.
Click Merge this branch to main to add these changes to the main branch on your repo.

If you created a new branch before editing:

Since you already branched out of the primary protected branch, go to Version Control on the left.
Click Commit and sync to add a message.
Add a commit message, such as "Add customers model, tests, docs."
Click Merge this branch to main to add these changes to the main branch on your repo.

Deploy dbt

Use dbt Cloud's Scheduler to deploy your production jobs confidently and build observability into your processes. You'll learn to create a deployment environment and run a job in the following steps.

Create a deployment environment

In the upper left, select Deploy, then click Environments.
Click Create Environment.
In the Name field, write the name of your deployment environment. For example, "Production."
In the dbt Version field, select the latest version from the dropdown.
Under Deployment connection, enter the name of the dataset you want to use as the target, such as jaffle_shop_prod. This will allow dbt to build and work with that dataset.
Click Save.

Create and run a job

Jobs are a set of dbt commands that you want to run on a schedule. For example, dbt build.

As the jaffle_shop business gains more customers, and those customers create more orders, you will see more records added to your source data. Because you materialized the bi_customers model as a table, you'll need to periodically rebuild your table to ensure that the data stays up-to-date. This update will happen when you run a job.

After creating your deployment environment, you should be directed to the page for a new environment. If not, select Deploy in the upper left, then click Jobs.
Click + Create job and then select Deploy job. Provide a name, for example, "Production run", and link it to the Environment you just created.
Scroll down to the Execution Settings section.
Under Commands, add this command as part of your job if you don't see it:
- dbt build
Select the Generate docs on run checkbox to automatically generate updated project docs each time your job runs.
For this exercise, do not set a schedule for your project to run — while your organization's project should run regularly, there's no need to run this example project on a schedule. Scheduling a job is sometimes referred to as deploying a project.
Select Save, then click Run now to run your job.
Click the run and watch its progress under "Run history."
Once the run is complete, click View Documentation to see the docs for your project.

Congratulations 🎉! You've just deployed your first dbt project!

FAQs

What happens if one of my runs fails?

Introduction​

Prerequisites​​

Related content​

Load data​

Connect dbt Cloud to Teradata​

Set up a dbt Cloud managed repository​

Initialize your dbt project​ and start developing​

Delete the example models​

FAQs​

Build your first model​

Change the way your model is materialized​

FAQs​

Build models on top of other models​

FAQs​

Build models on top of sources​

Add tests to your models​

FAQs​

Document your models​

FAQs​

Commit your changes​

Deploy dbt​

Create a deployment environment​

Create and run a job​

FAQs​

Introduction

Prerequisites

Related content

Load data

Connect dbt Cloud to Teradata

Set up a dbt Cloud managed repository

Initialize your dbt project and start developing

Delete the example models

FAQs

Build your first model

Change the way your model is materialized

FAQs

Build models on top of other models

FAQs

Build models on top of sources

Add tests to your models

FAQs

Document your models

FAQs

Commit your changes

Deploy dbt

Create a deployment environment

Create and run a job

FAQs