Making dbt model audits easier

Limitation of dbt docs

dbt (data build tool) is an open-source tool for data transforming, testing and documentation, that's easily integrated with cloud data warehouses. It's used in many ETL/ELT data stacks at the moment, including ours.

It contains out-of-the-box docs: dbt docs generate. The free text search, DAG view, and documentation features are great. However, there are some limitations. For example, as of v1.0.1, you cannot easily search for models sharing the same primary keys.

Solution

We added a step dbt ls --output json (docs) to the pipeline runs. This is stored into cloud storage and surfaced via a simple web app. Ta da! The models/tests along with any defined fields like unique_on are now filterable. 🎉

Searching through dbt ls output

This tool helps the data team by making audits of models easier and also speeds up iterative development.

Tools: Node.js, Azure storage JS SDK, App service with AD authentication.