Deploying dbt projects¶
dbt projects can be deployed to run on the omega-ml runtime as an ad-hoc or a scheduled job as follows. dbt projects are essentially a collection of files that make up a SQL-based workflow. This means we can easily deploy dbt projects to omega-ml as a script, and run them on schedule or on-demand.
Package the dbt project(s) and deploy to omega-ml
Schedule a job to run the dbt project(s)
Serve the dbt project(s) documentation via omegaml’s apphub (or browse locally)
Here’s a quick schematic overview of the components involved:
How to package a dbt project¶
To deploy a dbt project, we first need to package the project into a pip-installable package. This can be done by the following steps:
Mark the dbt project as a python package (by creating a __init__.py file)
Create the setup.py and MANIFEST.in files in the dbt directory
Store the dbt project in om.scripts
Create a job (notebook) to run the dbt project on a schedule
Let’s go through these steps in detail.
Create a dbt project¶
We assume a dbt project exists already. If not, you can create a new dbt project by following these steps. To create a dbt project, we can use the dbt CLI. First, we need to install dbt:
$ pip install dbt-core
Then, we can create a new dbt project by running the following command:
$ dbt init myproject
Mark your dbt project as a python package¶
To deploy a dbt project to omega-ml, we need to package the project as a python package so it can be installed in the runtime. This includes essentially two files:
__init__.py to mark the dbt project as a python package
The dbt profiles.yml to specify the dbt profile (without any secrets)
Mark the dbt project as a python package by creating a __init__.py file in the db project directory. We will also include a profiles.yml that however does not include any secrets such as passwords or API keys. Instead, we will update the dbt profile at runtime with omega-ml defaults.
In __init__.py we will add a function to update the dbt profile with omega-ml defaults. This function is used at run-time to replace {PLACEHOLDER}-style variables, e.g. to specify an SQL connection details such as server hostnames and passwords which are provided as part of the omega-ml qualifier context in which the dbt project will run.
The __init__.py file should look like this:
# /path/to/dbt/myproject/__init__.py
def update_dbt_profile(fn=None, mod=None, om=None, **vars):
"""
update dbt profiles.yaml with omegaml defaults
Usage:
import omegaml as om
mod = om.scripts.get('dbt/foo', install=True)
update_dbt_profile(mod=mod, om=om)
"""
from pathlib import Path
default_fn = Path(getattr(mod, '__file__', __file__)).parent / 'profiles.yml'
fn = Path(fn) if fn else default_fn
if not fn.exists():
raise FileNotFoundError(f'dbt profiles.yml not found at {fn}')
vars.update(**om.defaults) if om else None
with open(fn, 'r') as f:
profiles = f.read()
with open(fn, 'w') as f:
profiles = profiles.format(**vars)
f.write(profiles)
return fn.parent
Copy your profiles.yml from $HOME/.dbt/profiles.yml, and replace the values that are provided by your omega-ml qualifier context with their respective {OMEGA_VARIABLE} placeholder. Later we will use the update_dbt_profile function to replace these placeholders with the actual values provided by omega-ml.
The profiles.yml file should look something like this (adopt to the specific variables used in your omega-ml qualifier context):
myproject:
target: prod
outputs:
prod:
type: sqlserver
driver: '{OMEGA_SQL_SERVER_DRIVER}' # (The ODBC Driver installed on your system)
server: {OMEGA_SQL_SERVER_HOST}
port: 1433
database: {OMEGA_SQL_SERVER_DB}
schema: schema_name
user: {OMEGA_SQL_SERVER_USER}
password: {OMEGA_SQL_SERVER_PASSWORD}
Create the setup.py and MANIFEST.in files¶
The dbt project needs a setup.py file to be packaged as a python package. The setup.py file should look like this:
# /path/to/dbt/setup.py
from setuptools import setup, find_packages
setup(
name='myproject',
version='0.1',
packages=find_packages(),
include_package_data=True,
install_requires=[
'dbt-core',
]
)
We also need a MANIFEST.in file to include the dbt project files (models, macros, etc.) in the package:
# myproject/MANIFEST.in
include *.in
recursive-include myproject *
Store the dbt project in om.scripts¶
Then, we can package the dbt project by running the following command:
# in /path/to/setup.py
$ om scripts put . myproject dbt/myproject
Schedule a job (notebook)¶
To run the dbt project on a schedule, we need to create a job (notebook) that runs the dbt project. The notebook should look as follows and be stored in om.jobs. The notebook essentially has three parts:
Import the dbt project and update the dbt profile with omega-ml defaults
Run the dbt project
Generate and save the dbt docs, so it is available for later inspection or serving via omegaml’s apphub
[1] # cron: 0 0 * * 1
# comment: run every Monday at midnight
[2] # (1) import dbt project and prepare dbt profile
import omegaml as om
project = 'dbt/foo'
dbt_mod = om.scripts.get(project, install=True)
project_dir = dbt_mod.update_dbt_profile(om=om, OMEGA_BUCKET='main')
[3] # (2) run dbt project
!dbt run --profiles-dir $project_dir --project-dir $project_dir
[4] # (3) generate docs and save to om.datasets as dbt/<project>/report.zip
# generate docs
!dbt docs generate --profiles-dir $project_dir --project-dir $project_dir --target-path report
!python -m zipfile -c report.zip $project_dir/report
!om datasets put ./report.zip $project/report.zip
Browse the dbt project documentation¶
The dbt project documentation can be browsed locally by running the following command:
# in /path/to/dbt/myproject
$ om datasets get dbt/myproject/report.zip -o report.zip
$ unzip report.zip
$ dbt docs serve --target-path report
Alternatively, the dbt project documentation can be served via omegaml’s apphub. To do this, we need to create a a small flask app that serves the dbt project documentation, created and saved by the job (notebook) above. The app should look as follows:
# add this to path/to/dbt/myproject/__init__.py
def create_app(server=None, uri=None, **kwargs):
import os
import uuid
from functools import lru_cache
from flask import Flask, abort
from flask import Blueprint
from zipfile import ZipFile
import omegaml as om
server = server or Flask(__name__)
server.config.setdefault('SECRET_KEY', os.environ.get('SECRET_KEY') or uuid.uuid4().hex)
app = Blueprint('foo', __name__,
url_prefix=uri,
template_folder='templates')
file_cache = lru_cache(maxsize=100)
om = om.setup()
@app.route('/')
def index():
# present a list of project reports stored in om.datasets
# -- each project report is stored as dbt/<project>/report.zip
href = "<a href='{uri}/{project}/index'>{project}</a><br>"
projects = [href.format(project=os.path.basename(os.path.dirname(project)),
uri=uri or '') for project in om.datasets.list('dbt/*')]
text = "<p>select a project to view its dbt report</p>"
return text + "\n".join(projects) if projects else "No projects found"
@app.route('/<project>/index')
def project(project):
# open the project report's index.html
_send_report_file.cache_clear()
project_dir = f'dbt/{project}'
return _send_report_file(project_dir, 'index.html')
@app.route('/<project>/<path:path>')
def static_file(project, path):
# open a static file from the project report
project_dir = f'dbt/{project}'
return _send_report_file(project_dir, path)
@app.errorhandler(404)
def handle_exception(e):
return {
"code": e.code,
"description": e.description,
"exception": str(e),
}, 404
@file_cache
def _send_report_file(project_dir, filename):
report_fn = f'{project_dir}/report.zip'
try:
with om.datasets.get(report_fn) as f:
zipfile = ZipFile(f)
data = zipfile.read(f'report/{filename}')
zipfile.close()
except Exception as e:
abort(404, str(e))
return data
server.register_blueprint(app)
return server
To run this app locally, we can use the following command:
$ FLASK_APP=myproject:create_app flask run
To serve the app via omegaml’s apphub, we need to package the app as a python package and store it in om.scripts.
$ om scripts put . myproject apps/myproject $ om runtime restart app myproject
Working with multiple dbt projects¶
If you have multiple dbt projects, you can either follow the same steps as above, applied to each project, or you can create a single “dbtdeploy” application, that can be used to run multiple dbt projects.
Here’s how to create the “dbtdeploy” application:
Create a dbtdeploy application:
# in /path/to/dbt $ mkdir dbtdeploy $ touch dbtdeploy/__init__.py $ touch dbtdeploy/setup.py $ touch dbtdeploy/MANIFEST.in
Update the setup.py and MANIFEST.in files:
# dbtdeploy/setup.py from setuptools import setup, find_packages
- setup(
name=’dbtdeploy’, version=’0.1’, packages=find_packages(), include_package_data=True, install_requires=[
‘dbt-core’,
]
)
# dbtdeploy/MANIFEST.in include *.in recursive-include dbtdeploy * recursive-include myproject * <.. include other dbt projects here ..>
Update the __init__.py file for the dbtdeploy application:
# include the update_dbt_profile function, from above def update_dbt_profile(fn=None, mod=None, om=None, **vars): <... copy from above verbabtim ...> # include the create_app function, from above <... copy from above verbabtim ...>
Now we can package the dbtdeploy application, including all dbt projects into a single dbtdeploy application.
Link each dbt project into the directory of the dbtdeploy application:
$ ln -s /path/to/dbt/myproject dbtdeploy/myproject
Package the dbtdeploy application:
$ om scripts put . dbtdeploy dbt/dbtdeploy
Finally, we can use the dbtdeploy application to run multiple dbt projects.
Create a job (notebook) to run the dbtdeploy application:
# in om.jobs [1] # cron: 0 0 * * 1 # comment: run every Monday at midnight [2] # (1) import dbt project and prepare dbt profile dbtdeploy = om.scripts.get('dbt/dbtdeploy', install=True) dbt_dir = Path(dbtdeploy.__file__).parent dbtdeploy.update_dbt_profile(f"{dbt_dir}/profiles.yml", om=om) [3] # (2) run dbt projects (repeat (2) and (3) for each dbt project) project_dir = dbt_dir / 'foo` !dbt run --profiles-dir $dbt_dir --project-dir $project_dir [4] # (3) generate docs and save to om.datasets as dbt/<project>/report.zip # generate docs !dbt docs generate --profiles-dir $dbt_dir --project-dir $project_dir --target-path report !python -m zipfile -c report.zip $project_dir/report !om datasets put ./report.zip $project/report.zip
Serve the dbt project documentation via omegaml’s apphub:
# package the app $ om scripts put . dbtdeploy apps/dbtdeploy $ om runtime restart app dbtdeploy Alterantively, you can serve the app locally by running the following command: $ FLASK_APP=dbtdeploy:create_app flask run