DAG Development

Some notes for myself or my colleagues, whom I may or may not be working with, from my first foray into Airflow DAG development.
Python code formatting, PEP8 and Black
Python has well defined code style guides described in PEP8 and enforced in tools like Black. There is a Black VS Code formatter you should use to handle code formatting for you.
Connecting to SQL Server on your host machine from docker
If you're running Airflow locally, perhaps with mwaa-local-runner, and you want to connect a MS SQL Server instance installed on your host OS, you can use the host.docker.internal
DNS name to route to your host machine. If you're connecting to a named SQL instance, you'll want something like host.docker.internal\SQL2019
as the hostname.
You'll probably need Mixed Mode/SQL Auth enabled and a SQL Login with appropriate permissions.
Sharing common code between DAGs
Parameterized DAGs
It's possible to write a DAG that does a bunch of heavy lifting and invoke that DAG from other DAGs with a set of parameters provided at runtime. To invoke a DAG in this way you need to use the TriggerDagRunOperator
like so:
with DAG(
dag_id='trigger_dag',
start_date=datetime(2025, 5, 16),
schedule="0 0 */1 * *",
) as dag:
@task
def build_conf():
return {
"product_type": "Fish"
}
config = build_config()
trigger = TriggerDagRunOperator(
task_id="trigger_dag_op",
trigger_dag_id="the_target_dag",
conf=config,
wait_for_completion=True,
)
config >> trigger
Then in the target DAG being invoked, you can retrieve those runtime parameters as config like so:
@task
def read_config(**kwargs):
conf = kwargs["dag_run"].conf or {}
product_type = conf.get("product_type", "NOTE")
# ...
return {
"product_type": product_type,
# ...
}