DAG Development

DAG Development
Apache Airflow

Some notes for myself or my colleagues, whom I may or may not be working with, from my first foray into Airflow DAG development.

Python code formatting, PEP8 and Black

Python has well defined code style guides described in PEP8 and enforced in tools like Black. There is a Black VS Code formatter you should use to handle code formatting for you.

Connecting to SQL Server on your host machine from docker

If you're running Airflow locally, perhaps with mwaa-local-runner, and you want to connect a MS SQL Server instance installed on your host OS, you can use the host.docker.internal DNS name to route to your host machine. If you're connecting to a named SQL instance, you'll want something like host.docker.internal\SQL2019 as the hostname.

You'll probably need Mixed Mode/SQL Auth enabled and a SQL Login with appropriate permissions.

Sharing common code between DAGs

Read https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/modules_management.html

Parameterized DAGs

It's possible to write a DAG that does a bunch of heavy lifting and invoke that DAG from other DAGs with a set of parameters provided at runtime. To invoke a DAG in this way you need to use the TriggerDagRunOperator like so:

with DAG(
   dag_id='trigger_dag',
   start_date=datetime(2025, 5, 16),
   schedule="0 0 */1 * *",
) as dag:
    @task
    def build_conf():
        return {
            "product_type": "Fish"
        }

    config = build_config()

    trigger = TriggerDagRunOperator(
        task_id="trigger_dag_op",
        trigger_dag_id="the_target_dag",
        conf=config,
        wait_for_completion=True,
    )

    config >> trigger

Then in the target DAG being invoked, you can retrieve those runtime parameters as config like so:

@task
def read_config(**kwargs):
    conf = kwargs["dag_run"].conf or {}
    product_type = conf.get("product_type", "NOTE")
    # ...
    return {
        "product_type": product_type,
        # ...
    }