Skip to content

Your first Pro DAG (โ‰ˆ20 min)

This is the end-to-end Pro path: take one DAG from source to a running task on a control plane. Where the 2-minute Lite loop hides the artifact boundary so you can iterate, Pro makes it explicit โ€” because that boundary is what your CI pipeline automates later.

A DAG is an immutable artifact โ€” a dag.json + a container image, versioned together (ADR 0003). Every step below moves the DAG across one boundary:

flowchart LR
  A[dag.py + leoflow.yaml] --> B[compile โ†’ dag.json]
  B --> C[build DAG image<br/>FROM leoflow-runtime]
  C --> D[push image โ†’ registry]
  D --> E[register dag.json โ†’ control plane]
  E --> F[trigger โ†’ runs in a pod]

The DAG image is built FROM the published Leoflow task base (ghcr.io/neochaotic/leoflow-runtime:py3.11), which bundles the leoflow-agent (PID 1, talks gRPC to the control plane) and the leoflow_runtime Python helper. You never build the base yourself โ€” pull it from GHCR, multi-arch, signed.

Prerequisites

  • docker (or podman/nerdctl โ€” pass --builder).
  • The leoflow CLI and Python 3.11+ on your machine (see Python on the runner).
  • A reachable Leoflow control plane (LEOFLOW_SERVER) and a push token (LEOFLOW_TOKEN). For a throwaway target, the Helm chart's Pro deployment brings one up; a local registry (docker run -d -p 5000:5000 registry:2) is enough to push DAG images to.

Step 1 โ€” a project (dag.py + leoflow.yaml + Dockerfile)

dag.py
from airflow.sdk import DAG, task

@task
def extract() -> dict:
    return {"rows": 42}

@task
def load(data: dict) -> None:
    print(f"loaded {data['rows']} rows")

with DAG("first_pro_dag", schedule=None) as dag:
    load(extract())
leoflow.yaml
dag_id: first_pro_dag
python_version: "3.11"
dependencies:
  - requests==2.32.3
Dockerfile
FROM ghcr.io/neochaotic/leoflow-runtime:py3.11
RUN pip install --no-cache-dir requests==2.32.3
COPY dag.py /home/leoflow/dag.py
ENV PYTHONPATH=/home/leoflow

The Dockerfile is boilerplate

It always layers the same way: FROM the base, install your deps, COPY the DAG, set PYTHONPATH. The Lite loop generates this for you; in Pro you keep it in the repo so the image is fully reproducible in CI.

Step 2 โ€” compile, build, and push

leoflow setup                 # once per machine: extracts the parser
leoflow compile . --build --push \
  --image localhost:5000/first-pro-dag:v1.0.0 \
  --dag-version v1.0.0

This single command crosses three boundaries:

  1. compile โ€” parses dag.py, overlays leoflow.yaml, runs the guardrails (unknown task_id, unsupported operator, duplicate keys), and writes dag.json with --image recorded inside it.
  2. build โ€” builds the image from your Dockerfile.
  3. push โ€” pushes it to your registry.

Because the --image you pass is written into dag.json, the registered artifact and the pushed image can never drift.

The guardrails are your CI gate

The same checks fail the build here that warn you in leoflow lite, so a bad binding or an unsupported operator never reaches the control plane.

Step 3 โ€” register the artifact

leoflow push dag.json --server "$LEOFLOW_SERVER" --token "$LEOFLOW_TOKEN"

The control plane now knows the DAG, its version, and which image to pull.

Step 4 โ€” trigger and watch it run

Trigger from the Airflow UI (Trigger DAG) or the API:

curl -X POST "$LEOFLOW_SERVER/api/v2/dags/first_pro_dag/dagRuns" \
  -H "Authorization: Bearer $LEOFLOW_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"logical_date": "2026-01-01T00:00:00Z"}'

The scheduler pulls your image, runs each task in its own pod, and the UI shows state and logs at the next refresh. That is the whole Pro lifecycle.

From here

  • Automate it in CI. Steps 2โ€“3 are exactly what a pipeline runs on every push โ€” see CI/CD & deploy examples for GitHub Actions / GitLab / Cloud Build recipes (and the Python-on-the-runner notes).
  • Add credentials. Declare a connection in the UI; it is delivered to the pod as AIRFLOW_CONN_* โ€” see Variables & Connections.