ADR 0023: DAG Authoring β Config Binding and Override Layers¶
Status: Accepted Date: 2026-05-24 Deciders: Project founder
Context¶
A DAG author writes two files: a dag.py (real Apache Airflow SDK 3.2.x code,
imported by the Python parser via DagBag) and a leoflow.yaml (Leoflow's
deployment config). leoflow compile runs the parser to produce an immutable
dag.json, then overlays Leoflow-specific config onto it, builds the image, and
leoflow push registers the artifact (ADR 0003: one image per DAG; DAGs are
immutable artifacts).
Two questions arose while designing the authoring experience:
-
Where do Leoflow-specific knobs live, and how are they bound to a DAG and to individual tasks? Things like per-task
resources,retries,env, and the staging volume (ADR 0022) are not Airflow operator attributes. You cannot invent kwargs on an Airflow operator (PythonOperator(my_kwarg=...)raisesTypeErrorbecause the parser imports the real Airflow), so these knobs need a home outside the operator call. -
How do defaults and overrides layer? An author wants a default for the whole DAG and the ability to override per task; an operator wants per-cluster defaults (e.g. the RWX
storage_class, which differs between GKE / EKS / k3d) without re-baking the portable artifact.
The Airflow dialect we accept (and its limits)¶
The parser imports the real Airflow SDK, so any dag.py Airflow can import,
imports. The constraint is in the translation to dag.json, not the parse:
- Task types are detected by operator class name:
Python(incl. TaskFlow@task),Bash,Http. Any other operator β compile error. - Trigger rules:
all_success,all_failed,all_done,one_success,one_failed. Others β compile error. - XCom: TaskFlow data-flow (
transform(extract())) is resolved intoxcom_input; manualxcom_pullis not detected. - Schedule: string/cron/preset only.
- Not translated: branching (
BranchPythonOperatoris silently treated as plainpythontoday β see Consequences), dynamic task mapping (.expand), sensors, KubernetesPodOperator, datasets/assets, Jinja templating, per-taskdefault_argsfromdag.py(onlyleoflow.yamldefaults are honored).
Decision¶
1. Binding¶
- YAML β DAG binds by
dag_id(already the case:leoflow.yamldeclaresdag_id, the parser loads that DAG). - YAML β task binds by
task_id, via atasks:map keyed bytask_id:
dag_id: my_etl
staging:
enabled: true # DAG-level
tasks:
transform: # binds to task_id "transform"
retries: 5
env: { TZ: "UTC" }
resources:
requests: { cpu: "2", memory: 4Gi }
- YAML-only for now. Inline Python (
executor_config={"leoflow": {...}}, a native Airflow per-task escape hatch) is a viable future evolution but is not implemented, to keep a single source of truth. If added later it would be the most-specific layer (see precedence) and is the only sanctioned way to attach Leoflow config insidedag.py.
2. Override layers and precedence¶
Three layers, most specific wins:
L2 task override (leoflow.yaml tasks.<id>)
> L1 DAG default (leoflow.yaml defaults / default_args)
> L0 platform default (server config, applied at dispatch)
- L1 + L2 are merged at compile time and baked into
dag.json. The artifact carries the author's explicit intent and stays self-describing. - L0 is applied at dispatch time by the control plane, filling only the gaps
the artifact left empty. It is never baked into
dag.json, because per-cluster values (storage class, default resources) would make the artifact non-portable (violating ADR 0003 immutability/portability). L0 lives inexecutor.defaultsserver config.
Precedence falls out naturally: L2/L1 set explicit fields on the TaskSpec;
applyDefaultRetries and the dispatcher's gap-fill only act when a field is
unset.
3. Overridable-per-level matrix¶
| Knob | L2 task | L1 DAG | L0 platform | Notes |
|---|---|---|---|---|
retries, retry_delay_seconds, execution_timeout_seconds |
β | β
(defaults) |
β | |
env |
β (merged) | β | β | merged over compiled env |
resources |
β | β | β | L0 fills when unset |
execution (node selector, tolerations, SA, pull policy) |
β | β | β | |
staging (size, storage_class) |
β | β | β | DAG-level only: one RWX volume is shared atomically by the whole run (ADR 0022); per-task staging would break that semantic. L0 defaults size/class per cluster. |
4. Guardrails (fail closed, never silently drop)¶
- A
tasks:entry naming atask_idabsent from the compiled DAG β compile error naming the unknown id and listing the DAG's task ids. - A duplicate
task_idkey in the YAML β parse error (yaml.v3 rejects duplicate mapping keys). - A duplicate
dag_idacross projects in a monorepo is a CI-gate concern, not a single-project compile error: within one compile thedag_idis unique by construction, and re-pushing the samedag_idis the intended re-deploy path (dagsisUNIQUE (tenant_id, dag_id)withON CONFLICT DO UPDATE). The CI pattern (below) must detect cross-project collisions before push.
5. Dev loop vs CI artifact¶
- Dev: best local experience is a fast editβrun loop that skips the slow image
build (subprocess executor, hot reload), running the same parser + overlay +
guardrails in memory. (A
leoflow devwatch command is planned separately.) - CI (the authoritative path): on push,
leoflow compileruns the parser + overlay + guardrails as a gate, builds + pushes the image (tag = git sha), andleoflow pushregisters the immutable artifact. The same guardrails that warn the dev locally fail the CI build β a duplicatedag_id/task_idor unknown task binding never reaches production.
Consequences¶
- Authors get a clean separation:
dag.pyis pure Airflow logic;leoflow.yamlcarries deployment knobs, bound explicitly bydag_id/task_id. - The artifact stays portable: only the author's explicit config is baked; per-cluster defaults are layered in at dispatch.
- Typos fail loudly at compile, not silently in production.
- Known sharp edge to address later:
BranchPythonOperator/@task.branchmatch thePythonname check and are translated as plainpython, losing branch semantics with no error. A follow-up should make unsupported control-flow operators a hard compile error rather than a silent mistranslation. - The full authoring guide (
docs/dag-authoring.md) and its documentation format are tracked separately; this ADR records the decisions, not the end-user guide.