Skip to content

Leoflow

The workflow orchestrator that ate Apache Airflow's lunch.
Same UI. Same vocabulary. A Go control plane instead of Python's. Zero of the pain.
Native map-reduce for ML/AI โ€” fan-out + reduce as a Python list comprehension.

Get started DAG authoring GitHub

Leoflow Lite โ€” localhost:8088 Leoflow Lite, the ETL graph (extract, transform, load) running on a local cluster

A Go control plane that keeps Airflow's proven pod-per-task model and its UI, and throws away the Python control plane that makes Airflow slow.

Author a DAG, ship a container

A DAG is a leoflow.yaml (config, bindings, packaging) plus a dag.py (Airflow SDK). They compile to one immutable artifact โ€” dag.json + a container image.

schema_version: "1.0"
dag_id: etl_daily
description: Daily ETL โ€” extract, transform, load.
owner: data-eng
tags: [etl]
python_version: "3.12"
dependencies:            # pip packages baked into the DAG's own image
  - pandas==2.2.2
"""etl_daily โ€” a three-task ETL on the Airflow SDK."""
from airflow.sdk import DAG, task

@task
def extract() -> list[int]:
    return list(range(100))

@task
def transform(rows: list[int]) -> int:
    return sum(rows)

@task
def load(total: int) -> None:
    print("loaded:", total)

with DAG("etl_daily", schedule="0 6 * * *", catchup=False, tags=["etl"]):
    load(transform(extract()))
  • DAGs are immutable artifacts

    A dag.json + a container image, versioned together. No re-parsing /dags, no drift. Concepts โ†’

  • One image per DAG

    Each DAG carries its own dependencies. No shared filesystem, no dependency hell. Architecture โ†’

  • A real dev loop

    leoflow lite โ€” isolated cluster, hot reload, silver Lite badge. Edit, save, see it run. Operating modes โ†’

  • Airflow-compatible API & UI

    /api/v2/* and /ui/*, pinned to Airflow 3.2.x. HTTP API โ†’

  • Native map-reduce for ML/AI

    Fan-out + reduce as a Python list comprehension. No XCom plumbing, no broker, no special operator. Map-reduce for ML โ†’

Map-reduce in two lines of Python

Every parallel ML workload โ€” hyperparameter search, k-fold CV, ensemble training, batch inference, Monte Carlo โ€” is the same pattern. Leoflow expresses it as a list comprehension:

from airflow.sdk import DAG, task

@task
def trial(lr: float) -> dict:
    return train_one(lr)                            # map

@task
def select_best(trials: list[dict]) -> dict:
    return max(trials, key=lambda r: r["score"])    # reduce

with DAG("hparam_search", schedule=None):
    select_best([trial(lr) for lr in [0.001, 0.01, 0.05, 0.1, 0.5]])

The parser captures the list shape at compile time; the runtime assembles the upstream XComs in declaration order and delivers them as a real Python list โ€” with per-trial isolation, per-trial retry, and a null slot for any upstream that legitimately produced no result. See Map-reduce for ML โ†’ for the guarantees, limits, and the dag.json shape.

The dev loop

leoflow lite provision            # check + provision host deps (dev-only)
leoflow init dags/my_dag     # scaffold a project
leoflow lite dags/my_dag      # hot-reload at http://localhost:8088 (Lite edition)

Lite ships today (pre-alpha, local on a trusted network); Pro (the Kubernetes edition) is a near-term goal โ€” see the roadmap and Editions for the split.