leoflow¶
Leoflow control plane โ a GitOps-first, container-native workflow orchestrator (Airflow 3.2.x UI/API compatible).
Deploys the Leoflow control plane (leoflow-server) into Kubernetes for a
production-like install โ distinct from the host-run test/e2e/e2e.sh smoke.
Homepage: https://github.com/neochaotic/leoflow
What it installs¶
| Resource | Purpose |
|---|---|
| Deployment | leoflow-server (HTTP 8080, metrics 9090, agent gRPC 9091) |
| Service | ClusterIP exposing http / metrics / grpc |
| ServiceAccount + Role/RoleBinding | lets the control plane create/watch/delete task pods and read their logs in taskNamespace |
| Secret | holds inline DB/Redis/JWT/bootstrap credentials (skipped when you bring your own) |
| Job (hook) | runs golang-migrate before install/upgrade |
| Ingress | optional |
Quick start¶
kubectl create namespace leoflow
helm install lf ./helm/leoflow -n leoflow \
--set database.url='postgres://user:pass@postgres:5432/leoflow?sslmode=disable' \
--set redis.url='redis://redis:6379/0' \
--set auth.jwtSecret='change-me' \
--set bootstrap.password='admin'
Task pods are created in taskNamespace (default leoflow, which must match the
namespace the server targets). Agents dial the control plane gRPC at the
in-cluster Service DNS automatically (override with
config.agentControlPlaneAddr).
Bringing your own secrets¶
Instead of inline values, reference existing Secrets:
--set database.existingSecret=my-db # key: databaseUrl
--set redis.existingSecret=my-redis # key: redisUrl
--set auth.existingSecret=my-jwt # key: jwtSecret
--set bootstrap.existingSecret=my-boot # key: bootstrapPassword
When all credentials come from existing Secrets, the chart creates no Secret of its own.
Upgrades + secret rotation. The pod template carries a
checksum/secretannotation hashed over the chart-rendered Secret (#316), sohelm upgraderolls the pod whenever an inline value backing it changes (database.url,redis.url,auth.jwtSecret,secretKey,bootstrap.password). Credentials wired via*.existingSecretare outside the chart's visibility: rotating them requires a manualkubectl rollout restart deployment/leoflowfor the change to take effect.
Verified TLS to managed Postgres (#315)¶
Managed Postgres (Cloud SQL, RDS, Azure DB) presents a server cert signed by
a provider / per-instance CA that is not in the container's system roots.
Without the CA bundle mounted, the strongest TLS posture you can pin is
sslmode=require โ encrypted but the server cert is not verified
(MITM-vulnerable). To upgrade to sslmode=verify-full:
- Create a ConfigMap holding the CA bundle as
ca.crt:
- Reference it from
database.caConfigMapand point the DSN at the mounted path:
helm upgrade --install leoflow ./helm/leoflow -n leoflow \
--set database.caConfigMap=managed-pg-ca \
--set database.url='postgres://user:pass@cloudsql-private-ip:5432/leoflow?sslmode=verify-full&sslrootcert=/etc/leoflow/db-ca/ca.crt'
The chart mounts the ConfigMap readonly at /etc/leoflow/db-ca/ca.crt. pgx
reads sslrootcert from the DSN natively, so no extra Go-side configuration
is required.
Rotation note. Kubernetes auto-updates the mounted file when the ConfigMap changes, but the pgx pool keeps its existing connections until they cycle. Cert rotation that invalidates the old chain may break in-flight connections; rolling the pod (
kubectl rollout restart deploy/leoflow) guarantees a clean cutover.Redis sibling โ coming next (#312). The same pattern lands for Redis in a follow-up PR: a
redis.caConfigMapknob mounts the bundle and the Go client overrides itsTLSConfig.RootCAsfrom the file (go-redis does not read CA paths from the URL the way pgx does).
Migrations¶
The pre-install/pre-upgrade Job runs migrate -path <path> -database <url> up
using the default image ghcr.io/neochaotic/leoflow-migrate (built from
deploy/Dockerfile.migrate, published by .github/workflows/release.yaml on
every tag), which bundles the Leoflow migrations/ at migrations.path.
Override migrations.image to use your own, or set migrations.enabled=false
to migrate out of band.
Datastore compatibility¶
The chart talks to external Postgres + Redis you provision. Versions we test against in CI and the minimums we support:
| Datastore | Tested in CI | Recommended minimum | Notes |
|---|---|---|---|
| PostgreSQL | 16.x (postgres:16-alpine) |
PostgreSQL 13 (current LTS line) | Uses JSONB, ON CONFLICT, advisory locks, generated columns. Versions below 13 may work but are untested; CI doesn't validate them. |
| Redis | 7.x (redis:7-alpine) |
Redis 6.0 | ACL is used for the per-tenant auth model; Redis 5 and below lack ACL. Streams aren't used today but may be in the future. |
The PoC recipe pins Bitnami chart majors that bundle the tested versions (postgres chart 16 โ PG 16; redis chart 20 โ Redis 7). When pointing at managed cloud datastores (RDS, ElastiCache, CloudSQL, Memorystore), check that the engine version is at the recommended minimum or above.
If you need to run against an older engine, please file an issue with the version + symptoms โ we don't bench-test older majors but will investigate specific incompatibilities.
Verified TLS to managed Redis (#312)¶
Managed Redis offerings โ Memorystore SERVER_AUTHENTICATION, ElastiCache
in-transit encryption, Azure Cache for Redis โ sign their TLS server cert
with a provider or per-instance CA that is not in the container's
system roots. Without telling the client which CA to trust, the only
working postures are "plaintext redis://" (unacceptable across the
internet) or "skip verification" (we don't expose that knob; it strips
TLS to a noise channel).
Set redis.caConfigMap to a ConfigMap holding the provider CA as
ca.crt:
kubectl create configmap leoflow-redis-ca \
--from-file=ca.crt=./memorystore-server-ca.pem -n leoflow
The chart then:
- Mounts the ConfigMap read-only at
/etc/leoflow/redis-ca/ca.crt. - Sets
LEOFLOW_REDIS_CA_FILE=/etc/leoflow/redis-ca/ca.crtso the server overridestls.Config.RootCAsinstead of falling back to system roots. - Refuses to boot if the file is missing or malformed (clear error instead of a confusing "x509: certificate signed by unknown authority" at first Ping).
Leave caConfigMap empty for plaintext redis:// or for managed
TLS that uses a public CA (rare).
Evaluating without a managed Postgres + Redis¶
For a one-cluster evaluation (kind, minikube, k3d, scratch namespace), the
chart deliberately won't fall back to embedded datastores โ that's Lite's
job, not Pro's (see templates/deployment.yaml:8-13). The supported PoC
path is to install Bitnami's Postgres + Redis charts alongside Leoflow:
- Recipe:
helm/leoflow/examples/README.md - Matching values file:
helm/leoflow/examples/poc.yaml
Three helm installs in total. Not for production โ see the recipe for
the production-shaped command.
Validate¶
helm lint ./helm/leoflow
helm template lf ./helm/leoflow -n leoflow \
--set database.url=postgres://x \
--set redis.url=redis://r/0 \
--set auth.jwtSecret=s \
--set "secretKey=$(openssl rand -hex 16)"
bash scripts/helm-template-checks.sh # contract assertions (env wiring, Job hardening, fixture lengths)
Maintainers¶
| Name | Url | |
|---|---|---|
| Leoflow |
Source Code¶
Values¶
The table below is auto-generated by helm-docs from values.yaml. To
update: edit values.yaml (use # -- <description> comments above each key
that warrants documentation), then run helm-docs -c helm/leoflow from the
repo root. CI fails (helm-ci.yaml lint job) if the regenerated README would
differ from what's committed.
| Key | Type | Default | Description |
|---|---|---|---|
| affinity | object | {} |
Pod affinity rules (standard K8s โ co-locate or anti-affinity). |
| agentTLS.caConfigMap | string | "" |
Name of a ConfigMap with key ca.crt. Mounted into task pods so the agent verifies the server cert. Typically a cert-manager trust-bundle. |
| agentTLS.enabled | bool | true |
Enable TLS on the agent โ control plane gRPC channel (#58). Default ON for the Pro alpha: the chart marks this deployment as the production edition, and the server refuses to boot with LEOFLOW_AGENT_ALLOW_INSECURE_SECRETS=true so secrets cannot accidentally travel a plaintext channel. Disable only for an isolated, in-cluster test where the threat model accepts plaintext. |
| agentTLS.serverCertSecret | string | "" |
Name of a kubernetes.io/tls Secret (with tls.crt/tls.key) for the gRPC server. Typically produced by a cert-manager Certificate. Required when enabled: true. |
| auth.existingSecret | string | "" |
Name of a Secret with key jwtSecret (takes precedence over jwtSecret). |
| auth.jwtSecret | string | "" |
HMAC secret signing API + agent JWTs. Set inline OR reference an existing Secret via existingSecret. Generate with openssl rand -base64 64. |
| auth.tokenTtlSeconds | int | 3600 |
API + agent JWT lifetime in seconds. Default 1h; raise for longer agent sessions. |
| autoscaling.behavior | object | {} |
HPA scaling behavior (scale-up/scale-down policies). See K8s docs for autoscaling/v2 behavior schema. |
| autoscaling.enabled | bool | false |
Enable HPA for the leoflow-server Deployment. Requires metrics-server. |
| autoscaling.maxReplicas | int | 6 |
Maximum replicas. HPA never scales above this. |
| autoscaling.minReplicas | int | 2 |
Minimum replicas. HPA never scales below this. |
| autoscaling.targetCPUUtilizationPercentage | int | 70 |
Target average CPU utilization across replicas (percent). HPA scales out when exceeded. |
| autoscaling.targetMemoryUtilizationPercentage | string | "" |
Target average memory utilization (percent). Empty = not used. Add only if your workload is memory-bound (rare for a control plane). |
| bootstrap.existingSecret | string | "" |
Name of a Secret with key bootstrapPassword (takes precedence over password). |
| bootstrap.password | string | "" |
Initial admin password (first install only). Leave empty to skip the bootstrap; the operator then creates the first admin out-of-band. |
| config.agentControlPlaneAddr | string | "" |
gRPC address task pods dial back to reach the control plane. Defaults to the in-cluster Service DNS on ports.grpc when empty. Override for cross-cluster or external task pods. |
| config.cors.allowedOrigins | list | ["*"] |
CORS allowed origins for the API. ["*"] is fine for Pro behind an authenticated ingress; tighten for public-facing deploys. |
| config.logsDir | string | "/var/log/leoflow" |
Directory inside the pod where task logs are written. Mounted from logs.persistence (a PVC by default) so logs survive pod restarts. Set logs.persistence.enabled: false to fall back to an ephemeral emptyDir (dev only). |
| config.scheduler.enabled | bool | true |
Run the scheduler loop. Disable only for read-only API-only replicas (rare). |
| config.scheduler.loopIntervalMs | int | 1000 |
Scheduler loop interval in milliseconds. Lower = faster reactivity, higher CPU. 1000ms is the production-tested default. |
| database.caConfigMap | string | "" |
Name of a ConfigMap with key ca.crt holding the managed-Postgres CA bundle (#315). When set, the chart mounts it at /etc/leoflow/db-ca/ca.crt so the operator can pin sslmode=verify-full&sslrootcert=/etc/leoflow/db-ca/ca.crt in the DSN. Empty (default) means TLS still works via sslmode=require, but the server cert is NOT verified โ the connection is encrypted but MITM-vulnerable, the standard managed-DB posture before this knob. |
| database.existingSecret | string | "" |
Name of a Secret with key databaseUrl (takes precedence over url). |
| database.maxIdleConns | int | 5 |
Max idle DB connections kept in the pool. Should be โค maxOpenConns. |
| database.maxOpenConns | int | 20 |
Max concurrent open DB connections (Postgres-side load gate). Increase for high-throughput Pro deployments. |
| database.url | string | "" |
External Postgres DSN. Required for Pro (the embedded datastore is Lite-only); the chart fails the install if neither this nor existingSecret is set. Example: postgres://user:pass@host:5432/leoflow?sslmode=disable. |
| image.pullPolicy | string | "IfNotPresent" |
|
| image.repository | string | "ghcr.io/neochaotic/leoflow-server" |
Control-plane image. Published by GoReleaser on every tag, signed with cosign. |
| image.tag | string | "" |
Image tag. Defaults to .Chart.appVersion when empty; pre-alpha installs should pin --set image.tag=v0.0.1-prealpha.N (the v-prefix and no-v forms are both published and resolve to the same digest, so either works). |
| imagePullSecrets | list | [] |
|
| ingress.annotations | object | {} |
Ingress annotations (controller-specific: rewrites, TLS, auth, etc.). |
| ingress.className | string | "" |
Ingress class name (e.g. nginx). Leave empty to use the cluster default. |
| ingress.enabled | bool | false |
Enable an Ingress resource exposing the control plane via HTTP/HTTPS. Requires an Ingress controller (nginx/traefik/etc.) in the cluster. |
| ingress.hosts | list | [{"host":"leoflow.local","paths":[{"path":"/","pathType":"Prefix"}]}] |
Host + path rules. Each host maps to one or more path entries routed to the leoflow-server's http port. |
| ingress.tls | list | [] |
TLS configuration. Each entry maps hosts to a TLS Secret (typically a cert-manager Certificate Secret). |
| logs.persistence.accessMode | string | "ReadWriteOnce" |
PVC access mode. ReadWriteOnce (default) is fine for single-replica deployments; ReadWriteMany is required when replicaCount > 1. |
| logs.persistence.enabled | bool | true |
Persist control-plane logs in a PVC (default ON). Disable for ephemeral emptyDir (dev only โ logs lost on pod restart). |
| logs.persistence.size | string | "50Gi" |
PVC size for control-plane logs. ~1 GB/day per ~1000 active task runs is a sane starting point. |
| logs.persistence.storageClass | string | "" |
StorageClass for the PVC. Empty uses the cluster default. Specify an RWX class when accessMode: ReadWriteMany. |
| metrics.serviceMonitor.additionalLabels | object | {} |
Extra labels on the ServiceMonitor. Required when the Prometheus instance has a serviceMonitorSelector filter (e.g. {release: kube-prometheus-stack}). |
| metrics.serviceMonitor.enabled | bool | false |
Enable ServiceMonitor for Prometheus scraping. Requires kube-prometheus-stack CRDs. |
| metrics.serviceMonitor.interval | string | "30s" |
Prometheus scrape interval. |
| metrics.serviceMonitor.namespace | string | "" |
Namespace for the ServiceMonitor resource. Defaults to the release namespace; override when Prometheus expects ServiceMonitors in a dedicated namespace. |
| metrics.serviceMonitor.scrapeTimeout | string | "10s" |
Prometheus scrape timeout (must be โค interval). |
| migrations.enabled | bool | true |
|
| migrations.image.pullPolicy | string | "IfNotPresent" |
|
| migrations.image.repository | string | "ghcr.io/neochaotic/leoflow-migrate" |
leoflow-migrate image bundling Leoflow SQL migrations on top of migrate/migrate. Published per release by release.yaml, signed with cosign, multi-arch (amd64 + arm64). |
| migrations.image.tag | string | "" |
Migration image tag. Defaults to .Chart.appVersion when empty. Pin to the same tag as image.tag (both server and migrate publish both v-prefix and no-v forms โ use whichever convention you prefer, they resolve to the same digest): --set migrations.image.tag=v0.0.1-prealpha.N. |
| migrations.path | string | "/migrations" |
Path inside the migrate image where the SQL files live. Must match the COPY destination in deploy/Dockerfile.migrate. |
| migrations.podSecurityContext.fsGroup | int | 65532 |
|
| migrations.podSecurityContext.runAsGroup | int | 65532 |
|
| migrations.podSecurityContext.runAsNonRoot | bool | true |
|
| migrations.podSecurityContext.runAsUser | int | 65532 |
|
| migrations.securityContext.allowPrivilegeEscalation | bool | false |
|
| migrations.securityContext.capabilities.drop[0] | string | "ALL" |
|
| migrations.securityContext.readOnlyRootFilesystem | bool | true |
|
| migrations.securityContext.runAsNonRoot | bool | true |
|
| migrations.securityContext.runAsUser | int | 65532 |
|
| networkPolicy.egress | list | [] |
Explicit egress rules. Empty = allow-all (DNS is ALWAYS allowed regardless). Lock down to your DB/Redis/kube-apiserver endpoints in regulated environments. |
| networkPolicy.enabled | bool | false |
Enable NetworkPolicy gating ingress + egress on the control-plane pods. Requires a CNI that enforces policies (Calico/Cilium/etc.). |
| networkPolicy.ingressFrom | list | [] |
NetworkPolicy from rules for HTTP + gRPC ingress (task pods dial back). Empty = allow from any pod in any namespace. Tighten with e.g. [{namespaceSelector: {}}] for same-namespace only. |
| networkPolicy.metricsFrom | list | [] |
NetworkPolicy from rules for the metrics port (Prometheus scrape). Empty = no separate rule; the metrics port is reachable from wherever ingressFrom allows. Set e.g. [{namespaceSelector: {matchLabels: {kubernetes.io/metadata.name: monitoring}}}] to restrict to a Prometheus namespace. |
| nodeSelector | object | {} |
Pod nodeSelector (standard K8s scheduling label match). |
| observability.logFormat | string | "json" |
Log format: json (production / log aggregators) or console (dev / human-readable). |
| observability.logLevel | string | "info" |
Log level: debug, info, warn, error. Production default is info. |
| observability.otel.enabled | bool | false |
Export OpenTelemetry traces. When false, internal spans are no-ops. |
| observability.otel.endpoint | string | "" |
OTLP/gRPC endpoint URL, e.g. otel-collector:4317. Required when otel.enabled: true. |
| podAnnotations | object | {} |
|
| podDisruptionBudget.enabled | bool | false |
Enable PDB for the leoflow-server Deployment. Pair with replicaCount > 1. |
| podDisruptionBudget.maxUnavailable | string | "" |
Maximum replicas allowed unavailable during voluntary disruption. Set only ONE of minAvailable / maxUnavailable. |
| podDisruptionBudget.minAvailable | int | 1 |
Minimum replicas that must remain up during voluntary disruption. Set only ONE of minAvailable / maxUnavailable. |
| podSecurityContext.fsGroup | int | 65532 |
|
| podSecurityContext.runAsGroup | int | 65532 |
|
| podSecurityContext.runAsNonRoot | bool | true |
|
| podSecurityContext.runAsUser | int | 65532 |
|
| ports | object | {"grpc":9091,"http":8080,"metrics":9090} |
Ports the leoflow-server listens on. http: API + UI, metrics: Prometheus /metrics, grpc: agent โ control plane channel (task pods dial back here). |
| rbac.create | bool | true |
Create the Role + RoleBinding granting the control plane create/get/list/watch/delete on pods + get on pods/log in taskNamespace. Required for the pod-per-task executor. |
| redis.caConfigMap | string | "" |
Name of a ConfigMap with a ca.crt key containing the PEM CA bundle the client trusts when negotiating TLS to a rediss:// URL (#312). Required when the managed-Redis server cert is signed by a provider / per-instance CA that is not in the system roots โ Memorystore SERVER_AUTHENTICATION, ElastiCache in-transit encryption, Azure Cache for Redis. Mounted read-only at /etc/leoflow/redis-ca and exposed to the server via LEOFLOW_REDIS_CA_FILE. Leave empty when Redis uses a public CA or no TLS. |
| redis.existingSecret | string | "" |
Name of a Secret with key redisUrl (takes precedence over url). |
| redis.url | string | "" |
External Redis URI. Required for Pro (the embedded XCom is Lite-only). Example: redis://host:6379/0, or rediss://host:6380/0 for TLS. |
| replicaCount | int | 1 |
Number of control-plane replicas. The scheduler leader-elects (ADR 0009), so >1 is HA-safe (active-passive scheduler, active-active API). |
| resources | object | {"limits":{"cpu":"1","memory":"512Mi"},"requests":{"cpu":"100m","memory":"128Mi"}} |
Resource requests + limits for the leoflow-server container. Defaults sized for a small Pro (50โ500 DAGs); bump CPU+memory for larger deployments. The scheduler's main load is DB polling, not in-process compute. |
| secretKey | string | "" |
AES-256 key encrypting Connection passwords + Extra at rest (ADR 0019). MUST be exactly 32 raw bytes OR 64-char hex OR base64-of-32-bytes. Without it, Connection management is disabled (Variables still work). |
| secretKeyExistingSecret | string | "" |
Name of a Secret with key secretKey (takes precedence over secretKey). |
| securityContext.allowPrivilegeEscalation | bool | false |
|
| securityContext.capabilities.drop[0] | string | "ALL" |
|
| securityContext.readOnlyRootFilesystem | bool | false |
|
| securityContext.runAsNonRoot | bool | true |
|
| securityContext.runAsUser | int | 65532 |
|
| service.annotations | object | {} |
Service annotations (e.g. cloud LB controller hints, ExternalDNS). |
| service.type | string | "ClusterIP" |
Service type. ClusterIP for internal-only; LoadBalancer to expose externally; NodePort for k3d/kind. |
| serviceAccount.annotations | object | {} |
ServiceAccount annotations (e.g. AWS IAM role: eks.amazonaws.com/role-arn). |
| serviceAccount.create | bool | true |
Create a dedicated ServiceAccount for the leoflow-server. Set false only if you bring your own via name. |
| serviceAccount.name | string | "" |
Override the ServiceAccount name. Defaults to the chart fullname when empty. |
| taskNamespace | string | "leoflow" |
Namespace where the control plane creates task pods. MUST match the namespace the server expects (server code currently targets leoflow). The chart grants the control plane RBAC to manage pods here; if you override this, the RBAC follows but the server still looks at leoflow. |
| taskSecret.mountPath | string | "/etc/leoflow/secrets" |
Read-only mount path in the task pod. A connection references files here, e.g. /etc/leoflow/secrets/key.json. |
| taskSecret.name | string | "" |
Name of an existing Kubernetes Secret to mount into task pods. Empty = none. |
| taskServiceAccount.annotations | object | {} |
Annotations. GKE Workload Identity: iam.gke.io/gcp-service-account: GSA@PROJECT.iam.gserviceaccount.com. EKS IRSA: eks.amazonaws.com/role-arn: .... |
| taskServiceAccount.create | bool | false |
Create a ServiceAccount in taskNamespace for task pods to run as. |
| taskServiceAccount.name | string | "leoflow-task" |
Name of the task ServiceAccount (use this as execution.service_account). |
| tolerations | list | [] |
Pod tolerations (standard K8s โ allow scheduling on tainted nodes). |