Compared honestly · against frameworks we respect

You already have options.
Here's where Cleat fits.

LangGraph, Temporal, and plain Python with Pydantic are all reasonable choices for building stateful AI backends. Each makes different bets. Cleat makes a different bet too—that auditability, effect tracking, and orchestration belong in the type system, not in your prayers.

This page shows the same problem solved four ways, then where each tool wins. No strawmen. No marketing math.

→vs LangGraph →vs Temporal →vs Python + Pydantic →Feature matrix →When NOT to use Cleat

[ 01 / VS LANGGRAPH ]

vs LangGraph

LangGraph is the dominant choice for stateful agent workflows—graph nodes, shared state, checkpoints, human-in-the-loop. It's a Python library running inside your interpreter, and effects are whatever your nodes happen to do.

Library

LangGraph

Python framework, runtime-discovered effects, observability via LangSmith.

Language

Cleat

Compiled, declared effects, provenance and timeouts as syntax.

analyze_graph.py python

from typing import TypedDict
from langgraph.graph import StateGraph, END
import httpx, json

class State(TypedDict):
    repo: str
    pr: int
    result: dict | None

async def analyze(state: State) -> dict:
    # network call — but who knows? caller doesn't.
    url = f"https://api/{state['repo']}/pr/{state['pr']}"
    try:
        async with httpx.AsyncClient() as c:
            r = await c.get(url, timeout=30)
            r.raise_for_status()
            return {"result": r.json()}
    except Exception as e:
        # retry? you're writing it.
        return {"result": None}

graph = StateGraph(State)
graph.add_node("analyze", analyze)
graph.set_entry_point("analyze")
graph.add_edge("analyze", END)
app = graph.compile()

# mocking? monkeypatch httpx, hope nothing else uses it.

analyze.cleat cleat

import "std/http"
import "std/json"

type Analysis = struct {
    files_changed: int,
    summary: string,
}

tool analyze(repo: string, pr: int)
    -> Result[Analysis, string]
    needs { net }      // declared, enforced
    timeout: 30s         // syntax, not config
    retry: 2             // syntax, not glue
{
    let url = "https://api/${repo}/pr/${pr}"
    let resp = http.get(url)?
    let data: Analysis = json.decode(resp.body)?
    return Ok(data)
}

test "analyze can be mocked" {
    using analyze = fn(r, p) {
        return Ok(Analysis { ... })
    }      // scoped, typed, no monkeypatch
}

LangGraph wins when

You're already deep in the LangChain ecosystem and want LangSmith tracing
You need cyclic graph execution with rich human-in-the-loop checkpoints
Your team thinks in Python and the orchestration layer should match
You want a vast pre-built library of agent patterns and integrations

→ Cleat wins when

Effects matter—you want the compiler to refuse a tool that lies about side effects
Provenance is a requirement, not a feature you'll build later
You'd rather ship a single binary than manage a Python deployment
Mocking should be a language feature, not a third-party framework

[ 02 / VS TEMPORAL ]

vs Temporal

Temporal is the gold standard for durable execution—workflows that survive crashes and run for months. It's also a separate service you operate, with deterministic constraints that make general-purpose languages awkward to use correctly.

Platform

Temporal

Durable execution service · multi-language SDKs · operate the cluster.

Language

Cleat

Compile to native binary · zero infrastructure · auditability built-in.

workflow.py · Temporal python

from datetime import timedelta
from temporalio import workflow, activity
from temporalio.common import RetryPolicy

@activity.defn
async def fetch_pr(repo: str, pr: int) -> dict:
    # effect is here — but workflow can't call it directly,
    # must use execute_activity. determinism rule.
    ...

@workflow.defn
class AnalyzeWorkflow:
    @workflow.run
    async def run(self, repo: str, pr: int):
        return await workflow.execute_activity(
            fetch_pr, args=[repo, pr],
            start_to_close_timeout=timedelta(seconds=30),
            retry_policy=RetryPolicy(maximum_attempts=3),
        )

# plus: a Worker process, a Temporal Service running
# somewhere, and the discipline to never call random()
# or datetime.now() directly inside a workflow.

analyze.cleat cleat

import "std/http"

tool fetch_pr(repo: string, pr: int)
    -> Result[http.Response, string]
    needs { net }
    timeout: 30s
    retry: 3
{
    let url = "https://api/${repo}/pr/${pr}"
    return http.get(url)
}

// for long-running, durable state, attach a chain:
chain PRAuditTrail {
    signing: ed25519
    @retention(7y)
    record Step {
        action: string,
        result: string,
        timestamp: time,
    }
}

// no Workers. no Service. one binary.

Temporal wins when

You need true durable execution—workflows that resume after a server reboot weeks later
Polyglot teams where workflows span Go, Java, TypeScript, Python, and .NET
You already operate distributed infrastructure and a Temporal Service is no extra burden
Mission-critical orchestration where automatic state replay justifies the operational cost

→ Cleat wins when

Workflows are seconds-to-minutes long, not weeks—you don't need a workflow service
Auditability via signed provenance chains is a first-class need
You want determinism enforced by the language, not by SDK conventions you can violate
Single-binary deployment matters more than multi-language interop

[ 03 / VS PYTHON + PYDANTIC ]

vs Python + Pydantic

The default. You reach for FastAPI, Pydantic, httpx, and write the orchestration yourself. Maximum flexibility, maximum surface area, runtime everything.

Stack

Python + Pydantic

Glue your own. Runtime validation. Mypy if you remember to run it.

Language

Cleat

Static types. Compile-time effects. Built-in test runner. One binary.

analyze.py python

from pydantic import BaseModel
import httpx, asyncio
from tenacity import retry, stop_after_attempt

class Analysis(BaseModel):
    files_changed: int
    summary: str

@retry(stop=stop_after_attempt(3))
async def analyze(repo: str, pr: int) -> Analysis:
    # effects: ¯\_(ツ)_/¯
    url = f"https://api/{repo}/pr/{pr}"
    async with httpx.AsyncClient(timeout=30) as c:
        r = await c.get(url)
        r.raise_for_status()
        return Analysis.model_validate(r.json())

# tests: pytest + pytest-mock + AsyncMock + a prayer
import pytest
from unittest.mock import AsyncMock, patch

@pytest.mark.asyncio
async def test_analyze():
    with patch("httpx.AsyncClient.get",
               new_callable=AsyncMock) as m:
        m.return_value.json.return_value = {...}
        result = await analyze("a", 1)
        assert result.files_changed == 3

analyze.cleat cleat

import "std/http"
import "std/json"

type Analysis = struct {
    files_changed: int,
    summary: string,
}

tool analyze(repo: string, pr: int)
    -> Result[Analysis, string]
    needs { net }
    timeout: 30s
    retry: 3
{
    let url = "https://api/${repo}/pr/${pr}"
    let resp = http.get(url)?
    let data: Analysis = json.decode(resp.body)?
    return Ok(data)
}

test "analyze can be mocked" {
    using analyze = fn(r, p) {
        return Ok(Analysis {
            files_changed: 3, summary: "ok"
        })
    }
    assert(analyze("a", 1).IsOk())
}

// no pytest. no tenacity. no AsyncMock.

Python wins when

You need the entire Python ML and data ecosystem in the same process
Your team's existing skills are Python and switching costs outweigh language wins
You're prototyping fast and orchestration discipline can come later (or never)
You need a library that exists in PyPI and nowhere else

→ Cleat wins when

You want a single binary, no virtualenv, no dependency hell
Effect tracking and mocking should be free, not 4 dependencies
Compile-time errors beat runtime AttributeErrors at 3am
You're building a backend whose correctness you'll have to defend in an audit

[ 04 / FEATURE MATRIX ]

The full matrix.

No tool wins on every row. This is what each one actually ships.

Feature	LangGraph	Temporal	Python+Pydantic	Cleat
Static type checking errors at compile, not runtime	via mypy	SDK-level	via mypy	native
Compile-time effect tracking `needs { net, fs }` enforced	no	no	no	enforced
First-class tools timeout, retry as syntax	decorators	activities	DIY	keyword
First-class streams backpressure, typed yields	yes	via signals	DIY	keyword
State machines with reachability checks	graph nodes	SDK pattern	DIY	keyword
Cryptographic provenance signed audit trail, native	via LangSmith	event history	DIY	keyword
Durable execution across crashes multi-day workflows that resume	checkpoints	flagship	no	roadmap
Built-in test runner no external framework	pytest	SDK testkits	pytest	cleat test
Built-in formatter	ruff/black	per-language	ruff/black	cleat fmt
Compiles to native binary	interpreter	per SDK	interpreter	via Go
External infrastructure required	optional	cluster	none	none
Ecosystem maturity	large	large	vast	alpha

Last updated April 2026 · corrections welcome on GitHub

[ 05 / HONESTY ]

When not to use Cleat.

A new language is a real cost. If your problem is shaped like one of these, pick something else and we won't be offended.

Reach for Cleat when…

Your backend coordinates AI tools and you care about what they actually did. Especially if "what they did" might one day need to be defended in an audit, a postmortem, or a regulatory review.

You're building a multi-step AI pipeline and side effects keep biting you in production tools that hit the wrong API, retries that didn't fire, mocks left in by accident
Provenance, signing, and audit trails are requirements—not "nice to have" regulated industries, security tooling, financial workflows, healthcare
You want one statically-typed binary, not a Python service plus a workflow cluster plus an observability stack small teams, edge deployment, anyone tired of YAML
You're starting a new project and the team has appetite for a new language greenfield is much easier than retrofitting; go all-in or don't bother

Stay where you are when…

Cleat is alpha software. The compiler works, the stdlib ships, real programs run—but it's a young language with a small ecosystem. Be honest about your constraints.

You need a specific Python or npm library that has no equivalent scientific computing, ML training loops, niche SDKs—use the language with the library
Your workflows must survive multi-day server crashes Temporal solved this. Use Temporal. Cleat's durable execution story is on the roadmap, not in the box
You're shipping next quarter and risk-averse stakeholders need a 1.0 release Cleat is alpha. Wait for a stable tag, or pilot it on a non-critical service first
Your team has zero appetite to learn a new language That's a real constraint. Forcing a language on an unwilling team produces worse code than the language you replaced

You already have options.Here's where Cleat fits.

vs LangGraph

vs Temporal

vs Python + Pydantic

The full matrix.

When not to use Cleat.

Reach for Cleat when…

Stay where you are when…

Still curious?

You already have options.
Here's where Cleat fits.