Limits

Angl is production-interesting, not production-ready. This page states the known limits plainly, because the project's credibility depends on them being owned rather than hidden.

Current limits

No sandboxing in the dev judge. Generated code runs as a normal subprocess with your user privileges. A malicious .angl file's behavior text could steer the model into generating code that reads secrets, and nothing in the dev path stops it from running. Do not run someone else's .angl file without reading the generated artifact first. The distributed runtime has a validated sandbox boundary; the dev judge does not yet run on it.
Contract completeness is the ceiling. Untested behavior is unspecified, the same way C frames undefined behavior. No solution is planned; it is a known, owned limit.
Case authoring is still real work. Angl makes the checked cases the source of truth; it does not remove the need to state important examples.
Regeneration is not a proof of correctness. It is a way to ask a model for a new artifact, then reject it unless the checked contract still passes.
Non-Python targets are prototype-grade. Go, Rust, TypeScript, bundle, and assembly exist but run through subprocess, Docker, or generated adapter plumbing.
Fixture coverage is small. http_fixture and postgres_fixture exist; richer systems need richer fixtures.
Some behavior resists black-box contracts. Rich stateful UIs, real-time systems, and intricate concurrency are hard to judge over JSON I/O, and thus hard to Angl.

What is proven locally

Reliability proof points from the repo, all run locally:

mixed-language generated artifacts behind black-box shims
the repair loop with failure feedback
Docker-backed Go, Rust, and TypeScript execution
a Redis/Celery/Postgres distributed demo: 60/60 queued jobs passed through API, Redis, Celery, runner, and Postgres, with 359 persisted jobs and 0 failures
the distributed runtime's sandbox boundary validated 9/9: non-root containers, no docker.sock, the runner mounts only the generated build directory, cannot resolve the database or queue, and executes generated Rust with pre-fetched crates and no runtime egress
contract-strength mutation tests
real Postgres fixtures in the judge, including a DB-backed chapter compiled by a live model and verified against temporary Postgres containers
core suite: 94/94 tests passing with strict spec lint

The core idea is strong because the durable asset is the contract, not the generated source. The weak point is not the LLM. The weak point is contract completeness and side-effect coverage. The project becomes much more serious once every external dependency that matters has a fixture.

Limits

Current limits

What is proven locally

The honest assessment

On this page