Limits
The honest assessment. Production-interesting, not production-ready.
Angl is production-interesting, not production-ready. This page states the known limits plainly, because the project's credibility depends on them being owned rather than hidden.
Current limits
- No sandboxing in the dev judge. Generated code runs as a normal
subprocess with your user privileges. A malicious
.anglfile's behavior text could steer the model into generating code that reads secrets, and nothing in the dev path stops it from running. Do not run someone else's.anglfile without reading the generated artifact first. The distributed runtime has a validated sandbox boundary; the dev judge does not yet run on it. - Contract completeness is the ceiling. Untested behavior is unspecified, the same way C frames undefined behavior. No solution is planned; it is a known, owned limit.
- Case authoring is still real work. Angl makes the checked cases the source of truth; it does not remove the need to state important examples.
- Regeneration is not a proof of correctness. It is a way to ask a model for a new artifact, then reject it unless the checked contract still passes.
- Non-Python targets are prototype-grade. Go, Rust, TypeScript, bundle, and assembly exist but run through subprocess, Docker, or generated adapter plumbing.
- Fixture coverage is small.
http_fixtureandpostgres_fixtureexist; richer systems need richer fixtures. - Some behavior resists black-box contracts. Rich stateful UIs, real-time systems, and intricate concurrency are hard to judge over JSON I/O, and thus hard to Angl.
What is proven locally
Reliability proof points from the repo, all run locally:
- mixed-language generated artifacts behind black-box shims
- the repair loop with failure feedback
- Docker-backed Go, Rust, and TypeScript execution
- a Redis/Celery/Postgres distributed demo: 60/60 queued jobs passed through API, Redis, Celery, runner, and Postgres, with 359 persisted jobs and 0 failures
- the distributed runtime's sandbox boundary validated 9/9: non-root containers, no docker.sock, the runner mounts only the generated build directory, cannot resolve the database or queue, and executes generated Rust with pre-fetched crates and no runtime egress
- contract-strength mutation tests
- real Postgres fixtures in the judge, including a DB-backed chapter compiled by a live model and verified against temporary Postgres containers
- core suite: 94/94 tests passing with strict spec lint
The honest assessment
The core idea is strong because the durable asset is the contract, not the generated source. The weak point is not the LLM. The weak point is contract completeness and side-effect coverage. The project becomes much more serious once every external dependency that matters has a fixture.