Reliability around an unreliable model
The model is nondeterministic; the system around it must not be. Schema-validated outputs, retry budgets, verifier passes, and evaluation harnesses, so malformed or drifting output never reaches the next stage.
I build the reliability layer around AI: deterministic pipelines, evaluation and verification harnesses, and reproducible, cost-efficient systems. Everything below runs, is tested, and is open to re-run yourself, with no claim that isn't demonstrated.
AI's biggest problems right now are simple to name: it makes things up, it costs too much, and you can't reproduce what it did. Here is what I built for each, with a number you can check and a tool you can run right now, in your browser, no signup.
The number-one reason AI stays out of production. I don't promise zero - that's the red flag. I ground answers in your data, flag every unsupported claim, and verify before anything ships. The Hardener does the flagging: paste a document and it marks what isn't backed by a source.
Measure your rate → · Try the Hardener →Every token is money, and most prompts ship far more than the model needs. I cut the input: the Token Minimizer strips a prompt to what matters, and the dimensional API answers a localized question over a huge nested structure by reading only the slice it needs.
Try the Token Minimizer live →Same input, different output - so you can't audit it, evaluate it, or cache it. bfx-ingest turns a codebase into deterministic context: the same input yields the same root hash every run, byte-identical for prompt caching and reproducible for evals.
See bfx-ingest, with a live demo →No single source of truth means the same value drifts in five places and the model has five things to get wrong. The Dimensional Linter measures structural duplication in your code so you can collapse it to one source - the rule, not the copies.
Try the Dimensional Linter live →Beyond the AI layer, the same engineering depth, all live:
I get reliability from stock models by directing them, not hoping. Explicit operating directives, hard constraints, and a verification pass at each step keep a model on track and out of drift. Reduced to one number, that discipline took a verifier's hallucinated output from about 39% to 0% on a tested task while keeping the answer rate high. Not a smarter model, a cheaper and more reliable one. I focus my instance of it; the provider's model is untouched.
Since August 2025, on my own initiative, I went past using AI into how it works and how to optimize it, and I'm building an original framework: dimensional programming, representing data as derivable geometry so a model reads only what it needs. Demonstrated, not just claimed: a dependency-free API measures a roughly 99.7% token reduction answering localized questions over a large nested structure. The ideas are mine; every claim is published with a label for how well it's supported.
The model is nondeterministic; the system around it must not be. Schema-validated outputs, retry budgets, verifier passes, and evaluation harnesses, so malformed or drifting output never reaches the next stage.
LLM integration, RAG, evals, structured outputs, tool/function calling, model routing, prompt caching, token budgeting; self-hosted local models.
Node.js, Python, C#/.NET, Java, REST, WebSockets, PostgreSQL, SQL Server, SQLite, Linux, nginx, Docker, CI/CD, zero-downtime deploys.
Route cheap calls to cheap models, cache byte-identical context, budget tokens, so frontier spend goes only where frontier reasoning is required.
Content-addressed, hash-verified, deterministic replay, so you can prove exactly what a system did and recreate it on any machine.
Every claim is demonstrated in tested code and labeled for how well it's supported. The bold parts stay bold; the whole stays honest.
Everything I build runs on one idea, small inputs, large effects: a tiny rule, derived into vast structure. Each site below is one instance of it, all under the Butterflyfx banner.
More in the family (Kenetic Arts, an immersive 3D art gallery; the z = x·y geometry showcase) are in progress.
Seeking a full-time AI engineering or backend / platform role, open to hybrid and remote. The projects here are personal portfolio pieces, built end to end to demonstrate the work above, not commercial products for sale.