Inference costPublished - April 9, 2026
Batch Inference
Why batch inference can cut costs sharply on AI workloads that do not need real-time latency.
Value often starts with better cost architecture, not with a bigger model.
Lab
Articles, repos, and working notes, openly available.
Why batch inference can cut costs sharply on AI workloads that do not need real-time latency.
Value often starts with better cost architecture, not with a bigger model.
The current public Claude-Book release shows how I orchestrate multiple agents, state, and multi-pass workflows around a writing system.
How to design agentic workflows that go beyond a wrapper or a linear chatbot.
A repo to compare retrieval strategies and document when embeddings actually add value.
Reducing RAG stack complexity often improves delivery speed, maintenance, and total cost.