The Claude Agent Stack: Everything Anthropic Built For Production In 2025

Nobody at Anthropic is building chat anymore.

Every announcement they shipped in 2025 was for agent engineering. Not demo agents. Production agents that run for hours, process millions of tokens, cost real money, and do actual work.

Most people still think of Claude as a better chat model. That is missing the point entirely. Over the last twelve months they have quietly shipped the only complete production stack for building agents that don't fall over after 10 turns.

Prompt caching is the foundation of all production agents

This is the single most important thing Anthropic published all year. Almost no one outside the small group of people running production agents talks about this.

If you are building an agent and you are not designing every part of your architecture around prompt caching, you are wasting 70-90% of your budget. And you will get destroyed on latency.

Claude Code runs their entire oncall around cache hit rate. They declare SEVs if it drops below target. That is not a nice to have. That is table stakes.

The rules are non negotiable:

Static content always comes first. Dynamic content last. One single timestamp in your system prompt will invalidate every cache entry for every user.
Never add or remove tools mid conversation. Ever. Not even for plan mode. Not even when you think the user only needs 2 tools right now. Changing the tool list breaks the cache at the first token.
Never run compaction with a different system prompt. That one mistake will make your 100k token compaction call cost 100x more than it should. Use cache safe forking.
Switching models mid conversation is almost never worth it. A 100k token cache miss on Haiku costs more than just running the query on Opus.

This is not prompt engineering. This is agent operations.

The five multi agent patterns you will actually use

Anthropic did not publish theoretical patterns. These are the exact patterns they use inside Claude Code.

Stop building agent swarms. Stop building recursive self improving anything. Every production multi agent system in use today uses one of these five, in this order of adoption:

Generator / Verifier. 80% of production multi agent systems stop here. One writes, one checks. This pattern solves 9 out of 10 quality problems. It is boring, it works, and everyone forgets to add the maximum iteration limit until their agent loops for 3 hours burning $200.
Orchestrator / Subagent. This is what Claude Code uses. One main agent, disposable subagents for isolated work. This pattern works until the orchestrator becomes an information bottleneck. That happens right around 4 subagents.
Agent Teams. Persistent workers that retain context across tasks. Use this when you have work that runs for days, not minutes.
Message Bus. Use this once you have more than 10 agent types and your orchestrator code is 3000 lines of if statements.
Shared State. This is the only pattern that scales past 20 agents. It is also the hardest one to get right. You will spend 90% of your time debugging reactive loops that burn tokens forever.

Start with generator verifier. Do not skip steps. Everyone tries to jump straight to shared state. Everyone regrets it.

Batch API changes the unit economics of offline work

The Message Batches API is not just a discount. It rewrites what is economically viable to build with LLMs.

50% off all tokens. 10,000 requests per batch. 24 hour SLA.

This is the first LLM API that makes processing entire corporate document repositories, backfilling embeddings, grading millions of user inputs, or running nightly agent jobs actually affordable.

Quora already uses this for all non user facing summarization. Before this API they were running 200 parallel workers and managing their own rate limit backoff. Now they submit one batch and walk away.

You should use this for every single request that does not need a response within 60 seconds. Most teams could move 70% of their LLM traffic to batch today and cut their bill in half. Almost no one does this yet.

Context is an artifact, not a prompt

Brendan MacLean's work on the Skyline codebase is the single most important case study on agent onboarding that exists right now.

He did not fine tune a model. He did not write 1000 line system prompts. He did exactly what he does when onboarding a new human developer.

He built a context layer.

700,000 lines of C#. 17 years of history. No human remembers all of it. Claude did not remember it either, until he wrote it down in a format Claude could load.

This is the insight almost everyone misses. Agents do not learn. They read.

You do not train an agent on your codebase. You onboard it. You write down the rules, the conventions, the history, the mistakes people made before. You version that document. You maintain it. And then every agent gets to start there.

This is not magic. This is exactly what good engineering teams have always done for humans. It works exactly the same way for agents.

The pwiz-ai repository he built is now a permanent project asset. It will outlast every human developer currently working on Skyline.

Memory is not for remembering your cat's name

Everyone misunderstood the memory launch. This is not a consumer feature.

Claude memory is designed for teams. Every project gets its own isolated memory. There is no global memory. It is auditable. Editable. Administrators can turn it off entirely.

This is solving one specific very painful enterprise problem: every single conversation with Claude currently starts with the user pasting 3 paragraphs of context about their team, their project, their client, and their deadlines.

That is 80% of all tokens typed by enterprise users today. Memory eliminates that.

This is also the first feature that creates real lock in. Once Claude has 6 months of context about your team's work you will never switch models.

Skills are the missing standard for agent capabilities

Skills are not plugins.

Plugins were HTTP endpoints you called. Skills are repeatable workflows that Claude knows how to execute, when to execute, and how to chain together.

Most importantly, they are an open standard. The same skill will work in Claude Chat, Claude Code, Claude Cowork, and via the API. Anthropic is explicitly pushing this standard to work on every other LLM platform too.

This is the first serious attempt to build a common interface for agent capabilities. If it works it will do for agents what SQL did for databases.

As of December 2025 every major SaaS vendor has shipped an official skill. Adoption is moving faster than any previous LLM standard.

Domain fine tuning works when you do it correctly

The Diode Computers partnership shows exactly how model improvement actually happens right now.

Anthropic did not collect a million generic electrical engineering papers. They did not run general pretraining.

They got access to a very specific, very well defined task: generate reference designs in Zener from chip datasheets. They got a testbench that could grade outputs correctly. They trained Sonnet 4.5 on that task.

The result: Diode engineers preferred the output 8 out of 10 times.

This is how model improvement will work for the next five years. Not general purpose superintelligence. Domain specific, task focused fine tuning done in partnership with teams that actually do the work.

If you run a production agent with clear success metrics, Anthropic will work with you to improve the base model for your use case. Almost no teams know this program exists.

The tiered usage model is working

The Max plan launch settled the pricing argument.

Users do not want per token pricing for end user products. They want a subscription that lets them work without counting tokens.

20x higher limits. Priority access. Users are paying for it. This is the first pricing tier that actually matches how power users interact with LLMs.

For developers this means you can stop building usage counters and token limit warnings into your products. Users will pay for headroom.

The gap no one is talking about

All of these pieces fit together.

Prompt caching for long running agents. Multi agent patterns for scaling work. Batch API for offline processing. Context layers for onboarding. Memory for persistence. Skills for capabilities.

None of this is accidental. Anthropic is no longer competing on model benchmark scores. They are competing on the platform required to run agents in production.

Every other LLM provider is still selling you a model. Anthropic is selling you everything you need to actually run it.

That is the gap almost no one is talking about right now.

If you are building agent systems today, this is the stack you will be using twelve months from now.

The Claude Agent Stack: Everything Anthropic Built For Production In 2025

Prompt caching is the foundation of all production agents ​

The five multi agent patterns you will actually use ​

Batch API changes the unit economics of offline work ​

Context is an artifact, not a prompt ​

Memory is not for remembering your cat's name ​

Skills are the missing standard for agent capabilities ​

Domain fine tuning works when you do it correctly ​

The tiered usage model is working ​

The gap no one is talking about ​