Chapter 4: The Agentic AI Stack
Understanding who provides what — and where your effort should focus.
Building an agentic system involves three distinct layers. Clarity on these layers helps you make good "build vs buy" decisions and focus your effort where it creates the most value.
Layer 1: The Model (The Brain)
The large language model provides the core reasoning capability — the ability to understand language, think through problems, and generate responses.
Examples: Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), Llama (Meta)
You don't build this. You access it via API, paying per use. The model providers have invested billions in training these systems. Your job is to use them effectively, not recreate them.
Layer 2: The Framework/Platform (The Nervous System)
The framework handles everything that turns a model into a functioning agent:
- The agent loop (receive input → reason → act → observe → repeat)
- Tool definitions and execution
- Memory management
- Integration with external systems
- Error handling and retries
Examples of existing platforms:
- Azure AI Foundry (Microsoft)
- Amazon Bedrock Agents (AWS)
- Google Vertex AI Agents
- Mastra (open-source TypeScript)
- LangChain/LangGraph (open-source Python)
- CrewAI (multi-agent focused)
Or you can build your own. Some organisations write custom code to handle the agent loop, tool execution, and memory. This gives maximum control but requires significant development and maintenance effort.
If you choose to build your own framework — or want to understand what's inside the platforms you're evaluating — the logical architecture typically includes four layers: Interface (how users interact), Orchestration (the agent loop), Core Services (model, tools, memory), and External Systems (what the agent connects to).
→ See Addendum: Agentic AI Framework Architecture
Layer 3: The DNA (The Identity)
The DNA — instructions plus knowledge — is what transforms a generic agent into a specialist. This is where your unique value lives.
This is what you design and configure. Regardless of which model or platform you use, the DNA is yours. It's the IP that makes your agent valuable for your specific business context.
Where Should You Focus?
For most organisations, the answer is clear:
| Layer | Approach | Rationale |
|---|---|---|
| Layer 1: Model | Rent | Use a commercial model via API |
| Layer 2: Platform | Buy | Use an existing platform or framework |
| Layer 3: DNA | Build | Invest your effort in designing great DNA |
The model providers are spending billions on Layer 1. Platform providers are competing fiercely on Layer 2. Neither of these is where you'll differentiate.
Note: the framework architecture described in this chapter — and illustrated in the addendum — reflects a single-agent baseline. When you introduce multiple specialist agents collaborating on a task, Layer 2 expands significantly. The Orchestration Layer grows to include confidence aggregation and inter-agent coordination; the Memory Service expands to include shared state accessible across the agent team. These multi-agent extensions are covered in Part 2, particularly Chapters 9 (Multi-Agent Patterns) and 11 (Memory Patterns).
Your competitive advantage comes from Layer 3 — understanding your domain deeply enough to craft instructions and curate knowledge that makes your agent genuinely useful. This is where the Pragmatix Digital Transformation Framework provides guidance.
Runtime Architectures
Not all agents need the same runtime environment. A customer service advisor answering questions requires a very different infrastructure to a software delivery agent that clones repositories, writes code, runs builds, and deploys artifacts. Understanding the runtime options helps you make better infrastructure decisions and avoid over- or under-engineering your solution.
There are three broad runtime architectures for agentic systems, each suited to different use cases.
Model 1: Managed Platform Runtime
Managed platforms like Azure AI Foundry, Amazon Bedrock Agents, and Google Vertex AI Agents handle the runtime for you. Your agent logic runs within the platform's managed infrastructure. You configure the agent's DNA (instructions and knowledge), define tool connections, and the platform manages compute, scaling, and the agent loop.
Best suited for: Advisory and conversational agents, business process automation, customer-facing assistants, and any agent whose primary activity is reasoning and tool calls rather than direct system manipulation. These agents receive a prompt, reason through the problem, possibly call some APIs or retrieve information, and return a response. They don't need file systems, development tooling, or long-running compute.
Trade-offs: Fastest path to production with minimal infrastructure management. However, you're constrained by what the platform supports and locked into that provider's ecosystem. Customisation of the agent loop, memory management, and orchestration patterns is limited to what the platform exposes.
Example: The Pragmatix Advisory Portal runs specialist advisors on Azure AI Foundry. Each advisor is a stateless reasoning engine with tool access — the managed platform handles all runtime concerns.
Model 2: Shared Infrastructure Runtime
You build your own Layer 2 framework and deploy it on shared infrastructure — a server, a container, or a set of containers. All agents share the same runtime environment, with the framework managing the agent loop, tool execution, and memory.
At its simplest, an agent in this model is: a server that receives a user message, code that sends that message to an LLM API along with system instructions and tool definitions, code that handles the response (executing tool calls and passing results back to the model), and a loop that repeats until the model produces a final answer. Your code is the framework — the glue that manages this back-and-forth, handles errors, manages memory, and exposes it through an API or interface.
Best suited for: Organisations that need more control over the agent loop, custom orchestration patterns, or portability across cloud providers. Also appropriate when you need to integrate with systems or protocols that managed platforms don't support, or when you want to avoid vendor lock-in.
Trade-offs: Full control and portability, but you own the infrastructure, scaling, and operational burden. Agents share resources, so a misbehaving agent could affect others. Simpler to operate than container-per-agent models, but less isolated.
Example: A custom agentic framework running on Azure Container Apps, with multiple specialist agents sharing the same deployment. The framework uses PostgreSQL for memory, Redis for caching, and calls Claude or GPT-4o for reasoning.
Many organisations are already building agentic capabilities without realising it. Consider a Privacy Information Management System (PIMS) deployed as a native Azure application with a front end, Azure Function Apps, and a SQL database.
One feature is an AI-powered asset register. When a user uploads a file or screenshot containing details about an application — say, a SaaS product's about page or a vendor datasheet — an AI agent extracts and interprets the content, maps it to the asset register schema, and suggests which fields to populate when creating the record. The user reviews the suggestions, and on approval, the agent creates the record in the database.
This is a textbook example of shared infrastructure runtime. The layers map cleanly:
- Layer 1 (Model): GPT-4o via Azure OpenAI
- Layer 2 (Framework): The Function App code — it receives the upload, includes the asset register schema in the system prompt (prompt stuffing), sends it to the model, parses the response, presents suggestions to the user, and executes the database write on approval
- Layer 3 (DNA): The system prompt containing the table schema, field definitions, valid values, and mapping instructions
There's no separate agent platform. No managed orchestration service. The application code is the agent framework — it handles the loop of receiving input, calling the model, interpreting the response, and taking action.
In terms of the patterns from Part 2, this agent combines Tool Use (Pattern 2) with Approval Gates (Pattern 11), operating at Level 2 (Assistive) on the Autonomy Spectrum — the agent drafts, the human approves.
The graduation path is clear: move to Confidence-Based Escalation (Pattern 12) where the agent creates records automatically for high-confidence extractions and only escalates uncertain ones to the user. That's a step toward Level 3 (Active) — and it requires nothing more than updating the DNA and adding a confidence threshold to the existing code.
Model 3: Ephemeral Agent Runtimes (Container-per-Agent)
In this model, each agent instance spawns in its own isolated, ephemeral runtime environment — typically a container or managed development environment. An orchestrator decides what agents are needed, provisions the runtime, the agent does its work, and the environment is destroyed when the task is complete.
Platforms like Daytona, GitHub Codespaces, and Gitpod provide the infrastructure for this pattern. They offer API-driven provisioning of standardised environments with full development tooling — precisely what code-writing agents need.
Best suited for: Software delivery agents, code generation teams, testing and QA automation, data pipeline execution, or any use case where agents need to manipulate files, execute code, run builds, or operate with full development tooling. Particularly valuable for multi-agent systems where parallel execution and strict isolation between agents are important.
Trade-offs: Maximum isolation and scalability. Each agent gets a clean environment, preventing interference between tasks. Resources scale with workload, and you only pay for compute when agents are active. However, this is the most complex model to build and operate. Environment provisioning adds latency, and coordinating state across ephemeral containers requires careful design.
Example: An agentic software delivery platform where a supervisor agent receives a feature request, then spawns specialist agents in isolated Daytona environments — a discovery agent analyses requirements, an architecture agent designs the solution, coding agents write and test the code in parallel containers, and a deployment agent handles release. Each agent has its own workspace with git, compilers, and testing frameworks available.
Choosing the Right Runtime
The right runtime architecture depends on what your agents actually do, not on what sounds most sophisticated. Match the architecture to the use case.
| Dimension | Managed Platform | Shared Infrastructure | Ephemeral Runtimes |
|---|---|---|---|
| Setup effort | Low — platform handles it | Medium — you build the framework | High — orchestrator plus runtime provisioning |
| Control | Limited to platform capabilities | Full control over agent loop and integration | Full control plus environment isolation |
| Isolation | Logical (platform-managed) | Shared resources between agents | Full container isolation per agent |
| Scalability | Platform-managed auto-scaling | Manual scaling of shared infrastructure | Scales with workload; pay only when active |
| Cost model | Pay per use (platform fees plus model costs) | Fixed infrastructure plus model costs | Per-environment compute plus model costs |
| Portability | Locked to provider | Portable across clouds | Portable (container-based) |
| Dev tooling access | Not available | Limited (shared environment) | Full (git, compilers, runtimes, etc.) |
| Ideal agent types | Advisory, conversational, business process | Custom automation, multi-agent coordination | Software delivery, code execution, data pipelines |
The Pragmatic Approach
As with all technology decisions, start with the simplest architecture that meets your needs:
- If your agents reason and call tools — a managed platform is likely sufficient. Don't build infrastructure you don't need.
- If you need control or portability — build your own framework on shared infrastructure. This gives you flexibility without the complexity of managing ephemeral environments.
- If your agents execute code or manipulate systems — consider ephemeral runtimes. The isolation and clean-state benefits justify the additional complexity.
Many organisations will use more than one model. Advisory agents might run on a managed platform while software delivery agents use ephemeral runtimes — all coordinated through a shared orchestration layer. The key is matching the runtime to the workload, not defaulting to the most complex option.
Your runtime architecture will evolve. Start with a managed platform for your first agents. As you encounter limitations or need greater control, graduate to shared infrastructure. Reserve ephemeral runtimes for use cases that genuinely need them. This is the Game of Inches applied to infrastructure — add complexity only when the use case demands it.
