Part 1: Foundations

Chapter 4: The Agentic AI Stack

Understanding who provides what — and where your effort should focus.

Building an agentic system involves three distinct layers. Clarity on these layers helps you make good "build vs buy" decisions and focus your effort where it creates the most value.

Layer 1: The Model (The Brain)

The large language model provides the core reasoning capability — the ability to understand language, think through problems, and generate responses.

Examples: Claude (Anthropic), GPT-4 (OpenAI), Gemini (Google), Llama (Meta)

You don't build this. You access it via API, paying per use. The model providers have invested billions in training these systems. Your job is to use them effectively, not recreate them.

Layer 2: The Framework/Platform (The Nervous System)

The framework handles everything that turns a model into a functioning agent:

Examples of existing platforms:

Or you can build your own. Some organisations write custom code to handle the agent loop, tool execution, and memory. This gives maximum control but requires significant development and maintenance effort.

If you choose to build your own framework — or want to understand what's inside the platforms you're evaluating — the logical architecture typically includes four layers: Interface (how users interact), Orchestration (the agent loop), Core Services (model, tools, memory), and External Systems (what the agent connects to).

→ See Addendum: Agentic AI Framework Architecture

Layer 3: The DNA (The Identity)

The DNA — instructions plus knowledge — is what transforms a generic agent into a specialist. This is where your unique value lives.

This is what you design and configure. Regardless of which model or platform you use, the DNA is yours. It's the IP that makes your agent valuable for your specific business context.

Where Should You Focus?

For most organisations, the answer is clear:

Layer Approach Rationale
Layer 1: Model Rent Use a commercial model via API
Layer 2: Platform Buy Use an existing platform or framework
Layer 3: DNA Build Invest your effort in designing great DNA

The model providers are spending billions on Layer 1. Platform providers are competing fiercely on Layer 2. Neither of these is where you'll differentiate.

Note: the framework architecture described in this chapter — and illustrated in the addendum — reflects a single-agent baseline. When you introduce multiple specialist agents collaborating on a task, Layer 2 expands significantly. The Orchestration Layer grows to include confidence aggregation and inter-agent coordination; the Memory Service expands to include shared state accessible across the agent team. These multi-agent extensions are covered in Part 2, particularly Chapters 9 (Multi-Agent Patterns) and 11 (Memory Patterns).

Key Insight

Your competitive advantage comes from Layer 3 — understanding your domain deeply enough to craft instructions and curate knowledge that makes your agent genuinely useful. This is where the Pragmatix Digital Transformation Framework provides guidance.

Runtime Architectures

Not all agents need the same runtime environment. A customer service advisor answering questions requires a very different infrastructure to a software delivery agent that clones repositories, writes code, runs builds, and deploys artifacts. Understanding the runtime options helps you make better infrastructure decisions and avoid over- or under-engineering your solution.

There are three broad runtime architectures for agentic systems, each suited to different use cases.

Model 1: Managed Platform Runtime

Managed platforms like Azure AI Foundry, Amazon Bedrock Agents, and Google Vertex AI Agents handle the runtime for you. Your agent logic runs within the platform's managed infrastructure. You configure the agent's DNA (instructions and knowledge), define tool connections, and the platform manages compute, scaling, and the agent loop.

Best suited for: Advisory and conversational agents, business process automation, customer-facing assistants, and any agent whose primary activity is reasoning and tool calls rather than direct system manipulation. These agents receive a prompt, reason through the problem, possibly call some APIs or retrieve information, and return a response. They don't need file systems, development tooling, or long-running compute.

Trade-offs: Fastest path to production with minimal infrastructure management. However, you're constrained by what the platform supports and locked into that provider's ecosystem. Customisation of the agent loop, memory management, and orchestration patterns is limited to what the platform exposes.

Example: The Pragmatix Advisory Portal runs specialist advisors on Azure AI Foundry. Each advisor is a stateless reasoning engine with tool access — the managed platform handles all runtime concerns.

Model 2: Shared Infrastructure Runtime

You build your own Layer 2 framework and deploy it on shared infrastructure — a server, a container, or a set of containers. All agents share the same runtime environment, with the framework managing the agent loop, tool execution, and memory.

At its simplest, an agent in this model is: a server that receives a user message, code that sends that message to an LLM API along with system instructions and tool definitions, code that handles the response (executing tool calls and passing results back to the model), and a loop that repeats until the model produces a final answer. Your code is the framework — the glue that manages this back-and-forth, handles errors, manages memory, and exposes it through an API or interface.

Best suited for: Organisations that need more control over the agent loop, custom orchestration patterns, or portability across cloud providers. Also appropriate when you need to integrate with systems or protocols that managed platforms don't support, or when you want to avoid vendor lock-in.

Trade-offs: Full control and portability, but you own the infrastructure, scaling, and operational burden. Agents share resources, so a misbehaving agent could affect others. Simpler to operate than container-per-agent models, but less isolated.

Example: A custom agentic framework running on Azure Container Apps, with multiple specialist agents sharing the same deployment. The framework uses PostgreSQL for memory, Redis for caching, and calls Claude or GPT-4o for reasoning.

Case Study: You Might Already Be Building Agents

Many organisations are already building agentic capabilities without realising it. Consider a Privacy Information Management System (PIMS) deployed as a native Azure application with a front end, Azure Function Apps, and a SQL database.

One feature is an AI-powered asset register. When a user uploads a file or screenshot containing details about an application — say, a SaaS product's about page or a vendor datasheet — an AI agent extracts and interprets the content, maps it to the asset register schema, and suggests which fields to populate when creating the record. The user reviews the suggestions, and on approval, the agent creates the record in the database.

This is a textbook example of shared infrastructure runtime. The layers map cleanly:

  • Layer 1 (Model): GPT-4o via Azure OpenAI
  • Layer 2 (Framework): The Function App code — it receives the upload, includes the asset register schema in the system prompt (prompt stuffing), sends it to the model, parses the response, presents suggestions to the user, and executes the database write on approval
  • Layer 3 (DNA): The system prompt containing the table schema, field definitions, valid values, and mapping instructions

There's no separate agent platform. No managed orchestration service. The application code is the agent framework — it handles the loop of receiving input, calling the model, interpreting the response, and taking action.

In terms of the patterns from Part 2, this agent combines Tool Use (Pattern 2) with Approval Gates (Pattern 11), operating at Level 2 (Assistive) on the Autonomy Spectrum — the agent drafts, the human approves.

The graduation path is clear: move to Confidence-Based Escalation (Pattern 12) where the agent creates records automatically for high-confidence extractions and only escalates uncertain ones to the user. That's a step toward Level 3 (Active) — and it requires nothing more than updating the DNA and adding a confidence threshold to the existing code.

Model 3: Ephemeral Agent Runtimes (Container-per-Agent)

In this model, each agent instance spawns in its own isolated, ephemeral runtime environment — typically a container or managed development environment. An orchestrator decides what agents are needed, provisions the runtime, the agent does its work, and the environment is destroyed when the task is complete.

Platforms like Daytona, GitHub Codespaces, and Gitpod provide the infrastructure for this pattern. They offer API-driven provisioning of standardised environments with full development tooling — precisely what code-writing agents need.

Best suited for: Software delivery agents, code generation teams, testing and QA automation, data pipeline execution, or any use case where agents need to manipulate files, execute code, run builds, or operate with full development tooling. Particularly valuable for multi-agent systems where parallel execution and strict isolation between agents are important.

Trade-offs: Maximum isolation and scalability. Each agent gets a clean environment, preventing interference between tasks. Resources scale with workload, and you only pay for compute when agents are active. However, this is the most complex model to build and operate. Environment provisioning adds latency, and coordinating state across ephemeral containers requires careful design.

Example: An agentic software delivery platform where a supervisor agent receives a feature request, then spawns specialist agents in isolated Daytona environments — a discovery agent analyses requirements, an architecture agent designs the solution, coding agents write and test the code in parallel containers, and a deployment agent handles release. Each agent has its own workspace with git, compilers, and testing frameworks available.

Choosing the Right Runtime

The right runtime architecture depends on what your agents actually do, not on what sounds most sophisticated. Match the architecture to the use case.

Dimension Managed Platform Shared Infrastructure Ephemeral Runtimes
Setup effort Low — platform handles it Medium — you build the framework High — orchestrator plus runtime provisioning
Control Limited to platform capabilities Full control over agent loop and integration Full control plus environment isolation
Isolation Logical (platform-managed) Shared resources between agents Full container isolation per agent
Scalability Platform-managed auto-scaling Manual scaling of shared infrastructure Scales with workload; pay only when active
Cost model Pay per use (platform fees plus model costs) Fixed infrastructure plus model costs Per-environment compute plus model costs
Portability Locked to provider Portable across clouds Portable (container-based)
Dev tooling access Not available Limited (shared environment) Full (git, compilers, runtimes, etc.)
Ideal agent types Advisory, conversational, business process Custom automation, multi-agent coordination Software delivery, code execution, data pipelines

The Pragmatic Approach

As with all technology decisions, start with the simplest architecture that meets your needs:

Many organisations will use more than one model. Advisory agents might run on a managed platform while software delivery agents use ephemeral runtimes — all coordinated through a shared orchestration layer. The key is matching the runtime to the workload, not defaulting to the most complex option.

Practical Advice

Your runtime architecture will evolve. Start with a managed platform for your first agents. As you encounter limitations or need greater control, graduate to shared infrastructure. Reserve ephemeral runtimes for use cases that genuinely need them. This is the Game of Inches applied to infrastructure — add complexity only when the use case demands it.

☰ Contents