Agent Research — April 2026
Research into agent architecture for the pilot: how to build the agent host, what tools are needed, how Claude Code works under the hood.
Claude Code Architecture
Claude Code's agency is NOT model fine-tuning. It is:
- A system prompt — extensive instructions shaping behavior, tool usage, safety rules, conventions. This is the primary "agency" mechanism.
- A simple while loop — think → act → observe → repeat. Single-threaded. No complex orchestration.
- Real tool implementations — the model's power comes from what it can DO.
- Context management — automatic compaction, persistent memory, subagent isolation.
- Permission/safety layers — sandboxing, permission classification, hook-based interception.
Anthropic API Tool Call Format
Proprietary format, NOT MCP. Clean and structured.
Request — tool definitions:
{
"name": "get_weather",
"description": "Get current weather",
"input_schema": {
"type": "object",
"properties": { "location": { "type": "string" } },
"required": ["location"]
}
}Response — tool call:
{
"stop_reason": "tool_use",
"content": [
{ "type": "text", "text": "Let me check." },
{
"type": "tool_use",
"id": "toolu_01A09q90qw90lq917835lq9",
"name": "get_weather",
"input": { "location": "San Francisco, CA" }
}
]
}Sending result back (in user message):
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01A09q90qw90lq917835lq9",
"content": "68F, partly cloudy"
}
]
}stop_reason: "tool_use"signals the model wants tools executed- Multiple
tool_useblocks in one response = parallel tool calls - All results go in a single user message
- Error results:
is_error: trueon the tool_result - No
"role": "tool"— results are always inside"role": "user"messages
MCP bridge: MCP uses inputSchema
(camelCase), Anthropic uses input_schema (snake_case). MCP
uses arguments, Anthropic uses input. Minor
field renaming to bridge. Anthropic also has a beta MCP connector for
server-side bridging.
Claude Code — Complete Tool Reference
File Operations
- Read — read file contents. Params:
file_path,offset,limit,pages(PDF). Returns cat -n format. Supports images, PDFs, notebooks. - Write — create/overwrite file. Params:
file_path,content. Enforces read-before-write. - Edit — exact string replacement. Params:
file_path,old_string,new_string,replace_all. Requires prior Read. - Glob — file pattern matching. Params:
pattern,path. Returns paths sorted by mtime. - Grep — content search (ripgrep). Params:
pattern,path,output_mode,glob,type, context flags,multiline,head_limit,offset.
Execution
- Bash — execute shell commands. Params:
command,description,timeout(max 600s),run_in_background. Returns stdout/stderr. - BashOutput — retrieve output from background shells.
- KillShell — terminate a background shell.
Web
- WebSearch — web search. Params:
query,allowed_domains,blocked_domains. Returns results with URLs and snippets. - WebFetch — fetch URL, convert to markdown, process
with fast model. Params:
url,prompt(what to extract). 15-min cache.
Agent / Sub-agents
- Agent — launch autonomous sub-agent with its own
context window.
- Params:
prompt,description,subagent_type(Explore/Plan/general-purpose/custom),run_in_background,model,resume - Sub-agent gets its own context window (completely separate from parent)
- Gets a system prompt (built-in or from
.claude/agents/*.md) - Cannot spawn further sub-agents (no nesting)
- Up to ~7 parallel
- Returns final text report to parent
- Built-in types: Explore (read-only, haiku), Plan (read-only), general-purpose (all tools)
- Params:
Task Management
- TodoWrite — structured task lists. Items have
content,status(pending/in_progress/completed),activeForm.
Planning
- ExitPlanMode — exit planning mode with implementation plan.
User Interaction
- AskUserQuestion — structured multiple-choice questions to user.
Tool Discovery
- ToolSearch — fetch schemas for deferred/lazy-loaded tools. MCP tools and rarely-used tools load on demand.
Skills
- Skill — execute named skills/slash commands from
.claude/skills/.
Worktrees
- EnterWorktree / ExitWorktree — isolated git worktrees for parallel branch work.
IDE-specific
- getDiagnostics — VS Code language server diagnostics.
- executeCode — execute Python in Jupyter kernel.
MCP Resources
- ListMcpResourcesTool / ReadMcpResourceTool — access MCP server resources.
Computer Use
- Computer — screenshot, mouse, keyboard for graphical interfaces.
Total: ~24 built-in tools + unlimited via MCP servers.
Claude Code Lifecycle Hooks
25 lifecycle events. Key ones:
| Event | Data | Can Control |
|---|---|---|
| UserPromptSubmit | full prompt text | inject context |
| PreToolUse | tool name, full args, tool_use_id | block, modify, allow, escalate |
| PostToolUse | tool name, args, full result | inject context, block further |
| PostToolUseFailure | same as PostToolUse | same |
| Stop | stop_hook_active (no response text!) | — |
| SessionStart/End | source, model | — |
| SubagentStart/Stop | agent_id, transcript_path | — |
Gap: Stop event does not include Claude's response text. Must read from JSONL transcript file.
Hook types: command (shell), http (POST), prompt (LLM eval), agent (multi-turn subagent).
Agent SDK
The Claude Agent SDK packages the Claude Code loop as an embeddable library (TypeScript and Python):
- Built-in tool implementations (same as Claude Code)
- Hooks as programmatic callbacks (not shell commands)
- Direct response streaming (no transcript file workaround)
- Permission model, session management, context compaction
- Designed for Claude Code-like experiences — brings its own tools and context management
Open Source Agent Frameworks
Best fit for OpenLight (thinnest, don't own state)
- Mirascope — "LLM Anti-Framework." Explicit manual loop. No state ownership. Multi-provider. Python.
- Raw Anthropic SDK — zero abstraction, total control. You build everything.
- LiteLLM — provider abstraction only (100+ providers). Not a framework. Good utility layer.
- Vercel AI SDK — TypeScript.
prepareStepcallback for inter-step logic. No built-in persistence.
Moderate fit (some opinions, configurable)
- Pydantic AI — dependency injection for custom
state.
Agent.iter()for step-by-step control. Python. - Magentic — decorator-based. Thin. Less loop control.
Poor fit (own too much)
- Claude Agent SDK — designed for Claude Code-like experiences, owns context
- LangGraph — graph-based, owns state and persistence
- smolagents — code-as-action, owns execution context
- OpenAI Agents SDK — provider lock-in
- CrewAI, Google ADK, Microsoft Agent Framework — heavy multi-agent orchestration
WebSearch — Server-Side, Zero Implementation
WebSearch is a built-in server-side capability of
the Anthropic Messages API. Not a tool you implement — you add
{"type": "web_search_20250305", "name": "web_search"} to
the tools array and the API handles everything.
- The model composes the search query itself
- Search executes server-side within the API call
- Response contains both the query (
server_tool_use) and results (web_search_tool_result) $10/1k searches+ standard token costs- Domain allowlists/blocklists supported
- Also available:
web_search_20260209with dynamic filtering (model writes code to filter results)
No implementation needed for the pilot. Just include it in the tool definitions.
Brave Search is a separate, unrelated option available via MCP. Free tier: 2k queries/month. Useful as a fallback or for vendor independence.
VM Containment
Lightweight Linux VM — not Docker, not process-level sandboxing. Docker and sandbox-runtime (srt) share the host kernel — container escapes are real. For running AI agents with tool execution on a machine with sensitive data, hardware-level isolation with a separate kernel is required.
The decision: OrbStack or Lima on macOS (Apple Virtualization.framework), Firecracker on Linux. Real VMs, separate kernels, hardware isolation.
- Project directory mounted via virtio-fs (read/write)
- Public internet allowed (for API calls)
- Private/local network blocked at the VM's virtual network level
- Everything runs inside the VM: SvelteKit server, substrate, invocables
- User interacts via browser on the host (the UI is a web page served from the VM's exposed port)
Researched alternatives (rejected):
sandbox-runtime(srt): process-level sandboxing via sandbox-exec/bwrap. Shared kernel — not a real security boundary.- Docker: on macOS runs a hidden Linux VM anyway. Shared kernel on Linux. Container escapes documented.
- Firecracker: Linux-only, requires KVM. Good for Linux production.
- nsjail: Linux-only, heavier config, still shared kernel.
CLI Tools and the Anthropic API
CLI tools (executables receiving JSON on stdin) map cleanly to the Anthropic tool format:
- Tool definition: JSON schema in
input_schema - Model returns:
tool_usewith structuredinputmatching the schema - You execute: pipe
inputto the CLI, capture stdout - You send back:
tool_resultwith the output
No MCP overhead needed for our own tools. MCP is useful for consuming external tool servers but not required.
Resolved Questions
- WebSearch: Server-side API capability. Zero implementation.
- CLI alignment: Direct mapping to Anthropic API format.
- VM containment:
sandbox-runtimefor the pilot. Program-agnostic, cross-platform.
Decisions Made
- Agent runtime: Anthropic TypeScript SDK, raw API calls. Own loop, own tools, own context assembly. No framework.
- Sub-agent lifecycle: Dispatch chunk tracks status
(
running/completed/failed), PID, depth. Invocable manages process. - Sub-agent nesting: Allowed with depth limit (3 for pilot). Substrate supports it natively.
- Containment:
sandbox-runtime(srt) for bounded filesystem and network. - WebSearch: Server-side API capability, zero implementation.
- Tool definitions: Live in invocable code for the pilot. The invocable knows its own capabilities.