Core Concepts
This chapter covers the five pillars of Harness Engineering. Each one is a deep topic on its own — we introduce the fundamentals here and link to deeper dives.
Context Management
Context is the information fed to the model on each turn. Managing it well is the difference between an agent that feels magical and one that feels broken.
The Context Window Problem
Every model has a finite context window (8K–2M tokens). Your harness must decide:
- What goes in — system prompt, conversation history, file contents, tool results
- What gets dropped — older messages, redundant information, resolved threads
- In what order — priority ranking of context sources
Strategies
| Strategy | How It Works | Trade-off |
|---|---|---|
| Sliding window | Keep last N messages | Loses early context |
| Summarization | Compress old context into summaries | Lossy but compact |
| Retrieval (RAG) | Fetch relevant context on demand | Requires indexing |
| Hierarchical | Multi-level: hot (recent) + warm (session) + cold (archive) | Complex but effective |
How Context Flows Through a Harness
graph TD
A[System Prompt] --> E[Context Assembler]
B[Conversation History] --> E
C[Retrieved Documents / RAG] --> E
D[Tool Results] --> E
E --> F{Fits Token Limit?}
F -->|Yes| G[Send to LLM]
F -->|No| H[Prioritize & Trim]
H --> E
Key Design Decisions
- How do you handle context overflow?
- Do you summarize automatically or let the model decide?
- How do you prioritize competing context sources?
Memory & Persistence
Memory is context that survives across sessions. Without it, your agent wakes up with amnesia every time.
Memory Layers
┌─ Working Memory ──────── Current conversation context
├─ Session Memory ──────── Survives within a session (temp files, state)
├─ Long-term Memory ────── Persists across sessions (MEMORY.md, vector DB)
└─ Shared Memory ──────── Accessible across agents (team knowledge base)
The AGENTS.md / MEMORY.md Pattern
A file-based memory pattern popularized by OpenClaw and Claude Code:
- AGENTS.md — Agent configuration, personality, rules (read every session)
- MEMORY.md — Curated long-term memories (read + updated by agent)
- memory/YYYY-MM-DD.md — Daily raw logs (append-only)
This pattern is simple, portable, and version-controlled — the agent's memory lives in plain text files that humans can read and edit.
Memory Ownership
Who owns the agent's memory?
This is one of the most consequential questions in Harness Engineering:
- User-owned memory (open harness): stored locally, exportable, portable
- Platform-owned memory (closed harness): stored on vendor servers, locked in
Skill / Tool Orchestration
Skills (also called tools, plugins, or capabilities) extend what an agent can do beyond text generation.
Skill Architecture
Agent receives task
→ Harness determines which skills are needed
→ Skills are invoked (API calls, file ops, code execution)
→ Results are fed back to the agent
→ Agent synthesizes and responds
Design Patterns
| Pattern | Description | Example |
|---|---|---|
| Thin harness + thick skills | Harness is minimal; skills carry complexity | OpenClaw skills |
| Thick harness + thin tools | Harness has built-in logic; tools are simple | Claude Code built-in tools |
| Plugin marketplace | Community-contributed skills | OpenClaw Skill Gallery |
Key Considerations
- How does the agent discover available skills?
- How are permissions managed per skill?
- How do you handle skill failures gracefully?
- Can skills be composed (skill A calls skill B)?
Agent Lifecycle
An agent goes through distinct phases. The harness manages transitions between them.
Boot → Initialize → Active → [Paused] → Shutdown
│ │ │ │
│ │ │ └── Heartbeat / wake
│ │ └── Handle messages, run tasks
│ └── Load memory, read config, check permissions
└── Start runtime, validate environment
Key Lifecycle Events
- Cold start — First boot, no prior state
- Warm start — Resuming with existing memory
- Heartbeat — Periodic check-in (proactive tasks)
- Graceful shutdown — Save state, flush memory
- Crash recovery — Restore from last known good state
Multi-Agent Coordination
When multiple agents work together, the harness becomes an orchestrator.
Coordination Patterns
| Pattern | Description | When to Use |
|---|---|---|
| Hub-and-spoke | One coordinator delegates to specialist agents | Task decomposition |
| Peer-to-peer | Agents communicate directly | Collaborative editing |
| Pipeline | Output of agent A feeds into agent B | Sequential processing |
| Swarm | Many agents work independently on sub-tasks | Parallel exploration |
Hub-and-Spoke Example
graph TD
User[User Request] --> Coordinator[Coordinator Agent]
Coordinator --> A[Research Agent]
Coordinator --> B[Code Agent]
Coordinator --> C[Review Agent]
A -->|findings| Coordinator
B -->|code| Coordinator
C -->|feedback| B
Coordinator --> User
Challenges
- State sharing — How do agents share context?
- Conflict resolution — What if two agents edit the same file?
- Resource management — Token budgets, API rate limits
- Observability — Who did what, when, and why?
The Fundamental Equation
As Huang Jia puts it simply: Agent = Model + Harness. The model provides the brain; the harness provides the body.
Harness Core Components (Huang Jia's Framework)
A useful decomposition into six modules:
- Agentic Loop — The heart. Accept input → execute tools → iterate → return result. Directly descended from the ReAct (Reasoning + Acting) pattern.
- Tool System — The hands. Extends LLM capabilities beyond language into real-world actions.
- Memory & Context — The long-term brain. Provides continuity across sessions. (See Memory chapter.)
- Guardrails — The reins. Allow / Deny / Ask permission controls.
- Hooks — The guards. Pre/post-execution checks (e.g., preventing secret leaks).
- Session — The continuity layer. Runtime state management across interactions.
Five Production Problems a Harness Solves
When moving from prototype to production, agents hit predictable walls. A well-designed harness addresses each:
| Problem | Symptom | Harness Solution |
|---|---|---|
| Infinite loops | Agent keeps calling tools without converging | Loop budgets, step limits, convergence detection |
| Context explosion | Token usage balloons, quality degrades | Context compaction, summarization, priority queues |
| Permission loss-of-control | Agent executes dangerous operations | Guardrails (Allow/Deny/Ask), sandbox isolation |
| Quality unpredictability | Output varies wildly between runs | Quality gates, self-review loops, structured output |
| Cost opacity | Bills spike without visibility | Token accounting, cost caps, usage dashboards |
Next: Architecture Patterns →