Research / PracticeMarch 30, 2026
View Original

Meta-Harness: Automated Harness Optimization

Original: Tweet thread by Lior Alexander · March 30, 2026 Paper: by Yoonho Lee et al. — Quoted tweet Engagement: 242 likes · 390 bookmarks Category: Research / Practice


Overview

What if you could automate the process of improving an agent's harness? Meta-Harness does exactly that — an automated optimization method that tunes system prompts, tool definitions, retry logic, and context management end-to-end, achieving a 6x performance gap from harness changes alone, without changing the model.


How It Works

The core loop is simple:

1. Start with any harness
2. An optimizer agent reads the code, logs, and scores
3. It identifies what caused each failure
4. It rewrites the harness and submits a new version
5. That version gets tested, results go back into the folder
6. Repeat

What Makes It Different: Raw File Access

Previous automated optimization methods compressed everything into short summaries — the optimizer only saw ~26K tokens per step. Not enough to trace why something broke.

Meta-Harness keeps every raw file: 10 million tokens per step, 400x more data to learn from. Enough to trace a failure back to the exact line that caused it.


Results

Benchmark Result Note
TerminalBench-2 (coding) #1 among all Haiku agents
Text classification Beats best hand-designed harness by 7.7 points Uses 4x fewer tokens
Math Single strategy improves accuracy across 5 unseen models Cross-model transfer

All improvements came from optimizing the harness, not the model.


The Key Insight

"The performance delta between frontier models is narrowing. The delta between harness implementations on the same model is not. That's where the leverage is."

This is the strongest empirical evidence yet for why Harness Engineering matters: a well-designed harness can extract dramatically more value from the same model than a poorly-designed one. And now, the design process itself can be automated.


Implications

  1. Harness optimization is a tractable problem — you don't need humans to hand-tune every configuration
  2. Cross-model transfer — a good harness design for one model often improves others
  3. Token efficiency is a first-class objective — Meta-Harness found solutions that are both better AND cheaper
  4. The future of harness engineering may be harness-engineering-engineering — meta-level optimization of the optimization process itself

See also: AutoAgent: Self-Optimizing Agent · Wayne Zhang: Three Scaling Dimensions