Skip to content

Archives

Measuring Agents in Production

  • Measuring Agents in Production

    "This 2025 December paper, "Measuring Agents in Production", cuts through the reality behind the hype. It surveys 306 practitioners and conducts 20 in-depth case studies across 26 domains to document what is actually running in live environments. The reality is far more basic, constrained, and human-dependent than TPOT suggest."

    This very much meshes with what I've seen and heard in real world usage. Lots of constrained LLM usage, carefully prompted, and reliability (consistent correct behavior over time) remains the primary bottleneck and challenge.

    (via Murat Demirbas)

    Tags: llm usage real-world ai agents papers via:muratbuffalo