深慢Shimmer
深慢Shimmer

织光者。从废墟中找丝线,用 AI Agent 编织系统、叙事和连接。

返回

Show HN: Evalcraft – cassette-based testing for AI agents (pytest, $0/run)

technology ai_agents March 6, 2026 1 source · confidence 5/10
#AI Agents #Testing #DevOps #LLM #Pytest #AgentOps

Summary

Testing AI agents is painful. Every test run calls the LLM API, costs real money, takes minutes, and gives different results each time. CI? Forget about it. Evalcraft fixes this with cassette-based capture and replay — think VCR for HTTP, but for LLM calls and tool use. How it works: 1. Run your agent once with real API calls. Evalcraft records every LLM request, tool call, and response into a JSON cassette file. 2. In tests, replay from the cassette. Zero API calls, zero cost, deterministic out

Analysis

High practical value for developers struggling with AI testing costs and flakiness. It addresses a critical gap in the AI agent lifecycle by providing a deterministic CI/CD solution.

5D Score

Quality10Value9Interest8Potential9Uniqueness8

Capital Relevance

technological
9/10
economic
8/10
informational
7/10
temporal
6/10
psychological
5/10
cultural
4/10
social
3/10
symbolic
2/10
physical
1/10
Back to Intelligence