No description
- TypeScript 100%
Add tool-evals.ts covering glob, grep, create-file, edit-file, tool-search, reasoning, convertUnit, web-fetch-http, web-search-tavily, sub-agent, and cross-tool tests. Add setup/teardown infrastructure to agent.eval.ts for file-dependent tests. Add time-accuracy scoring mode to hybrid scorer. Fix SpaceX eval expected value for better LLM fallback scoring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| evals | ||
| src | ||
| tests | ||
| .gitignore | ||
| package-lock.json | ||
| package.json | ||
| README.md | ||
| tsconfig.json | ||
| vitest.config.ts | ||
Severin Agent
Evals
To run evals:
# First export API Keys
export BRAINTRUST_API_KEY="YOUR_API_KEY"
# Run the evals and create an experiment in Braintrust
npx braintrust eval evals/agent.eval.ts
# Run the evals without sending any data
npx braintrust eval --no-send-logs evals/agent.eval.ts