No description
  • TypeScript 100%
Find a file
James Peret ae5a4e6e46
feat: add 21 new eval tests for untested tools and improve scoring
Add tool-evals.ts covering glob, grep, create-file, edit-file, tool-search,
reasoning, convertUnit, web-fetch-http, web-search-tavily, sub-agent, and
cross-tool tests. Add setup/teardown infrastructure to agent.eval.ts for
file-dependent tests. Add time-accuracy scoring mode to hybrid scorer.
Fix SpaceX eval expected value for better LLM fallback scoring.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-05-04 23:04:23 -03:00
evals feat: add 21 new eval tests for untested tools and improve scoring 2026-05-04 23:04:23 -03:00
src feat: wire messageGateway into background-jobs plugin config 2026-05-03 01:10:58 -03:00
tests Integrate Braintrust tracing with experimental_telemetry support 2025-09-18 21:51:26 -03:00
.gitignore Updated .gitignore 2026-04-28 23:51:07 -03:00
package-lock.json feat: wire remote-nodes-plugin and remote-bash-tool into Severin 2026-04-17 22:14:03 -03:00
package.json feat: wire remote-nodes-plugin and remote-bash-tool into Severin 2026-04-17 22:14:03 -03:00
README.md Add comprehensive plugin and tool integration to Severin agent 2025-09-13 17:09:37 -03:00
tsconfig.json Update Severin autonomous agent implementation 2025-09-11 13:53:42 -03:00
vitest.config.ts Add comprehensive plugin and tool integration to Severin agent 2025-09-13 17:09:37 -03:00

Severin Agent

Evals

To run evals:

# First export API Keys
export BRAINTRUST_API_KEY="YOUR_API_KEY"
# Run the evals and create an experiment in Braintrust
npx braintrust eval evals/agent.eval.ts
# Run the evals without sending any data
npx braintrust eval --no-send-logs evals/agent.eval.ts