No description

TypeScript 100%

Find a file

James Peret ae5a4e6e46 feat: add 21 new eval tests for untested tools and improve scoring Add tool-evals.ts covering glob, grep, create-file, edit-file, tool-search, reasoning, convertUnit, web-fetch-http, web-search-tavily, sub-agent, and cross-tool tests. Add setup/teardown infrastructure to agent.eval.ts for file-dependent tests. Add time-accuracy scoring mode to hybrid scorer. Fix SpaceX eval expected value for better LLM fallback scoring. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>		2026-05-04 23:04:23 -03:00
evals	feat: add 21 new eval tests for untested tools and improve scoring	2026-05-04 23:04:23 -03:00
src	feat: wire messageGateway into background-jobs plugin config	2026-05-03 01:10:58 -03:00
tests	Integrate Braintrust tracing with experimental_telemetry support	2025-09-18 21:51:26 -03:00
.gitignore	Updated .gitignore	2026-04-28 23:51:07 -03:00
package-lock.json	feat: wire remote-nodes-plugin and remote-bash-tool into Severin	2026-04-17 22:14:03 -03:00
package.json	feat: wire remote-nodes-plugin and remote-bash-tool into Severin	2026-04-17 22:14:03 -03:00
README.md	Add comprehensive plugin and tool integration to Severin agent	2025-09-13 17:09:37 -03:00
tsconfig.json	Update Severin autonomous agent implementation	2025-09-11 13:53:42 -03:00
vitest.config.ts	Add comprehensive plugin and tool integration to Severin agent	2025-09-13 17:09:37 -03:00

README.md

Severin Agent

Evals

To run evals:

# First export API Keys
export BRAINTRUST_API_KEY="YOUR_API_KEY"
# Run the evals and create an experiment in Braintrust
npx braintrust eval evals/agent.eval.ts
# Run the evals without sending any data
npx braintrust eval --no-send-logs evals/agent.eval.ts