- TypeScript 100%
| src | ||
| tests | ||
| .gitignore | ||
| package-lock.json | ||
| package.json | ||
| README.md | ||
| tsconfig.json | ||
| vitest.config.ts | ||
browser-tool
A stateful, session-isolated browser control tool for Fractal Synapse agents. Exposes a single webBrowser tool powered by raw Playwright (playwright-core) that lets an agent navigate the web, interact with page elements, extract text and HTML, take screenshots, manage tabs, execute JavaScript, and handle iframes -- all through a multi-action interface. Each agent instance gets its own independent browser session, preventing interference between sub-agents.
Installation
# Install the package (from the project root)
cd packages/tools/browser-tool
npm install
# Install the Chromium browser binary (required by playwright-core)
npx playwright install chromium
playwright-coredoes not auto-download browsers. The consuming application is responsible for installing the browser binary separately.
Actions
The tool accepts an action parameter plus optional context-specific parameters. All 20 actions:
| Action | Parameters | Description |
|---|---|---|
navigate |
url (required), waitUntil (optional) |
Go to a URL. Supports waitUntil: "load", "networkidle", "domcontentloaded" (default: "load"). |
go_back |
-- | Navigate back in browser history. |
go_forward |
-- | Navigate forward in browser history. |
close |
-- | Close the current tab. If it is the last tab, the entire browser session is closed. |
click |
selector (required) |
Click an element on the page. |
fill |
selector (required), value (required) |
Type text into an input field. |
select |
selector (required), value (required) |
Choose an option from a <select> dropdown by value or label. |
hover |
selector (required) |
Hover the mouse over an element. |
drag |
selector (required), targetSelector (required) |
Drag an element to a target element. |
press_key |
key (required), selector (optional) |
Press a keyboard key. If selector is provided, the key is sent to that element. |
delete |
selector (optional) |
Press the Delete key on an element or globally. |
get_visible_text |
selector (optional) |
Get readable text from the page or a specific element. Truncated at 10,000 characters. |
get_html |
selector (optional), depth (optional, default 4) |
Get cleaned, depth-limited HTML with <script>, <style>, <svg>, <noscript> stripped. For full raw HTML use evaluate with document.documentElement.outerHTML. |
screenshot |
-- | Take a screenshot of the current page. Saved via storage plugin if available, otherwise returned as base64. |
evaluate |
value (required) |
Execute JavaScript code in the page context. The value parameter is the JS code string. |
console_logs |
-- | Get browser console output collected since the last call. Clears the log buffer after returning. |
iframe_click |
frameSelector (required), selector (required) |
Click an element inside an iframe. |
iframe_fill |
frameSelector (required), selector (required), value (required) |
Fill an input field inside an iframe. |
upload_file |
selector (required), filePath (required) |
Upload a file via a <input type="file"> element. |
click_and_switch_tab |
selector (required) |
Click a link that opens a new tab and automatically switch to it. |
Selector Formats
The selector parameter supports multiple Playwright selector engines:
CSS: "#login-btn", ".nav-item", "input[name=email]", "button.primary"
Text: "text=Sign in", "text=Create account"
ARIA: "role=button[name='Submit']", "role=link[name='Docs']"
Label: "label=Email address", "label=Password"
Usage
Register the tool in your agent's tool registry:
import { webBrowserToolDefinition } from 'browser-tool';
// Register with the tool registry
toolRegistry.registerTool('webBrowser', webBrowserToolDefinition);
// Add to the agent's tool list
const agent = createAgent({
tools: ['webBrowser', /* ...other tools */],
// ...
});
Session Isolation
Each agent instance gets its own isolated browser session. Sessions are keyed by experimental_context.agentId, so multiple sub-agents running concurrently cannot interfere with each other's browser state. Tab management uses a per-session page stack -- click_and_switch_tab pushes new pages, close pops them, and closing the last tab tears down the entire browser instance for that agent.
Screenshots and Storage
The screenshot action attempts to save the image via a storage plugin (storage-fs or storage-mock) accessed through experimental_context.plugins. If a storage plugin is available, the screenshot is saved to screenshots/ within the storage directory and the file path is returned. If no storage plugin is configured, the screenshot is returned as a base64-encoded string.