Interactive browser control — navigate, click, fill forms, extract content, screenshots, tab management
  • TypeScript 100%
Find a file
2026-03-30 06:30:06 -03:00
src Initial commit 2026-03-30 06:30:06 -03:00
tests Initial commit 2026-03-30 06:30:06 -03:00
.gitignore Initial commit 2026-03-30 06:30:06 -03:00
package-lock.json Initial commit 2026-03-30 06:30:06 -03:00
package.json Initial commit 2026-03-30 06:30:06 -03:00
README.md Initial commit 2026-03-30 06:30:06 -03:00
tsconfig.json Initial commit 2026-03-30 06:30:06 -03:00
vitest.config.ts Initial commit 2026-03-30 06:30:06 -03:00

browser-tool

A stateful, session-isolated browser control tool for Fractal Synapse agents. Exposes a single webBrowser tool powered by raw Playwright (playwright-core) that lets an agent navigate the web, interact with page elements, extract text and HTML, take screenshots, manage tabs, execute JavaScript, and handle iframes -- all through a multi-action interface. Each agent instance gets its own independent browser session, preventing interference between sub-agents.

Installation

# Install the package (from the project root)
cd packages/tools/browser-tool
npm install

# Install the Chromium browser binary (required by playwright-core)
npx playwright install chromium

playwright-core does not auto-download browsers. The consuming application is responsible for installing the browser binary separately.

Actions

The tool accepts an action parameter plus optional context-specific parameters. All 20 actions:

Action Parameters Description
navigate url (required), waitUntil (optional) Go to a URL. Supports waitUntil: "load", "networkidle", "domcontentloaded" (default: "load").
go_back -- Navigate back in browser history.
go_forward -- Navigate forward in browser history.
close -- Close the current tab. If it is the last tab, the entire browser session is closed.
click selector (required) Click an element on the page.
fill selector (required), value (required) Type text into an input field.
select selector (required), value (required) Choose an option from a <select> dropdown by value or label.
hover selector (required) Hover the mouse over an element.
drag selector (required), targetSelector (required) Drag an element to a target element.
press_key key (required), selector (optional) Press a keyboard key. If selector is provided, the key is sent to that element.
delete selector (optional) Press the Delete key on an element or globally.
get_visible_text selector (optional) Get readable text from the page or a specific element. Truncated at 10,000 characters.
get_html selector (optional), depth (optional, default 4) Get cleaned, depth-limited HTML with <script>, <style>, <svg>, <noscript> stripped. For full raw HTML use evaluate with document.documentElement.outerHTML.
screenshot -- Take a screenshot of the current page. Saved via storage plugin if available, otherwise returned as base64.
evaluate value (required) Execute JavaScript code in the page context. The value parameter is the JS code string.
console_logs -- Get browser console output collected since the last call. Clears the log buffer after returning.
iframe_click frameSelector (required), selector (required) Click an element inside an iframe.
iframe_fill frameSelector (required), selector (required), value (required) Fill an input field inside an iframe.
upload_file selector (required), filePath (required) Upload a file via a <input type="file"> element.
click_and_switch_tab selector (required) Click a link that opens a new tab and automatically switch to it.

Selector Formats

The selector parameter supports multiple Playwright selector engines:

CSS:        "#login-btn", ".nav-item", "input[name=email]", "button.primary"
Text:       "text=Sign in", "text=Create account"
ARIA:       "role=button[name='Submit']", "role=link[name='Docs']"
Label:      "label=Email address", "label=Password"

Usage

Register the tool in your agent's tool registry:

import { webBrowserToolDefinition } from 'browser-tool';

// Register with the tool registry
toolRegistry.registerTool('webBrowser', webBrowserToolDefinition);

// Add to the agent's tool list
const agent = createAgent({
  tools: ['webBrowser', /* ...other tools */],
  // ...
});

Session Isolation

Each agent instance gets its own isolated browser session. Sessions are keyed by experimental_context.agentId, so multiple sub-agents running concurrently cannot interfere with each other's browser state. Tab management uses a per-session page stack -- click_and_switch_tab pushes new pages, close pops them, and closing the last tab tears down the entire browser instance for that agent.

Screenshots and Storage

The screenshot action attempts to save the image via a storage plugin (storage-fs or storage-mock) accessed through experimental_context.plugins. If a storage plugin is available, the screenshot is saved to screenshots/ within the storage directory and the file path is returned. If no storage plugin is configured, the screenshot is returned as a base64-encoded string.