Interactive browser control — navigate, click, fill forms, extract content, screenshots, tab management

TypeScript 100%

Find a file

James Peret 16a95c702f Initial commit		2026-03-30 06:30:06 -03:00
src	Initial commit	2026-03-30 06:30:06 -03:00
tests	Initial commit	2026-03-30 06:30:06 -03:00
.gitignore	Initial commit	2026-03-30 06:30:06 -03:00
package-lock.json	Initial commit	2026-03-30 06:30:06 -03:00
package.json	Initial commit	2026-03-30 06:30:06 -03:00
README.md	Initial commit	2026-03-30 06:30:06 -03:00
tsconfig.json	Initial commit	2026-03-30 06:30:06 -03:00
vitest.config.ts	Initial commit	2026-03-30 06:30:06 -03:00

README.md

browser-tool

A stateful, session-isolated browser control tool for Fractal Synapse agents. Exposes a single webBrowser tool powered by raw Playwright (playwright-core) that lets an agent navigate the web, interact with page elements, extract text and HTML, take screenshots, manage tabs, execute JavaScript, and handle iframes -- all through a multi-action interface. Each agent instance gets its own independent browser session, preventing interference between sub-agents.

Installation

# Install the package (from the project root)
cd packages/tools/browser-tool
npm install

# Install the Chromium browser binary (required by playwright-core)
npx playwright install chromium

playwright-core does not auto-download browsers. The consuming application is responsible for installing the browser binary separately.

Actions

The tool accepts an action parameter plus optional context-specific parameters. All 20 actions:

Action	Parameters	Description
`navigate`	`url` (required), `waitUntil` (optional)	Go to a URL. Supports `waitUntil`: `"load"`, `"networkidle"`, `"domcontentloaded"` (default: `"load"`).
`go_back`	--	Navigate back in browser history.
`go_forward`	--	Navigate forward in browser history.
`close`	--	Close the current tab. If it is the last tab, the entire browser session is closed.
`click`	`selector` (required)	Click an element on the page.
`fill`	`selector` (required), `value` (required)	Type text into an input field.
`select`	`selector` (required), `value` (required)	Choose an option from a `<select>` dropdown by value or label.
`hover`	`selector` (required)	Hover the mouse over an element.
`drag`	`selector` (required), `targetSelector` (required)	Drag an element to a target element.
`press_key`	`key` (required), `selector` (optional)	Press a keyboard key. If `selector` is provided, the key is sent to that element.
`delete`	`selector` (optional)	Press the Delete key on an element or globally.
`get_visible_text`	`selector` (optional)	Get readable text from the page or a specific element. Truncated at 10,000 characters.
`get_html`	`selector` (optional), `depth` (optional, default 4)	Get cleaned, depth-limited HTML with `<script>`, `<style>`, `<svg>`, `<noscript>` stripped. For full raw HTML use `evaluate` with `document.documentElement.outerHTML`.
`screenshot`	--	Take a screenshot of the current page. Saved via storage plugin if available, otherwise returned as base64.
`evaluate`	`value` (required)	Execute JavaScript code in the page context. The `value` parameter is the JS code string.
`console_logs`	--	Get browser console output collected since the last call. Clears the log buffer after returning.
`iframe_click`	`frameSelector` (required), `selector` (required)	Click an element inside an iframe.
`iframe_fill`	`frameSelector` (required), `selector` (required), `value` (required)	Fill an input field inside an iframe.
`upload_file`	`selector` (required), `filePath` (required)	Upload a file via a `<input type="file">` element.
`click_and_switch_tab`	`selector` (required)	Click a link that opens a new tab and automatically switch to it.

Selector Formats

The selector parameter supports multiple Playwright selector engines:

CSS:        "#login-btn", ".nav-item", "input[name=email]", "button.primary"
Text:       "text=Sign in", "text=Create account"
ARIA:       "role=button[name='Submit']", "role=link[name='Docs']"
Label:      "label=Email address", "label=Password"

Usage

import { webBrowserToolDefinition } from 'browser-tool';

// Register with the tool registry
toolRegistry.registerTool('webBrowser', webBrowserToolDefinition);

// Add to the agent's tool list
const agent = createAgent({
  tools: ['webBrowser', /* ...other tools */],
  // ...
});

Session Isolation

Each agent instance gets its own isolated browser session. Sessions are keyed by experimental_context.agentId, so multiple sub-agents running concurrently cannot interfere with each other's browser state. Tab management uses a per-session page stack -- click_and_switch_tab pushes new pages, close pops them, and closing the last tab tears down the entire browser instance for that agent.

Screenshots and Storage

The screenshot action attempts to save the image via a storage plugin (storage-fs or storage-mock) accessed through experimental_context.plugins. If a storage plugin is available, the screenshot is saved to screenshots/ within the storage directory and the file path is returned. If no storage plugin is configured, the screenshot is returned as a base64-encoded string.