- TypeScript 93.2%
- JavaScript 6.8%
Changed ToolDefinition name from webFetchHttp to web-fetch-http for naming consistency with other tools. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| src | ||
| tests | ||
| .gitignore | ||
| package-lock.json | ||
| package.json | ||
| README.md | ||
| tsconfig.json | ||
| vitest.config.ts | ||
Web Fetch HTTP Tool
HTTP-based web page fetching and content extraction tool for the Fractal Synapse agent system.
Overview
The Web Fetch HTTP Tool provides a lightweight alternative to browser-based web scraping by using HTTP requests and AI-powered content extraction. It fetches web pages directly via HTTP and uses OpenAI to intelligently extract and structure the content.
Features
- HTTP-based fetching - No browser dependencies, faster and more lightweight
- AI-powered extraction - Uses OpenAI GPT-4o-mini to intelligently parse and extract content
- Context window handling - Smart preprocessing and chunking for large pages (e.g., Wikipedia)
- Multiple extraction modes - Text extraction, CSS selector targeting, or structured data
- Robust error handling - Comprehensive error reporting with structured error objects
- Same interface as Stagehand - Drop-in replacement for browser-based web-fetch tools
Installation
npm install
npm run build
Usage
Parameters
url(required) - The URL to fetch content fromselector(optional) - CSS selector to extract specific contentextractText(optional, default: true) - Whether to extract text content
Extraction Modes
-
Text Mode (default,
extractText: true)await webFetchHttp('https://example.com') // Returns: { title, content, summary, url, timestamp } -
Selector Mode (when
selectoris provided)await webFetchHttp('https://example.com', '.article-content') // Returns: { title, content, url, timestamp } -
Structured Mode (
extractText: false)await webFetchHttp('https://example.com', undefined, false) // Returns: { title, content, links, images, url, timestamp }
Example Results
Text Mode:
{
"url": "https://example.com",
"timestamp": "2025-09-12T10:30:00.000Z",
"title": "Example Page",
"content": "Main text content of the page...",
"summary": "Brief summary of the page content"
}
Structured Mode:
{
"url": "https://example.com",
"timestamp": "2025-09-12T10:30:00.000Z",
"title": "Example Page",
"content": "Main text content...",
"links": ["https://example.com/page1", "https://example.com/page2"],
"images": ["https://example.com/image1.jpg", "https://example.com/image2.png"]
}
Error Handling
The tool returns structured error objects for all failure scenarios:
{
"error": true,
"message": "Human-readable error message",
"details": "Technical details about the error",
"timestamp": "2025-09-12T10:30:00.000Z",
"toolName": "web-fetch-http",
"url": "https://failed-url.com"
}
Common error scenarios:
- Invalid URL format
- Network connectivity issues
- HTTP errors (404, 500, etc.)
- AI content extraction failures
- Malformed HTML content
Context Window Handling
The tool automatically handles large web pages that would exceed AI model context windows:
- HTML Preprocessing - Removes unnecessary tags (scripts, styles, navigation, ads)
- CSS Selector Early Application - Reduces content size before AI processing
- Semantic Chunking - Splits large content at natural boundaries (sections, articles)
- Token Estimation - Monitors content size and applies chunking when needed (>15,000 tokens)
- Result Combination - Merges chunked results while preserving structure
Example with Wikipedia Moon page (>240,000 tokens):
// Automatically chunks and processes without context window errors
const result = await webFetchHttp('https://en.wikipedia.org/wiki/Moon')
// Returns combined content from all chunks
Testing
Unit Tests
npm run test:unit # Run unit tests only
npm run test:run # Run all tests including integration
npm run test:watch # Watch mode for development
Integration Tests
npm run test:integration # Run integration tests (skips without API key)
npm run test:integration:expensive # Run expensive tests (requires API key)
The tests cover:
- HTML preprocessing and chunking logic
- Context window handling
- CSS selector extraction
- Error handling scenarios
- Wikipedia Moon page integration test
- Agent-core integration patterns
Requirements
- Node.js environment
OPENAI_API_KEYenvironment variable for AI content extraction- Internet connectivity for fetching web pages
Integration
To integrate with Fractal Synapse agent applications:
-
Add to
package.jsondependencies:{ "dependencies": { "web-fetch-http-tool": "file:../../packages/tools/web-fetch-http-tool" } } -
Import and register:
import { webFetchHttpToolDefinition } from 'web-fetch-http-tool'; toolRegistry.registerTool('webFetchHttp', webFetchHttpToolDefinition); -
Add to AgentDefinition:
const agentDefinition = new AgentDefinition( 'My Agent', 'Description', 'System prompt', ['webFetchHttp'], // Include tool name 'openai-gpt-4o' );
Comparison with Stagehand Tools
| Feature | Web Fetch HTTP | Stagehand Web Fetch |
|---|---|---|
| Speed | ⚡ Fast | 🐌 Slower |
| Resources | 💡 Lightweight | 🔋 Heavy (browser) |
| JavaScript Support | ❌ No | ✅ Yes |
| Complex Interactions | ❌ No | ✅ Yes |
| Setup Complexity | ✅ Simple | ❌ Complex |
| Reliability | ✅ High | ⚠️ Browser dependencies |
License
ISC
Author
James Peret