No description

TypeScript 100%

Find a file

James Peret 76dee9eaf5 Added Readme.md		2026-03-19 20:34:58 -03:00
scripts	Initial commit	2026-03-19 20:29:07 -03:00
src	Initial commit	2026-03-19 20:29:07 -03:00
tests	Initial commit	2026-03-19 20:29:07 -03:00
.gitignore	Initial commit	2026-03-19 20:29:07 -03:00
package-lock.json	Initial commit	2026-03-19 20:29:07 -03:00
package.json	Initial commit	2026-03-19 20:29:07 -03:00
README.md	Added Readme.md	2026-03-19 20:34:58 -03:00
tsconfig.json	Initial commit	2026-03-19 20:29:07 -03:00
vitest.config.ts	Initial commit	2026-03-19 20:29:07 -03:00

README.md

nomic-embeddings-plugin

Local text embedding plugin for the Fractal Synapse agent framework. Runs nomic-ai/nomic-embed-text-v1.5 entirely on-device using ONNX Runtime — no API key, no network calls, no data leaving the machine after the one-time model download.

Features

Fully local — model runs in-process via @huggingface/transformers and ONNX Runtime
Four task-specific embedding modes — document storage, query retrieval, clustering, classification
8192-token context window — handles long documents without silent truncation
Matryoshka dimensions — choose 64 to 768 dimensions to trade storage size against quality
Process-level singleton pipeline — model loads once and is reused across all embed() calls
Electron-ready — supports a local ONNX model path for bundling inside an app package

Model

Property	Value
Model	`nomic-ai/nomic-embed-text-v1.5`
ONNX variant	`model_q4f16.onnx` (Q4+FP16 hybrid quantization)
Model size	111 MB
Max dimensions	768
Max tokens	8192
Output	L2-normalized float vector
MTEB score (768-dim)	62.28

The Q4+FP16 variant is used by default. It is the smallest available ONNX file and produces results that are virtually identical to the full-precision model for retrieval tasks.

Installation

npm install @fractal-synapse/nomic-embeddings-plugin

Then download the model (one time only, ~111 MB saved to the HuggingFace cache):

node node_modules/@fractal-synapse/nomic-embeddings-plugin/dist/scripts/download-model.js

The model is cached at ~/.cache/huggingface on Linux/macOS and %USERPROFILE%\.cache\huggingface on Windows.

Quick Start

import { NomicEmbeddingsPlugin } from '@fractal-synapse/nomic-embeddings-plugin';

const embeddings = new NomicEmbeddingsPlugin();

// Store a document
const docVector = await embeddings.embed(
  'fractal-synapse is a modular TypeScript framework for building stateful AI agents',
  'document'
);

// Search for it later
const queryVector = await embeddings.embed(
  'what is the fractal-synapse framework?',
  'query'
);

console.log(docVector.length); // 768

Registering as an Agent Plugin

The plugin implements both AgentPlugin and EmbeddingsInterface. Register it with an agent and it becomes available to any other plugin that depends on embeddings (such as a memory plugin):

import { Agent } from '@fractal-synapse/agent-core';
import { NomicEmbeddingsPlugin } from '@fractal-synapse/nomic-embeddings-plugin';

const embeddings = new NomicEmbeddingsPlugin();
const agent = new Agent({ plugins: [embeddings] });

Other plugins access it through the agent's EmbeddingsInterface:

// Inside another plugin
const vector = await agent.embeddings.embed(text, 'document');

The Four Embedding Modes

This is the most important concept to understand when using this plugin. The nomic-embed-text model is trained with task-specific input prefixes that shift the embedding space to optimize for a particular use. The prefix is prepended automatically — you only need to pass the correct mode.

Using the wrong mode produces measurably worse results. The mode is not a hint; it is part of how the model works.

`document` — Storing content for retrieval

Use this mode when embedding text that will be stored in a vector index. It optimizes the vector to be found by a query-mode search.

const vector = await embeddings.embed(
  'TypeScript generics let you write type-safe functions that work across multiple types',
  'document'
);
// Store vector in your vector DB alongside the source text

When to use:

Saving agent memory episodes to a vector store
Indexing documents, notes, or knowledge base entries
Storing tool outputs that should be retrievable later

`query` — Searching for stored content

Use this mode when embedding a search query that will be compared against document-mode vectors. The two modes are trained as an asymmetric pair — a query vector is optimized to match document vectors, not other query vectors.

const queryVector = await embeddings.embed(
  'how do TypeScript generics work?',
  'query'
);

// Compare against stored document vectors
const scores = storedDocs.map(doc => cosineSimilarity(queryVector, doc.vector));
const topResult = storedDocs[scores.indexOf(Math.max(...scores))];

When to use:

Embedding a user's question before searching agent memory
Any lookup against a corpus of document-mode vectors
Semantic search over stored knowledge

Important: Always embed stored content as document and search queries as query. Embedding a query in document mode — or vice versa — produces lower-quality retrieval.

`clustering` — Grouping and deduplication

Use this mode when computing symmetric similarity between items that are being grouped or compared without a query/document distinction. Both sides of the comparison are embedded in the same mode, so similarity is mutual.

const [vectorA, vectorB] = await Promise.all([
  embeddings.embed('the agent saves memories to a SQLite database', 'clustering'),
  embeddings.embed('memories are stored in SQLite by the agent', 'clustering'),
]);

const similarity = cosineSimilarity(vectorA, vectorB);
// similarity ≈ 0.97 → near-duplicate, flag for deduplication

When to use:

Detecting near-duplicate memories before saving a new one
Grouping a set of memories or documents into topic clusters
Finding related items in a collection where there is no external query

`classification` — Categorizing content

Use this mode when you want to assign content to pre-defined categories. It maximizes the separation between categories, making same-category items cluster tightly and different-category items push apart.

// Pre-embed your category labels once
const categories = {
  bug:     await embeddings.embed('software bug, error, crash, exception', 'classification'),
  feature: await embeddings.embed('new feature, enhancement, improvement, request', 'classification'),
  question: await embeddings.embed('question, how to, help, explain', 'classification'),
};

// Classify new content
const incoming = await embeddings.embed(
  'the agent crashes when the tool returns null',
  'classification'
);

const scores = Object.entries(categories).map(([label, vec]) => ({
  label,
  score: cosineSimilarity(incoming, vec),
}));

const predicted = scores.sort((a, b) => b.score - a.score)[0].label;
// predicted → 'bug'

When to use:

Tagging memory episodes by type (fact, event, preference, error)
Routing incoming messages to different handlers
Labelling content without training a separate classifier

Mode summary

Mode	Prefix applied	Use when
`document`	`search_document:`	Embedding content for storage in a vector index
`query`	`search_query:`	Embedding a search query to retrieve stored documents
`clustering`	`clustering:`	Symmetric grouping, deduplication, or topic clustering
`classification`	`classification:`	Assigning content to pre-defined categories

Matryoshka Dimensions

The model uses Matryoshka Representation Learning (MRL), which means the first N dimensions of a full 768-dim vector are themselves a valid, well-calibrated embedding at size N. You can reduce the output size to lower storage cost and vector search time with minimal quality loss.

// Full quality, 768-dim vectors (default)
const plugin768 = new NomicEmbeddingsPlugin();

// Smaller vectors — still outperforms OpenAI text-embedding-3-small on MTEB
const plugin512 = new NomicEmbeddingsPlugin({ dimensions: 512 });

// Compact vectors for memory-constrained environments
const plugin256 = new NomicEmbeddingsPlugin({ dimensions: 256 });

Dimensions	MTEB Score	vs OpenAI text-embedding-3-small
768	62.28	Better
512	61.96	Better
256	61.04	Comparable
128	59.34	Slightly below
64	56.10	Below

Warning: Changing dimensions after data has been stored requires re-embedding all vectors and recreating the vector store tables. Treat the dimension setting as a deployment-time decision, not a runtime one.

Truncation is done correctly using the Matryoshka pipeline: layer normalization → slice to target dims → L2 re-normalization. A plain array slice would produce incorrect results.

Configuration

interface NomicEmbeddingsPluginConfig {
  // Output vector size. Default: 768.
  dimensions?: 64 | 128 | 256 | 512 | 768;

  // Absolute path to a local ONNX model file.
  // Use this in Electron apps to point to the bundled model:
  //   path.join(process.resourcesPath, 'models', 'model_q4f16.onnx')
  // If omitted, loads from the HuggingFace cache directory.
  modelPath?: string;

  // Optional logger compatible with the Fractal Synapse LoggingInterface.
  logger?: LoggingInterface;
}

Electron / bundled app

Point modelPath to the ONNX file bundled inside the app package so the plugin does not depend on the user's HuggingFace cache:

import path from 'path';

const embeddings = new NomicEmbeddingsPlugin({
  modelPath: path.join(process.resourcesPath, 'models', 'model_q4f16.onnx'),
  dimensions: 512, // smaller vectors — good default for desktop apps
});

Cosine Similarity

All output vectors are L2-normalized (magnitude = 1.0), so cosine similarity reduces to a dot product:

function cosineSimilarity(a: number[], b: number[]): number {
  return a.reduce((sum, v, i) => sum + v * b[i], 0);
}

The result is in the range [-1, 1], where 1 means identical, 0 means unrelated, and negative values indicate opposing meaning. For typical retrieval tasks, scores above 0.8 indicate strong semantic similarity.

Pipeline Lifecycle

The ONNX model is loaded as a process-level singleton — it is initialized on the first embed() call and reused for all subsequent calls. Creating multiple NomicEmbeddingsPlugin instances shares the same underlying pipeline.

Call cleanup() when the application shuts down to release the ONNX Runtime resources:

// When the app is closing
await embeddings.cleanup();

When registered as an agent plugin, cleanup() is called automatically by the agent's shutdown sequence.

Testing

# Unit tests only (no model required)
npm run test:run -- tests/unit

# All tests including integration (requires model download)
npm run test:run

# Watch mode during development
npm run test:watch

Integration tests are automatically skipped when the model has not been downloaded. Download it first with the script described in the Installation section.

README.md

nomic-embeddings-plugin

Features

Model

Installation

Quick Start

Registering as an Agent Plugin

The Four Embedding Modes

document — Storing content for retrieval

query — Searching for stored content

clustering — Grouping and deduplication

classification — Categorizing content

Mode summary

Matryoshka Dimensions

Configuration

Electron / bundled app

Cosine Similarity

Pipeline Lifecycle

Testing

`document` — Storing content for retrieval

`query` — Searching for stored content

`clustering` — Grouping and deduplication

`classification` — Categorizing content