No description
  • TypeScript 100%
Find a file
2026-03-19 20:34:58 -03:00
scripts Initial commit 2026-03-19 20:29:07 -03:00
src Initial commit 2026-03-19 20:29:07 -03:00
tests Initial commit 2026-03-19 20:29:07 -03:00
.gitignore Initial commit 2026-03-19 20:29:07 -03:00
package-lock.json Initial commit 2026-03-19 20:29:07 -03:00
package.json Initial commit 2026-03-19 20:29:07 -03:00
README.md Added Readme.md 2026-03-19 20:34:58 -03:00
tsconfig.json Initial commit 2026-03-19 20:29:07 -03:00
vitest.config.ts Initial commit 2026-03-19 20:29:07 -03:00

nomic-embeddings-plugin

Local text embedding plugin for the Fractal Synapse agent framework. Runs nomic-ai/nomic-embed-text-v1.5 entirely on-device using ONNX Runtime — no API key, no network calls, no data leaving the machine after the one-time model download.

Features

  • Fully local — model runs in-process via @huggingface/transformers and ONNX Runtime
  • Four task-specific embedding modes — document storage, query retrieval, clustering, classification
  • 8192-token context window — handles long documents without silent truncation
  • Matryoshka dimensions — choose 64 to 768 dimensions to trade storage size against quality
  • Process-level singleton pipeline — model loads once and is reused across all embed() calls
  • Electron-ready — supports a local ONNX model path for bundling inside an app package

Model

Property Value
Model nomic-ai/nomic-embed-text-v1.5
ONNX variant model_q4f16.onnx (Q4+FP16 hybrid quantization)
Model size 111 MB
Max dimensions 768
Max tokens 8192
Output L2-normalized float vector
MTEB score (768-dim) 62.28

The Q4+FP16 variant is used by default. It is the smallest available ONNX file and produces results that are virtually identical to the full-precision model for retrieval tasks.

Installation

npm install @fractal-synapse/nomic-embeddings-plugin

Then download the model (one time only, ~111 MB saved to the HuggingFace cache):

node node_modules/@fractal-synapse/nomic-embeddings-plugin/dist/scripts/download-model.js

The model is cached at ~/.cache/huggingface on Linux/macOS and %USERPROFILE%\.cache\huggingface on Windows.

Quick Start

import { NomicEmbeddingsPlugin } from '@fractal-synapse/nomic-embeddings-plugin';

const embeddings = new NomicEmbeddingsPlugin();

// Store a document
const docVector = await embeddings.embed(
  'fractal-synapse is a modular TypeScript framework for building stateful AI agents',
  'document'
);

// Search for it later
const queryVector = await embeddings.embed(
  'what is the fractal-synapse framework?',
  'query'
);

console.log(docVector.length); // 768

Registering as an Agent Plugin

The plugin implements both AgentPlugin and EmbeddingsInterface. Register it with an agent and it becomes available to any other plugin that depends on embeddings (such as a memory plugin):

import { Agent } from '@fractal-synapse/agent-core';
import { NomicEmbeddingsPlugin } from '@fractal-synapse/nomic-embeddings-plugin';

const embeddings = new NomicEmbeddingsPlugin();
const agent = new Agent({ plugins: [embeddings] });

Other plugins access it through the agent's EmbeddingsInterface:

// Inside another plugin
const vector = await agent.embeddings.embed(text, 'document');

The Four Embedding Modes

This is the most important concept to understand when using this plugin. The nomic-embed-text model is trained with task-specific input prefixes that shift the embedding space to optimize for a particular use. The prefix is prepended automatically — you only need to pass the correct mode.

Using the wrong mode produces measurably worse results. The mode is not a hint; it is part of how the model works.

document — Storing content for retrieval

Use this mode when embedding text that will be stored in a vector index. It optimizes the vector to be found by a query-mode search.

const vector = await embeddings.embed(
  'TypeScript generics let you write type-safe functions that work across multiple types',
  'document'
);
// Store vector in your vector DB alongside the source text

When to use:

  • Saving agent memory episodes to a vector store
  • Indexing documents, notes, or knowledge base entries
  • Storing tool outputs that should be retrievable later

query — Searching for stored content

Use this mode when embedding a search query that will be compared against document-mode vectors. The two modes are trained as an asymmetric pair — a query vector is optimized to match document vectors, not other query vectors.

const queryVector = await embeddings.embed(
  'how do TypeScript generics work?',
  'query'
);

// Compare against stored document vectors
const scores = storedDocs.map(doc => cosineSimilarity(queryVector, doc.vector));
const topResult = storedDocs[scores.indexOf(Math.max(...scores))];

When to use:

  • Embedding a user's question before searching agent memory
  • Any lookup against a corpus of document-mode vectors
  • Semantic search over stored knowledge

Important: Always embed stored content as document and search queries as query. Embedding a query in document mode — or vice versa — produces lower-quality retrieval.


clustering — Grouping and deduplication

Use this mode when computing symmetric similarity between items that are being grouped or compared without a query/document distinction. Both sides of the comparison are embedded in the same mode, so similarity is mutual.

const [vectorA, vectorB] = await Promise.all([
  embeddings.embed('the agent saves memories to a SQLite database', 'clustering'),
  embeddings.embed('memories are stored in SQLite by the agent', 'clustering'),
]);

const similarity = cosineSimilarity(vectorA, vectorB);
// similarity ≈ 0.97 → near-duplicate, flag for deduplication

When to use:

  • Detecting near-duplicate memories before saving a new one
  • Grouping a set of memories or documents into topic clusters
  • Finding related items in a collection where there is no external query

classification — Categorizing content

Use this mode when you want to assign content to pre-defined categories. It maximizes the separation between categories, making same-category items cluster tightly and different-category items push apart.

// Pre-embed your category labels once
const categories = {
  bug:     await embeddings.embed('software bug, error, crash, exception', 'classification'),
  feature: await embeddings.embed('new feature, enhancement, improvement, request', 'classification'),
  question: await embeddings.embed('question, how to, help, explain', 'classification'),
};

// Classify new content
const incoming = await embeddings.embed(
  'the agent crashes when the tool returns null',
  'classification'
);

const scores = Object.entries(categories).map(([label, vec]) => ({
  label,
  score: cosineSimilarity(incoming, vec),
}));

const predicted = scores.sort((a, b) => b.score - a.score)[0].label;
// predicted → 'bug'

When to use:

  • Tagging memory episodes by type (fact, event, preference, error)
  • Routing incoming messages to different handlers
  • Labelling content without training a separate classifier

Mode summary

Mode Prefix applied Use when
document search_document: Embedding content for storage in a vector index
query search_query: Embedding a search query to retrieve stored documents
clustering clustering: Symmetric grouping, deduplication, or topic clustering
classification classification: Assigning content to pre-defined categories

Matryoshka Dimensions

The model uses Matryoshka Representation Learning (MRL), which means the first N dimensions of a full 768-dim vector are themselves a valid, well-calibrated embedding at size N. You can reduce the output size to lower storage cost and vector search time with minimal quality loss.

// Full quality, 768-dim vectors (default)
const plugin768 = new NomicEmbeddingsPlugin();

// Smaller vectors — still outperforms OpenAI text-embedding-3-small on MTEB
const plugin512 = new NomicEmbeddingsPlugin({ dimensions: 512 });

// Compact vectors for memory-constrained environments
const plugin256 = new NomicEmbeddingsPlugin({ dimensions: 256 });
Dimensions MTEB Score vs OpenAI text-embedding-3-small
768 62.28 Better
512 61.96 Better
256 61.04 Comparable
128 59.34 Slightly below
64 56.10 Below

Warning: Changing dimensions after data has been stored requires re-embedding all vectors and recreating the vector store tables. Treat the dimension setting as a deployment-time decision, not a runtime one.

Truncation is done correctly using the Matryoshka pipeline: layer normalization → slice to target dims → L2 re-normalization. A plain array slice would produce incorrect results.

Configuration

interface NomicEmbeddingsPluginConfig {
  // Output vector size. Default: 768.
  dimensions?: 64 | 128 | 256 | 512 | 768;

  // Absolute path to a local ONNX model file.
  // Use this in Electron apps to point to the bundled model:
  //   path.join(process.resourcesPath, 'models', 'model_q4f16.onnx')
  // If omitted, loads from the HuggingFace cache directory.
  modelPath?: string;

  // Optional logger compatible with the Fractal Synapse LoggingInterface.
  logger?: LoggingInterface;
}

Electron / bundled app

Point modelPath to the ONNX file bundled inside the app package so the plugin does not depend on the user's HuggingFace cache:

import path from 'path';

const embeddings = new NomicEmbeddingsPlugin({
  modelPath: path.join(process.resourcesPath, 'models', 'model_q4f16.onnx'),
  dimensions: 512, // smaller vectors — good default for desktop apps
});

Cosine Similarity

All output vectors are L2-normalized (magnitude = 1.0), so cosine similarity reduces to a dot product:

function cosineSimilarity(a: number[], b: number[]): number {
  return a.reduce((sum, v, i) => sum + v * b[i], 0);
}

The result is in the range [-1, 1], where 1 means identical, 0 means unrelated, and negative values indicate opposing meaning. For typical retrieval tasks, scores above 0.8 indicate strong semantic similarity.

Pipeline Lifecycle

The ONNX model is loaded as a process-level singleton — it is initialized on the first embed() call and reused for all subsequent calls. Creating multiple NomicEmbeddingsPlugin instances shares the same underlying pipeline.

Call cleanup() when the application shuts down to release the ONNX Runtime resources:

// When the app is closing
await embeddings.cleanup();

When registered as an agent plugin, cleanup() is called automatically by the agent's shutdown sequence.

Testing

# Unit tests only (no model required)
npm run test:run -- tests/unit

# All tests including integration (requires model download)
npm run test:run

# Watch mode during development
npm run test:watch

Integration tests are automatically skipped when the model has not been downloaded. Download it first with the script described in the Installation section.