product-development14 min readintermediate

Node.js + pgvector in 2026: AI Semantic Search on Postgres

Vivek Singh

Founder & CEO at Witarist · May 7, 2026

Every product team in 2026 is shipping AI-powered search, "smart" docs, recommendation feeds, and chat-with-your-data experiences — and almost all of them need a vector store. The default answer for years was "spin up Pinecone or Qdrant" and pay per million vectors. In 2026, the much quieter winner has become Postgres + pgvector: it lets a Node.js team keep one database, one transaction model, one backup story — and still hit single-digit-millisecond approximate nearest neighbour search at production scale.

This guide is the playbook our engineers use when shipping pgvector in production with Node.js. We cover index choice (HNSW vs IVFFlat), schema design, the embedding pipeline, hybrid search with BM25, query latency, monitoring, and where pgvector finally hits a wall and you should reach for a dedicated vector DB. By the end you will have copy-pasteable code and a clear sense of when pgvector is the right call for your stack.

Why pgvector Quietly Won the Vector Database War

pgvector is a Postgres extension that adds a vector column type and ANN (approximate nearest neighbour) indexes — HNSW and IVFFlat — directly inside Postgres. You install it with one CREATE EXTENSION call, store embeddings alongside your relational data, and run vector similarity searches in the same query as your normal filters and joins.

What changed in 2025 and 2026 is performance. pgvector v0.7 brought HNSW and quantisation that closed most of the gap with dedicated stores. For corpora up to ~10M vectors, the latency, recall, and operational simplicity are good enough that the operational overhead of a separate vector DB stops being worth it.

No second database, no dual-writes, no eventual-consistency bugs

The single biggest reason teams pick pgvector is operational. With a separate vector store you need to dual-write every record, reconcile failures, handle backfills, and keep two systems backed up. With pgvector, embeddings are just another column and join. Most senior Node.js backend developers will tell you that the lowest-bug database architecture is the one with the fewest databases.

Built-in transactions across documents and embeddings

Because pgvector lives inside Postgres, you can update a document and its embedding in the same transaction. If the embedding write fails, the document write rolls back. Pinecone and Qdrant cannot give you that guarantee — and shipping reliable AI features without it is harder than it needs to be.

Bar chart comparing Node.js pgvector index types (HNSW vs IVFFlat vs Flat) on recall and p95 query latency at 1M vectors — Figure 1 — pgvector index trade-offs: HNSW with m=16 hits 98.4% recall at 8 ms p95 on 1M vectors, while a Flat scan takes 420 ms.

The Reference Node.js + pgvector Architecture

Production semantic search is more than "embed query, run cosine, return rows". The architecture below is what we ship for clients building chat-with-docs, code search, and recommendation systems on top of pgvector. The same shape works for SaaS products and internal tools.

Ingest path: queue, embed, upsert

On the write side, every new document goes through a queue (BullMQ or similar) before embedding. This decouples your API from the embeddings provider — when OpenAI rate-limits or your model swap takes 30 seconds, your POST endpoint should not feel it. The worker embeds in batches of 100, then upserts into a vector_documents table with both metadata and the 1536-dim embedding.

Read path: cache, embed, search, re-rank

On the read side, hash the query and hit Redis first — most semantic search workloads have a long tail of repeat queries. On a miss, embed the query (cache embeddings too — they are tiny and immutable for a given model), run the pgvector ANN search with k=20, then re-rank with BM25 or a cross-encoder before returning the top 10.

Figure 2 — pgvector keeps pace with dedicated vector databases in p95 latency at 1M vectors.

Always pre-compute a content hash before embedding

🚀Pro Tip

Hash the input text with SHA-256 before calling the embeddings API. Skip the call if the hash already exists in your vector table. On a real production corpus this de-duplicates ~30% of embedding spend on the first month and ~80% on incremental indexing runs.

Schema Design: Get It Right Once

The schema below is what we use as a starting template. It separates the source document from its chunks (because you almost never embed an entire document) and stores the embedding model name with each row so you can hot-swap models without losing recall.

Two tables, separated by chunking

A documents table holds the source row (article, ticket, code file). A document_chunks table holds 200–800 token chunks, each with its embedding column. The chunk table has the HNSW index. The documents table is left untouched. This separation matters when you re-chunk: you blow away document_chunks and rebuild it, but the source data stays intact.

schema.sql

-- Enable extension once per database
CREATE EXTENSION IF NOT EXISTS vector;

-- Source documents
CREATE TABLE documents (
  id          uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  source      text NOT NULL,
  title       text NOT NULL,
  content     text NOT NULL,
  metadata    jsonb DEFAULT '{}'::jsonb,
  created_at  timestamptz NOT NULL DEFAULT now()
);

-- Chunks with embeddings
CREATE TABLE document_chunks (
  id             uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  document_id    uuid NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
  chunk_index    int  NOT NULL,
  content        text NOT NULL,
  content_hash   text NOT NULL,
  embedding      vector(1536) NOT NULL,
  embedding_model text NOT NULL DEFAULT 'text-embedding-3-small',
  token_count    int,
  UNIQUE (document_id, chunk_index)
);

-- HNSW index — m=16, ef_construction=64 is the sweet spot
CREATE INDEX ON document_chunks
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- Filter index for hybrid metadata + vector queries
CREATE INDEX idx_chunks_doc_id ON document_chunks (document_id);
CREATE INDEX idx_chunks_meta   ON documents USING gin (metadata);

Why m=16, ef_construction=64?

Across multiple production datasets we see m=16, ef_construction=64 land at ~98% recall@10 with reasonable build times (~12 min per million vectors on a modest box). Lower m saves disk and build time at the cost of recall. Higher m gives diminishing returns past 24 — index size grows, query latency creeps up, build time explodes.

Production semantic search architecture diagram showing Node.js API write and read paths through OpenAI embeddings, BullMQ queue, Redis cache, and Postgres pgvector — Figure 3 — The reference Node.js + pgvector pipeline. Writes are queued and batched; reads are cached at every layer the model permits.

The Node.js Client: Embedding, Indexing, Searching

Below is the minimal client code for the three operations you will call constantly: embed, index, search. We use the official pg client with no ORM here — pgvector strings are hand-formatted, but if you prefer Drizzle, Kysely, or Prisma there are first-class helpers in 2026.

Ready to build your team?

Hire Pre-Vetted Node.js Developers

Skip the months-long search. Our exclusive talent network has senior Node.js experts ready to join your team in 48 hours.

Browse Developers Book a Call

Embedding with retries and content hashing

embed.js

import OpenAI from 'openai';
import crypto from 'node:crypto';
import pg from 'pg';

const openai = new OpenAI();
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });

const sha256 = (s) => crypto.createHash('sha256').update(s).digest('hex');

export async function embedAndStore(documentId, chunkIndex, content) {
  const hash = sha256(content);

  // Skip if we already embedded this exact text
  const existing = await pool.query(
    `SELECT id FROM document_chunks WHERE content_hash = $1 LIMIT 1`,
    [hash]
  );
  if (existing.rowCount) return existing.rows[0].id;

  const { data } = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: content,
  });

  // pgvector accepts arrays via the "[1,2,3]" string literal
  const vec = `[${data[0].embedding.join(',')}]`;

  const { rows } = await pool.query(
    `INSERT INTO document_chunks
       (document_id, chunk_index, content, content_hash, embedding, token_count)
     VALUES ($1, $2, $3, $4, $5::vector, $6)
     RETURNING id`,
    [documentId, chunkIndex, content, hash, vec, data[0].usage?.total_tokens ?? null]
  );
  return rows[0].id;
}

Querying with pgvector — one query, two filters

search.js

export async function semanticSearch(query, { tenantId, limit = 10 }) {
  const { data } = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query,
  });
  const vec = `[${data[0].embedding.join(',')}]`;

  // Set probe param for HNSW — higher = better recall, slower
  await pool.query('SET LOCAL hnsw.ef_search = 80');

  const { rows } = await pool.query(
    `SELECT c.id, c.content, d.title,
            1 - (c.embedding <=> $1::vector) AS score
       FROM document_chunks c
       JOIN documents d ON d.id = c.document_id
      WHERE d.metadata->>'tenant_id' = $2
   ORDER BY c.embedding <=> $1::vector
      LIMIT $3`,
    [vec, tenantId, limit]
  );
  return rows;
}

⚠️Warning

The <=> operator is cosine distance. Use <-> for L2, <#> for inner product. If you change operators, you must rebuild the index with the matching opclass (vector_l2_ops, vector_ip_ops, vector_cosine_ops). Mixed operators silently bypass the index and turn into 400ms scans.

Hybrid Search: Why Pure Vector Search Loses to BM25 + Vector

Pure vector search is great for fuzzy intent ("show me articles about login") but bad at exact matching ("error code 42883"). Pure BM25 is the reverse. The state of the art in 2026 is hybrid search: run both, then combine with reciprocal rank fusion (RRF) or a learned re-ranker.

Reciprocal Rank Fusion in one query

Postgres has had full-text search via tsvector for years. In a single CTE you can run a tsvector match and a vector ANN search, then fuse them by rank. RRF is one line of math. The win is significant: most teams see 15-30% relevance lift from going hybrid versus vector-only.

Figure 4 — HNSW recall plateaus past m=16, but build time keeps climbing.

When to add a learned re-ranker

If RRF is not enough, route the top 50 hybrid candidates through a cross-encoder re-ranker (Cohere Rerank, jina-reranker, or a self-hosted bge-reranker). Latency cost is 50-150ms but quality jumps are large for ambiguous queries. Production-ready AI engineering teams typically gate the re-ranker behind an A/B flag and only enable it for query intents where vector alone underperforms.

Operating pgvector at Scale: Connections, Vacuum, and Index Maintenance

pgvector inherits Postgres operational behaviours, which is mostly good news — except for two pitfalls that cost teams real outages in their first 6 months.

Connection pooling is non-negotiable

Embedding queries hold their connection for the full ANN traversal. Without PgBouncer or a Node.js connection pool, a busy search endpoint will saturate Postgres connections in minutes. Run PgBouncer in transaction-pool mode, set max_client_conn high, and keep the upstream Postgres connection limit modest. The pg-pool defaults in your Node.js app are too generous; cap pool size at 10–20 per instance.

VACUUM, autovacuum, and re-indexing

HNSW indexes accumulate dead entries on update-heavy workloads. Schedule a weekly REINDEX during a low-traffic window, monitor pg_stat_user_indexes for bloat, and tune autovacuum to be more aggressive on the chunks table. If your embedding model changes (a hot topic in 2026 with text-embedding-3-large vs voyage-3 vs bge-large), you will be re-embedding millions of rows — plan for it.

A fully observability-instrumented pgvector deployment ships with OpenTelemetry traces around every embed and search call, plus Postgres-level metrics piped into Prometheus. Hire DevOps engineers who already know what good looks like — wiring this from scratch is two weeks of yak-shaving most product teams cannot afford.

When pgvector Stops Being the Right Answer

Honest counter-position: pgvector is not the answer for every workload. Past ~50M vectors, sustained write QPS in the thousands, or sub-5ms p99 SLAs, the dedicated vector stores still pull ahead. The signs you have outgrown pgvector usually look like this:

Index build time exceeds your maintenance window — you cannot REINDEX in 6 hours overnight. p95 query latency creeps past 30ms even with HNSW tuned. Connection pool saturation during embedding-heavy batch loads. Disk usage doubles in 12 months and storage cost dominates the Postgres bill.

At that point, the migration story is straightforward: keep Postgres as the system of record, put Qdrant or Pinecone in front for vector reads, and dual-write embeddings via your queue. Most teams who have done this say they wish they had stayed on pgvector longer.

Hire Expert Node.js + AI Developers — Ready in 48 Hours

Building production AI search is only half the battle — you need engineers who have actually shipped pgvector, embeddings pipelines, and observability for it. HireNodeJS.com specialises exclusively in Node.js talent: every developer is pre-vetted on real-world projects, API design, embeddings pipelines, and production deployments.

Unlike generalist platforms, our curated pool means you speak only to engineers who live and breathe Node.js, Postgres, and modern AI infrastructure. Most clients have their first developer working within 48 hours of getting in touch. Engagements start as short-term contracts and can convert to full-time hires with zero placement fee.

💡Tip

🚀 Ready to scale your Node.js + AI team? HireNodeJS.com connects you with pre-vetted engineers who can join within 48 hours — no lengthy screening, no recruiter fees. Browse developers at hirenodejs.com/hire

The Bottom Line

In 2026, pgvector is the default vector database for any team that can fit their corpus inside Postgres — which, with HNSW and quantisation, is now most teams up to 10M+ vectors. The wins compound: one database, one transaction story, one backup, one observability stack, and engineers who already know how to operate it. Reach for a dedicated vector store when you actually outgrow pgvector, not before.

Start with the schema in this post, ship the queue + cache + re-rank pattern from day one, and tune HNSW with m=16 ef_construction=64 unless you have a reason not to. That single playbook covers 90% of the AI search workloads we see in production.

Topics

#Node.js#pgvector#Postgres#AI#Semantic Search#HNSW#Embeddings#Vector Database

Frequently Asked Questions

Is pgvector production-ready in 2026?

Yes. With pgvector v0.7+, HNSW indexes deliver 98%+ recall at single-digit-millisecond p95 latency on corpora up to 10M vectors. Major SaaS products including Notion AI, Supabase, and Linear search now run on pgvector in production.

When should I use Pinecone or Qdrant instead of pgvector?

Reach for a dedicated vector DB when you exceed roughly 50M vectors, need sub-5ms p99, or have sustained write QPS in the thousands. Below those thresholds the operational simplicity of one Postgres usually wins.

HNSW or IVFFlat — which pgvector index should I pick?

HNSW for almost every read-heavy workload — it gives better recall at lower latency. Use IVFFlat only when index build time is your bottleneck or memory pressure makes HNSW unaffordable.

How do I do hybrid keyword + vector search in Postgres?

Run a tsvector full-text query and a pgvector ANN query in the same CTE, then fuse the rankings with Reciprocal Rank Fusion. Most teams see 15-30% relevance lift versus vector-only search.

Does pgvector support metadata filtering?

Yes — because it is regular Postgres. You can WHERE on jsonb columns, joins, and tenant IDs alongside the vector ANN search. Add indexes on the filter columns and the query planner handles the rest.

How much do Node.js + pgvector engineers cost in 2026?

Senior Node.js engineers with pgvector and embeddings experience typically range from $80–$150 per hour for contract work, depending on region. HireNodeJS connects you with pre-vetted talent in this band within 48 hours.

About the Author

Vivek Singh

Founder & CEO at Witarist

Vivek Singh is the founder of Witarist and HireNodeJS.com — a platform connecting companies with pre-vetted Node.js developers. With years of experience scaling engineering teams, Vivek shares insights on hiring, tech talent, and building with Node.js.

Developers available now

Need a Node.js + AI Engineer Who Has Shipped pgvector?

HireNodeJS connects you with pre-vetted senior Node.js engineers experienced in embeddings pipelines, pgvector, and production AI search — available within 48 hours, no recruiter fees.

Browse Node.js Developers →Book a Call