13 May 2026

Postgres + pgvector: notes from shipping retrieval

Why we often start vectors in Postgres, what breaks first at scale, and when to split search out - without rewriting everything overnight.

Hosted vector databases are great marketing. For many products, pgvector inside Postgres is the fastest path to working RAG: one backup story, one permissions model, transactional consistency next to the rest of your app data.

What we like

Joint queries: filter by tenant, plan, or freshness before the vector hop - no awkward dual-store joins.
Ops you already run: migrations, replicas, PITR - the team already knows how to babysit Postgres.
Good enough latency when indexes and dimensions are sane for MVP → PMF.

Where it hurts

Write-heavy embedding churn (constant re-embedding whole corpora) can starve OLTP if you share one cluster blindly.
Recall tuning (IVFFlat lists, probes, dimensions) needs discipline - generic defaults rarely survive prod traffic.

When we peel search out

Separate retrieval services make sense when embedding ingest dominates database CPU, you need region-isolated search, or you’re doing hybrid lexical + vector at a scale where a dedicated engine earns its keep.

Until then: Postgres stays the truth store - and vectors live beside the rows they annotate.