Postgres + pgvector: notes from shipping retrieval
Why we often start vectors in Postgres, what breaks first at scale, and when to split search out - without rewriting everything overnight.
Hosted vector databases are great marketing. For many products, pgvector inside Postgres is the fastest path to working RAG: one backup story, one permissions model, transactional consistency next to the rest of your app data.
What we like
- Joint queries: filter by tenant, plan, or freshness before the vector hop - no awkward dual-store joins.
- Ops you already run: migrations, replicas, PITR - the team already knows how to babysit Postgres.
- Good enough latency when indexes and dimensions are sane for MVP → PMF.
Where it hurts
- Write-heavy embedding churn (constant re-embedding whole corpora) can starve OLTP if you share one cluster blindly.
- Recall tuning (IVFFlat lists, probes, dimensions) needs discipline - generic defaults rarely survive prod traffic.
When we peel search out
Separate retrieval services make sense when embedding ingest dominates database CPU, you need region-isolated search, or you’re doing hybrid lexical + vector at a scale where a dedicated engine earns its keep.
Until then: Postgres stays the truth store - and vectors live beside the rows they annotate.