vector-search-api

A production-style semantic search REST API. Ingest text documents (or URLs), chunk and embed them with OpenAI, store vectors in PostgreSQL + pgvector, and query with natural language.

Quickstart

cp .env.example .env
# Set OPENAI_API_KEY and API_KEYS in .env, then:
docker-compose up

The API will be available at http://localhost:3000.

Configuration

Edit .env before starting:

OPENAI_API_KEY=sk-...          # Required: your OpenAI API key
API_KEYS=key-dev-123,key-prod  # Required: comma-separated bearer tokens
PORT=3000
RATE_LIMIT_RPM=60
CHUNK_SIZE=400
CHUNK_OVERLAP=80

All requests (except GET /health) require the header:

Authorization: Bearer <your-api-key>

API reference

POST /ingest

Chunk, embed, and store a text document.

curl -X POST http://localhost:3000/ingest \
  -H "Authorization: Bearer key-dev-123" \
  -H "Content-Type: application/json" \
  -d '{
    "content": "Rate limiting is a technique used to control the rate of requests...",
    "namespace": "docs",
    "metadata": { "source": "confluence", "author": "ayush" },
    "chunkSize": 400,
    "chunkOverlap": 80
  }'

Response:

{ "inserted": 3, "namespace": "docs", "ids": ["uuid1", "uuid2", "uuid3"] }

POST /ingest/url

Fetch a URL, extract its text, and ingest it.

curl -X POST http://localhost:3000/ingest/url \
  -H "Authorization: Bearer key-dev-123" \
  -H "Content-Type: application/json" \
  -d '{
    "url": "https://example.com/article",
    "namespace": "web",
    "metadata": { "tag": "research" }
  }'

GET /search

Semantic similarity search.

curl "http://localhost:3000/search?q=how+to+handle+rate+limiting&namespace=docs&topK=5" \
  -H "Authorization: Bearer key-dev-123"

With metadata filter:

curl "http://localhost:3000/search?q=rate+limiting&namespace=docs&topK=3&filters=%7B%22source%22%3A%22confluence%22%7D&minScore=0.7" \
  -H "Authorization: Bearer key-dev-123"

Response:

{
  "results": [
    {
      "id": "uuid",
      "content": "chunk text...",
      "score": 0.89,
      "metadata": { "source": "confluence" },
      "namespace": "docs",
      "createdAt": "2025-11-01T10:00:00Z"
    }
  ],
  "cached": false,
  "query": "how to handle rate limiting"
}

DELETE /documents/:id

Delete a specific document chunk by UUID.

curl -X DELETE http://localhost:3000/documents/550e8400-e29b-41d4-a716-446655440000 \
  -H "Authorization: Bearer key-dev-123"

GET /namespaces

List all namespaces with document counts.

curl http://localhost:3000/namespaces \
  -H "Authorization: Bearer key-dev-123"

GET /namespaces/:ns/stats

Stats for a single namespace.

curl http://localhost:3000/namespaces/docs/stats \
  -H "Authorization: Bearer key-dev-123"

GET /health

No auth required.

curl http://localhost:3000/health

Chunking strategy

Documents are split into overlapping fixed-size character windows. Given chunkSize=400 and chunkOverlap=80, the sliding window advances by 400 - 80 = 320 characters on each step, so consecutive chunks share 80 characters of context.

Why overlapping chunks improve retrieval quality:

Semantic search works by comparing the embedding of your query to the embedding of each stored chunk. A long document contains many ideas; a single embedding for the whole document averages them all together and loses specificity. Chunking gives each idea its own embedding, making nearest-neighbour search much more precise.

The overlap matters because a sentence that straddles a chunk boundary would be cut in half without it — losing meaning. By carrying the tail of the previous chunk into the start of the next, we ensure every sentence appears whole in at least one chunk, and that the contextual "lead-in" is preserved. This reduces retrieval failures caused purely by where the chunk boundary happened to fall.

Index choice: HNSW vs IVFFlat

pgvector supports two approximate nearest-neighbour (ANN) index types:

	HNSW	IVFFlat
Build time	Slower	Faster
Query speed	Fast, consistent	Fast, varies
Recall at low K	Very high	Good
Memory usage	Higher	Lower
Requires training	No	Yes (needs `VACUUM ANALYZE` + enough rows)
Supports concurrent inserts	Yes	Yes

Why HNSW was chosen here:

No training step. IVFFlat requires a list count tuned to the number of vectors present at index creation time (lists = sqrt(rows)). An empty or small table produces a poor IVFFlat index that must be rebuilt later. HNSW works correctly from the first insert.
Better recall. HNSW navigates a multilevel proximity graph, achieving higher recall (fewer missed true nearest neighbours) at equivalent query latency. For a search API where result quality is the primary goal, this matters.
Simpler operationally. IVFFlat requires periodic VACUUM ANALYZE and potential index rebuilds as data grows. HNSW self-organises incrementally.

The trade-off is memory: HNSW uses more RAM per vector. For very large collections (>10M vectors) and memory-constrained deployments, IVFFlat is the right choice.

The configured parameters m=16, ef_construction=64 are pgvector's recommended defaults — a balanced starting point that works well for most datasets. Increase ef_construction (e.g. to 128) for higher recall at the cost of slower index builds.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
src		src
.env.example		.env.example
.gitignore		.gitignore
DESIGN_DECISIONS.md		DESIGN_DECISIONS.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

vector-search-api

Quickstart

Configuration

API reference

POST /ingest

POST /ingest/url

GET /search

DELETE /documents/:id

GET /namespaces

GET /namespaces/:ns/stats

GET /health

Chunking strategy

Index choice: HNSW vs IVFFlat

vector-search-api

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

vector-search-api

Quickstart

Configuration

API reference

POST /ingest

POST /ingest/url

GET /search

DELETE /documents/:id

GET /namespaces

GET /namespaces/:ns/stats

GET /health

Chunking strategy

Index choice: HNSW vs IVFFlat

vector-search-api

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages