What You Get
- -Document ingestion pipeline
- -Smart chunking with overlap strategy
- -Vector embedding and indexing
- -Semantic search with hybrid retrieval
- -Cited answer generation
Step by Step
1. Set up PostgreSQL with pgvector
Install pgvector extension. Create tables: documents (id, title, file_type, created_at), chunks (id, document_id, content, chunk_index, embedding vector(1536)), and set up a HNSW index on the embedding column for fast similarity search.
2. Build the document ingestion pipeline
Accept uploads of PDF, DOCX, TXT, and Markdown files. Extract text using appropriate parsers (pdf-parse for PDF, mammoth for DOCX). Generate a unique document ID and store metadata.
3. Implement smart chunking
Use recursive character text splitting: chunk size of 1000 characters with 200-character overlap. For each chunk, store the document_id, chunk_index, and content. Handle edge cases: tables, code blocks, headers.
4. Generate and store embeddings
For each chunk, generate an embedding using text-embedding-3-small (1536 dimensions). Batch process chunks (20 at a time) to respect API rate limits. Store embeddings in pgvector.
5. Build the search API
Create a query endpoint that: generates an embedding for the query, performs cosine similarity search via pgvector, optionally adds keyword BM25 fallback, and returns top 5 chunks with document source and relevance scores.
6. Implement answer generation
Use OpenAI to generate answers from retrieved chunks. Prompt includes: the question, the retrieved chunks with source citations, and instructions to cite sources. Return answer with references.
7. Build the chat UI and admin panel
Create a chat interface with message history, source references displayed as collapsible citations, and an admin panel to upload/manage documents and view ingestion status.
Stack
Build This
Copy this prompt and paste it into Claude Code, OpenCode, Codex, or Cursor to build this recipe.
Common Failure Modes
- !Poor chunking strategy for different document types
- !Embedding costs at scale
- !Retrieving irrelevant chunks
- !Hallucination from weak retrieval
Implementation Notes
Start with 10-20 documents to tune chunking. Monitor embedding API costs. Test retrieval quality before building the UI.
Related skill: rag document ingestion
Want rag over company documents running in your business?
4M Labs can deploy rag over company documents as a production workflow:
- Connected to your tools and data sources
- Secured for your team with proper access controls
- Deployed with monitoring and error handling
- Documented for handoff and future maintenance