RAG Over Company Documents: AI Recipe

What You Get

-Document ingestion pipeline
-Smart chunking with overlap strategy
-Vector embedding and indexing
-Semantic search with hybrid retrieval
-Cited answer generation

Step by Step

1. Set up PostgreSQL with pgvector

Install pgvector extension. Create tables: documents (id, title, file_type, created_at), chunks (id, document_id, content, chunk_index, embedding vector(1536)), and set up a HNSW index on the embedding column for fast similarity search.

2. Build the document ingestion pipeline

Accept uploads of PDF, DOCX, TXT, and Markdown files. Extract text using appropriate parsers (pdf-parse for PDF, mammoth for DOCX). Generate a unique document ID and store metadata.

3. Implement smart chunking

Use recursive character text splitting: chunk size of 1000 characters with 200-character overlap. For each chunk, store the document_id, chunk_index, and content. Handle edge cases: tables, code blocks, headers.

4. Generate and store embeddings

For each chunk, generate an embedding using text-embedding-3-small (1536 dimensions). Batch process chunks (20 at a time) to respect API rate limits. Store embeddings in pgvector.

5. Build the search API

Create a query endpoint that: generates an embedding for the query, performs cosine similarity search via pgvector, optionally adds keyword BM25 fallback, and returns top 5 chunks with document source and relevance scores.

6. Implement answer generation

Use OpenAI to generate answers from retrieved chunks. Prompt includes: the question, the retrieved chunks with source citations, and instructions to cite sources. Return answer with references.

7. Build the chat UI and admin panel

Create a chat interface with message history, source references displayed as collapsible citations, and an admin panel to upload/manage documents and view ingestion status.

Stack

PostgreSQL + pgvectorOpenAI embeddingsOpenAINext.jsLangChain or custom pipeline

Build This

Copy this prompt and paste it into Claude Code, OpenCode, Codex, or Cursor to build this recipe.

Build me a RAG system over company documents. It should: 1) Accept uploads of PDF, DOCX, TXT, and Markdown files. 2) Chunk documents using recursive character splitting with 1000-character chunks and 200-character overlap. 3) Generate embeddings using text-embedding-3-small and store in PostgreSQL with pgvector. 4) On query, perform hybrid search (semantic similarity + keyword BM25) and retrieve top 5 relevant chunks. 5) Generate an answer using OpenAI that cites the source document and chunk position. 6) Include a chat UI with source references displayed alongside answers. 7) Add an admin panel to manage documents and view ingestion status.

Common Failure Modes

!Poor chunking strategy for different document types
!Embedding costs at scale
!Retrieving irrelevant chunks
!Hallucination from weak retrieval

Implementation Notes

Start with 10-20 documents to tune chunking. Monitor embedding API costs. Test retrieval quality before building the UI.

Related skill: rag document ingestion

Want rag over company documents running in your business?

4M Labs can deploy rag over company documents as a production workflow:

Connected to your tools and data sources
Secured for your team with proper access controls
Deployed with monitoring and error handling
Documented for handoff and future maintenance

Book an Implementation Sprint