---
name: rag-document-ingestion
description: Gives an agent the ability to ingest documents, chunk them, generate embeddings, and build a searchable knowledge base.
inputs:
  - documents: Files (PDF, DOCX, TXT, MD) or URLs to process
  - chunk_strategy: Chunk size and overlap configuration
  - embedding_model: Which embedding model to use
outputs:
  - vector_index: Searchable embeddings in pgvector
  - chunk_summary: Overview of processed chunks and their sources
  - stats: Document count, chunk count, storage used
tools:
  - pgvector: Vector storage and similarity search
  - openai_embeddings: Text embedding generation
  - pdf_parser: Document text extraction
  - postgresql: Metadata and chunk storage
safety:
  - Review documents for PII before ingestion
  - Set access controls on the knowledge base
  - Monitor embedding API costs
  - Do not ingest sensitive credentials or secrets
---

# RAG Document Ingestion Skill

Ingest documents, chunk them, generate embeddings, and build a searchable knowledge base for RAG applications.

## When to Use

- You have company docs you want to make searchable
- You want to build a Q&A system over your knowledge base
- You need cited answers from internal documentation
- You're setting up a RAG pipeline

## How It Works

1. **Upload**: Accept PDF, DOCX, TXT, and Markdown files
2. **Extract**: Parse text content from each document
3. **Chunk**: Split into overlapping chunks (1000 chars, 200 overlap)
4. **Embed**: Generate vector embeddings for each chunk
5. **Index**: Store in pgvector with HNSW index for fast search
6. **Verify**: Test retrieval quality with sample queries

## Chunk Strategy

- Size: 1000 characters per chunk
- Overlap: 200 characters between chunks
- Split on: paragraph breaks, then sentences, then characters
- Preserve: Headers and section context

## Example Prompt

"Ingest these 10 PDF documents into a RAG knowledge base. Chunk them with 1000-char chunks and 200-char overlap. Generate embeddings and store in pgvector. Then test retrieval with 3 sample queries."

## Related

- Recipe: /recipes/rag-company-docs
