Upload your policy documents, compliance manuals, or product guides. Ask questions in plain English and get accurate, cited answers — no hallucinations, because every answer is grounded in your actual documents.
Check: python3 --version.
Get one at platform.openai.com/api-keys. We use OpenAI for embeddings (text-embedding-3-small is cheap and excellent) and GPT-4o for answers. Alternatively, you can use Anthropic for answers and a local embedding model — the code shows how.
Gather the PDFs, Word docs, or text files you want the agent to know about. Put them in a folder called documents/. This example works with PDF and plain text. For Word documents, pip install python-docx and add a loader for it.
mkdir doc-qa-agent && cd doc-qa-agent
python3 -m venv venv
source venv/bin/activate
pip install openai llama-index llama-index-embeddings-openai \
llama-index-llms-openai pypdf python-dotenv
# Create the documents directory and add your files
mkdir documents
# Copy your PDFs or text files into the documents/ folder
OPENAI_API_KEY=sk-...your-key...
import os
from dotenv import load_dotenv
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
load_dotenv()
# Configure embedding model and LLM
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_key=os.getenv("OPENAI_API_KEY")
)
Settings.llm = OpenAI(
model="gpt-4o",
api_key=os.getenv("OPENAI_API_KEY")
)
print("Loading documents from ./documents/ ...")
documents = SimpleDirectoryReader(
input_dir="./documents",
recursive=True,
required_exts=[".pdf", ".txt", ".md"]
).load_data()
print(f"Loaded {len(documents)} document chunks")
print("Building vector index (this may take a minute)...")
index = VectorStoreIndex.from_documents(
documents,
show_progress=True
)
# Save the index to disk so we don't rebuild every time
index.storage_context.persist(persist_dir="./index_storage")
print("✓ Index saved to ./index_storage")
print(f"\nDocuments indexed:")
sources = set(d.metadata.get('file_name', 'unknown') for d in documents)
for s in sorted(sources):
print(f" - {s}")
Run it: python ingest.py. For 10–20 PDF pages, this takes about 30 seconds and costs a few cents in OpenAI embedding API calls. Re-run whenever you add or update documents.
import os
from dotenv import load_dotenv
from llama_index.core import StorageContext, load_index_from_storage, Settings
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.retrievers import VectorIndexRetriever
from llama_index.core.response_synthesizers import get_response_synthesizer
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
load_dotenv()
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
api_key=os.getenv("OPENAI_API_KEY")
)
Settings.llm = OpenAI(
model="gpt-4o",
api_key=os.getenv("OPENAI_API_KEY"),
system_prompt="""You are a helpful document assistant. Answer questions
based ONLY on the provided context. If the answer is not in the context,
say "I couldn't find information about this in the documents."
Always cite the source document and section when answering.
Format citations as: [Source: filename, page X]"""
)
# Load the pre-built index from disk
print("Loading document index...")
storage_context = StorageContext.from_defaults(persist_dir="./index_storage")
index = load_index_from_storage(storage_context)
# Configure retriever — top_k=5 means retrieve 5 most relevant chunks
retriever = VectorIndexRetriever(index=index, similarity_top_k=5)
# Response synthesizer formats and cites the answer
response_synthesizer = get_response_synthesizer(
response_mode="compact",
verbose=False
)
query_engine = RetrieverQueryEngine(
retriever=retriever,
response_synthesizer=response_synthesizer,
)
def ask(question: str) -> str:
"""Query the document index and return an answer with citations."""
response = query_engine.query(question)
# Extract source citations
sources = []
if hasattr(response, 'source_nodes'):
for node in response.source_nodes:
filename = node.metadata.get('file_name', 'unknown')
page = node.metadata.get('page_label', node.metadata.get('page', '?'))
score = round(node.score, 3) if node.score else None
source = f"{filename} (page {page})"
if score:
source += f" [relevance: {score}]"
if source not in sources:
sources.append(source)
answer = str(response)
if sources:
answer += f"\n\n📎 Sources: {' | '.join(sources)}"
return answer
def main():
print("\n" + "="*60)
print("Document Q&A Agent — Ready")
print("="*60)
print("Ask questions about your documents.")
print("Type 'quit' to exit.\n")
while True:
question = input("Question: ").strip()
if question.lower() in ('quit', 'exit', 'q'):
break
if not question:
continue
print("\nSearching documents...")
answer = ask(question)
print(f"\nAnswer: {answer}\n")
print("-" * 40 + "\n")
if __name__ == "__main__":
main()
# Step 1: Ingest documents (run once)
python ingest.py
# Step 2: Start the Q&A agent
python agent.py
Example interaction with an insurance policy document:
Question: What is the deductible for collision coverage?
Searching documents...
Answer: According to the policy documents, the collision coverage
deductible is $500 for standard auto policies. However, if you
selected the "low deductible" option at enrollment, it may be
reduced to $250.
📎 Sources: auto-policy-2024.pdf (page 12) [relevance: 0.891] |
coverage-summary.pdf (page 3) [relevance: 0.743]
----------------------------------------
Question: Does the policy cover rental car costs after an accident?
Searching documents...
Answer: Yes, rental reimbursement coverage is included if you
added the "Transportation Expense" endorsement. The policy covers
up to $30/day and $900 total per claim while your vehicle is being
repaired.
📎 Sources: auto-policy-2024.pdf (page 18) [relevance: 0.912]
----------------------------------------
Question: What is the maximum age to purchase a life insurance policy?
Searching documents...
Answer: I couldn't find information about this in the documents.
Your documents appear to cover auto and home insurance — this
question may be about a product not covered in the indexed files.
Pro tip: The last example shows the agent correctly saying "I don't know" instead of hallucinating. This is the key benefit of RAG — the agent only answers from what's actually in your documents.
Add pip install streamlit and create a app.py with a simple chat UI. Run with streamlit run app.py and share the URL with your team. Takes about 20 lines of code.
Add a file watcher using watchdog that re-runs ingest.py automatically when any file in the documents/ folder is added, changed, or deleted.
Tag each document with metadata (e.g. "product: home-insurance", "year: 2024") and add filters to the retriever so users can ask "according to the 2024 policy..." and only search the relevant subset.
Replace the basic query engine with LlamaIndex's CondensePlusContextChatEngine to support follow-up questions in a conversation: "What's the deductible?" → "How do I lower it?" (it remembers you were asking about deductibles).