
Agentic AI – LLM with RAG – Beginner Bootcamp
Welcome to our Beginner Bootcamp on Agentic AI! In this post, learn to Build Your Own Document-Reading AI Assistant — an AI tool that can read and understand documents like PDFs, Word files, Excel sheets, HTML pages, JSON files, and more using OpenAI’s powerful LLMs paired with a cutting-edge method called RAG (Retrieval-Augmented Generation).
New to Agentic AI?
Start with our Agentic AI: The Dawn of Proactive AI Assistants before diving into this hands-on guide.
Want to go deeper with multi-agent systems? Check out our advanced tutorial on Beginner Bootcamp: Create Agentic AI Apps with Python.
🛠️ GitHub Repository: 👉 View the Code on GitHub
( https://github.com/debabratapruseth/AI-agent-with-LLM-and-RAG )
Why Learn RAG?
In the real world, organizations store their most valuable knowledge in documents — policy handbooks, manuals, technical reports, HR guidelines, etc. Standard language models like GPT-4 can’t access this data unless we feed it explicitly.
That’s where RAG comes in. RAG = Retrieval + LLM Generation
It retrieves the relevant content from your data, then feeds it to the LLM (like GPT-4) so it can answer your questions — accurately and in context.
What You Will Learn
By the end of this bootcamp, you’ll be able to:
- Load content from various document types (PDF, DOCX, XLSX, HTML, JSON, CSV).
- Split content into manageable chunks for efficient processing.
- Generate vector embeddings using OpenAI.
- Store those embeddings in a vector database like FAISS.
- Use GPT-4 to generate accurate answers based on retrieved data.
How It Works

- Load: Choose the right loader based on your file type.
- Chunk: Break the text into smaller pieces (because LLMs have limited memory).
- Embed: Convert text chunks into vector representations.
- Store: Save those vectors in a searchable vector database.
- Query: When you ask a question, retrieve the most relevant chunks.
- Answer: Feed those chunks to GPT-4 for a final, context-aware response.
Choosing the Right Document Loader
| Content Type | Recommended Loader | Why Use It? |
|---|---|---|
| Simple PDFs | PyPDFLoader | Fast, clean text extraction |
| PDFs with tables | PDFPlumberLoader | Can extract table content |
| Scanned/image-based PDFs | UnstructuredPDFLoader | Supports OCR + layout parsing |
| Word / Excel | UnstructuredWordDocumentLoader, UnstructuredExcelLoader | Preserves formatting and structure |
| CSV / JSON | PandasCSVLoader, JSONLoader | Great for structured and tabular data |
| Web / HTML | WebBaseLoader, UnstructuredHTMLLoader | Useful for scraping public pages |
Why Chunking Matters
LLMs like GPT-4 can only process a limited number of tokens at once (e.g., 8K–32K tokens). To get around this, we split documents into smaller parts:
Chunking Benefits:
- Enables semantic search
- Improves accuracy of answers
- Supports large documents
Common Splitters (Chunking Tools):
| Splitter Class | Description | Ideal For |
|---|---|---|
RecursiveCharacterTextSplitter | Splits by paragraph → sentence → word | Most general-purpose docs (PDFs, books) |
TokenTextSplitter | Splits by token count | Fine control for GPT-3.5/4 |
MarkdownHeaderTextSplitter | Splits based on Markdown headers | Blog posts, tech docs |
HTMLHeaderTextSplitter | Splits based on HTML tags | Web scraping |
Language Splitters | Splits by function/class in code | Source code, Jupyter notebooks |
Chunk Size Best Practices:
chunk_size # Number of characters per chunk
chunk_overlap # Chracter that should be common/overlap in-between sequential chunks. Overlap helps maintain context across chunks
| Document Type | Chunk Size | Chunk Overlap |
|---|---|---|
| Emails, Chats | 300–500 characters | ~50–100 |
| Policies, Manuals | 500–1000 characters | ~100–200 |
| Scientific Papers | 1000–1500 characters | ~150–200 |
| Source Code (Python) | ~30 lines | ~10 lines |
| For GPT-4 Turbo (128k) | ≤ 3000 tokens | ~200 tokens |
Vector Stores Explained
Once you have text embeddings, you’ll need a vector store to save and search them.
What is a Vector Store?
- Stores numerical representations (embeddings) of your text
- Allows semantic search
- Returns relevant content for any user question
Vector Store Comparision:
| Name | Type | Open Source | Use Case | Notes |
|---|---|---|---|---|
| FAISS | In-memory | ✅ Yes | Local prototyping | Fast, lightweight |
| Chroma | Embedded | ✅ Yes | Small-scale RAG systems | LangChain default |
| Pinecone | Cloud-hosted | ❌ No | Production RAG at scale | High performance, enterprise ready |
| Qdrant | Cloud-native | ✅ Yes | Filtering + metadata + hybrid | Excellent for structured docs |
| Weaviate | Cloud-native | ✅ Yes | NLP + hybrid search | GraphQL API, advanced filters |
| Milvus | On-prem/cloud | ✅ Yes | AI for image/video/audio | Supports GPU acceleration |
Educational Goals for Students
By completing this project, you will:
- Understand how LLMs can work with private data
- Build a working pipeline from document → vector → answer
- Learn to choose the right tools for the job
- Practice Python + AI in an applied real-world scenario
- Gain confidence to build your own AI-powered Q&A system
What’s Next?
Once you’ve mastered the basics:
- Add a UI using Gradio or Streamlit
- Support live web pages or email parsing
- Store your FAISS vector DB to disk for persistence
- Scale using Pinecone or Qdrant
Ready to Build Your First Agent?
Open Google Colab (or any Python environment of your choice). Upload the notebook from the GitHub repo. Run the code and see the agent in action. Use any AI assistant (e.g., Gemini, Copilot, ChatGPT) for real-time debugging or customizations.
Discover more from Debabrata Pruseth
Subscribe to get the latest posts sent to your email.


