Agentic AI – LLM with RAG – Beginner Bootcamp

Agentic AI – LLM with RAG – Beginner Bootcamp

Welcome to our Beginner Bootcamp on Agentic AI! In this post, learn to Build Your Own Document-Reading AI Assistant — an AI tool that can read and understand documents like PDFs, Word files, Excel sheets, HTML pages, JSON files, and more using OpenAI’s powerful LLMs paired with a cutting-edge method called RAG (Retrieval-Augmented Generation).

New to Agentic AI?
Start with our Agentic AI: The Dawn of Proactive AI Assistants before diving into this hands-on guide.
Want to go deeper with multi-agent systems? Check out our advanced tutorial on Beginner Bootcamp: Create Agentic AI Apps with Python.

🛠️ GitHub Repository: 👉 View the Code on GitHub
( https://github.com/debabratapruseth/AI-agent-with-LLM-and-RAG )

Why Learn RAG?

In the real world, organizations store their most valuable knowledge in documents — policy handbooks, manuals, technical reports, HR guidelines, etc. Standard language models like GPT-4 can’t access this data unless we feed it explicitly.

That’s where RAG comes in. RAG = Retrieval + LLM Generation
It retrieves the relevant content from your data, then feeds it to the LLM (like GPT-4) so it can answer your questions — accurately and in context.

What You Will Learn

By the end of this bootcamp, you’ll be able to:

Load content from various document types (PDF, DOCX, XLSX, HTML, JSON, CSV).
Split content into manageable chunks for efficient processing.
Generate vector embeddings using OpenAI.
Store those embeddings in a vector database like FAISS.
Use GPT-4 to generate accurate answers based on retrieved data.

How It Works

RAG Flow — Source : https://miro.medium.com/v2/resize:fit:1400/1*YLrQl5CM7NjQPcfTCrf-sQ.png

Load: Choose the right loader based on your file type.
Chunk: Break the text into smaller pieces (because LLMs have limited memory).
Embed: Convert text chunks into vector representations.
Store: Save those vectors in a searchable vector database.
Query: When you ask a question, retrieve the most relevant chunks.
Answer: Feed those chunks to GPT-4 for a final, context-aware response.

Choosing the Right Document Loader

Content Type	Recommended Loader	Why Use It?
Simple PDFs	`PyPDFLoader`	Fast, clean text extraction
PDFs with tables	`PDFPlumberLoader`	Can extract table content
Scanned/image-based PDFs	`UnstructuredPDFLoader`	Supports OCR + layout parsing
Word / Excel	`UnstructuredWordDocumentLoader`, `UnstructuredExcelLoader`	Preserves formatting and structure
CSV / JSON	`PandasCSVLoader`, `JSONLoader`	Great for structured and tabular data
Web / HTML	`WebBaseLoader`, `UnstructuredHTMLLoader`	Useful for scraping public pages

Why Chunking Matters

LLMs like GPT-4 can only process a limited number of tokens at once (e.g., 8K–32K tokens). To get around this, we split documents into smaller parts:

Chunking Benefits:

Enables semantic search
Improves accuracy of answers
Supports large documents

Common Splitters (Chunking Tools):

Splitter Class	Description	Ideal For
`RecursiveCharacterTextSplitter`	Splits by paragraph → sentence → word	Most general-purpose docs (PDFs, books)
`TokenTextSplitter`	Splits by token count	Fine control for GPT-3.5/4
`MarkdownHeaderTextSplitter`	Splits based on Markdown headers	Blog posts, tech docs
`HTMLHeaderTextSplitter`	Splits based on HTML tags	Web scraping
`Language` Splitters	Splits by function/class in code	Source code, Jupyter notebooks

Chunk Size Best Practices:
chunk_size # Number of characters per chunk
chunk_overlap # Chracter that should be common/overlap in-between sequential chunks. Overlap helps maintain context across chunks

Document Type	Chunk Size	Chunk Overlap
Emails, Chats	300–500 characters	~50–100
Policies, Manuals	500–1000 characters	~100–200
Scientific Papers	1000–1500 characters	~150–200
Source Code (Python)	~30 lines	~10 lines
For GPT-4 Turbo (128k)	≤ 3000 tokens	~200 tokens

Vector Stores Explained

Once you have text embeddings, you’ll need a vector store to save and search them.

What is a Vector Store?

Stores numerical representations (embeddings) of your text
Allows semantic search
Returns relevant content for any user question

Vector Store Comparision:

Name	Type	Open Source	Use Case	Notes
FAISS	In-memory	✅ Yes	Local prototyping	Fast, lightweight
Chroma	Embedded	✅ Yes	Small-scale RAG systems	LangChain default
Pinecone	Cloud-hosted	❌ No	Production RAG at scale	High performance, enterprise ready
Qdrant	Cloud-native	✅ Yes	Filtering + metadata + hybrid	Excellent for structured docs
Weaviate	Cloud-native	✅ Yes	NLP + hybrid search	GraphQL API, advanced filters
Milvus	On-prem/cloud	✅ Yes	AI for image/video/audio	Supports GPU acceleration

Educational Goals for Students

By completing this project, you will:

Understand how LLMs can work with private data
Build a working pipeline from document → vector → answer
Learn to choose the right tools for the job
Practice Python + AI in an applied real-world scenario
Gain confidence to build your own AI-powered Q&A system

What’s Next?

Once you’ve mastered the basics:

Add a UI using Gradio or Streamlit
Support live web pages or email parsing
Store your FAISS vector DB to disk for persistence
Scale using Pinecone or Qdrant

Ready to Build Your First Agent?

Open Google Colab (or any Python environment of your choice). Upload the notebook from the GitHub repo. Run the code and see the agent in action. Use any AI assistant (e.g., Gemini, Copilot, ChatGPT) for real-time debugging or customizations.

Want to learn more about GenAI and Prompt Engineering !

Home: Gen AI

Discover more from Debabrata Pruseth

Subscribe to get the latest posts sent to your email.