Agentic AI – LLM with RAG – Beginner Bootcamp

Welcome to our Beginner Bootcamp on Agentic AI! In this post, learn to Build Your Own Document-Reading AI Assistant — an AI tool that can read and understand documents like PDFs, Word files, Excel sheets, HTML pages, JSON files, and more using OpenAI’s powerful LLMs paired with a cutting-edge method called RAG (Retrieval-Augmented Generation).

New to Agentic AI?
Start with our Agentic AI: The Dawn of Proactive AI Assistants before diving into this hands-on guide.
Want to go deeper with multi-agent systems? Check out our advanced tutorial on Beginner Bootcamp: Create Agentic AI Apps with Python.

Why Learn RAG?

In the real world, organizations store their most valuable knowledge in documents — policy handbooks, manuals, technical reports, HR guidelines, etc. Standard language models like GPT-4 can’t access this data unless we feed it explicitly.

That’s where RAG comes in. RAG = Retrieval + LLM Generation
It retrieves the relevant content from your data, then feeds it to the LLM (like GPT-4) so it can answer your questions — accurately and in context.

What You Will Learn

By the end of this bootcamp, you’ll be able to:

  1. Load content from various document types (PDF, DOCX, XLSX, HTML, JSON, CSV).
  2. Split content into manageable chunks for efficient processing.
  3. Generate vector embeddings using OpenAI.
  4. Store those embeddings in a vector database like FAISS.
  5. Use GPT-4 to generate accurate answers based on retrieved data.

How It Works

RAG Flow
Source : https://miro.medium.com/v2/resize:fit:1400/1*YLrQl5CM7NjQPcfTCrf-sQ.png
  1. Load: Choose the right loader based on your file type.
  2. Chunk: Break the text into smaller pieces (because LLMs have limited memory).
  3. Embed: Convert text chunks into vector representations.
  4. Store: Save those vectors in a searchable vector database.
  5. Query: When you ask a question, retrieve the most relevant chunks.
  6. Answer: Feed those chunks to GPT-4 for a final, context-aware response.

Choosing the Right Document Loader

Content TypeRecommended LoaderWhy Use It?
Simple PDFsPyPDFLoaderFast, clean text extraction
PDFs with tablesPDFPlumberLoaderCan extract table content
Scanned/image-based PDFsUnstructuredPDFLoaderSupports OCR + layout parsing
Word / ExcelUnstructuredWordDocumentLoader, UnstructuredExcelLoaderPreserves formatting and structure
CSV / JSONPandasCSVLoader, JSONLoaderGreat for structured and tabular data
Web / HTMLWebBaseLoader, UnstructuredHTMLLoaderUseful for scraping public pages

Why Chunking Matters

LLMs like GPT-4 can only process a limited number of tokens at once (e.g., 8K–32K tokens). To get around this, we split documents into smaller parts:

Chunking Benefits:

  1. Enables semantic search
  2. Improves accuracy of answers
  3. Supports large documents

Common Splitters (Chunking Tools):

Splitter ClassDescriptionIdeal For
RecursiveCharacterTextSplitterSplits by paragraph → sentence → wordMost general-purpose docs (PDFs, books)
TokenTextSplitterSplits by token countFine control for GPT-3.5/4
MarkdownHeaderTextSplitterSplits based on Markdown headersBlog posts, tech docs
HTMLHeaderTextSplitterSplits based on HTML tagsWeb scraping
Language SplittersSplits by function/class in codeSource code, Jupyter notebooks

Chunk Size Best Practices:
chunk_size # Number of characters per chunk
chunk_overlap # Chracter that should be common/overlap in-between sequential chunks. Overlap helps maintain context across chunks

Document TypeChunk SizeChunk Overlap
Emails, Chats300–500 characters~50–100
Policies, Manuals500–1000 characters~100–200
Scientific Papers1000–1500 characters~150–200
Source Code (Python)~30 lines~10 lines
For GPT-4 Turbo (128k)≤ 3000 tokens~200 tokens

Vector Stores Explained

Once you have text embeddings, you’ll need a vector store to save and search them.

What is a Vector Store?

  1. Stores numerical representations (embeddings) of your text
  2. Allows semantic search
  3. Returns relevant content for any user question

Vector Store Comparision:

NameTypeOpen SourceUse CaseNotes
FAISSIn-memory✅ YesLocal prototypingFast, lightweight
ChromaEmbedded✅ YesSmall-scale RAG systemsLangChain default
PineconeCloud-hosted❌ NoProduction RAG at scaleHigh performance, enterprise ready
QdrantCloud-native✅ YesFiltering + metadata + hybridExcellent for structured docs
WeaviateCloud-native✅ YesNLP + hybrid searchGraphQL API, advanced filters
MilvusOn-prem/cloud✅ YesAI for image/video/audioSupports GPU acceleration

Educational Goals for Students

By completing this project, you will:

  1. Understand how LLMs can work with private data
  2. Build a working pipeline from document → vector → answer
  3. Learn to choose the right tools for the job
  4. Practice Python + AI in an applied real-world scenario
  5. Gain confidence to build your own AI-powered Q&A system

What’s Next?

Once you’ve mastered the basics:

  • Add a UI using Gradio or Streamlit
  • Support live web pages or email parsing
  • Store your FAISS vector DB to disk for persistence
  • Scale using Pinecone or Qdrant

Ready to Build Your First Agent?

Open Google Colab (or any Python environment of your choice). Upload the notebook from the GitHub repo. Run the code and see the agent in action. Use any AI assistant (e.g., Gemini, Copilot, ChatGPT) for real-time debugging or customizations.


Want to learn more about GenAI and Prompt Engineering !


Discover more from Debabrata Pruseth

Subscribe to get the latest posts sent to your email.

Scroll to Top