AI Chatbot

Retrieval Augmented Generation (RAG) Chatbot

An RAG-powered chatbot that answers questions over ingested PDFs using an automated OCR and embedding pipeline.

Project Overview

I developed this RAG (Retrieval-Augmented Generation) chatbot to empower Clinical Research Monitors, particularly Principal Investigators (PIs), to interactively query key information extracted from technical and regulatory PDF documents.

The pipeline starts by fetching PDFs via an internal API, then pre-processing them through OCR (Optical Character Recognition) to extract text. The text is chunked and converted into vector embeddings—numerical representations that capture the semantic meaning of each chunk.

Embeddings enable efficient semantic search: when a user submits a question, the system performs a similarity search over the embedding store to retrieve the most relevant document sections. These retrieved chunks are then passed to a Large Language Model (LLM) to generate precise, context-aware answers—this combination of retrieval and generation is known as RAG.

Key Features

Automated PDF ingestion with OCR-based text extraction
Semantic retrieval via vector embeddings
RAG-powered responses for contextually accurate answers

Technologies Used

PythonFastAPIOpenAI APIOpenVision OCRElasticsearch (vector database)

Project Gallery

Retrieval & Generation Diagram

Project Details

Client

Clinical Research Monitors

Timeline

Jan 2025 - May 2025

Role

AI Engineer

More Projects

Snipiddy: AI-Powered Dietary Assistant

Web Application

Hydrologic

IoT