Architecture
System Architecture
Understanding how NotebookLLM's frontend, backend, and AI services work together.
High-Level Architecture

Technology Stack
Backend
- • FastAPI - Web framework
- • PostgreSQL - Database
- • Qdrant - Vector store
- • Supabase - Auth & Storage
- • LlamaIndex - RAG framework
- • Procrastinate - Task queue
Frontend
- • Next.js 14 - React framework
- • TypeScript - Type safety
- • Tailwind CSS - Styling
- • shadcn/ui - Components
- • React Query - State
- • next-themes - Theming
AI/ML
- • Google Gemini - LLM
- • Cohere - Reranking
- • Sentence Transformers
- • Kokoro TTS - Audio
- • Langfuse - Observability
Data Flow
Document Ingestion Flow
- User uploads file via frontend
- File stored in Supabase (private bucket)
- Background worker processes document
- Text extracted, chunked, and embedded
- Vectors stored in Qdrant
- Document status updated to COMPLETED
Chat Query Flow
- User sends message via chat interface
- Backend retrieves relevant document chunks (hybrid search)
- Chunks reranked using Cohere
- LLM generates response with citations
- Response streamed to frontend
- Message and citations saved to database
Content Generation Flow
- User requests content (podcast/quiz/flashcards)
- LLM generates content using document context
- For podcasts: TTS converts script to audio
- Generated content stored in database
- Audio files uploaded to public bucket
- Content available in Studio panel
Key Components
RAG Pipeline
Retrieval-Augmented Generation powered by LlamaIndex. Combines semantic and keyword search with reranking for accurate, cited responses.
Content Generation
AI-powered generation of podcasts, quizzes, flashcards, and mind maps using structured outputs for reliable results.
Background Processing
Procrastinate-based task queue handles long-running operations like document processing and content generation.
Authentication
Supabase Auth with JWT validation. Users provisioned automatically on first login.