DocChat App: Demo

Jan 18, 2026·
Estevan Fisk Data Scientist
Estevan Fisk Data Scientist
· 1 min read

Python 3.10+ Gemini 1.5 LLM OpenAI Backup LLM LangGraph Agents Modal Compute
Chroma Vector DB Docling Parsing Gemini Embeddings
Data Scientist M.S. Statistics Stoicism

⚖️ Deployment & Performance

This application is deployed using a Serverless Architecture (Modal).

  • Cost Efficiency: To minimize overhead, the infrastructure remains at zero when not in use. You are only billed for the exact seconds the code is running.

  • The “Cold Start” Trade-off: Upon the first request, the environment takes ~30-60 seconds to spin up the container and load the LLM/Vector index into memory. Subsequent requests are sub-second.

  • Why this approach? For a portfolio project, this demonstrates the ability to deploy production-grade AI without the high monthly cost of an “always-on” GPU server.

(Note: The app runs on a cold-start architecture to save costs. Please allow ~30-60 seconds for the first request as the container spins up. Run time for example DeepSeek-R1 Technical Report takes ~45-60 seconds. Run time for Google 2024 Environmental Report takes ~2-3 minutes due to larger pdf file size.)


Open App Externally in New Tab ↗