LLM Spreads: Demo

Feb 14, 2026·
Estevan Fisk Data Scientist
Estevan Fisk Data Scientist
· 1 min read

Python 3.10+ Gemini 1.5 LLM OpenAI Backup LLM LangGraph Agents Modal Compute
Docling Parsing Gemini Embeddings Streamlit
Data Scientist M.S. Statistics Stoicism

⚖️ Deployment & Performance

This application is deployed using a Serverless Architecture (Modal).

  • Cost Efficiency: To minimize overhead, the infrastructure remains at zero when not in use. You are only billed for the exact seconds the code is running.

  • The “Cold Start” Trade-off: Upon the first request, the environment takes ~30-60 seconds to spin up the container and load the LLM/Vector index into memory. Subsequent requests are sub-second.

  • Why this approach? For a portfolio project, this demonstrates the ability to deploy production-grade AI without the high monthly cost of an “always-on” GPU server.

(Note: The app runs on a cold-start architecture to save costs. Please allow ~30-60 seconds for the first request as the container spins up. Run time for example pdf Fastenal Report takes ~45-60 seconds as Docling can take a while to parse the file.)


Open App Externally in New Tab ↗