LLM Spreads: Demo

⚖️ Deployment & Performance

This application is deployed using a Serverless Architecture (Modal).

Cost Efficiency: To minimize overhead, the infrastructure remains at zero when not in use. You are only billed for the exact seconds the code is running.
The “Cold Start” Trade-off: Upon the first request, the environment takes ~30-60 seconds to spin up the container and load the LLM/Vector index into memory. Subsequent requests are sub-second.
Why this approach? For a portfolio project, this demonstrates the ability to deploy production-grade AI without the high monthly cost of an “always-on” GPU server.

(Note: The app runs on a cold-start architecture to save costs. Please allow ~30-60 seconds for the first request as the container spins up. Run time for example pdf Fastenal Report takes ~45-60 seconds as Docling can take a while to parse the file.)

Open App Externally in New Tab ↗

No results found

⚖️ Deployment & Performance