DocChat App: Demo
⚖️ Deployment & Performance
This application is deployed using a Serverless Architecture (Modal).
Cost Efficiency: To minimize overhead, the infrastructure remains at zero when not in use. You are only billed for the exact seconds the code is running.
The “Cold Start” Trade-off: Upon the first request, the environment takes ~30-60 seconds to spin up the container and load the LLM/Vector index into memory. Subsequent requests are sub-second.
Why this approach? For a portfolio project, this demonstrates the ability to deploy production-grade AI without the high monthly cost of an “always-on” GPU server.
(Note: The app runs on a cold-start architecture to save costs. Please allow ~30-60 seconds for the first request as the container spins up. Run time for example DeepSeek-R1 Technical Report takes ~45-60 seconds. Run time for Google 2024 Environmental Report takes ~2-3 minutes due to larger pdf file size.)