I completely agree with your point on "production-readiness" and the hidden technical debt. Building an ML pipeline is just the start; ensuring scalability, maintainability, transparency, and other key factors is what truly makes it production-ready.
In my LLM-based applications, I use the following tools for the most of my projects:
- LangSmith or LangFuse for tracing, evaluations, and prompt management.
- Prometheus and Grafana for observability of non-LLM-related metrics.
- GitHub Actions and Coolify for automated testing and CI/CD.
- For dataset management, I use either LangSmith's built-in features or DVC.
However, beyond these, much of the operationalization is done on a project-by-project basis, often requiring custom solutions tailored to specific problems.
While it's true that focusing on the problem first and then selecting the right frameworks is important, the tools I’ve mentioned cover the majority of the LLMOps lifecycle. If something isn’t covered, I typically look for a framework as needed or develop my own solution on the fly.