This is a great article, some of these techniques are pretty useful!
I have a few questions, if you don't mind:
1. Will this pipeline still work with small models like Mistral? The agent logic requires LLM to answer in a specific way and decide things on its own, which is from my experience not a good idea - small models usually give wrong answers, their context is small, they hallucinate, they sometimes don't follow the prompt completely, and finally they even mix up the context with prompt. Have you encountered these issues, and what would be your advice?
2. How would you add the logic of deciding whether the assistant needs the data from database or not? For example, if the user asks to summarize the dialog, it probably shouldn't use RAG and just use the dialog history.