I’m becoming increasingly convinced that the conversational AI future is a mixture of general (foundational) large language models (LLMs) that can provide a high-level diagnosis of a situation or question, and which then delegate to LLMs for specialized reasoning. The general LLM is used to process generic language to orchestrate calls to specialized services and LLMs with deep domain knowledge, and then to potentially summarise and synthesis the results back into a general form for the end-user.

I think this architecture has the advantage that the specialized LLMs can be (cheaply) trained on highly curated data sets to ensure accuracy answering questions about very narrow domains. I don’t typically ask a cardiologist advice on how to repair my car, or vice-a-versa! The intuition is that while it may be possible to train a single AI to handle questions about both heart conditions and Ford Galaxy engine fault codes it is likely much more expensive to combine both sources of knowledge in a single deep neural network, and accuracy may well suffer.

The orchestrating LLM has to be great at understanding the concepts behind a question, and delegating to the right specialized LLMs, formulating the right questions to ask, and interpreting responses, probing with follow-up questions or requesting clarifications. To be effective the orchestrating LLM has to have elements of language translation, data transformation, common-sense, reasoning, spatial awareness.

Perhaps I am just describing something akin to the OpenAI Plugin architecture, but with a more intentional mechanism to form a cluster of conversational LLMs co-operating via an orchestrator to answer a question or complete a task?

What do you think will be the future architecture? Monolith? Hub-and-spoke? Network-of-networks? Peer-to-peer? Something else?