Multi-Agent Interactions

16 Sep 2024

Something neat I watched in a cringe-worthy demo from Microsoft was having a multi-agent toolchain to do some more precise reasoning. In this context, a problem that requires input from several different actions are segregated off into “agents”, some requiring LLMs, some not, and a “conductor” agent incorporates all the input from the various agents and decides if the conclusion reached satisfies the problem requirements - if it doesn’t, we continue working until the problem is solved.

In the demo, they created an application that verifies some of the functionality that Microsoft provides by creating three agents - one that answers the user’s original question with a link, one that verifies that the answer is correct using retrieval-augmented generation, and one that verifies that the link that was generated is an actual working link. With those three generated outputs, a final “conductor” agent parses the output from each agent and decides if the problem has been adequately answered - if not, the conductor instructs all the agents to try again.

This seems like a novel (if not a bit heavy-handed) approach to LLM verification against hallucinations, and in fact mimics a similar flow that a lot of LLM testing tools take. Microsoft and other open source tools often rely on other, more powerful LLMs to verify the output of RAG or smaller LLMs, rating the generated answer on a scale of 1 to 5 along the metrics of coherence, groundedness, and relevance.

azure metrics

image from https://learn.microsoft.com/en-us/azure/ai-studio/how-to/evaluate-flow-results

Furthermore, Microsoft has started to bake some framework support for this multi-agent flow into Semantic Kernel. It’s still very early and likely subject to change, but it’s encouraging to see the tooling for such interactions being made available early on. They have a few nice examples of how to work with this new AgentChat concept on github - let’s explore the Step03_Chat.cs file.

One of the first things you can see in the file is a few instructional (system) prompts:

    private const string ReviewerName = "ArtDirector";
    private const string ReviewerInstructions =
        """
        You are an art director who has opinions about copywriting born of a love for David Ogilvy.
        The goal is to determine if the given copy is acceptable to print.
        If so, state that it is approved.
        If not, provide insight on how to refine suggested copy without example.
        """;

    private const string CopyWriterName = "CopyWriter";
    private const string CopyWriterInstructions =
        """
        You are a copywriter with ten years of experience and are known for brevity and a dry humor.
        The goal is to refine and decide on the single best copy as an expert in the field.
        Only provide a single proposal per response.
        You're laser focused on the goal at hand.
        Don't waste time with chit chat.
        Consider suggestions when refining an idea.
        """;

These outline both a name (important later) and instructions for two separate agents - one responsible for editing, and another for generating proposals.

I always like reading other’s system prompts to see how they address things like ‘chattiness’ and constraining the output of an LLM - statements like ‘you’re laser focused’ and ‘don’t waste time with chit chat’ are interesting ways of accomplishing this, ones I haven’t tried myself. Other LLMs may respond differently to these cues, these examples are geared to work with OpenAI’s ChatGPT models.

Further down the file, after we create our two agents with those names and instructions, we see the creation of a AgentGroupChat object. This is the “chat context” that these agents will perform their operations in, exchanging generated content back and forth until they hit a termination condition. The terminiation condition is defined within this AgentGroupChat object as such:

TerminationStrategy =
    new ApprovalTerminationStrategy()
    {
        // Only the art-director may approve.
        Agents = [agentReviewer],
        // Limit total number of turns
        MaximumIterations = 10
    }

The AgentReviewer is given the power to end the chat, and we also specify a maximum number of attempts within the AgentGroupChat we’re willing to entertain. This part is important, as a chat like this could potentially go on forever if the termination condition is never met, consuming all your OpenAI API credits in the meantime :)

It’s worth noting that the ApprovalTerminationStrategy is actually a class of type TerminationStrategy that has a single ShouldAgentTerminateAsync method. The method returns true/false based on a condition (in the example’s case, it simply checks to see if the AgentReviewer returned the string “approve” in its most recent output).

With that set up, we throw a chat message into the chat, and let the agents do their thing!

# user: concept: maps made out of egg cartons.

# Assistant - CopyWriter: "Discover new terrain with our egg carton maps: where every bump is a mountain, and every divot is a valley—scrambled, but never lost."

# Assistant - ArtDirector: This copy needs refinement. While the creative concept is intriguing, the message should be clearer and more focused on the product's unique value proposition. It should emphasize the innovative use of egg cartons while conveying a sense of adventure or discovery. Avoid overly whimsical language that might dilute the main idea. Aim for a balance of creativity and clarity, ensuring that the benefits of these maps are evident to the reader.

# Assistant - CopyWriter: "Explore uncharted territories with our egg carton maps: sustainable, tactile, and oddly geographic—because adventure shouldn’t come in a standard shape."

# Assistant - ArtDirector: This copy is close, but it could be improved. While it introduces sustainability and the unique aspect of the product, the phrase "oddly geographic" may confuse readers instead of enticing them. Focus on more evocative language that highlights the tactile and imaginative nature of the maps without straying into ambiguity. Strive for a compelling call to action that makes the reader eager to experience these maps.

# Assistant - CopyWriter: "Navigate the unexpected with our egg carton maps: a sustainable twist on exploration, where every contour invites a new adventure."

# Assistant - ArtDirector: Approved. This copy effectively communicates the unique and sustainable aspect of the product while inspiring a sense of adventure. The language is inviting and aligns well with the creative concept.

[IS COMPLETED: True]

It’s interesting to watch the two agents interact as if they’re responding to a user’s input in a normal chat completion context. It also further reinforces some of my findings working on the Generative AI Game Master project, where having small, specific prompts is more consistent and higher quality than trying to achieve multiple things in a single prompt.

The downside of this approach, of course, is the potentially large number of calls to pay-per-use APIs this requires. You could run locally with smaller LLMs or on a beefier system of course, but it’s worth remembering that each agent interaction includes all prior chat history, so as the agents do their thing each individual API call gets more and more expensive (monetarily, if not computationally).

generative-ai