Semantic Kernel as a Prompt Wrapper

4 Sep 2024

Let’s be honest - there isn’t really a lot of tooling out there for C# developers when it comes to generative AI. It seems the entire space has revolved around the Python language (and with good reason). All the cool tooling for fine-tuning, running models locally, and response filtering seem to require Python if you want to get into the weeds. The sole exception to this seems to be Semantic Kernel, Microsoft’s answer to “how do we use generative AI in C# code?” I haven’t used it extensively, but I have started to play around with it, and one feature immediately caught my eye around their management of Chat Completion.

I’ve used generative AI in my C# applications primarily through either the Amazon Bedrock SDKs (to which they provide a .Net wrapper), or the OpenAI APIs via regular http requests. There’s now an official OpenAI SDK for .Net, but before that rolled out Microsoft had been pushing the Semantic Kernel library as the primary way to interface with generative AI in your C# applications.

The library is very “.net”, supporting dependency injection and configuration out of the box, and that’s one of the strengths I’d like to focus on today. For the last year or so I’ve been primarily using the Amazon Bedrock API to invoke generative models with on-demand hosting, which has worked pretty well. With the exception of a few arguments, you generally pass the same request body into the API and tell it which model to invoke via a model identifier string,and you get your response back.

Where the issue always lied for me was the fact that different generative models require slightly different prompt formats. As an example, specifying a system, assistant, and user message with some few-shot examples with Amazon Titan might look like this:

Respond to each user message with a rhyme.

User: Hello
Bot: Bellow
User: Goodbye
Bot: I cry
User: Farewell
Bot: 

Where the exact same prompt for Llama 3 looks like this:

<|begin_of_text|>
<|start_header_id|>system<|end_header_id|>
Respond to each user message with a rhyme.<|eot_id|>
<|start_header_id|>user<|end_header_id|>
Hello<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
Bellow<|eot_id|>
<|start_header_id|>user<|end_header_id|>
Goodbye<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>
I cry<|eot_id|>
<|start_header_id|>user<|end_header_id|>
Farewell<|eot_id|>
<|start_header_id|>assistant<|end_header_id|>

There’s nothing inherent in the Amazon Bedrock SDK to help me manage the differences between these two model formats; I either have to manage it manually through prompt templating, or consider writing my own library (which I almost did!)

Enter Semantic Kernel, and more specifically its concept of a ChatHistory object. The point of this object is to maintain the history of a chat conversation with a text or multimodal generative AI model, and as such provides methods such as .AddSystemMessage(string), .AddAssistantMessage(string), and .AddUserMessage(string). Now, the prompt templates above become code, like so:

var history = new ChatHistory();
history.AddSystemMessage("Respond to each user message with a rhyme.");
history.AddUserMessage("Hello");
history.AddAssistantMessage("Bellow");
history.AddUserMessage("Goodbye");
history.AddAssistantMessage("I cry");
history.AddUserMessage("Farewell");

Now we have a model-agnostic way of managing these conversations, which is particularly with few-shot prompting or when doing some A-B testing to determine which model serves your needs the best.

As an aside, it’s a bit unfortunate that they’ve named all these APIs around the concept of a Chat. While semantically correct, it’s not very obvious that these ‘chat completions’ are the main way to interface with text-based or multimodal generative AI models, or that they don’t have to be used for a genuine ‘back-and-forth-with-the-user’ convesation.

We can throw our history object towards a generative model of our choosing like so:

var builder = Kernel.CreateBuilder();
builder.AddOpenAIChatCompletion("gpt-4o-mini", 
    new OpenAIClient(new System.ClientModel.ApiKeyCredential("insert OpenAI API key here")));
var kernel = builder.Build();

var chatCompletionService = kernel.Services.GetRequiredService<IChatCompletionService>();
var response = await chatCompletionService.GetChatMessageContentAsync(history);

As you can imagine, you could replace the gpt-4o-mini string with any of the models hosted by the OpenAI API, or replace OpenAIClient altogether with any other hosted model client from HuggingFace or Azure. They’re even working on an adapter for Amazon Bedrock! The beauty is, if I change the model, I don’t have to rewrite my prompt to a new matching format; the library takes care of that for me.