Generative API Assistant - API-agnostic agent

8 Nov 2024

Extending some of our findings in conversational AI prompting, I’ve made a simple app that uses an OpenAPI spec file to walk an end-user through making an API call based on the available routes and request body defined within the specification. With this, we could theoretically support any well-defined API request!

OpenAPI is a specification that allows you to describe RESTful endpoints. It’s usually in YAML or JSON format, and looks a little something like this:

openapi: 3.0.1
info:
  title: PetStore API
  version: 1.0.0
servers:
  - url: http://localhost:5183
paths:
  /pets:
    get:
      description: Get all pets
      responses:
        '200':
          description: A list of pets
          content:
            application/json:
              schema:
                type: array
                items:
                  $ref: '#/components/schemas/Pet'
components:
  schemas:
    Pet:
      type: object
      properties:
        name:
          type: string
        category:
          type: string
        age:
          type: number
        color:
          type: string
        size:
          type: string

As you can see, the format allows you to specify a base URL, routes, verbs, and request & response types. This format is supported by a number of applications including popular developer tools like Postman/Newman, allowing you to read in the file and create web requests based on it.

Using this common file format, I thought it’d be neat to try to create an AI assistant that walked you through selecting and executing an API call! Here’s what a sample interaction looks like:

I can help you with two operations related to pets. You can either:

1. Retrieve a list of all pets available.
2. Add a new pet to the system.

Please let me know which operation you would like to perform!
> let's add a new pet

Great! Let's start by providing the details of your new pet.

First, what is the name of your pet?
> Molly

Thank you! Now, please tell me the category of your pet. It can be one of the following: dog, cat, fish, bird, reptile, rodent, or other.
> she is a dog

Got it! Molly is a dog.

Next, can you tell me her age in years?
> she'll be turning 12 in December

Great! So, Molly is 11 years old right now.

Now, what is her primary color? It can be black, white, yellow, brown, grey, green, blue, or other.
> she's mostly brown with a white underbelly

Thank you for that detail! We'll use "brown" for her primary color since that's the dominant color.

Finally, what is her size? Please choose from small, medium, or large.
> she's about 40 lbs

The API call to add your new pet was successful, and the server responded with an OK status. Your pet has been added! If you need any further assistance or have any other requests, feel free to let me know.

Step One - What routes are available?

To start with, I build a list of the routes that are available to the user by parsing the OpenAPI file. Luckily there’s a .NET library made by Microsoft that allows you to read in and parse the OpenAPI format, so I didn’t have to write any custom code.

using var streamReader = new FileStream("file.yaml", FileMode.Open);
var reader = new OpenApiStreamReader();
var document = await reader.ReadAsync(streamReader);

This gives us a nice and easy to use OpenApiDocument object that itemizes all the various properties and definitions within the file. Using this, I was able to build a system prompt outlining the functions that are available to the user:

"You are a helpful AI assistant, helping the user perform an API call. Here are the list of endpoints available for use:"
foreach(var path in document.OpenApiDocument.Paths)
{
    foreach(var operation in path.Value.Operations)
    {
        routeList.AppendLine($"- {operation.Key} {path.Key}: {operation.Value.Description}");
    }
}
"Give the user a summary of the operations you have access to, and ask them which operation they would like to perform.  Do not expose the raw API route.  Do not use markdown in your replies."

Step Two - What route did the user choose?

Based on the user’s input, we ask the generative AI model to determine which of those available routes the user would like to execute.

Identify which of the following API routes the user would like to use based on the operation they want to perform.
Here are a list of API routes available.
{routes}
Identify which route the user would like to use by returning a JSON object with two properties, the endpoint and the verb.  Do not include markdown.
User: {input}

I do this with a new ChatHistory object just so the generative model doesn’t get confused about what its directive is, and I’m also re-using the route list that I had built before. I’m also relying on OpenAI’s ChatResponseFormat.Json setting to coerce the model output to be a parsable JSON object.

Note that despite specifying the ChatResponseFormat.Json value, the response often still had regular text in it, or at the very least had the response wrapped with the triple-tick markdown format. I had to add the explicit ‘do not include markdown’ prompt to avoid that.

At this point we have a JSON object generated by the model representing the route and verb that the user chose, which should allow us to select the appropriate part of the OpenAPI spec to continue with.

dynamic jsonResponse = JsonConvert.DeserializeObject(findRoute.ToString());
var selectedPath = document.OpenApiDocument.Paths.FirstOrDefault(p => p.Key == jsonResponse.endpoint.ToString());
var selectedOperation = selectedPath.Value.Operations.FirstOrDefault(o => o.Key.ToString() == jsonResponse.verb.ToString());

Step Three - Fill out the request body

Once we have that selectedOperation object, we can determine if there’s a request body that needs to be filled out before we make the request. For Get methods we won’t need anything, but Put, Post, Patch, and other requests likely will. In the case that we do have a request body, we can use the generative AI model to ask the user what value(s) they want to include in the request.

var reference = selectedOperation.Value.RequestBody.Content["application/json"].Schema.Reference;
var schema = document.OpenApiDocument.ResolveReference(reference) as OpenApiSchema;

"You are a helpful AI assistant, assisting the user to make an API call. The API route they have selected requires a number of properties in the request body. The required properties are:"

foreach(var property in schema.Properties)
{
    propertyBuilder.AppendLine($"- {property.Key}, type {property.Value.Type}, description: {property.Value.Description}");
}

"Ask the user for a value for each of these properties, one question at a time."

This built out prompt will walk the user through a conversation, getting an answer for each property required in the request body. I have this in a do...while loop, checking after each user input to see if we’ve gathered enough information yet.

do
{
    var nextQuestion = await chatCompletionService.GetChatMessageContentAsync(propertyHistory);
    Console.WriteLine(nextQuestion);
    propertyHistory.AddAssistantMessage(nextQuestion.ToString());
    var input = Console.ReadLine();
    propertyHistory.AddUserMessage(input);
} while (!await HaveCompletedAllProperties(propertyHistory, propertyBuilder.ToString(), chatCompletionService));

public async Task<bool> HaveCompletedAllProperties(ChatHistory chatHistory, string propertyList, IChatCompletionService chatCompletionService)
{
    var localChatHistory = new ChatHistory(chatHistory);
    localChatHistory.AddSystemMessage($"""
        The required properties are:
        {propertyList}
        Referring to the user's input, have we got a satisfactory answer for all of the required properties?  Respond with only Yes or No.
        """);
    var response = await chatCompletionService.GetChatMessageContentAsync(localChatHistory);
    return response.ToString().ToLower().StartsWith("yes");
}

The cool thing here, just like in my earlier conversational AI post, is because we’re in the context of a conversation, the generative model can handle natural language and corrections.

Thank you! Next, what category does Molly belong to? The options are: dog, cat, fish, bird, reptile, rodent, or other.
> dog

Got it! Now, how old is Molly? Please provide her age in years.
> she's 11 years old

Thank you! Next, what is Molly's primary color? The options are: black, white, yellow, brown, grey, green, blue, or other.
> oh, sorry, she's actually a cat

No problem! So, Molly is a cat.

Let's move on to her age: you mentioned she is 11 years old.

Now, what is her primary color? The options are: black, white, yellow, brown, grey, green, blue, or other.

Finally, what is the dog's size? Please choose from small, medium, or large.
> she's about 40 lbs
# this ends up picking 'medium' without any other prompting

The ability to do things like this without any additional coding is one of the best things about taking advantage of generative AI!

Step Four - Make the request

Once we have all the required fields filled out, we can generate a request body and make the web request! I’m also using the ChatResponseFormat.Json option here as well.

Return a json object with the property names and their values from the user's responses.  Only return the JSON object, and do not include markdown.

I finally ask the generative AI to create a friendly message to the user based on the response from the API.

You are a helpful AI assistant who just made an API call on behalf of the user.
They were trying to {userOperation}.
The HTTP status code for the response we got back from the server was {response.StatusCode}.
Let the user know what happened.

In theory, we could throw any OpenAPI spec file at this program and have it work using the same algorithm! There might be some difficulty with more complicated request bodies, or with properties that require certain values (i.e. category in this pet example is more likely to be categoryId in a real-world API, where ‘dog’ corresponds with ‘1’), but a bit of extra effort should be able to handle most of those cases.