LLMInspect API User Guide

About This Guide

Welcome to the LLMInspect API User Guide. This guide is designed to help users of all levels interact seamlessly with various language models using the LLMInspect API. With LLMInspect API, you can connect to multiple LLM providers, including OpenAI, Gemini, and your locally deployed InspectGPT (Local LLM). It is compatible with the OpenAI API format, making it easy for users familiar with OpenAI's API to get started quickly.

Key Features

Multi-LLM Support: Connect to multiple language model providers such as OpenAI, Gemini, and Local LLMs like InspectGPT.
OpenAI API Compatibility: LLMInspect follows the OpenAI Chat Completions format, allowing for a smooth transition if you're already using OpenAI's API.
Supports Chat Completions and Image Generation: Interact with language models for chat completions and generate images using supported models.

By default, all requests sent through the LLMInspect API are directed to OpenAI. To interact with Gemini or InspectGPT, you can specify the desired provider by using the appropriate headers, which will be explained in detail later in this guide.

Authentication

To use the LLMInspect API, proper authentication is required. Your API access can be authenticated in two different ways, as explained in the diagram below:

1. Using Your Own Subscription Key

You can use your own subscription key issued by public model providers (e.g., OpenAI, Gemini, etc.). Include the key in the HTTP request headers using the following format:

Authorization: Bearer {MODEL_KEY}

Obtaining an OpenAI API Key

To obtain an OpenAI API key, visit the OpenAI API Keys page. The OpenAI API key typically has the following format:

sk-***********************

Obtaining a Gemini API Key

To obtain a Gemini API key, visit the Gemini API documentation. The Gemini API key usually has the following format:

AIzaSy************************

Local LLM Key

For accessing a local LLM like InspectGPT, you may need a specific key depending on your deployment configuration. Please contact your system administrator for details on obtaining your local LLM key and its format.

2. Using LLMInspect API Token

Alternatively, you can use an API token issued by the LLMInspect authentication service to access both public and private model providers. Include the token in the HTTP request headers using the following format:

Authorization: Bearer {ID_TOKEN}

Using the LLMInspect API token allows for secure interaction with the API across various models without needing individual keys from each provider.

For Admin: Obtaining LLMInspect API Token

Admin can generate LLMInspect API token and provide them to the employees so they can have flawless access to API across all the models.

Use the following curl command to request a token, and replace the placeholder values with your organization's credentials:

curl -X POST "https://your_domain/realms/InspectChat/protocol/openid-connect/token" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "client_id=your_client_id" \
-d "client_secret=your_client_secret" \
-d "username=your_username" \
-d "password=your_password" \
-d "grant_type=password"

On success, the server returns a JSON response containing an access_token, along with other important fields:

{
  "access_token": "x.x.x",
  "expires_in": 300,
  "refresh_expires_in": 1800,
  "refresh_token": "x.x.x",
  "token_type": "Bearer",
  "scope": "profile email"
}

Explanation of Key Fields

access_token: The main token used for authenticating API requests.
expires_in: The duration (in seconds) until the access_token expires. In this example, the token is valid for 300 seconds (5 minutes).
refresh_expires_in: The duration (in seconds) until the refresh_token expires, allowing token renewal without reauthentication.
refresh_token: Used to renew the access_token, avoiding the need for a full reauthentication.
token_type: Indicates the type of token, generally Bearer.
scope: Lists the authorized scopes for this token, such as profile and email access.

Note for Admins: The refresh_token can be used to renew the access_token before expiration.

API Usage

To perform requests with the LLMInspect API, use the following base URL:

https://{llminspect_domain}/v1/chat/completions

By default, all requests are routed to OpenAI. You can, however, switch between different models by changing the headers or model key. Ensure that you specify the correct model in the request body, as shown in the following example.

Accessing Private / Local LLMs

To access locally deployed models (e.g. EUNOMATIX’s InspectGPT), set the X-Client-Id header to InspectGPT. Use the appropriate model and key based on your local deployment. For example, if you are using Mistral as your local LLM, the model would be mistral-tiny.

Example header for local LLM requests:

X-Client-Id: 'InspectGPT'

Accessing OpenAI Models

By default all the requests to LLMInspect are route to OPENAI but to be explicit you can set the X-Client-Id header to OpenAI.

X-Client-Id: 'OpenAI'

You can interact with any text-compatible OpenAI model such as:

gpt-4o
gpt-4o-mini
gpt-4-turbo
gpt-4
gpt-3.5-turbo

Here's an example of a request body for making a chat completion request:

{
  "messages": [
    {
      "role": "user",
      "content": "Hi, how are you?"
    }
  ],
  "stream": true,
  "model": "gpt-3.5-turbo",
  "temperature": 0.5,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "top_p": 1
}

Stream Mode: If stream is set to true, the API will return the response in chunks.
Non-Stream Mode: If stream is set to false, you will receive the full response at once.

Note: For more information on request formatting, you can refer to the OpenAI API documentation.

Accessing OpenAI Image Models

By default all the requests to LLMInspect are route to OPENAI but to be explicit you can set the X-Client-Id header to OpenAI.

X-Client-Id: 'OpenAI'

To create the images you have make request on the following endpoint.

https://{llminspect_domain}/v1/images/generations

You can access the OpenAI supported image models. Such as:

dall-e-3
dall-e-2

Here's an example of a request body for making a image generation request:

{
    "model": "dall-e-3",
    "prompt": "A cute baby sea otter",
    "n": 1,
    "size": "1024x1024"
  }

Size: The size of the generated images. Must be one of 256x256, 512x512, or 1024x1024 for dall-e-2. Must be one of 1024x1024, 1792x1024, or 1024x1792 for dall-e-3 models.

Note: For more information on request formatting, you can refer to the OpenAI Image Generation API Docs.

Accessing Google Gemini Models

To interact with Gemini models, modify the request by setting the X-Client-Id header to Gemini and use the correct Gemini model in the request body. If you are using a model-specific key, update the Authorization header accordingly.

Example header for Gemini requests:

X-Client-Id: 'Gemini'

Supported Gemini models include:

gemini-1.5-flash
gemini-1.5-flash-8b
gemini-1.5-pro
gemini-1.0-pro

Note: Vision models are not yet supported via the API.

Example Usage

Generating images with Dall-E 3

curl -X POST "https://{llminspect_domain}/v1/images/generations" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {YOUR_ACCESS_TOKEN}" \
-H "X-Client-Id: 'OpenAI'" \
-d '{
"model": "dall-e-3",
"prompt": "A cute baby sea otter",
"n": 1,
"size": "1024x1024"
}'

Explanation:

https://{llminspect_domain}/v1/images/generations: Replace {llminspect_domain} with the actual domain for your LLMInspect API.
Authorization Header: Replace {YOUR_ACCESS_TOKEN} with your valid OpenAI Key or LLMInspect API Token for API authentication.
X-Client-Id Header: Specifies OpenAI as the provider.
Request Body: The JSON body contains the model, prompt, n, and size fields for image generation.

This command will send a request to generate one 1024x1024 image based on the given prompt. The response will provide you with the url of the image.

Generating Code With GPT-4o

For generating code with the gpt-4o model via the LLMInspect API, Here's a curl command. This command includes the X-Client-Id header to specify OpenAI, the Bearer token for authorization, and a structured prompt aligned for code generation.

curl -X POST "https://{llminspect_domain}/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer {YOUR_ACCESS_TOKEN}" \
-H "X-Client-Id: 'OpenAI'" \
-d '{
  "model": "gpt-4o",
  "messages": [
    {
      "role": "system",
      "content": "You are an AI coding assistant. Generate clean, efficient, and well-commented code as requested by the user."
    },
    {
      "role": "user",
      "content": "Write a Python function to find the factorial of a number using recursion."
    }
  ],
  "temperature": 0.3,
  "presence_penalty": 0,
  "frequency_penalty": 0,
  "top_p": 1
}'

Explanation:

https://{llminspect_domain}/v1/chat/completions: Replace {llminspect_domain} with the actual domain for your LLMInspect API.
Authorization Header: Replace {YOUR_ACCESS_TOKEN} with your OpenAI Key or LLMInspect API Token.
X-Client-Id Header: Specifies OpenAI as the provider.
Request Body:
- System Message: Sets the context for the assistant to produce code-oriented responses.
- User Message: Specifies the task, here asking for a Python function using recursion to calculate factorials.
- Model Parameters: temperature, presence_penalty, frequency_penalty, and top_p values are set to generate a balanced, consistent response.

This command will send a request to generate Python code based on the user's prompt, ensuring clarity and efficiency in the generated code.

API Access and LLMInspect Guardrails

LLMInspect offers powerful guardrails to ensure safe and secure interactions with language models. Based on the configurations set by your admin, your API requests shall be subject to these guardrails. If a request violates any of these guardrails, it may be blocked.

Supported Guardrails

DetectSecrets: This guardrail identifies and prevents the sharing of sensitive information, such as secrets or API keys, within a request.
DetectPII: Blocks the transmission of personally identifiable information (PII).
Sentiment Analysis: Monitors the sentiment or tone of messages and can flag or block negative or harmful content.
DetectUnusualPrompt: Flags any unusual or potentially harmful prompt requests that could lead to dangerous or unwanted outputs from the models.

Important: These guardrails are configured by your organization’s admin, and certain requests may be blocked based on these settings. For more information on the guardrails you can access the Security Guide.

REFERENCE

OpenAI API Key Management: OpenAI API Keys page
Gemini API Key Documentation: Gemini API documentation
OpenAI Chat API Reference: OpenAI Chat API Documentation
OpenAI Image Generation API Reference: OpenAI Image API Documentation