Custom LLMs

Anam supports integration with custom Large Language Models (LLMs), allowing you to use your own models while benefiting from Anam’s persona, voice, and streaming infrastructure.

Custom LLMs are processed directly from Anam’s servers, reducing latency and simplifying your integration. All API credentials you provide are encrypted at rest using AES-256.

Other Ways to Use Custom LLMs

This page covers server-side custom LLMs where Anam handles the LLM calls for you. There are other integration patterns:

Custom LLM (client-side) — Handle LLM calls yourself in your client code and stream responses to the persona
ElevenLabs Agents — Use ElevenLabs Conversational AI as your LLM + TTS provider with an Anam avatar
LiveKit — Use Anam avatars as a face layer in your existing LiveKit agent pipeline

How Custom LLMs Work

When you create a custom LLM configuration in Anam:

Model Registration: You register your LLM details with Anam, including the model endpoint and authentication credentials
Server-Side Processing: Anam handles all LLM calls from our servers, reducing latency and complexity
Secure Storage: Your API keys and credentials are encrypted and securely stored
Integration: Use your custom LLM ID in place of Anam’s built-in models

Creating a Custom LLM

To create a custom LLM, you’ll need to:

Register your LLM configuration through the Anam API or dashboard
Provide the necessary connection details (endpoint, API keys, model parameters)
Receive a unique LLM ID for your custom model
Use this ID when creating session tokens

Custom LLM creation API endpoints and dashboard features are coming soon. Contact support@anam.ai for early access.

Supported LLM Specifications

Anam supports custom LLMs that comply with one of the following API specifications:

OpenAI API Specification - Compatible with OpenAI’s chat completion endpoints
Azure OpenAI API Specification - Compatible with Azure’s OpenAI service endpoints
Gemini API Specification - Compatible with Google’s Gemini API endpoints
Groq OpenAI API Specification - Compatible with Groq’s API endpoints

Your custom LLM must support streaming responses. Non-streaming LLMs will not work with Anam’s real-time persona interactions.

Specifying Multiple Endpoints

Anam allows you to specify multiple endpoints per LLM. The Anam backend will automatically route to the fastest available LLM from the data centre where the Anam engine is running, and fallback to other endpoints in the case of an error.

To ensure routing selects the fastest available endpoint, Anam may occasionally send small probe prompts to your configured endpoints. These only occur while sessions are active for that LLM, and are lightweight—around 1500 tokens in size. Probes are infrequent (a few times per hour at most), have no effect on active conversations, and exist solely to maintain reliable performance.

Technical Requirements

API Compatibility

Your LLM server must implement one of the supported API specifications mentioned above. This includes:

Matching the request/response format
Supporting the same authentication methods
Implementing compatible endpoint paths

Streaming Support

Enable streaming responses in your LLM implementation: - Return responses with stream: true support - Use Server-Sent Events (SSE) for streaming chunks - Include proper content types and formatting

Validation Testing

When you add your LLM in the Anam Lab, automatic tests verify:

API specification compliance
Streaming functionality
Response format compatibility
Authentication mechanisms

The Lab will provide feedback if your LLM doesn’t meet the requirements, helping you identify what needs to be fixed.

Testing Tip: We recommend using curl commands to compare your custom LLM’s raw HTTP responses with those from the actual providers (OpenAI, Azure OpenAI, or Gemini). Client libraries like the OpenAI SDK often transform responses and extract specific values, which can mask differences in the actual HTTP response format. Your custom implementation must match the raw HTTP response structure, not the transformed output from client libraries.

Example Custom LLM Endpoints

If you’re building your own LLM server, ensure your endpoints match one of these patterns:

POST /v1/chat/completions
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY

{
"model": "your-model-name",
"messages": [...],
"stream": true
}

Using Custom LLMs

Once you have your custom LLM ID, use it when requesting session tokens:

const response = await fetch('https://api.anam.ai/v1/auth/session-token', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${process.env.ANAM_API_KEY}`
  },
  body: JSON.stringify({
    personaConfig: {
      name: 'Sebastian',
      avatarId: '30fa96d0-26c4-4e55-94a0-517025942e18',
      voiceId: '6bfbe25a-979d-40f3-a92b-5394170af54b',
      llmId: 'your-custom-llm-id', // Your custom LLM ID
      systemPrompt: 'You are a helpful customer service representative.',
    },
  })
});

const { sessionToken } = await response.json();

Available Built-in LLM IDs

These built-in models are available as LLM IDs:

LLM ID	Description	Best For
`a7cf662c-2ace-4de1-a21e-ef0fbf144bb7`	GPT OSS 120B	Recommended for new projects
`ANAM_LLAMA_v3_3_70B_V1`	Llama 3.3 70B	Great conversationalist, very fast. Not for tool calls or knowledge base
`b4f89001-9638-4879-a9c3-02cc9f9f2004`	GPT 4.1	Slow but capable all-rounder
`CUSTOMER_CLIENT_V1`	Client-side LLM	When you only use .talk() commands to speak

Older llmIds — GPT-4.1 Mini (0934d97d-0c3a-4f33-91b0-5e136a0ef466), Gemini 3 Flash Preview (27cbd128-f1e6-4b67-8ab3-9123659be08c), Gemini 2.5 Flash (9d8900ee-257d-4401-8817-ba9c835e9d36), and Kimi k2 (88190a76-3e87-4935-ab39-f4f73038815a) — continue to work for existing configurations but are no longer recommended for new projects.

Security Considerations

Encryption at Rest: All API keys and credentials are encrypted using AES-256 before storage.

Secure Transmission: Credentials are transmitted over TLS 1.3 and never exposed in logs or responses.

Access Control: Only your account can use your custom LLM configurations.

Benefits of Server-Side Processing

By processing custom LLMs from Anam’s servers:

Reduced Latency: Direct server-to-server communication eliminates client-side round trips
Simpler Client Code: No need to manage LLM connections in your client application
Integrated Streaming: Your custom LLM works with Anam’s voice and video streaming
Credential Security: API keys stay on the server, never exposed to client-side code
Automatic Scaling: Anam handles load balancing and scaling

Using LLMs with reasoning

LLMs that have reasoning enabled will produce separate reasoning messages in addition to the spoken text messages made by the persona. These messages contain the reasoning used by the LLM when forming the response.

Currently only OpenAI spec LLMs support reasoning messages (e.g. OpenAI, Azure OpenAI and Groq OpenAI). For best performance we suggest using the reasoning models provided by Groq.

How Reasoning Messages Work

User makes a request

User: “Show me the pricing page”

LLM produces reasoning response prior to main response

{
  ...
  "reasoning": "The user has requested to see the pricing page, I need to call the pricing page tool and respond to the user"
  ...
}

SDK emits event

The Anam SDK emits a REASONING_HISTORY_UPDATED event that your application can handle.

Your app handles the event

Each REASONING_HISTORY_UPDATED event contains the full history of reasoning messages. Alternatively, you can listen for REASONING_STREAM_EVENT_RECEIVED which streams updates in chunks, but you will need to handle aggregating the messages yourself.

import { AnamEvent, ReasoningMessage } from '@anam-ai/js-sdk';

// Option 1: Full history on each update
client.addListener(
  AnamEvent.REASONING_HISTORY_UPDATED,
  (messages: ReasoningMessage[]) => {
    updateReasoningMessageHistory(messages);
  }
);

import { AnamEvent, ReasoningStreamEvent } from '@anam-ai/js-sdk';

// Option 2: Streaming updates (requires manual aggregation)
client.addListener(
  AnamEvent.REASONING_STREAM_EVENT_RECEIVED,
  (event: ReasoningStreamEvent) => {
    setReasoningHistory((previousMessages) => {
      const lastMessage = previousMessages[previousMessages.length - 1];

      // Handle streamed thoughts - append to existing message
      if (lastMessage && lastMessage.id === event.id) {
        const updatedMessages = [...previousMessages];
        updatedMessages[updatedMessages.length - 1] = {
          ...lastMessage,
          content: lastMessage.content + ' ' + event.content,
        };
        return updatedMessages;
      }

      // Handle new messages
      return [
        ...previousMessages,
        {
          content: event.content,
          id: event.id,
        },
      ];
    });
  }
);

Next Steps

Cookbook: Custom LLM (Client-Side)

Tutorial for integrating your own LLM on the client side

Cookbook: Python BYO LLM

Bring your own LLM using the Python SDK

Personas with Custom LLMs

Learn how personas work with custom language models

Setup in Anam Lab

Configure a custom LLM in the Anam Lab

Documentation Index

​Custom LLMs

​Other Ways to Use Custom LLMs

​How Custom LLMs Work

​Creating a Custom LLM

​Supported LLM Specifications

​Specifying Multiple Endpoints

​Technical Requirements

​Example Custom LLM Endpoints

​Using Custom LLMs

​Available Built-in LLM IDs

​Security Considerations

​Benefits of Server-Side Processing

​Using LLMs with reasoning

​How Reasoning Messages Work

​Next Steps

Cookbook: Custom LLM (Client-Side)

Cookbook: Python BYO LLM

Personas with Custom LLMs

Setup in Anam Lab

Custom LLMs

Other Ways to Use Custom LLMs

How Custom LLMs Work

Creating a Custom LLM

Supported LLM Specifications

Specifying Multiple Endpoints

Technical Requirements

Example Custom LLM Endpoints

Using Custom LLMs

Available Built-in LLM IDs

Security Considerations

Benefits of Server-Side Processing

Using LLMs with reasoning

How Reasoning Messages Work

Next Steps