Custom LLMs
Anam supports integration with custom Large Language Models (LLMs), allowing you to use your own models while benefiting from Anam’s persona, voice, and streaming infrastructure.Custom LLMs are processed directly from Anam’s servers, reducing latency and
simplifying your integration. All API credentials you provide are
encrypted at rest using AES-256.
Other Ways to Use Custom LLMs
This page covers server-side custom LLMs where Anam handles the LLM calls for you. There are other integration patterns:- Custom LLM (client-side) — Handle LLM calls yourself in your client code and stream responses to the persona
- ElevenLabs Agents — Use ElevenLabs Conversational AI as your LLM + TTS provider with an Anam avatar
- LiveKit — Use Anam avatars as a face layer in your existing LiveKit agent pipeline
How Custom LLMs Work
When you create a custom LLM configuration in Anam:- Model Registration: You register your LLM details with Anam, including the model endpoint and authentication credentials
- Server-Side Processing: Anam handles all LLM calls from our servers, reducing latency and complexity
- Secure Storage: Your API keys and credentials are encrypted and securely stored
- Integration: Use your custom LLM ID in place of Anam’s built-in models
Creating a Custom LLM
To create a custom LLM, you’ll need to:- Register your LLM configuration through the Anam API or dashboard
- Provide the necessary connection details (endpoint, API keys, model parameters)
- Receive a unique LLM ID for your custom model
- Use this ID when creating session tokens
Custom LLM creation API endpoints and dashboard features are coming soon.
Contact support@anam.ai for early access.
Supported LLM Specifications
Anam supports custom LLMs that comply with one of the following API specifications:- OpenAI API Specification - Compatible with OpenAI’s chat completion endpoints
- Azure OpenAI API Specification - Compatible with Azure’s OpenAI service endpoints
- Gemini API Specification - Compatible with Google’s Gemini API endpoints
- Groq OpenAI API Specification - Compatible with Groq’s API endpoints
Specifying Multiple Endpoints
Anam allows you to specify multiple endpoints per LLM. The Anam backend will automatically route to the fastest available LLM from the data centre where the Anam engine is running, and fallback to other endpoints in the case of an error.To ensure routing selects the fastest available endpoint, Anam may occasionally send small probe prompts to your configured endpoints. These only occur while sessions are active for that LLM, and are lightweight—around 1500 tokens in size. Probes are infrequent (a few times per hour at most), have no effect on active conversations, and exist solely to maintain reliable performance.
Technical Requirements
API Compatibility
Your LLM server must implement one of the supported API specifications mentioned above. This includes:
- Matching the request/response format
- Supporting the same authentication methods
- Implementing compatible endpoint paths
Streaming Support
Enable streaming responses in your LLM implementation: - Return responses with
stream: true support - Use Server-Sent Events (SSE) for streaming chunks -
Include proper content types and formattingTesting Tip: We recommend using
curl commands to compare your custom
LLM’s raw HTTP responses with those from the actual providers (OpenAI, Azure
OpenAI, or Gemini). Client libraries like the OpenAI SDK often transform
responses and extract specific values, which can mask differences in the
actual HTTP response format. Your custom implementation must match the raw
HTTP response structure, not the transformed output from client libraries.Example Custom LLM Endpoints
If you’re building your own LLM server, ensure your endpoints match one of these patterns:Using Custom LLMs
Once you have your custom LLM ID, use it when requesting session tokens:Security Considerations
Encryption at Rest: All API keys and credentials are encrypted using
AES-256 before storage.
Secure Transmission: Credentials are transmitted over TLS 1.3 and never
exposed in logs or responses.
Access Control: Only your account can use your custom LLM configurations.
Benefits of Server-Side Processing
By processing custom LLMs from Anam’s servers:- Reduced Latency: Direct server-to-server communication eliminates client-side round trips
- Simpler Client Code: No need to manage LLM connections in your client application
- Integrated Streaming: Your custom LLM works with Anam’s voice and video streaming
- Credential Security: API keys stay on the server, never exposed to client-side code
- Automatic Scaling: Anam handles load balancing and scaling
Using LLMs with reasoning
LLMs that have reasoning enabled will produce separatereasoning messages in addition to the spoken text messages made by the persona. These messages contain the reasoning used by the LLM when forming the response.
How Reasoning Messages Work
SDK emits event
The Anam SDK emits a REASONING_HISTORY_UPDATED event that your application can handle.
