Settings
Generative AI Models & GPT Models
You can choose one of multiple generative AI models. In general we highly recommend Azure GPT-4.1 Mini for most use cases. If you need a higher quality response, you can choose Azure GPT-4.1.
| Title | Region | Price Level | Price In (per 1M tokens) | Price Out (per 1M tokens) |
|---|
Custom API Keys
When using a large language model, you can enter your own API key to route the API calls to your account.
When using an Azure model, you can also add custom endpoint information. In this case you not only need to enter an API key, but also the base URL and deployments. See the article Azure GPT for further details, how to set up your Azure GPT model endpoints.
AI Rate Limiting
To prevent abuse and control costs, you can restrict the number of AI requests a user can send within a certain timeframe.
📍 Where to find it: Go to Knowledge > Settings tab and scroll down to the AI Settings section.
Configuration: You can set a rate limit based on IP address per hour or by Tenant ID per minute.
- Available Options: 30, 60, 360, or 3,600 requests per hour.
- Behavior: Once the limit is reached, the user will see a "Too many requests" message in the chat header (Error Code 429) and cannot send further messages until the timer resets.
💡 Recommendation: For standard agents, lower limits are often sufficient. However, if you are using the AI Prompt Module or performing intensive testing, we recommend increasing the limit to 360 requests per hour or higher to avoid interruptions.
Monitoring Costs: You can view the specific number of used tokens and the resulting costs in the Knowledge > Message tab.
Self-Hosted Models
If you want to host your own model, you can use Ollama or vLLM to connect to your model hosting.
Ollama
Ollama is an open source project that serves as a platform for running LLMs on your own infrastructure. You can use the Ollama to run and connect to your own model. You can find more information about Ollama on the Ollama website.
vLLM
vLLM is an open source library for LLM inference and serving. vLLM includes an OpenAI-compatible API that can be integrated with LoyJoy by selected the OpenAI compatible model and entering your vLLM endpoint URL. You can find more information about vLLM in the vLLM repository.
Llama-3 70b: Generative AI 100% Made in Germany 🇩🇪
The integration of Meta's open-weight model Llama-3 70b into the LoyJoy Platform offers a new LLM which is hosted by our German partner primeLine AI (part of the primeLine Group) in Limburg an der Lahn, Germany and ensures that everything is 100% made in Germany.
Key Features
- Hosted in an ISO 27001 certified data center of partimus GmbH in Limburg an der Lahn (part of the primeLine Group)
- Privacy focused: Data stays in Germany
- No Azure OpenAI hosting required
- 100% Made in Germany
- As always, full GDPR compliance
Interested? Contact us for activation.