Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.xpander.ai/llms.txt

Use this file to discover all available pages before exploring further.

Contact our team to unlock self-hosted locations on your account before starting.

Prerequisites

RequirementMinimum version
Kubernetes1.20
Helm3.12
Ingress controllerNGINX Ingress Controller or equivalent
Storage classFor persistent volumes (Redis, PostgreSQL)
TLScert-manager, or manually-managed certificates
You also need an environment set up in the Xpander Console to get your organizationId, environmentId, and deploymentManagerApiKey. Create one at app.xpander.ai/environments.

Architecture

The Helm chart deploys eight application services plus two data stores:
ComponentRole
Agent ControllerMain API endpoint, orchestrates agent execution
AI GatewayRoutes LLM provider requests (OpenAI, Anthropic, etc.)
Agent WorkerTask execution runtime that invokes tools and processes steps
MCPModel Context Protocol server for tool exposure
ChatWeb chat UI backed by Chainlit
Code RunnerSandboxed environment for code execution tools
AWS OperatorManages AWS-specific integrations
APIPublic REST API surface
RedisCache and session state
PostgreSQLPersistent data store for agents, threads, tasks, and memory
In addition, the chart enables an in-cluster Docker registry and a metrics-server by default. Disable them with dockerRegistry.enabled=false and metricsServer.enabled=false if your cluster already provides equivalents. When you set a domain and enable ingress, the chart creates a hostname for each application service:
  • agent-controller.{domain} (SDK and REST calls)
  • ai-gateway.{domain} (LLM provider routing)
  • mcp.{domain}, chat.{domain}, agent-worker.{domain}, code-runner.{domain}, aws-operator.{domain}, api.{domain} (internal traffic; you typically only call the first two)

Networking: outbound only

Your cluster makes outbound connections to Xpander. Xpander never initiates inbound connections to your cluster. Outbound destinations (HTTPS, port 443):
166.117.85.46
15.197.85.80
Configure your firewall to allow egress from the cluster to these IPs on port 443. All data stays inside your infrastructure.

Install with Helm

Add the Helm repository:
helm repo add xpander https://charts.xpander.ai
helm repo update
Install with the IDs from your Xpander Console environment:
helm upgrade --install xpander xpander/xpander \
  --namespace xpander --create-namespace \
  --set global.organizationId=<your-org-id> \
  --set global.environmentId=<your-env-id> \
  --set secrets.static.deploymentManagerApiKey=<your-deployment-api-key>
To expose the deployment via ingress, add ingress.enabled=true and a domain:
helm upgrade --install xpander xpander/xpander \
  --namespace xpander --create-namespace \
  --set ingress.enabled=true \
  --set domain=xpander.my-company.com \
  --set global.organizationId=<your-org-id> \
  --set global.environmentId=<your-env-id> \
  --set secrets.static.deploymentManagerApiKey=<your-deployment-api-key>

Production install with LLM keys

For production, add your LLM provider keys and use a values file for cert-manager, storage, and resource limits:
# xpander-values.yaml
domain: "xpander.production.com"

global:
  organizationId: "<your-org-id>"
  environmentId: "<your-env-id>"
  env:
    LOG_LEVEL: "info"
    ENVIRONMENT: "production"

secrets:
  static:
    deploymentManagerApiKey: "<your-deployment-api-key>"

agent-worker:
  env:
    AGENTS_OPENAI_API_KEY: "sk-..."
    ANTHROPIC_API_KEY: "sk-ant-..."

ingress:
  enabled: true
  tls:
    enabled: true
    source: "cert-manager"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"

resources:
  agentController:
    limits:
      cpu: "1000m"
      memory: "1Gi"

redis:
  storage:
    size: "32Gi"
    storageClass: "fast-ssd"
helm upgrade --install xpander xpander/xpander \
  --namespace xpander --create-namespace \
  --values xpander-values.yaml
Store LLM keys as Kubernetes secrets in production. Create a secret with kubectl create secret generic ai-service-keys --from-literal=openai-api-key=... --from-literal=anthropic-api-key=..., then reference it in the values file using envFromSecretKeys.

Configuration reference

Required parameters

ParameterDescription
global.organizationIdYour Xpander organization ID
global.environmentIdYour Xpander environment ID
secrets.static.deploymentManagerApiKeyDeployment manager API key

Common optional parameters

ParameterDefaultDescription
domain"" (ingress disabled)Base domain for ingress hostnames
ingress.enabledfalseExpose services via ingress
ingress.tls.enabledfalseEnable TLS on ingress
ingress.tls.sourceself-signedself-signed, cert-manager, or external
agent-worker.env.AGENTS_OPENAI_API_KEY""OpenAI API key
agent-worker.env.ANTHROPIC_API_KEY""Anthropic API key
redis.storage.size8GiRedis PVC size
redis.storage.storageClasscluster defaultStorage class for Redis PVC

Verify the deployment

Check pod status:
kubectl -n xpander get pods
You should see one pod for each application service (agent-controller, ai-gateway, agent-worker, mcp, chat, code-runner, aws-operator, api) plus redis and postgres StatefulSet pods. With default settings the in-cluster docker-registry and metrics-server pods also run. Test the public health endpoints through port-forwarding:
kubectl -n xpander port-forward service/xpander-agent-controller 9016:9016 &
kubectl -n xpander port-forward service/xpander-ai-gateway 9018:9018 &

curl http://localhost:9016/health
curl http://localhost:9018/health
If ingress is enabled, test through the hostnames:
curl https://agent-controller.xpander.my-company.com/health
curl https://ai-gateway.xpander.my-company.com/health
Then return to app.xpander.ai/environments, click Complete setup on your environment, and confirm the status changes to Connected.

Connect the SDK to your deployment

Point the SDK at your Agent Controller hostname and use the Agent Controller API key (not your Xpander cloud key):
from xpander_sdk import Backend, Configuration
from agno.agent import Agent

config = Configuration(
    api_key="<agent-controller-api-key>",
    organization_id="<your-org-id>",
    base_url="https://agent-controller.xpander.my-company.com",
)

backend = Backend(configuration=config)
agent = Agent(**backend.get_args(agent_id="<agent-id>"))

result = await agent.arun(input="What can you help me with?")
Use the Agent Controller hostname, not the root domain. The SDK needs base_url=https://agent-controller.{domain}. The root domain will not resolve to the API.

Upgrade

Pull the latest chart and upgrade in place, reusing your existing values:
helm repo update
helm upgrade xpander xpander/xpander \
  --namespace xpander \
  --reuse-values
To change configuration during an upgrade, use the values file:
helm upgrade xpander xpander/xpander \
  --namespace xpander \
  --values xpander-values.yaml

Troubleshoot

Usually a storage class or PVC issue. Check PVC status:
kubectl -n xpander get pvc
kubectl -n xpander describe pvc
Confirm your cluster has a default storage class, or set redis.storage.storageClass and the PostgreSQL equivalent explicitly.
Check the ingress is created and pointing to services:
kubectl -n xpander describe ingress
Verify your DNS records point to the ingress controller’s external IP and that the controller itself is running (it usually lives in its own namespace, e.g. ingress-nginx).
Check component logs:
kubectl -n xpander logs deployment/xpander-agent-controller
kubectl -n xpander logs deployment/xpander-ai-gateway
kubectl -n xpander logs deployment/xpander-agent-worker
The Agent Controller needs to reach Xpander’s outbound IPs on port 443. If egress is blocked, health checks fail.
Verify the keys are reaching the agent-worker pod:
kubectl -n xpander exec deployment/xpander-agent-worker -- env | grep API_KEY
If missing, re-run the upgrade with --set agent-worker.env.AGENTS_OPENAI_API_KEY=... or check that your secret is correctly mounted through envFromSecretKeys.

Uninstall

helm uninstall xpander --namespace xpander
kubectl delete namespace xpander
Deleting the namespace also removes PVCs and permanently deletes stored data. Back up PostgreSQL first if needed.

What’s next

Monitor Runs

Trace execution, debug failures, and review AI performance

SDK Configuration

Full SDK configuration reference for self-hosted deployments.