Cluster and Pod Issues
Pods stuck in Init
Pods stuck in Init
The init containers wait for Common cause: Cannot reach the deployment manager. If using PrivateLink, verify the DNS and security groups are configured correctly — see Configure PrivateLink.
agent-controller to be ready. Check its logs:Pods stuck in Pending
Pods stuck in Pending
Cause: Storage class or PVC issues.Check that a default StorageClass exists and the EBS CSI driver is running:If using EKS, ensure the node role has
AmazonEBSCSIDriverPolicy attached.exec format error
exec format error
You are running on ARM/Graviton nodes. xpander images are amd64 only. Switch to x86 instance types (
t3, m5, c5, etc.).Insufficient CPU
Insufficient CPU
The
agent-worker pod requests 2 CPU by default. Options:- Add more nodes or use larger instances
- For non-production environments only:
Health checks failing
Health checks failing
Check application logs:
PrivateLink Issues
PrivateLink InvalidServiceName (cross-region)
PrivateLink InvalidServiceName (cross-region)
When creating a VPC endpoint to the xpander service from any region other than Without
us-west-2, you must include --service-region us-west-2:--service-region, AWS looks for the service in your local region and fails with InvalidServiceName.PrivateLink connection timeout (HTTP 000)
PrivateLink connection timeout (HTTP 000)
Check the security group on the VPC endpoint allows inbound TCP 443 from your VPC CIDR:Also verify the private DNS hosted zone and alias record were created correctly — see Configure PrivateLink.
Ingress and Networking Issues
Ingress not accessible
Ingress not accessible
Verify ingress configuration:Check that the NLB was provisioned and DNS CNAME records point to it:
Load Balancer not provisioning
Load Balancer not provisioning
If using EKS Auto Mode, ensure the cluster role trust policy includes
sts:TagSession:SSL certificate errors on chat URLs
SSL certificate errors on chat URLs
The ACM certificate must include
*.chat.<DOMAIN> as a subject alternative name. The chat UI generates per-thread subdomains (e.g., moccasin-prawn.chat.<DOMAIN>) that are not covered by *.<DOMAIN>.Request a new certificate with:API Key Issues
API key not being picked up after helm upgrade
API key not being picked up after helm upgrade
The See the secret field name mapping for all key names.
xpander-static secret has a Helm resource keep policy. Set the key directly:API key configuration — checking current values
API key configuration — checking current values

