Document Management - xpander.ai

A knowledge base is a document collection an agent can query as part of its reasoning. The Workbench has a drag-and-drop UI for managing documents, but for anything programmatic (syncing from a CMS, batch-uploading from S3, refreshing a doc set on a schedule) you’ll want the SDK.

Knowledge base retrieval is wired in automatically only for Agno. For LangChain, OpenAI Agents SDK, and AWS Strands, you need to add the retriever as a tool manually. See the framework pages for details.

Prerequisites

Complete the Quickstart so the CLI, SDK, and xpander login are already set up.
Python 3.12+ for the local handler.

1. List your knowledge bases

Knowledge bases live at the organization level, not on a specific agent. You attach one or more KBs to an agent in the Workbench, and the agent’s framework gets a retriever wired in automatically.

from xpander_sdk import KnowledgeBases

kbs = KnowledgeBases()
for kb in kbs.list():
    print(kb.name, kb.id, kb.total_documents)

Each KnowledgeBase object has these fields:

Field	Type	What it’s for
`id`	`str`	Stable identifier. Use this to attach the KB to an agent or look it up later.
`name`	`str`	Human-readable label.
`description`	`str`	Shown to the agent as context for when to query this KB.
`type`	`str`	`managed` (xpander handles embeddings and storage) or `external` (your own vector store, enterprise-only).
`total_documents`	`int`	Count of documents currently indexed.

2. Create a KB

kb = kbs.create(
    name="Engineering Runbooks",
    description="Internal incident response and on-call documentation",
)

Some notes:

Knowledge bases are xpander-managed, which means chunking, embedding, vector storage, and search are configured automatically.
The agent reads description to decide when to query this knowledge base.

To bring your own self-managed knowledge base, contact sales.

3. Add documents

Documents are referenced by URL, not uploaded as bytes. The KB fetches each URL, parses it, chunks it, embeds the chunks, and stores them.

docs = kb.add_documents(
    document_urls=[
        "https://internal.acme.com/runbooks/payment-failures.pdf",
        "https://internal.acme.com/runbooks/database-restore.md",
    ],
    sync=True,
)

for d in docs:
    print(d.id, d.document_url)

Some notes:

sync=True waits for processing before returning. Use it when you need to search the documents in the same script that uploads them. Use sync=False (the default) for batch jobs where you don’t need to block on completion.
URLs must be reachable from xpander’s infrastructure. For internal documents not on the public web, host them somewhere xpander can reach: S3 with a presigned URL, an internal HTTPS endpoint with IP whitelisting, or a public bucket. There’s no in-memory-blob entry point; if a document only exists in memory (a generated report, a transient export), upload it to storage first.
Documents behind auth need a credentialless URL. The platform’s fetcher can’t carry your session. Use S3 presigned URLs that include a short-lived token, or make a temporary public link.

Supported formats: PDF, Markdown, plain text, HTML, CSV, JSON, and common Office formats.

4. List documents in a KB

documents = kb.list_documents()
for d in documents:
    print(d.id, d.document_url, d.status)

status is most useful when you’ve added documents asynchronously and want to know which ones finished processing. Failed documents stay in the list with their error captured, so you can find and re-add the URLs after fixing whatever was wrong.

5. Remove documents

to_remove = [d.id for d in documents if "deprecated" in d.document_url]
kb.delete_multiple_documents(document_ids=to_remove)

Deletes are immediate and final. Re-adding a URL re-processes the file from scratch with a new document ID.

6. Delete a KB

kb.delete()

This wipes the KB and every document in it. There’s no soft-delete or retention window. Detach it from any agents that reference it before calling this. Otherwise those agents will be referencing a KB that no longer exists.

7. Attach a KB to an agent

from xpander_sdk import Agents

agent = Agents().get(agent_id="agt_01H...")
agent.attach_knowledge_base(knowledge_base_id=kb.id)

Once attached to an agent, the knowledge base is available in all future sessions.

Sync patterns

Two patterns come up repeatedly in production: Keeping a KB in sync with a source-of-truth elsewhere. Run a scheduled job that fetches the latest URL list from your CMS, diffs it against kb.list_documents(), removes documents no longer in the source, and adds new ones. With sync=True on the add, the job is idempotent. Per-tenant KBs in a multi-tenant system. One KB per customer, attached to a customer-specific agent. The setup overhead is one kbs.create(name=...) call when you onboard a customer, plus an agent.attach_knowledge_base(...) call to link it.

Troubleshooting

Document status is `failed` after `add_documents`

The most common causes are: the URL isn’t reachable from xpander’s infrastructure (private network, missing auth), the file format isn’t supported, or the file is malformed. Check the status field on the document object; it carries the error message. Fix the URL or file and re-add.

`sync=True` times out on large files

Large PDFs and Office documents can take a while to chunk and embed. Switch to sync=False and poll kb.list_documents() until the document’s status is ready. Or break the upload into smaller batches.

Agent doesn't use the KB in its reasoning

Check that the KB is attached to the agent (visible in the Workbench under the agent’s KB tab) and that the agent has been published after attachment. For frameworks other than Agno, you may need to wire the retriever in manually. See the framework pages.

Deleting a KB breaks an attached agent

Detach the KB from all agents before deleting it. In the Workbench, open each agent’s KB tab and remove the reference. Then delete the KB. An agent referencing a deleted KB will silently get no results from KB queries.

Next steps

Semantic search

Query a KB directly from code, outside the agent’s reasoning loop.

Agent KB integration

How attached KBs reach the agent’s reasoning loop.

KB SDK reference

Full method-level docs.

Output Response Filtering

Trim large KB responses before they reach the LLM.

​Prerequisites

​1. List your knowledge bases

​2. Create a KB

​3. Add documents

​4. List documents in a KB

​5. Remove documents

​6. Delete a KB

​7. Attach a KB to an agent

​Sync patterns

​Troubleshooting

​Next steps

Semantic search

Agent KB integration

KB SDK reference

Output Response Filtering

Prerequisites

1. List your knowledge bases

2. Create a KB

3. Add documents

4. List documents in a KB

5. Remove documents

6. Delete a KB

7. Attach a KB to an agent

Sync patterns

Troubleshooting

Next steps