Observability with Tool Hooks

Tool hooks are decorators that fire around every tool invocation an agent makes. They give you a single place to plug in logging, metrics, alerting, payload redaction, custom guardrails, and per-tool observability without touching the tools themselves. Hooks are framework-agnostic: they run at the SDK level, so the same hook fires whether the agent is built on Agno, OpenAI Agents SDK, LangChain, or AWS Strands, and whether the tool is a connector, a custom @register_tool function, or an MCP-server tool. There are three decorators:

@on_tool_before runs before each tool invocation.
@on_tool_after runs after a successful invocation, with the result.
@on_tool_error runs when a tool invocation raises.

Prerequisites

Complete the Quickstart so the CLI, SDK, and xpander login are already set up.
An agent with at least one tool attached. Connectors selected in Agent Studio, @register_tool functions, or MCP-server tools all work.
Python 3.12+ for the local handler.

1. Log every tool call

The smallest useful hook is a logger that records each tool the agent reaches for. Drop the three decorators in a module that’s imported from your handler and they auto-register at import time:

hooks.py

from typing import Any, Dict, Optional
from loguru import logger
from xpander_sdk import on_tool_before, on_tool_after, on_tool_error, Tool

@on_tool_before
def log_invocation(tool: Tool, payload: Any,
                   payload_extension: Optional[Dict[str, Any]] = None,
                   tool_call_id: Optional[str] = None,
                   agent_version: Optional[str] = None):
    logger.info(f"-> {tool.name} called with payload {payload}")

@on_tool_after
def log_success(tool: Tool, payload: Any,
                payload_extension: Optional[Dict[str, Any]] = None,
                tool_call_id: Optional[str] = None,
                agent_version: Optional[str] = None,
                result: Any = None):
    logger.info(f"<- {tool.name} returned {type(result).__name__}")

@on_tool_error
def log_failure(tool: Tool, payload: Any,
                payload_extension: Optional[Dict[str, Any]] = None,
                tool_call_id: Optional[str] = None,
                agent_version: Optional[str] = None,
                error: Optional[Exception] = None):
    logger.error(f"x {tool.name} failed: {error}")

What this means in practice:

@on_tool_before runs immediately before the tool body executes. The Tool object exposes tool.name, tool.id, tool.is_local (true for @register_tool functions, false for connectors), and tool.description.
@on_tool_after only runs on success and adds a result parameter carrying whatever the tool returned. For connector tools, that’s the raw response body. For local tools, it’s whatever your function returned.
@on_tool_error runs in place of the after-hook when the tool raises. The error parameter is the original exception. The agent’s framework still sees the failure; the hook is for your side effects (logs, alerts, traces).
tool_call_id is unique per invocation. Use it as the correlation key to pair before-hooks with their matching after-hooks or error-hooks.
Both sync and async hooks work. The SDK detects coroutine functions and awaits them automatically, so you can await an HTTP client or a DB write inside an async hook without extra wiring.

Three traits of every hook signature, regardless of which decorator you use:

Parameter	Type	Notes
`tool`	`Tool`	The tool being invoked. Read `tool.name`, `tool.id`, `tool.is_local`, `tool.description`. Always set.
`payload`	`Any`	The arguments the LLM produced for the call. For connectors, typically `{"body_params": {...}, "path_params": {...}, "query_params": {...}}`. For local tools, whatever your function expects. Always set.
`payload_extension`	`Optional[Dict]`	The deep-merged extension you passed via `tool_call_payload_extension` on the task or `payload_extension=` on `agent.ainvoke_tool`. `None` if you didn’t set one.
`tool_call_id`	`Optional[str]`	Stable identifier for one invocation. Pair before/after hooks with this key.
`agent_version`	`Optional[str]`	The deployed version of the agent that issued the call. Useful for filtering metrics by rollout.
`result` (after only)	`Any`	The value the tool returned on success.
`error` (error only)	`Optional[Exception]`	The exception raised by the tool body.

2. Time and instrument every tool call

Once you have logging, the next thing most teams want is timing and counter metrics per tool. The before/after pair is the natural fit, with tool_call_id as the correlation key:

hooks.py

import time
from typing import Any, Dict, Optional
from xpander_sdk import on_tool_before, on_tool_after, Tool

starts: dict[str, float] = {}

@on_tool_before
def record_start(tool: Tool, payload: Any,
                 payload_extension: Optional[Dict[str, Any]] = None,
                 tool_call_id: Optional[str] = None,
                 agent_version: Optional[str] = None):
    if tool_call_id:
        starts[tool_call_id] = time.time()

@on_tool_after
def record_duration(tool: Tool, payload: Any,
                    payload_extension: Optional[Dict[str, Any]] = None,
                    tool_call_id: Optional[str] = None,
                    agent_version: Optional[str] = None,
                    result: Any = None):
    started = starts.pop(tool_call_id, None) if tool_call_id else None
    if started is not None:
        metrics_client.timing(f"tool.{tool.name}.duration_ms", (time.time() - started) * 1000)
        metrics_client.increment(f"tool.{tool.name}.calls")

What this means in practice:

tool_call_id is the correlation key. Concurrent tool calls run on the same agent, so a global timestamp would clobber. The id stays stable from the before hook to the matching after or error hook.
Pop, don’t peek. Removing the entry on the after hook keeps memory bounded across long-running containers.
Embed tool.name in the metric name. Per-tool dashboards drop out of this naming scheme without per-tool boilerplate.

Mirror the increment in @on_tool_error so success and failure counters add up to the total call count.

3. Redact payloads and add custom guardrails

Hooks are observe-only by design. The SDK calls them, but ignores any return value, so you cannot mutate the payload or rewrite the result from a hook. What you can do is:

Redact at the sink. Strip secrets from the copy of the payload you log or send to a tracing backend.
Detect and alert. Match the payload against a guardrail policy and emit an alert or a metric when it trips.
Raise to fail loud. A hook that raises has its exception logged by the SDK; the tool itself still runs, but the alert reaches your error-tracking system.

hooks.py

import copy
from typing import Any, Dict, Optional
from xpander_sdk import on_tool_before, Tool

SENSITIVE_KEYS = {"api_key", "password", "ssn", "credit_card"}

@on_tool_before
def redact_and_log(tool: Tool, payload: Any,
                   payload_extension: Optional[Dict[str, Any]] = None,
                   tool_call_id: Optional[str] = None,
                   agent_version: Optional[str] = None):
    # Deep copy so we never touch the live payload the tool will receive.
    safe = copy.deepcopy(payload) if isinstance(payload, dict) else payload
    if isinstance(safe, dict):
        for key in list(safe.get("body_params", {})):
            if key.lower() in SENSITIVE_KEYS:
                safe["body_params"][key] = "***"
    audit_log.write({"tool": tool.name, "tool_call_id": tool_call_id, "payload": safe})

What this means in practice:

copy.deepcopy(payload) is the safety net. Even though hook return values are ignored, mutating a shared dict in place could affect other observers reading the same object. Copy first, redact the copy.
SENSITIVE_KEYS is your project’s policy. Extend it with whatever your security team flags.
audit_log.write(...) is a stand-in for whatever sink you ship to (S3, Datadog, OpenTelemetry). Hooks are the right place for this work because they fire on every tool, not just the ones you remember to instrument.

To enforce a policy that should block a call, do it inside the tool function itself. Hooks fire before the tool body runs, but raising from a hook only logs the exception, it doesn’t cancel the invocation.

4. Alert on failures of business-critical tools

Most tool errors are noise: an LLM produced an invalid payload, a connector returned a 4xx, the agent retries. The few that should page someone (a charge that didn’t go through, an auth check that broke) deserve their own hook with a name allowlist:

hooks.py

from xpander_sdk import on_tool_error, Tool

CRITICAL = {"payment_processor", "auth_service", "fraud_check"}

@on_tool_error
async def alert_on_failure(tool: Tool, payload, payload_extension=None,
                           tool_call_id=None, agent_version=None, error=None):
    if tool.name not in CRITICAL:
        return
    await alert_service.send(
        title=f"Critical tool failure: {tool.name}",
        message=f"Error: {error}\nCall ID: {tool_call_id}\nAgent: {agent_version}",
        severity="critical",
    )

What this means in practice:

The name allowlist is what keeps alert volume sane. Without it, every transient connector failure pages you.
agent_version is included in the alert so you can correlate a spike of errors with the rollout that introduced it.
The hook is async, so it can await an HTTP call to PagerDuty or Slack without spinning up a background thread.

5. Attribute cost and usage per tenant

Tool hooks are how you build per-customer billing or per-team cost dashboards on top of agent activity. Combine tool_call_payload_extension with an @on_tool_after hook that reads the tenant ID off the extension and increments a counter:

hooks.py

from xpander_sdk import on_tool_after, Tool

@on_tool_after
async def attribute_cost(tool: Tool, payload, payload_extension=None,
                         tool_call_id=None, agent_version=None, result=None):
    tenant_id = (payload_extension or {}).get("body_params", {}).get("tenant_id")
    if tenant_id:
        await billing.increment(tenant_id, tool=tool.name, count=1)

What this means in practice:

payload_extension is the same dict you set when creating the task with tool_call_payload_extension={"body_params": {"tenant_id": "acme-corp"}}. Every tool call inside that task carries it through to the hook.
The hook fires on every successful invocation, so the counter reflects real usage, not LLM intentions.
It works uniformly across tool types. Connector calls, custom @register_tool calls, and MCP tools all hit this hook with the same extension.

6. Where hooks fit in your project

Register hooks at module level so they’re set up before any task is processed. The cleanest pattern is a hooks.py imported from your handler:

xpander_handler.py

import hooks  # registers logging, metrics, and audit hooks at import time

from xpander_sdk import on_task, Task, Backend
from agno.agent import Agent

@on_task
async def handler(task: Task) -> Task:
    backend = Backend(configuration=task.configuration)
    agno_agent = Agent(**(await backend.aget_args(task=task)))
    result = await agno_agent.arun(input=task.to_message())
    task.result = result.content
    return task

What this means in practice:

The import hooks line is enough. Each @on_tool_before / @on_tool_after / @on_tool_error decorator registers itself in a process-global registry on import. There’s no register_hooks(...) call.
Hooks compose with @on_boot. Use a boot handler to construct the metrics client, alerting client, or audit-log writer that your hooks reach for, so they exist before the first tool fires.
Hooks coexist with framework-level callbacks. Agno’s tool_hooks arg, OpenAI Agents SDK’s run hooks, and LangChain callbacks all keep working. xpander’s hooks fire at the SDK’s tool-invocation layer, so they run alongside (not instead of) any framework callback you’ve already wired up.

How hooks fire

The SDK runs hooks synchronously around the tool body. The order is fixed:

Schema validation runs first if the tool has a Pydantic schema.
All @on_tool_before hooks run, in registration order.
The tool body executes (the connector HTTP call, the local @register_tool function, or the MCP server call).
On success, every @on_tool_after hook runs, in registration order, with the result.
On failure, every @on_tool_error hook runs, in registration order, with the exception.
Activity reporting to Agent Studio happens after hooks return, so your hooks see the call before the platform’s metrics view does.

A few non-obvious properties:

Exceptions inside a hook are caught by the SDK and logged. They don’t prevent the tool from running, don’t cancel sibling hooks, and don’t propagate to the agent loop. This makes hooks safe for instrumentation, but it means you can’t use them to block a call.
Hooks observe; they don’t mutate. The SDK calls each hook and ignores its return value. Mutate the local copy you log, but don’t expect hook returns to alter the live payload or rewrite the result.
Order matters when hooks share state. If two @on_tool_after hooks both read a dict populated by a @on_tool_before hook, register them in the order the after-hooks need to run.

Troubleshooting

My hook never fires

The decorator only registers the hook when the module that defines it is imported. If hooks.py lives next to xpander_handler.py but nothing ever imports it, the decorators never run. Add import hooks at the top of xpander_handler.py (or wherever your @on_task lives) so registration happens at boot.

My hook fires twice for one tool call

Hooks register globally, so importing hooks.py from two different modules registers each decorator twice. Pick one import site (the handler) and remove the others. Re-running xpander agent dev reloads the registry from a fresh process, which is the easiest way to confirm.

Async hook seems to block or never complete

The SDK detects coroutine functions and awaits them; sync hooks run inline. If you wrote a sync hook that calls asyncio.run(...) or blocks on a sync HTTP client inside an async handler, you’ll stall the event loop. Either declare the hook async def and await an async client, or keep it sync and use a non-blocking client.

An exception in my hook crashed... nothing

Hook exceptions are caught and logged by the SDK; the tool still runs. If you need a hook failure to be loud, push the exception to your error tracker yourself (sentry_sdk.capture_exception(e)) inside a try/except. Don’t rely on the exception bubbling up to the agent loop, because it won’t.

`payload_extension` is `None` even though I set `tool_call_payload_extension`

tool_call_payload_extension is a per-task setting passed to agent.acreate_task(...). If you’re invoking a tool by hand with agent.ainvoke_tool(...) and didn’t pass payload_extension=..., the hook receives None. Either set the extension on the task, or pass it to ainvoke_tool directly.

Hook return value seems to be ignored

It is. Hooks observe the call; they don’t mutate it. The SDK ignores whatever a hook returns. To shape the payload that reaches a tool, use input schema overrides on the tool’s Advanced tab in Agent Studio. To shape the result the LLM sees, use Output Response Filtering or filter inside your @register_tool function before returning.

Next steps

Pre-built connectors

The other tool surface hooks observe, including tool_call_payload_extension for per-tenant context.

Custom tools

Wrap your own Python functions with @register_tool. Hooks fire for these too.

Output Response Filtering

How large tool responses get filtered before reaching the LLM.

Lifecycle hooks

@on_boot and @on_shutdown for setting up the clients your tool hooks reach for.

Frameworks

How tool calls flow through Agno, OpenAI Agents SDK, LangChain, and AWS Strands.

Core Concepts

The SDK class names mapped onto agents, tasks, threads, and tools.

​Prerequisites

​1. Log every tool call

​2. Time and instrument every tool call

​3. Redact payloads and add custom guardrails

​4. Alert on failures of business-critical tools

​5. Attribute cost and usage per tenant

​6. Where hooks fit in your project

​How hooks fire

​Troubleshooting

​Next steps

Pre-built connectors

Custom tools

Output Response Filtering

Lifecycle hooks

Frameworks

Core Concepts

Prerequisites

1. Log every tool call

2. Time and instrument every tool call

3. Redact payloads and add custom guardrails

4. Alert on failures of business-critical tools

5. Attribute cost and usage per tenant

6. Where hooks fit in your project

How hooks fire

Troubleshooting

Next steps