[bot] Add Llama Stack Client Python SDK integration for inference, agents, and responses execution instrumentation

## Summary

The Llama Stack Client Python SDK (`llama-stack-client`) is Meta's official Python client for [Llama Stack](https://github.com/meta-llama/llama-stack) — a production framework for building LLM-powered applications using Llama models. Latest stable release: **v0.2.x / 0.6.0** (March 2026), actively maintained by the llamastack organization. This repository has zero instrumentation for any `llama-stack-client` execution surface — no integration directory, no wrapper, no patcher, no `auto_instrument()` support.

Llama Stack provides standardized inference, safety, agents, and retrieval APIs that work uniformly across local (Ollama, vLLM, TGI), cloud (Together AI, Fireworks, SambaNova), and Meta's own managed endpoints. Its `LlamaStackClient` and `AsyncLlamaStackClient` expose a distinct execution API (`client.inference.chat_completion()`, `client.agents.*`, `client.responses.create()`) that is separate from the OpenAI SDK shape and is not coverable by `wrap_openai()`.

Comparable provider and agent framework integrations in this repo: `openai`, `anthropic`, `mistral`, `cohere`, `huggingface-hub`, `openai-agents`, `claude-agent-sdk`, `google-adk`.

## What needs to be instrumented

The `llama-stack-client` package exposes these execution surfaces, none of which are instrumented:

### Inference API (highest priority)

| SDK Method | Description | Return type |
|---|---|---|
| `client.inference.chat_completion(model_id, messages, ...)` | Chat completions with Llama models — supports tool calling, streaming, structured output | `ChatCompletionResponse` or `AsyncIterator[ChatCompletionStreamChunk]` |
| `client.inference.completion(model_id, content, ...)` | Text completions from a prompt | `CompletionResponse` |
| `client.inference.embeddings(model_id, contents, ...)` | Text embeddings generation | `EmbeddingsResponse` |

`ChatCompletionResponse` includes `completion_message` (content + tool calls), `logprobs`, and usage metadata (`prompt_tokens`, `completion_tokens`, `total_tokens`). Streaming yields `ChatCompletionStreamChunk` objects with delta content.

### Responses API (agent execution)

| SDK Method | Description | Return type |
|---|---|---|
| `client.responses.create(input, model, tools=..., ...)` | Agentic response generation — automatically handles tool calls, knowledge bases, and conversation state | `OpenAIResponseObject` |

The Responses API is Llama Stack's high-level agent execution surface introduced in early 2026. It accepts a conversation thread, configured tools (MCP servers, function calls, web search), and returns a completed response after autonomously handling tool invocations in a loop.

### Agents API (lower-level agent runs)

| SDK Method | Description | Return type |
|---|---|---|
| `client.agents.create(agent_config)` | Create an agent with a specific system prompt, tools, and model | `AgentCreateResponse` |
| `client.agents.sessions.create(agent_id, ...)` | Create a session for multi-turn conversations | `Session` |
| `client.agents.turns.create(agent_id, session_id, messages, ...)` | Execute an agent turn, yielding events (tool calls, completions, etc.) | `AgentTurnResponseStreamChunk` iterator |

The Agents API provides a lower-level interface for multi-turn agent execution with explicit turn management. Each turn involves an LLM call, optional tool execution, and a final completion.

### Async variants

All methods have async equivalents on `AsyncLlamaStackClient` with identical signatures.

## Implementation notes

**Distinct client from OpenAI:** `LlamaStackClient` is a Stainless-generated client (like Groq, Mistral, Together) but with a Llama Stack-specific API shape. It is not a subclass of `openai.OpenAI` and cannot be wrapped with `wrap_openai()`.

**Provider-agnostic:** Llama Stack routes inference to any configured backend (Ollama, Together, Fireworks, SambaNova, Meta's managed endpoints). The integration should capture the `model_id` and, where available, the backend provider from response metadata.

**Streaming:** Both `inference.chat_completion()` and `agents.turns.create()` support streaming. The integration must handle streaming span lifecycle (start span on call, accumulate chunks/events, finalize on exhaustion).

**Tool call tracing:** When agents invoke tools, the turn response stream includes `AgentTurnResponseStepStartPayload` and `AgentTurnResponseStepCompletePayload` events for each tool call. These should be captured as child spans of the agent turn span.

**Token usage:** `ChatCompletionResponse` includes a `usage` field with `prompt_tokens`, `completion_tokens`, `total_tokens` — directly usable for span metrics.

## No coverage in any instrumentation layer

- No integration directory (`py/src/braintrust/integrations/llama_stack/`)
- No wrapper function (e.g. `wrap_llama_stack()`)
- No patcher in any existing integration
- No nox test session (`test_llama_stack`)
- No version entry in `py/src/braintrust/integrations/versioning.py`
- No mention in `py/src/braintrust/integrations/__init__.py`
- No entry in `[tool.braintrust.matrix]` in `py/pyproject.toml`

A grep for `llama_stack`, `llama-stack`, `llamastack` across `py/src/braintrust/` returns zero matches.

## Braintrust docs status

`not_found` — `llama-stack-client` is not listed on the [Braintrust integrations directory](https://www.braintrust.dev/docs/integrations) or the [tracing guide](https://www.braintrust.dev/docs/guides/tracing). No `auto_instrument()` reference and no `wrap_llama_stack()` function are documented anywhere in Braintrust docs.

## Upstream references

- llama-stack-client on PyPI: https://pypi.org/project/llama-stack-client/
- llama-stack-client-python on GitHub: https://github.com/llamastack/llama-stack-client-python
- Llama Stack main project: https://github.com/meta-llama/llama-stack
- Llama Stack inference API docs: https://llamastack.github.io/docs/references/python-sdk-reference/inference
- Llama Stack agents API docs: https://llamastack.github.io/docs/references/python-sdk-reference/agents
- Responses API article: https://developers.redhat.com/articles/2026/03/09/automate-ai-agents-responses-api-llama-stack
- DataCamp Llama Stack guide: https://www.datacamp.com/tutorial/llama-stack

## Local repo files inspected

- `py/src/braintrust/integrations/` — no `llama_stack/` directory on `main`
- `py/src/braintrust/wrappers/` — no Llama Stack wrapper
- `py/noxfile.py` — no `test_llama_stack` session
- `py/pyproject.toml` `[tool.braintrust.matrix]` — no llama-stack-client entry
- `py/src/braintrust/integrations/__init__.py` — Llama Stack not listed
- `py/src/braintrust/integrations/versioning.py` — no Llama Stack version matrix
- Full repo grep for `llama_stack`, `llama-stack`, `llamastack` — zero matches in SDK source

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[bot] Add Llama Stack Client Python SDK integration for inference, agents, and responses execution instrumentation #535

Summary

What needs to be instrumented

Inference API (highest priority)

Responses API (agent execution)

Agents API (lower-level agent runs)

Async variants

Implementation notes

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

SDK Method	Description	Return type
`client.inference.chat_completion(model_id, messages, ...)`	Chat completions with Llama models — supports tool calling, streaming, structured output	`ChatCompletionResponse` or `AsyncIterator[ChatCompletionStreamChunk]`
`client.inference.completion(model_id, content, ...)`	Text completions from a prompt	`CompletionResponse`
`client.inference.embeddings(model_id, contents, ...)`	Text embeddings generation	`EmbeddingsResponse`

SDK Method	Description	Return type
`client.agents.create(agent_config)`	Create an agent with a specific system prompt, tools, and model	`AgentCreateResponse`
`client.agents.sessions.create(agent_id, ...)`	Create a session for multi-turn conversations	`Session`
`client.agents.turns.create(agent_id, session_id, messages, ...)`	Execute an agent turn, yielding events (tool calls, completions, etc.)	`AgentTurnResponseStreamChunk` iterator

Uh oh!

[bot] Add Llama Stack Client Python SDK integration for inference, agents, and responses execution instrumentation #535

Description

Summary

What needs to be instrumented

Inference API (highest priority)

Responses API (agent execution)

Agents API (lower-level agent runs)

Async variants

Implementation notes

No coverage in any instrumentation layer

Braintrust docs status

Upstream references

Local repo files inspected

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions