What are the five OpenTelemetry GenAI agent spans?

The spec defines create_agent, invoke_agent_client, invoke_agent_internal, invoke_workflow, and execute_tool. Each carries a required gen_ai.operation.name attribute set to that operation value. They model agent creation, the caller side, the agent's internal reasoning loop, workflow sub-steps, and tool execution respectively.

Is the OpenTelemetry GenAI semantic conventions spec stable?

No. As of June 2026 the agent-spans page is still labeled Development under release v1.41.1, so span and attribute names can still change. Gate experimental attributes behind OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental and pin your semconv library version.

Which gen_ai attributes should every model span carry?

At minimum: gen_ai.provider.name, gen_ai.request.model, gen_ai.usage.input_tokens, gen_ai.usage.output_tokens, and gen_ai.conversation.id. Every major backend projects these into queryable columns for cost and latency analysis.

Does Langfuse accept raw OpenTelemetry traces?

Yes. Langfuse exposes an OTLP ingestion endpoint that accepts OTel-shaped spans as-is, so an agent instrumented with a vanilla OTLP exporter ships to Langfuse with no Langfuse-specific code. After the January 2026 ClickHouse acquisition, those spans land in a ClickHouse OLAP store.

OpenTelemetry GenAI Conventions: Instrument AI Agents

The OpenTelemetry project now has a versioned, citable answer to the question "what does a correct AI agent trace look like?" As of June 17, 2026, the GenAI agent-spans specification sits at release v1.41.1, tagged 2026-05-11, and it defines exactly five agent span operations.

It is also still labeled Development, which means the names can change under you. Both facts matter, and this guide treats them as a single engineering problem.

If you are building agent observability with the OpenTelemetry GenAI semantic conventions, the durable move is to emit those five spans now and insulate your code from the churn. The vendor side is converging fast: ClickHouse acquired Langfuse in January 2026, and Cisco completed its acquisition of Galileo in May 2026.

Both consolidated platforms ingest OTel-shaped traces as their primary input. Picking a non-OTel wire format in 2026 means opting out of the two biggest acquisitions in the category.

TL;DR. The OTel GenAI spec defines five agent spans (create_agent, invoke_agent_client, invoke_agent_internal, invoke_workflow, execute_tool) plus a gen_ai.* attribute vocabulary. The spec is still Development, so shim, pin, dual-emit, and test. Your traces land in warehouse-first backends like Langfuse on ClickHouse or Splunk via Galileo.

Key takeaways

The agent spec defines five named span operations, every one carrying gen_ai.operation.name.
Five attributes do most of the work: provider, model, input tokens, output tokens, conversation ID.
The spec is Development as of v1.41.1 (2026-05-11); names can still break.
gen_ai.usage.reasoning.output_tokens (added v1.41.0) is the new cost trap for o-series and extended-thinking models.
Warehouse-first storage (Postgres for state, ClickHouse for OLAP) is now the reference architecture.

What are the OpenTelemetry GenAI semantic conventions?

The OpenTelemetry GenAI semantic conventions are a standard vocabulary of span names and attributes for tracing generative-AI systems, including a dedicated page for AI agents that defines five span operations and a gen_ai.* attribute set for models, tools, and token usage.

The agent-spans page lists the five operations in this order:

Span operation	Kind	What it wraps
`create_agent`	client	Instantiating an agent (a LangGraph graph, an AutoGen agent, a hand-rolled class)
`invoke_agent_client`	client	The caller side, treating the agent as a remote service
`invoke_agent_internal`	internal	The agent's top-level reasoning loop; parent of model and tool calls
`invoke_workflow`	client	A discrete workflow step or sub-graph node
`execute_tool`	client	The agent running a tool: a function, retrieval, code interpreter, shell

There are two execute_tool definitions in the wider namespace, and the distinction trips people up. The LLM provider asking for a function is a gen_ai.chat span with a gen_ai.tool.* event.

The agent runtime actually running that function is the execute_tool operation on the agent page. The generative-AI spans page covers the non-agent operations like chat, embeddings, and generate_content.

The gen-ai.* attribute table you actually need

The full registry is large. In practice, five attributes are the ones every backend (Langfuse, ClickHouse ClickStack, Splunk, Honeycomb) projects into a queryable column, and the rest is gravy.

Attribute	Required for	Notes
`gen_ai.operation.name`	all agent spans	One of the five values above
`gen_ai.provider.name`	all agent spans	Renamed from `gen_ai.system` in v1.37
`gen_ai.request.model`	model invocations	Required
`gen_ai.usage.input_tokens`	model invocations	Conditionally Required when provider returns counts
`gen_ai.usage.output_tokens`	model invocations	Conditionally Required
`gen_ai.conversation.id`	session-scoped spans	Your primary grouping key
`gen_ai.usage.reasoning.output_tokens`	reasoning models	Opt-In, added v1.41.0
`gen_ai.tool.name` / `gen_ai.tool.call.id`	`execute_tool`	Required on tool spans
`error.type`	failed spans	Stable

The gen_ai.usage.* attributes are the cost-tracking spine. Input and output tokens are Conditionally Required on model spans, meaning you set them whenever the provider returns counts and omit them otherwise.

The new one to watch is gen_ai.usage.reasoning.output_tokens, added in v1.41.0 (April 2026) per the semantic-conventions CHANGELOG. Reasoning models such as OpenAI's o-series and Anthropic extended-thinking can multiply per-call cost by 5x to 20x against the base input rate.

A cost pipeline that ignores this attribute will systematically under-report. Treat it as Opt-In until v1.42.0 confirms the convention.

OTel also standardizes a metric, gen_ai.client.operation.duration, a histogram in seconds keyed on provider, operation, and model. Its dimensions are stable even when span attribute names churn, which makes it the right foundation for latency SLOs.

The canonical multi-step agent loop

The shape the spec expects is a parent-child tree. The caller opens an invoke_agent_client span; the agent opens invoke_agent_internal; each model call is a gen_ai.chat child; each tool the model requests becomes an execute_tool child.

invoke_agent_client          (CLIENT)
 └── invoke_agent_internal   (INTERNAL)
      ├── gen_ai.chat        (model call #1 → requests a tool)
      ├── execute_tool       (the tool runs)
      └── gen_ai.chat        (model call #2 with the tool result → final answer)

Here is the loop in Python, with the attributes that earn their keep:

python

import os
from opentelemetry import trace

# Opt into the latest experimental GenAI conventions. Leave OFF to keep v1.36 names.
os.environ.setdefault("OTEL_SEMCONV_STABILITY_OPT_IN", "gen_ai_latest_experimental")
tracer = trace.get_tracer("com.example.agent", "1.0.0")

def invoke_model(messages, model="gpt-4.1"):
    with tracer.start_as_current_span("gen_ai.chat") as span:
        span.set_attribute("gen_ai.operation.name", "chat")
        span.set_attribute("gen_ai.provider.name", "openai")
        span.set_attribute("gen_ai.request.model", model)
        span.set_attribute("gen_ai.conversation.id", messages.session_id)
        reply = call_provider(messages, model)
        span.set_attribute("gen_ai.usage.input_tokens", reply.usage.input_tokens)
        span.set_attribute("gen_ai.usage.output_tokens", reply.usage.output_tokens)
        return reply

def run_agent(user_message):
    with tracer.start_as_current_span("invoke_agent_client") as client:
        client.set_attribute("gen_ai.operation.name", "invoke_agent_client")
        client.set_attribute("gen_ai.agent.name", "research-assistant")
        client.set_attribute("gen_ai.agent.id", AGENT_ID)
        with tracer.start_as_current_span("invoke_agent_internal") as agent:
            agent.set_attribute("gen_ai.operation.name", "invoke_agent_internal")
            for _ in range(MAX_STEPS):
                reply = invoke_model(messages)
                if reply.tool_call:
                    execute_tool(reply.tool_call.name, reply.tool_call)
                else:
                    return reply.text

The TypeScript and Go SDKs follow the same shape. In Node, wrap each step in tracer.startActiveSpan(...), set error.type and SpanStatusCode.ERROR in the catch block, and end the span in finally.

In Go, tracer.Start(ctx, "gen_ai.chat") with defer span.End() and span.SetAttributes(...) is the idiom. The attribute keys are identical across all three languages, which is the whole point of a shared convention.

Attributing latency and cost per span

Per-span latency is just end_time − start_time from the OTLP fields. For the loop, sum the durations of every child gen_ai.chat and execute_tool span; the gap between that sum and the parent's duration is the agent's own overhead, including prompt assembly, tool selection, and any LLM-based routing the framework does outside the model API call.

Token cost is a join with a model-price table at query time:

cost = input_tokens * price_in(model)
     + output_tokens * price_out(model)
     + reasoning_output_tokens * price_reasoning(model)   -- v1.41.0+
     + cached_input_tokens * price_cached(model)

Langfuse surfaces this pre-aggregated as cost-per-trace because it stores usage_details on every observation. A "which conversations are burning the budget" query in ClickHouse groups by conversation_id and agent_id over a recent window and orders by cost_usd, no price-table join needed at query time because Langfuse stores the joined cost on ingest.

Where the spans land: the 2026 vendor consolidation

Two acquisitions reshaped the storage layer this year, and both point the same direction.

Langfuse joined ClickHouse. On 2026-01-16, ClickHouse announced a $400M Series D at a $15B valuation and the acquisition of Langfuse in the same release. Langfuse stayed MIT-licensed and self-hostable. CEO Marc Klingen framed it plainly: "LLM observability and evaluation is fundamentally a data problem... We moved our data layer to ClickHouse, and that technical decision turned into a real partnership."

The 2026 Langfuse architecture is hybrid: PostgreSQL for transactional state (sessions, projects, accounts), ClickHouse for the OLAP store (traces, observations, scores), Redis or Valkey for cache and queue, and S3 or MinIO for object payloads. For OTel users, the consequence is direct: Langfuse's OTLP ingestion endpoint accepts your spans as-is, projects the gen_ai.* attributes into ClickHouse columns, and exposes them as filtering dimensions in the UI.

No Langfuse-specific code required.

Galileo joined Cisco. Cisco announced intent to acquire Galileo on 2026-04-09 and updated the post on 2026-05-22 to confirm completion; the Cisco acquisitions list records it as closed. Galileo is being folded into Splunk Observability Cloud's AI Agent Monitoring, not rebranded standalone. Its docs recommend OpenTelemetry and OpenInference integration paths, so the same gen_ai.* attributes flow into its evaluation surface today.

Read together, these moves say OTel is the wire format and warehouse-first storage is winning. A 2026 observability stack without a columnar OLAP layer behind the trace store is structurally more expensive to run at the volume agentic systems produce.

How do you survive an unstable spec?

This is the honest caveat, paired with the workaround. The conventions page itself warns that instrumentation libraries "should NOT change the version of the GenAI conventions that they emit by default" while the spec is Development.

In roughly 11 months from v1.36.0 to the in-development v1.42.0, the GenAI namespace shipped multiple material versions and breaking renames, including gen_ai.system to gen_ai.provider.name in v1.37.

The defensive posture is the same one that worked for OTel's database and HTTP conventions:

Use a shim. Traceloop's OpenLLMetry or Arize's OpenInference own the attribute vocabulary, so a spec change becomes a shim-version bump, not an application edit.
Pin the conventions library. Pin opentelemetry-semantic-conventions (and its Go/JS/Java siblings) at the version matching the spec page you code against. Treat upgrades as explicit events.
Gate experimental attributes. Set OTEL_SEMCONV_STABILITY_OPT_IN=gen_ai_latest_experimental only where you want the next version. Keep production on the default and dual-emit in staging to see the diff first.
Centralize keys. Put every attribute name in one constants module. When gen_ai.system became gen_ai.provider.name, a well-organized codebase changed one file.
Lock the spec in tests. A golden span exporter that asserts exact attribute keys catches a churn-induced rename in CI before it reaches production.
Prefer stable metrics. Dashboards reading gen_ai.client.operation.duration survive renames; dashboards reading span-level attribute keys do not.
Read the CHANGELOG on every release. The GitHub issue cadence on the genai subdirectory is the cleanest early-warning signal.

OpenTelemetry graduated within CNCF on 2026-05-21, which underwrites the bet even while the GenAI subspec matures.

What this means for you

Emit the five agent operations explicitly, parent every model and tool call under invoke_agent_internal, and put gen_ai.operation.name on every span. Cover provider, model, both token counts, and conversation ID on each model span, and add reasoning tokens when the model produces them.

Choose OTel as the wire format so your backend stays swappable. If your fleet produces more than roughly 100M observations a month, the ClickHouse-backed Langfuse path scales with storage cost and lets you query spans without exporting them. If volume is small, Langfuse Cloud, Arize Phoenix, and Honeycomb all consume the same traces.

Then shim, pin, dual-emit, and test, because the spec will move again before the year is out. The workflow you build around the five spans will still be correct after the next rename. The exact attribute strings might not be, and that is precisely why you keep them in one file.

Agent Observability with the OpenTelemetry GenAI Conventions