This document describes how multimodal content (text, images, audio) is represented in OpenInference spans.
When a message contains multiple content items (e.g., text and images), the content is represented using the message.contents
array structure with flattened attributes:
llm.input_messages.<messageIndex>.message.contents.<contentIndex>.message_content.<attribute>
Where:
<messageIndex>
is the zero-based index of the message<contentIndex>
is the zero-based index of the content item within the message<attribute>
is the specific content attributeEach content item has a type
attribute that identifies its kind:
"text"
- Text content"image"
- Image content (URL or base64)"audio"
- Audio content (URL or base64)llm.input_messages.0.message.contents.0.message_content.type = "text"
llm.input_messages.0.message.contents.0.message_content.text = "What is in this image?"
llm.input_messages.0.message.contents.1.message_content.type = "image"
llm.input_messages.0.message.contents.1.message_content.image.image.url = "https://example.com/image.jpg"
For base64-encoded images:
llm.input_messages.0.message.contents.1.message_content.type = "image"
llm.input_messages.0.message.contents.1.message_content.image.image.url = "data:image/png;base64,iVBORw0KGgo..."
llm.input_messages.0.message.contents.2.message_content.type = "audio"
llm.input_messages.0.message.contents.2.message_content.audio.audio.url = "https://example.com/audio.mp3"
When OPENINFERENCE_HIDE_INPUT_IMAGES
is set to true:
"__REDACTED__"
When OPENINFERENCE_BASE64_IMAGE_MAX_LENGTH
is set (default: 32000):
data:image/png;base64,
)When OPENINFERENCE_HIDE_INPUT_TEXT
is set to true:
"__REDACTED__"
A user message with both text and image content:
{
"llm.input_messages.0.message.role": "user",
"llm.input_messages.0.message.contents.0.message_content.type": "text",
"llm.input_messages.0.message.contents.0.message_content.text": "What objects do you see in this image?",
"llm.input_messages.0.message.contents.1.message_content.type": "image",
"llm.input_messages.0.message.contents.1.message_content.image.image.url": "https://example.com/photo.jpg"
}
When a message contains only text content (no multimodal content), it can use the simpler format:
{
"llm.input_messages.0.message.role": "user",
"llm.input_messages.0.message.content": "Hello, how are you?"
}
The message.content
attribute is used for simple text-only messages, while message.contents
is used for multimodal messages.