openinference

Multimodal Attributes

This document describes how multimodal content (text, images, audio) is represented in OpenInference spans.

Message Content Arrays

When a message contains multiple content items (e.g., text and images), the content is represented using the message.contents array structure with flattened attributes:

Attribute Pattern

llm.input_messages.<messageIndex>.message.contents.<contentIndex>.message_content.<attribute>

Where:

Content Type Attributes

Each content item has a type attribute that identifies its kind:

Text Content

llm.input_messages.0.message.contents.0.message_content.type = "text"
llm.input_messages.0.message.contents.0.message_content.text = "What is in this image?"

Image Content

llm.input_messages.0.message.contents.1.message_content.type = "image"
llm.input_messages.0.message.contents.1.message_content.image.image.url = "https://example.com/image.jpg"

For base64-encoded images:

llm.input_messages.0.message.contents.1.message_content.type = "image"
llm.input_messages.0.message.contents.1.message_content.image.image.url = "data:image/png;base64,iVBORw0KGgo..."

Audio Content

llm.input_messages.0.message.contents.2.message_content.type = "audio"
llm.input_messages.0.message.contents.2.message_content.audio.audio.url = "https://example.com/audio.mp3"

Privacy Considerations

Hiding Images

When OPENINFERENCE_HIDE_INPUT_IMAGES is set to true:

Base64 Image Truncation

When OPENINFERENCE_BASE64_IMAGE_MAX_LENGTH is set (default: 32000):

Hiding Text Content

When OPENINFERENCE_HIDE_INPUT_TEXT is set to true:

Example: Multimodal Message

A user message with both text and image content:

{
  "llm.input_messages.0.message.role": "user",
  "llm.input_messages.0.message.contents.0.message_content.type": "text",
  "llm.input_messages.0.message.contents.0.message_content.text": "What objects do you see in this image?",
  "llm.input_messages.0.message.contents.1.message_content.type": "image",
  "llm.input_messages.0.message.contents.1.message_content.image.image.url": "https://example.com/photo.jpg"
}

Fallback for Simple Messages

When a message contains only text content (no multimodal content), it can use the simpler format:

{
  "llm.input_messages.0.message.role": "user",
  "llm.input_messages.0.message.content": "Hello, how are you?"
}

The message.content attribute is used for simple text-only messages, while message.contents is used for multimodal messages.