open-inference-spec

Overview

The OpenInference specification defines a set of columns that semantically map to segments of a model’s inference. OpenInference defines a set of columns that capture production inference logs that can be used on top of many file formats.

Naming Convention

The column names in OpenInference encode semantics via a well-formed prefix, where a set of :s are used to encapsulate machine-parsable information. Parsers of the OpenInference specification should use the : as a delimiter to extract the ontological information about the column. The anatomy of a column name is as follows:

:<category>.<data_type>.<[identifier]>:<name>

Where category MUST be provided. The data_type and identifier MUST be provided depending on the category. The name is optional ONLY if the category is a reserved singleton category for the row (e.g. :id:).

In the specification, category, data_type, and identifier will be referred to as parts.

Between the :s, the parts are separated by a .. The following is an example of an integer column named age:

:feature.int:age

Data Types

OpenInference is designed to be transport and file format agnostic. As such, it relies on the underlying file format to define the primitive types. However not all file formats are created equal and a superset of data types are required to fully capture the data (For example, JSON has no concept of float). For this reason, we reserve the second part of the prefix for the data_type. The following is a list of data types that are supported by OpenInference:

primitive data types
- int: an integer
- float: a floating point number
- bool: a boolean
- str: a string, typically a label that can be enumerated
high-level data types
- text: a string that is a sentence or paragraph
- id: a str or int that is a unique identifier
- url: a string that is a URL. MUST start with http:// or https://
temporal data types
- iso_8601: a string that is an ISO 8601 timestamp
- milliseconds: a numeric value that is the number of milliseconds
- seconds: a numeric value that is the number of seconds
- minutes: a numeric value that is the number of minutes

List Types

The above data types can be used to define a list of values by wrapping the data type in []s. For example, a column that captures a list of ids can be defined as :feature.[id]:document_ids.

Specifiers

Specifiers designate a reserved semantic meaning to the column. Specifiers are used to capture specific reserved information about the column. The following is a list of specifiers that are supported by OpenInference:

score: the score of the prediction. This is a numeric value.
label: the label of the prediction. This is a string or bool.
importance: the importance of the feature. This is a numeric value.
embedding: the embedding of the feature. This is always a list of numeric values.
retrieved_document_ids: the list of document IDs that were retrieved using an embedding. This is always a list of id types. See features for more information.
retrieved_document_scores: the list of document scores that were retrieved using an embedding. See features for more information.

For full details on each of the columns, consult the sub-sections below.

Columns

This site is open source. Improve this page.