The OpenInference specification defines a set of columns that semantically map to segments of a model’s inference. OpenInference defines a set of columns that capture production inference logs that can be used on top of many file formats.
The column names in OpenInference encode semantics via a well-formed prefix, where a set of :
s are used to encapsulate machine-parsable information. Parsers of the OpenInference specification should use the :
as a delimiter to extract the ontological information about the column. The anatomy of a column name is as follows:
:<category>.<data_type>.<[identifier]>:<name>
Where category
MUST be provided. The data_type
and identifier
MUST be provided depending on the category
. The name
is optional ONLY if the category
is a reserved singleton category for the row (e.g. :id:
).
In the specification, category
, data_type
, and identifier
will be referred to as parts.
Between the :
s, the parts are separated by a .
. The following is an example of an integer column named age
:
:feature.int:age
A single row or inference record is composed of a set of columns that capture the following information:
In the specification, the above information will be referred to as categories. The above categories are captured in the prefix-based naming convention as the first item. The following is a list of prefixes that are used to capture the above:
:id:
:timestamp:
:version:
:feature:
:prediction:
:actual:
:tag:
The above prefixes are used to capture the semantic category of the column. For example, a column named :feature.int:age
would be a column that captures the age of the user and that is used as an input to the model. A column named :prediction.float.score:
would be a column that captures the score of the prediction.
The features, predictions, actuals, and tags categories will be referred to in this specification as dimensions.
OpenInference is designed to be transport and file format agnostic. As such, it relies on the underlying file format to define the primitive types. However not all file formats are created equal and a superset of data types are required to fully capture the data (For example, JSON has no concept of float
). For this reason, we reserve the second part of the prefix for the data_type
. The following is a list of data types that are supported by OpenInference:
primitive data types
high-level data types
str
or int
that is a unique identifierhttp://
or https://
temporal data types
The above data types can be used to define a list of values by wrapping the data type in []
s. For example, a column that captures a list of id
s can be defined as :feature.[id]:document_ids
.
Specifiers designate a reserved semantic meaning to the column. Specifiers are used to capture specific reserved information about the column. The following is a list of specifiers that are supported by OpenInference:
id
types. See features for more information.For full details on each of the columns, consult the sub-sections below.