The arguments for creating the tool response handling evaluator.
Optionalchoices?: ClassificationChoicesMapOptionalinputMapping?: ObjectMapping<RecordType>The mapping of the input to evaluate to the shape that the evaluator expects
Optionalname?: stringOptionaloptimizationDirection?: OptimizationDirectionOptionalpromptTemplate?: PromptTemplateOptionaltelemetry?: TelemetryConfigAn evaluator function that takes a ToolResponseHandlingEvaluationRecord and returns a classification result indicating whether the tool response handling is correct or incorrect.
const evaluator = createToolResponseHandlingEvaluator({ model: openai("gpt-4o-mini") });
// Example: Correct extraction from tool result
const result = await evaluator.evaluate({
input: "What's the weather in Seattle?",
toolCall: 'get_weather(location="Seattle")',
toolResult: JSON.stringify({
temperature: 58,
unit: "fahrenheit",
conditions: "partly cloudy"
}),
output: "The weather in Seattle is 58°F and partly cloudy."
});
console.log(result.label); // "correct"
// Example: Hallucinated data (incorrect)
const resultHallucinated = await evaluator.evaluate({
input: "What restaurants are nearby?",
toolCall: 'search_restaurants(location="downtown")',
toolResult: JSON.stringify({
results: [{ name: "Cafe Luna", rating: 4.2 }]
}),
output: "I found Cafe Luna (4.2 stars) and Mario's Italian (4.8 stars) nearby."
});
console.log(resultHallucinated.label); // "incorrect" - Mario's was hallucinated
Creates a tool response handling evaluator function.
This function returns an evaluator that determines whether an AI agent properly handled a tool's response, including error handling, data extraction, transformation, and safe information disclosure.