Arize Phoenix TS
    Preparing search index...
    • Creates a tool response handling evaluator function.

      This function returns an evaluator that determines whether an AI agent properly handled a tool's response, including error handling, data extraction, transformation, and safe information disclosure.

      Type Parameters

      Parameters

      Returns ClassificationEvaluator<RecordType>

      An evaluator function that takes a ToolResponseHandlingEvaluationRecord and returns a classification result indicating whether the tool response handling is correct or incorrect.

      const evaluator = createToolResponseHandlingEvaluator({ model: openai("gpt-4o-mini") });

      // Example: Correct extraction from tool result
      const result = await evaluator.evaluate({
      input: "What's the weather in Seattle?",
      toolCall: 'get_weather(location="Seattle")',
      toolResult: JSON.stringify({
      temperature: 58,
      unit: "fahrenheit",
      conditions: "partly cloudy"
      }),
      output: "The weather in Seattle is 58°F and partly cloudy."
      });
      console.log(result.label); // "correct"

      // Example: Hallucinated data (incorrect)
      const resultHallucinated = await evaluator.evaluate({
      input: "What restaurants are nearby?",
      toolCall: 'search_restaurants(location="downtown")',
      toolResult: JSON.stringify({
      results: [{ name: "Cafe Luna", rating: 4.2 }]
      }),
      output: "I found Cafe Luna (4.2 stars) and Mario's Italian (4.8 stars) nearby."
      });
      console.log(resultHallucinated.label); // "incorrect" - Mario's was hallucinated