Arize Phoenix TS
    Preparing search index...

    Module @arizeai/phoenix-evals

    Arize Phoenix logo
    @arizeai/phoenix-evals

    NPM Version Documentation

    This package provides a TypeScript evaluation library. It is vendor agnostic and can be used in isolation of any framework or platform. This package is still under active development and is subject to change.

    # or yarn, pnpm, bun, etc...
    npm install @arizeai/phoenix-evals

    The library provides a createClassifier function that allows you to create custom evaluators for different tasks like hallucination detection, relevance scoring, or any binary/multi-class classification.

    import { createClassifier } from "@arizeai/phoenix-evals/llm";
    import { openai } from "@ai-sdk/openai";

    const model = openai("gpt-4o-mini");

    const promptTemplate = `
    In this task, you will be presented with a query, a reference text and an answer. The answer is
    generated to the question based on the reference text. The answer may contain false information. You
    must use the reference text to determine if the answer to the question contains false information,
    if the answer is a hallucination of facts. Your objective is to determine whether the answer text
    contains factual information and is not a hallucination. A 'hallucination' refers to
    an answer that is not based on the reference text or assumes information that is not available in
    the reference text. Your response should be a single word: either "factual" or "hallucinated", and
    it should not include any other text or characters.

    [BEGIN DATA]
    ************
    [Query]: {{input}}
    ************
    [Reference text]: {{reference}}
    ************
    [Answer]: {{output}}
    ************
    [END DATA]

    Is the answer above factual or hallucinated based on the query and reference text?
    `;

    // Create the classifier
    const evaluator = await createClassifier({
    model,
    choices: { factual: 1, hallucinated: 0 },
    promptTemplate: promptTemplate,
    });

    // Use the classifier
    const result = await evaluator({
    output: "Arize is not open source.",
    input: "Is Arize Phoenix Open Source?",
    reference:
    "Arize Phoenix is a platform for building and deploying AI applications. It is open source.",
    });

    console.log(result);
    // Output: { label: "hallucinated", score: 0 }

    See the complete example in examples/classifier_example.ts.

    The library includes several pre-built evaluators for common evaluation tasks. These evaluators come with optimized prompts and can be used directly with any AI SDK model.

    import { createFaithfulnessEvaluator } from "@arizeai/phoenix-evals/llm";
    import { openai } from "@ai-sdk/openai";
    const model = openai("gpt-4o-mini");

    // Faithfulness Detection
    const faithfulnessEvaluator = createFaithfulnessEvaluator({
    model,
    });

    // Use the evaluators
    const result = await faithfulnessEvaluator({
    input: "What is the capital of France?",
    context: "France is a country in Europe. Paris is its capital city.",
    output: "The capital of France is London.",
    });

    console.log(result);
    // Output: { label: "unfaithful", score: 0, explanation: "..." }

    When your data structure doesn't match what an evaluator expects, use bindEvaluator to map your fields to the evaluator's expected input format:

    import {
    bindEvaluator,
    createFaithfulnessEvaluator,
    } from "@arizeai/phoenix-evals";
    import { openai } from "@ai-sdk/openai";

    const model = openai("gpt-4o-mini");

    type ExampleType = {
    question: string;
    context: string;
    answer: string;
    };

    const evaluator = bindEvaluator<ExampleType>(
    createFaithfulnessEvaluator({ model }),
    {
    inputMapping: {
    input: "question", // Map "input" from "question"
    context: "context", // Map "context" from "context"
    output: "answer", // Map "output" from "answer"
    },
    }
    );

    const result = await evaluator.evaluate({
    question: "Is Arize Phoenix Open Source?",
    context:
    "Arize Phoenix is a platform for building and deploying AI applications. It is open source.",
    answer: "Arize is not open source.",
    });

    Mapping supports simple properties ("fieldName"), dot notation ("user.profile.name"), array access ("items[0].id"), JSONPath expressions ("$.items[*].id"), and function extractors ((data) => data.customField).

    See the complete example in examples/bind_evaluator_example.ts.

    This package works seamlessly with @arizeai/phoenix-client to enable experimentation workflows. You can create datasets, run experiments, and trace evaluation calls for analysis and debugging.

    To run experiments with your evaluations, install the phoenix-client

    npm install @arizeai/phoenix-client
    
    import { createFaithfulnessEvaluator } from "@arizeai/phoenix-evals/llm";
    import { openai } from "@ai-sdk/openai";
    import { createDataset } from "@arizeai/phoenix-client/datasets";
    import {
    asExperimentEvaluator,
    runExperiment,
    } from "@arizeai/phoenix-client/experiments";

    // Create your evaluator
    const faithfulnessEvaluator = createFaithfulnessEvaluator({
    model: openai("gpt-4o-mini"),
    });

    // Create a dataset for your experiment
    const dataset = await createDataset({
    name: "faithfulness-eval",
    description: "Evaluate the faithfulness of the model",
    examples: [
    {
    input: {
    question: "Is Phoenix Open-Source?",
    context: "Phoenix is Open-Source.",
    },
    },
    // ... more examples
    ],
    });

    // Define your experimental task
    const task = async (example) => {
    // Your AI system's response to the question
    return "Phoenix is not Open-Source";
    };

    // Create a custom evaluator to validate results
    const faithfulnessCheck = asExperimentEvaluator({
    name: "faithfulness",
    kind: "LLM",
    evaluate: async ({ input, output }) => {
    // Use the faithfulness evaluator from phoenix-evals
    const result = await faithfulnessEvaluator({
    input: input.question,
    context: input.context,
    output: output,
    });

    return result; // Return the evaluation result
    },
    });

    // Run the experiment with automatic tracing
    runExperiment({
    experimentName: "faithfulness-eval",
    experimentDescription: "Evaluate the faithfulness of the model",
    dataset: dataset,
    task,
    evaluators: [faithfulnessCheck],
    });

    To run examples, install dependencies using pnpm and run:

    pnpm install
    pnpx tsx examples/classifier_example.ts
    # change the file name to run other examples

    Join our community to connect with thousands of AI builders:

    Modules

    __generated__/default_templates
    __generated__/default_templates/CONCISENESS_CLASSIFICATION_EVALUATOR_CONFIG
    __generated__/default_templates/CORRECTNESS_CLASSIFICATION_EVALUATOR_CONFIG
    __generated__/default_templates/DOCUMENT_RELEVANCE_CLASSIFICATION_EVALUATOR_CONFIG
    __generated__/default_templates/FAITHFULNESS_CLASSIFICATION_EVALUATOR_CONFIG
    __generated__/default_templates/HALLUCINATION_CLASSIFICATION_EVALUATOR_CONFIG
    __generated__/default_templates/TOOL_INVOCATION_CLASSIFICATION_EVALUATOR_CONFIG
    __generated__/default_templates/TOOL_RESPONSE_HANDLING_CLASSIFICATION_EVALUATOR_CONFIG
    __generated__/default_templates/TOOL_SELECTION_CLASSIFICATION_EVALUATOR_CONFIG
    __generated__/types
    core/EvaluatorBase
    core/FunctionEvaluator
    helpers
    helpers/asEvaluatorFn
    helpers/createEvaluator
    helpers/toEvaluationResult
    index
    llm
    llm/ClassificationEvaluator
    llm/createClassificationEvaluator
    llm/createClassifierFn
    llm/createConcisenessEvaluator
    llm/createCorrectnessEvaluator
    llm/createDocumentRelevanceEvaluator
    llm/createFaithfulnessEvaluator
    llm/createHallucinationEvaluator
    llm/createToolInvocationEvaluator
    llm/createToolResponseHandlingEvaluator
    llm/createToolSelectionEvaluator
    llm/generateClassification
    llm/LLMEvaluator
    telemetry
    template
    template/applyTemplate
    template/createTemplateVariablesProxy
    template/getTemplateVariables
    types
    types/base
    types/data
    types/evals
    types/otel
    types/prompts
    types/templating
    utils
    utils/bindEvaluator
    utils/objectMappingUtils
    utils/typeUtils