跳过内容

评估样本

评估样本是一个单一的结构化数据实例,用于评估和衡量您的LLM应用在特定场景下的性能。它代表AI应用需要处理的单个交互单元或特定用例。在 Ragas 中,评估样本使用 SingleTurnSampleMultiTurnSample 类来表示。

SingleTurnSample

SingleTurnSample 代表用户、LLM 和用于评估的预期结果之间的单轮交互。它适用于涉及单个问答对的评估,可能包含额外的上下文或参考信息。

示例

以下示例演示如何在基于 RAG 的应用中创建 SingleTurnSample 实例以评估单轮交互。在此场景中,用户提出问题,AI 提供答案。我们将创建一个 SingleTurnSample 实例来表示此交互,包括任何检索到的上下文、参考答案和评估标准。

from ragas import SingleTurnSample

# User's question
user_input = "What is the capital of France?"

# Retrieved contexts (e.g., from a knowledge base or search engine)
retrieved_contexts = ["Paris is the capital and most populous city of France."]

# AI's response
response = "The capital of France is Paris."

# Reference answer (ground truth)
reference = "Paris"

# Evaluation rubric
rubric = {
    "accuracy": "Correct",
    "completeness": "High",
    "fluency": "Excellent"
}

# Create the SingleTurnSample instance
sample = SingleTurnSample(
    user_input=user_input,
    retrieved_contexts=retrieved_contexts,
    response=response,
    reference=reference,
    rubric=rubric
)

MultiTurnSample

MultiTurnSample 代表人类、AI 以及可选的工具和用于评估的预期结果之间的多轮交互。它适用于在更复杂的交互中表示对话式代理以便进行评估。在 MultiTurnSample 中,user_input 属性表示消息序列,这些消息共同构成了人类用户和 AI 系统之间的多轮对话。这些消息是 HumanMessageAIMessageToolMessage 类的实例。

示例

以下示例演示如何创建 MultiTurnSample 实例以评估多轮交互。在此场景中,用户想知道纽约市当前的天气。AI 助手将使用天气 API 工具获取信息并响应用户。

from ragas.messages import HumanMessage, AIMessage, ToolMessage, ToolCall

# User asks about the weather in New York City
user_message = HumanMessage(content="What's the weather like in New York City today?")

# AI decides to use a weather API tool to fetch the information
ai_initial_response = AIMessage(
    content="Let me check the current weather in New York City for you.",
    tool_calls=[ToolCall(name="WeatherAPI", args={"location": "New York City"})]
)

# Tool provides the weather information
tool_response = ToolMessage(content="It's sunny with a temperature of 75°F in New York City.")

# AI delivers the final response to the user
ai_final_response = AIMessage(content="It's sunny and 75 degrees Fahrenheit in New York City today.")

# Combine all messages into a list to represent the conversation
conversation = [
    user_message,
    ai_initial_response,
    tool_response,
    ai_final_response
]

现在,使用对话创建 MultiTurnSample 对象,包括任何参考响应和评估标准。

from ragas import MultiTurnSample
# Reference response for evaluation purposes
reference_response = "Provide the current weather in New York City to the user."


# Create the MultiTurnSample instance
sample = MultiTurnSample(
    user_input=conversation,
    reference=reference_response,
)