任务指标
摘要分数
摘要分数指标衡量摘要(response)在多大程度上捕捉了reference_contexts中的重要信息。该指标背后的直觉是,一个好的摘要应该包含上下文中存在的所有重要信息。
我们首先从上下文中提取一组重要的关键短语。然后使用这些关键短语生成一组问题。对于上下文来说,这些问题的答案总是是(1)。然后我们向摘要提出这些问题,并将摘要分数计算为正确回答的问题数与问题总数的比率。
我们使用答案(一个由1和0组成的列表)来计算问答分数。然后,将问答分数计算为正确回答的问题(答案 = 1)与问题总数的比率。
\[ \text{问答分数} = \frac{|\text{正确回答的问题数}|}{|\text{问题总数}|} \]
我们还引入了一个选项,通过提供一个简洁性分数来惩罚较长的摘要。如果启用此选项,最终分数将计算为摘要分数和简洁性分数的加权平均值。这个简洁性分数确保了那些只是复制文本的摘要不会得到高分,因为它们显然会正确回答所有问题。
\[ \text{简洁性分数} = 1 - \frac{\min(\text{摘要长度}, \text{上下文长度})}{\text{上下文长度} + \text{1e-10}} \]
我们还提供一个系数coeff(默认值为0.5)来控制分数的权重。
最终的摘要分数计算如下:
\[ \text{摘要分数} = \text{问答分数} * \text{(1-coeff)} + \\ \text{简洁性分数} * \text{coeff} \]
示例
from openai import AsyncOpenAI
from ragas.llms import llm_factory
from ragas.metrics.collections import SummaryScore
# Setup LLM
client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)
# Create metric
scorer = SummaryScore(llm=llm)
# Evaluate
result = await scorer.ascore(
reference_contexts=[
"A company is launching a new product, a smartphone app designed to help users track their fitness goals. The app allows users to set daily exercise targets, log their meals, and track their water intake. It also provides personalized workout recommendations and sends motivational reminders throughout the day."
],
response="A company is launching a fitness tracking app that helps users set exercise goals, log meals, and track water intake, with personalized workout suggestions and motivational reminders."
)
print(f"Summary Score: {result.value}")
输出
同步用法
如果你偏好同步代码,可以使用 .score() 方法来代替 .ascore()
旧版指标 API
以下示例使用旧版指标 API 模式。对于新项目,我们建议使用上面显示的基于集合的 API。
弃用时间表
此 API 将在 0.4 版本中被弃用,并在 1.0 版本中被移除。请迁移到上面显示的基于集合的 API。
使用 SingleTurnSample 的示例
from ragas.dataset_schema import SingleTurnSample
from ragas.metrics import SummarizationScore
sample = SingleTurnSample(
response="A company is launching a fitness tracking app that helps users set exercise goals, log meals, and track water intake, with personalized workout suggestions and motivational reminders.",
reference_contexts=[
"A company is launching a new product, a smartphone app designed to help users track their fitness goals. The app allows users to set daily exercise targets, log their meals, and track their water intake. It also provides personalized workout recommendations and sends motivational reminders throughout the day."
]
)
scorer = SummarizationScore(llm=evaluator_llm)
await scorer.single_turn_ascore(sample)
输出