忠实度

忠实性 (Faithfulness) 指标衡量响应在事实上与检索到的上下文的一致程度。其取值范围为 0 到 1，分数越高表示一致性越好。

如果一个响应中的所有声明都能被检索到的上下文所支持，那么它就被认为是忠实的。

计算方法如下： 1. 识别响应中的所有声明。 2. 检查每个声明是否可以从检索到的上下文中推断出来。 3. 使用以下公式计算忠实性分数：

\[ \text{忠实性分数} = \frac{\text{响应中被检索上下文支持的声明数量}}{\text{响应中的总声明数量}} \]

示例

from openai import AsyncOpenAI
from ragas.llms import llm_factory
from ragas.metrics.collections import Faithfulness

# Setup LLM
client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)

# Create metric
scorer = Faithfulness(llm=llm)

# Evaluate
result = await scorer.ascore(
    user_input="When was the first super bowl?",
    response="The first superbowl was held on Jan 15, 1967",
    retrieved_contexts=[
        "The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles."
    ]
)
print(f"Faithfulness Score: {result.value}")

输出

Faithfulness Score: 1.0

同步用法

如果你偏好同步代码，可以使用 .score() 方法来代替 .ascore()

result = scorer.score(
    user_input="When was the first super bowl?",
    response="The first superbowl was held on Jan 15, 1967",
    retrieved_contexts=[...]
)

计算方法

示例

问题：爱因斯坦在何时何地出生？

上下文：阿尔伯特·爱因斯坦（生于 1879 年 3 月 14 日）是一位德国出生的理论物理学家，被广泛认为是有史以来最伟大和最有影响力的科学家之一。

高忠实性回答：爱因斯坦于 1879 年 3 月 14 日出生在德国。

低忠实性回答：爱因斯坦于 1879 年 3 月 20 日出生在德国。

让我们来看看如何使用低忠实性回答来计算忠实性。

第一步： 将生成的回答分解为独立的陈述。
- 陈述
  - 陈述 1：“爱因斯坦出生在德国。”
  - 陈述 2：“爱因斯坦出生于 1879 年 3 月 20 日。”
第二步： 对于每个生成的陈述，验证它是否可以从给定的上下文中推断出来。
- 陈述 1：是
- 陈述 2：否
第三步： 使用上述公式计算忠实性。

\[ \text{忠实性} = { \text{1} \over \text{2} } = 0.5 \]

旧版指标 API

以下示例使用旧版指标 API 模式。对于新项目，我们建议使用上面显示的基于集合的 API。

弃用时间表

此 API 将在 0.4 版本中被弃用，并在 1.0 版本中被移除。请迁移到上面显示的基于集合的 API。

使用 SingleTurnSample 的示例

from ragas.dataset_schema import SingleTurnSample
from ragas.metrics import Faithfulness

sample = SingleTurnSample(
        user_input="When was the first super bowl?",
        response="The first superbowl was held on Jan 15, 1967",
        retrieved_contexts=[
            "The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles."
        ]
    )
scorer = Faithfulness(llm=evaluator_llm)
await scorer.single_turn_ascore(sample)

输出

1.0

使用 HHEM-2.1-Open 的忠实性

Vectara 的 HHEM-2.1-Open 是一个分类器模型（T5），经过训练用于检测 LLM 生成文本中的幻觉。该模型可用于计算忠实性的第二步，即当声明与给定上下文进行交叉核对，以确定其是否可以从上下文中推断出来时。该模型免费、小巧且开源，使其在生产用例中非常高效。

使用该模型计算忠实性

from ragas.dataset_schema import SingleTurnSample
from ragas.metrics import FaithfulnesswithHHEM


sample = SingleTurnSample(
        user_input="When was the first super bowl?",
        response="The first superbowl was held on Jan 15, 1967",
        retrieved_contexts=[
            "The First AFL–NFL World Championship Game was an American football game played on January 15, 1967, at the Los Angeles Memorial Coliseum in Los Angeles."
        ]
    )
scorer = FaithfulnesswithHHEM(llm=evaluator_llm)
await scorer.single_turn_ascore(sample)

您可以通过设置 device 参数将模型加载到指定设备上，并使用 batch_size 参数调整推理的批处理大小。默认情况下，模型加载在 CPU 上，批处理大小为 10。

my_device = "cuda:0"
my_batch_size = 10

scorer = FaithfulnesswithHHEM(device=my_device, batch_size=my_batch_size)
await scorer.single_turn_ascore(sample)