跳转到内容

修改指标中的提示

在 Ragas 中,每个使用 LLM 的指标也使用一个或多个提示来生成中间结果,这些结果用于计算分数。在使用基于 LLM 的指标时,提示可以被视为超参数。一个针对您的领域和用例优化的提示,可以将基于 LLM 的指标的准确性提高 10-20%。由于最佳提示取决于所使用的 LLM,您可能需要调整每个指标所使用的提示。

Ragas 中的每个提示都是使用 [BasePrompt][ragas.prompt.metrics.base_prompt.BasePrompt] 或 PydanticPrompt 类编写的。在继续之前,请确保您已理解提示对象文档

理解您的指标提示

对于支持提示自定义的指标,Ragas 通过指标实例提供了对底层提示对象的访问。让我们看看如何在 Faithfulness 指标中访问和修改提示。

from ragas.metrics.collections import Faithfulness
from openai import AsyncOpenAI
from ragas.llms import llm_factory

# Setup dependencies
client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)

# Create metric instance
scorer = Faithfulness(llm=llm)

# Access the internal prompt (implementation specific to each metric)
# Most modern metrics have prompts initialized in __init__
print(scorer.prompt)
输出
<ragas.metrics.collections.faithfulness.util.FaithfulnessPrompt at 0x7f8c41410970>

生成和查看提示字符串

让我们查看将发送给 LLM 的提示。

# For metrics with BasePrompt, you can generate the prompt string
# Here's an example with Faithfulness metric
from ragas.metrics.collections.faithfulness.util import FaithfulnessInput

# Create sample input
sample_input = FaithfulnessInput(
    response="The Eiffel Tower is located in Paris.",
    context="The Eiffel Tower is an iconic iron lattice tower located in Paris, France."
)

# Generate the prompt string
prompt_string = scorer.prompt.to_string(sample_input)
print(prompt_string)
输出
Your task is to judge the faithfulness of a series of statements based on a given context.
For each statement you must return verdict as 1 if the statement can be directly inferred
based on the context or 0 if the statement can not be directly inferred based on the context.

[Example statements and context shown here...]

修改现代指标中的提示

Ragas 中的现代指标使用模块化的 BasePrompt 类,这些类是每个指标的 util.py 模块的一部分。要自定义提示:

  1. 访问提示:提示作为指标实例的一个属性(通常是 self.prompt)可用。
  2. 修改提示类:扩展或子类化提示以自定义指令或示例。
  3. 更新指标:在指标初始化期间传递您的自定义提示,或在之后修改它。

以下是使用 FactualCorrectness 指标的一个示例。

from ragas.metrics.collections import FactualCorrectness
from ragas.metrics.collections.factual_correctness.util import (
    ClaimDecompositionInput,
    ClaimDecompositionOutput,
    ClaimDecompositionPrompt,
)

# Create a custom prompt by subclassing
class CustomClaimDecompositionPrompt(ClaimDecompositionPrompt):
    @property
    def instruction(self):
        return """You are an expert at breaking down complex statements into atomic claims.
Break down the input text into clear, verifiable claims.
Only output valid JSON with a "claims" array."""

# Create metric instance and replace prompt
scorer = FactualCorrectness(llm=llm)
scorer.prompt = CustomClaimDecompositionPrompt()

# Now the metric will use the custom prompt
result = await scorer.ascore(
    response="The Eiffel Tower is in Paris and was built in 1889.",
    reference="The Eiffel Tower is located in Paris. It was completed in 1889."
)

修改默认提示中的示例

少样本示例(Few-shot examples)可以极大地影响 LLM 的输出。默认提示中的示例可能无法反映您的特定领域或用例。以下是如何修改它们的方法:

from ragas.metrics.collections.faithfulness.util import (
    FaithfulnessInput,
    FaithfulnessOutput,
    FaithfulnessPrompt,
    StatementFaithfulnessAnswer,
)

# Create custom prompt with domain-specific examples
class DomainSpecificFaithfulnessPrompt(FaithfulnessPrompt):
    examples = [
        (
            FaithfulnessInput(
                response="Machine learning is a subset of AI that uses statistical techniques.",
                context="Machine learning is a field within artificial intelligence that enables systems to learn from data without being explicitly programmed.",
            ),
            FaithfulnessOutput(
                statements=[
                    StatementFaithfulnessAnswer(
                        statement="Machine learning is a subset of AI.",
                        reason="This statement is supported by the context which mentions ML as a field within AI.",
                        verdict=1
                    ),
                    StatementFaithfulnessAnswer(
                        statement="Machine learning uses statistical techniques.",
                        reason="While related, the context doesn't explicitly mention statistical techniques.",
                        verdict=0
                    ),
                ]
            ),
        ),
        # Add more examples for your specific domain
    ]

# Update the metric with custom prompt
scorer = Faithfulness(llm=llm)
scorer.prompt = DomainSpecificFaithfulnessPrompt()

# Now evaluate with domain-specific prompts
result = await scorer.ascore(
    response="Neural networks are inspired by biological neurons.",
    context="Artificial neural networks are computing systems loosely inspired by biological neural networks found in animal brains."
)

这种方法确保 LLM 拥有能更好反映您的领域和评估标准的示例。

完整的提示自定义示例

这是一个完整的示例,展示了如何验证您的自定义设置。

# Create sample input to test the prompt
sample_input = FaithfulnessInput(
    response="The capital of France is Paris.",
    context="Paris is the capital and most populous city of France."
)

# Generate and view the full prompt string
full_prompt = scorer.prompt.to_string(sample_input)
print("Full Prompt:")
print(full_prompt)
print("\n" + "="*80 + "\n")

# Now use it for evaluation
result = await scorer.ascore(
    response="The capital of France is Paris.",
    context="Paris is the capital and most populous city of France."
)
print(f"Faithfulness Score: {result.value}")