修改指标中的提示
在 Ragas 中,每个使用 LLM 的指标也使用一个或多个提示来生成中间结果,这些结果用于计算分数。在使用基于 LLM 的指标时,提示可以被视为超参数。一个针对您的领域和用例优化的提示,可以将基于 LLM 的指标的准确性提高 10-20%。由于最佳提示取决于所使用的 LLM,您可能需要调整每个指标所使用的提示。
Ragas 中的每个提示都是使用 [BasePrompt][ragas.prompt.metrics.base_prompt.BasePrompt] 或 PydanticPrompt 类编写的。在继续之前,请确保您已理解提示对象文档。
理解您的指标提示
对于支持提示自定义的指标,Ragas 通过指标实例提供了对底层提示对象的访问。让我们看看如何在 Faithfulness 指标中访问和修改提示。
from ragas.metrics.collections import Faithfulness
from openai import AsyncOpenAI
from ragas.llms import llm_factory
# Setup dependencies
client = AsyncOpenAI()
llm = llm_factory("gpt-4o-mini", client=client)
# Create metric instance
scorer = Faithfulness(llm=llm)
# Access the internal prompt (implementation specific to each metric)
# Most modern metrics have prompts initialized in __init__
print(scorer.prompt)
生成和查看提示字符串
让我们查看将发送给 LLM 的提示。
# For metrics with BasePrompt, you can generate the prompt string
# Here's an example with Faithfulness metric
from ragas.metrics.collections.faithfulness.util import FaithfulnessInput
# Create sample input
sample_input = FaithfulnessInput(
response="The Eiffel Tower is located in Paris.",
context="The Eiffel Tower is an iconic iron lattice tower located in Paris, France."
)
# Generate the prompt string
prompt_string = scorer.prompt.to_string(sample_input)
print(prompt_string)
Your task is to judge the faithfulness of a series of statements based on a given context.
For each statement you must return verdict as 1 if the statement can be directly inferred
based on the context or 0 if the statement can not be directly inferred based on the context.
[Example statements and context shown here...]
修改现代指标中的提示
Ragas 中的现代指标使用模块化的 BasePrompt 类,这些类是每个指标的 util.py 模块的一部分。要自定义提示:
- 访问提示:提示作为指标实例的一个属性(通常是
self.prompt)可用。 - 修改提示类:扩展或子类化提示以自定义指令或示例。
- 更新指标:在指标初始化期间传递您的自定义提示,或在之后修改它。
以下是使用 FactualCorrectness 指标的一个示例。
from ragas.metrics.collections import FactualCorrectness
from ragas.metrics.collections.factual_correctness.util import (
ClaimDecompositionInput,
ClaimDecompositionOutput,
ClaimDecompositionPrompt,
)
# Create a custom prompt by subclassing
class CustomClaimDecompositionPrompt(ClaimDecompositionPrompt):
@property
def instruction(self):
return """You are an expert at breaking down complex statements into atomic claims.
Break down the input text into clear, verifiable claims.
Only output valid JSON with a "claims" array."""
# Create metric instance and replace prompt
scorer = FactualCorrectness(llm=llm)
scorer.prompt = CustomClaimDecompositionPrompt()
# Now the metric will use the custom prompt
result = await scorer.ascore(
response="The Eiffel Tower is in Paris and was built in 1889.",
reference="The Eiffel Tower is located in Paris. It was completed in 1889."
)
修改默认提示中的示例
少样本示例(Few-shot examples)可以极大地影响 LLM 的输出。默认提示中的示例可能无法反映您的特定领域或用例。以下是如何修改它们的方法:
from ragas.metrics.collections.faithfulness.util import (
FaithfulnessInput,
FaithfulnessOutput,
FaithfulnessPrompt,
StatementFaithfulnessAnswer,
)
# Create custom prompt with domain-specific examples
class DomainSpecificFaithfulnessPrompt(FaithfulnessPrompt):
examples = [
(
FaithfulnessInput(
response="Machine learning is a subset of AI that uses statistical techniques.",
context="Machine learning is a field within artificial intelligence that enables systems to learn from data without being explicitly programmed.",
),
FaithfulnessOutput(
statements=[
StatementFaithfulnessAnswer(
statement="Machine learning is a subset of AI.",
reason="This statement is supported by the context which mentions ML as a field within AI.",
verdict=1
),
StatementFaithfulnessAnswer(
statement="Machine learning uses statistical techniques.",
reason="While related, the context doesn't explicitly mention statistical techniques.",
verdict=0
),
]
),
),
# Add more examples for your specific domain
]
# Update the metric with custom prompt
scorer = Faithfulness(llm=llm)
scorer.prompt = DomainSpecificFaithfulnessPrompt()
# Now evaluate with domain-specific prompts
result = await scorer.ascore(
response="Neural networks are inspired by biological neurons.",
context="Artificial neural networks are computing systems loosely inspired by biological neural networks found in animal brains."
)
这种方法确保 LLM 拥有能更好反映您的领域和评估标准的示例。
完整的提示自定义示例
这是一个完整的示例,展示了如何验证您的自定义设置。
# Create sample input to test the prompt
sample_input = FaithfulnessInput(
response="The capital of France is Paris.",
context="Paris is the capital and most populous city of France."
)
# Generate and view the full prompt string
full_prompt = scorer.prompt.to_string(sample_input)
print("Full Prompt:")
print(full_prompt)
print("\n" + "="*80 + "\n")
# Now use it for evaluation
result = await scorer.ascore(
response="The capital of France is Paris.",
context="Paris is the capital and most populous city of France."
)
print(f"Faithfulness Score: {result.value}")