使用 Ragas 为基于 RAG 的问答系统生成合成测试集
概述
在本教程中,我们将探索 Ragas 中的测试集生成模块,为基于检索增强生成 (RAG) 的问答机器人创建一个合成测试集。我们的目标是设计一个 Ragas 航空公司助手,能够回答客户关于各种主题的查询,包括
- 航班预订
- 航班更改和取消
- 行李政策
- 查看预订
- 航班延误
- 机上服务
- 特殊协助
为了确保我们的合成数据集尽可能真实且多样化,我们将创建不同的客户角色。每个角色将代表不同的旅行者类型和行为,帮助我们构建一个全面且具有代表性的测试集。这种方法确保我们可以彻底评估 RAG 模型的有效性和鲁棒性。
让我们开始吧!
下载并加载文档
运行以下命令下载虚拟 Ragas 航空公司数据集并使用 LangChain 加载文档。
from langchain_community.document_loaders import DirectoryLoader
path = "ragas-airline-dataset"
loader = DirectoryLoader(path, glob="**/*.md")
docs = loader.load()
设置 LLM 和嵌入模型
from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings(model="text-embedding-3-small"))
创建知识图谱
使用文档创建基础知识图谱
from ragas.testset.graph import KnowledgeGraph
from ragas.testset.graph import Node, NodeType
kg = KnowledgeGraph()
for doc in docs:
kg.nodes.append(
Node(
type=NodeType.DOCUMENT,
properties={"page_content": doc.page_content, "document_metadata": doc.metadata}
)
)
kg
设置转换
在本教程中,我们使用仅由节点构建的知识图谱创建了一个单跳查询数据集。为了增强我们的图谱并改进查询生成,我们应用了三个关键转换
- 标题提取: 使用语言模型从每个文档中提取清晰的章节标题(例如,从 flight cancellations.md 中提取“航空公司发起的取消”)。这些标题隔离了特定主题,为生成重点问题提供了直接上下文。
- 标题分割: 根据提取的标题将文档分割成易于管理的小节。这增加了节点数量,并确保生成更精细、更具上下文特异性的查询。
- 关键词提取: 识别核心主题关键词(例如关键座位信息),这些关键词作为语义种子点,丰富了生成查询的多样性和相关性。
from ragas.testset.transforms import apply_transforms
from ragas.testset.transforms import HeadlinesExtractor, HeadlineSplitter, KeyphrasesExtractor
headline_extractor = HeadlinesExtractor(llm=generator_llm, max_num=20)
headline_splitter = HeadlineSplitter(max_tokens=1500)
keyphrase_extractor = KeyphrasesExtractor(llm=generator_llm)
transforms = [
headline_extractor,
headline_splitter,
keyphrase_extractor
]
apply_transforms(kg, transforms=transforms)
Applying HeadlinesExtractor: 100%|██████████| 8/8 [00:00<?, ?it/s]
Applying HeadlineSplitter: 100%|██████████| 8/8 [00:00<?, ?it/s]
Applying KeyphrasesExtractor: 100%|██████████| 25/25 [00:00<?, ?it/s]
配置用于查询生成的角色
角色提供了上下文和视角,确保生成的查询自然、特定于用户且多样化。通过根据不同用户视角定制查询,我们的测试集涵盖了广泛的场景
- 首次乘机旅客: 生成带有详细分步指导的查询,针对需要清晰指示的新手。
- 常旅客: 为经验丰富的旅行者生成简洁、注重效率的查询。
- 愤怒的商务舱旅客: 生成带有批评、紧急语气的查询,以反映高期望和立即解决的需求。
from ragas.testset.persona import Persona
persona_first_time_flier = Persona(
name="First Time Flier",
role_description="Is flying for the first time and may feel anxious. Needs clear guidance on flight procedures, safety protocols, and what to expect throughout the journey.",
)
persona_frequent_flier = Persona(
name="Frequent Flier",
role_description="Travels regularly and values efficiency and comfort. Interested in loyalty programs, express services, and a seamless travel experience.",
)
persona_angry_business_flier = Persona(
name="Angry Business Class Flier",
role_description="Demands top-tier service and is easily irritated by any delays or issues. Expects immediate resolutions and is quick to express frustration if standards are not met.",
)
personas = [persona_first_time_flier, persona_frequent_flier, persona_angry_business_flier]
使用合成器生成查询
合成器负责将丰富的节点和角色转换为查询。它们通过选择节点属性(例如,“实体”或“关键词”),将其与角色、样式和查询长度配对,然后使用 LLM 根据节点内容生成查询-答案对来实现这一目标。
使用 SingleHopSpecificQuerySynthesizer
的两个实例来定义查询分布
- 基于标题的合成器 – 使用提取的文档标题生成查询,从而产生引用特定章节的结构化问题。
- 基于关键词的合成器 – 围绕关键概念形成查询,生成更广泛的主题问题。
两个合成器权重相等(各 0.5),确保了特定和概念性查询的均衡组合,最终增强了测试集的多样性。
from ragas.testset.synthesizers.single_hop.specific import (
SingleHopSpecificQuerySynthesizer,
)
query_distibution = [
(
SingleHopSpecificQuerySynthesizer(llm=generator_llm, property_name="headlines"),
0.5,
),
(
SingleHopSpecificQuerySynthesizer(
llm=generator_llm, property_name="keyphrases"
),
0.5,
),
]
测试集生成
from ragas.testset import TestsetGenerator
generator = TestsetGenerator(
llm=generator_llm,
embedding_model=generator_embeddings,
knowledge_graph=kg,
persona_list=personas,
)
现在我们可以生成测试集了。
testset = generator.generate(testset_size=10, query_distribution=query_distibution)
testset.to_pandas()
Generating Scenarios: 100%|██████████| 2/2 [00:00<?, ?it/s]
Generating Samples: 100%|██████████| 10/10 [00:00<?, ?it/s]
用户输入 | 参考上下文 | 参考答案 | 合成器名称 | |
---|---|---|---|---|
0 | Wut do I do if my baggage is Delayed, Lost, or... | [Baggage Policies\n\nThis section provides a d... | If your baggage is delayed, lost, or damaged, ... | single_hop_specifc_query_synthesizer |
1 | Wht asistance is provided by the airline durin... | [Flight Delays\n\nFlight delays can be caused ... | Depending on the length of the delay, Ragas Ai... | single_hop_specifc_query_synthesizer |
2 | What is Step 1: Check Fare Rules in the contex... | [Flight Cancellations\n\nFlight cancellations ... | Step 1: Check Fare Rules involves logging into... | single_hop_specifc_query_synthesizer |
3 | How can I access my booking online with Ragas ... | [Managing Reservations\n\nManaging your reserv... | To access your booking online with Ragas Airli... | single_hop_specifc_query_synthesizer |
4 | What assistance does Ragas Airlines provide fo... | [Special Assistance\n\nRagas Airlines provides... | Ragas Airlines provides special assistance ser... | single_hop_specifc_query_synthesizer |
5 | What steps should I take if my baggage is dela... | [Baggage Policies This section provides a deta... | If your baggage is delayed, lost, or damaged w... | single_hop_specifc_query_synthesizer |
6 | How can I resubmit the claim for my baggage is... | [Potential Issues and Resolutions for Baggage ... | To resubmit the claim for your baggage issue, ... | single_hop_specifc_query_synthesizer |
7 | Wut are the main causes of flight delays and h... | [Flight Delays Flight delays can be caused by ... | Flight delays can be caused by weather conditi... | single_hop_specifc_query_synthesizer |
8 | How can I request reimbursement for additional... | [2. Additional Expenses Incurred Due to Delay ... | To request reimbursement for additional expens... | single_hop_specifc_query_synthesizer |
9 | What are passenger-initiated cancelations? | [Flight Cancellations Flight cancellations can... | Passenger-initiated cancellations occur when a... | single_hop_specifc_query_synthesizer |
最后思考
在本教程中,我们探索了使用 Ragas 库进行测试集生成,主要关注单跳查询。在即将发布的教程中,我们将深入研究多跳查询,扩展这些概念以获得更丰富的测试集场景。