使用 Ragas 为基于 RAG 的问答系统生成合成测试集

概述

在本教程中，我们将探索 Ragas 中的测试集生成模块，为基于检索增强生成 (RAG) 的问答机器人创建一个合成测试集。我们的目标是设计一个 Ragas 航空公司助手，能够回答客户关于各种主题的查询，包括

航班预订
航班更改和取消
行李政策
查看预订
航班延误
机上服务
特殊协助

为了确保我们的合成数据集尽可能真实且多样化，我们将创建不同的客户角色。每个角色将代表不同的旅行者类型和行为，帮助我们构建一个全面且具有代表性的测试集。这种方法确保我们可以彻底评估 RAG 模型的有效性和鲁棒性。

让我们开始吧！

下载并加载文档

运行以下命令下载虚拟 Ragas 航空公司数据集并使用 LangChain 加载文档。

! git clone https://hugging-face.cn/datasets/explodinggradients/ragas-airline-dataset

from langchain_community.document_loaders import DirectoryLoader

path = "ragas-airline-dataset"
loader = DirectoryLoader(path, glob="**/*.md")
docs = loader.load()

设置 LLM 和嵌入模型

from ragas.llms import LangchainLLMWrapper
from ragas.embeddings import LangchainEmbeddingsWrapper
from langchain_openai import ChatOpenAI, OpenAIEmbeddings


generator_llm = LangchainLLMWrapper(ChatOpenAI(model="gpt-4o-mini"))
generator_embeddings = LangchainEmbeddingsWrapper(OpenAIEmbeddings(model="text-embedding-3-small"))

创建知识图谱

使用文档创建基础知识图谱

from ragas.testset.graph import KnowledgeGraph
from ragas.testset.graph import Node, NodeType


kg = KnowledgeGraph()

for doc in docs:
    kg.nodes.append(
        Node(
            type=NodeType.DOCUMENT,
            properties={"page_content": doc.page_content, "document_metadata": doc.metadata}
        )
    )

kg

输出

KnowledgeGraph(nodes: 8, relationships: 0)

设置转换

在本教程中，我们使用仅由节点构建的知识图谱创建了一个单跳查询数据集。为了增强我们的图谱并改进查询生成，我们应用了三个关键转换

标题提取： 使用语言模型从每个文档中提取清晰的章节标题（例如，从 flight cancellations.md 中提取“航空公司发起的取消”）。这些标题隔离了特定主题，为生成重点问题提供了直接上下文。
标题分割： 根据提取的标题将文档分割成易于管理的小节。这增加了节点数量，并确保生成更精细、更具上下文特异性的查询。
关键词提取： 识别核心主题关键词（例如关键座位信息），这些关键词作为语义种子点，丰富了生成查询的多样性和相关性。

from ragas.testset.transforms import apply_transforms
from ragas.testset.transforms import HeadlinesExtractor, HeadlineSplitter, KeyphrasesExtractor

headline_extractor = HeadlinesExtractor(llm=generator_llm, max_num=20)
headline_splitter = HeadlineSplitter(max_tokens=1500)
keyphrase_extractor = KeyphrasesExtractor(llm=generator_llm)

transforms = [
    headline_extractor,
    headline_splitter,
    keyphrase_extractor
]

apply_transforms(kg, transforms=transforms)

Applying HeadlinesExtractor: 100%|██████████| 8/8 [00:00<?, ?it/s]
Applying HeadlineSplitter: 100%|██████████| 8/8 [00:00<?, ?it/s]
Applying KeyphrasesExtractor: 100%|██████████| 25/25 [00:00<?, ?it/s]

配置用于查询生成的角色

角色提供了上下文和视角，确保生成的查询自然、特定于用户且多样化。通过根据不同用户视角定制查询，我们的测试集涵盖了广泛的场景

首次乘机旅客： 生成带有详细分步指导的查询，针对需要清晰指示的新手。
常旅客： 为经验丰富的旅行者生成简洁、注重效率的查询。
愤怒的商务舱旅客： 生成带有批评、紧急语气的查询，以反映高期望和立即解决的需求。

from ragas.testset.persona import Persona

persona_first_time_flier = Persona(
    name="First Time Flier",
    role_description="Is flying for the first time and may feel anxious. Needs clear guidance on flight procedures, safety protocols, and what to expect throughout the journey.",
)

persona_frequent_flier = Persona(
    name="Frequent Flier",
    role_description="Travels regularly and values efficiency and comfort. Interested in loyalty programs, express services, and a seamless travel experience.",
)

persona_angry_business_flier = Persona(
    name="Angry Business Class Flier",
    role_description="Demands top-tier service and is easily irritated by any delays or issues. Expects immediate resolutions and is quick to express frustration if standards are not met.",
)

personas = [persona_first_time_flier, persona_frequent_flier, persona_angry_business_flier]

使用合成器生成查询

合成器负责将丰富的节点和角色转换为查询。它们通过选择节点属性（例如，“实体”或“关键词”），将其与角色、样式和查询长度配对，然后使用 LLM 根据节点内容生成查询-答案对来实现这一目标。

使用 SingleHopSpecificQuerySynthesizer 的两个实例来定义查询分布

基于标题的合成器 – 使用提取的文档标题生成查询，从而产生引用特定章节的结构化问题。
基于关键词的合成器 – 围绕关键概念形成查询，生成更广泛的主题问题。

两个合成器权重相等（各 0.5），确保了特定和概念性查询的均衡组合，最终增强了测试集的多样性。

from ragas.testset.synthesizers.single_hop.specific import (
    SingleHopSpecificQuerySynthesizer,
)

query_distibution = [
    (
        SingleHopSpecificQuerySynthesizer(llm=generator_llm, property_name="headlines"),
        0.5,
    ),
    (
        SingleHopSpecificQuerySynthesizer(
            llm=generator_llm, property_name="keyphrases"
        ),
        0.5,
    ),
]

测试集生成

from ragas.testset import TestsetGenerator

generator = TestsetGenerator(
    llm=generator_llm,
    embedding_model=generator_embeddings,
    knowledge_graph=kg,
    persona_list=personas,
)

现在我们可以生成测试集了。

testset = generator.generate(testset_size=10, query_distribution=query_distibution)
testset.to_pandas()

Generating Scenarios: 100%|██████████| 2/2 [00:00<?, ?it/s]
Generating Samples: 100%|██████████| 10/10 [00:00<?, ?it/s]

输出

	用户输入	参考上下文	参考答案	合成器名称
0	Wut do I do if my baggage is Delayed, Lost, or...	[Baggage Policies\n\nThis section provides a d...	If your baggage is delayed, lost, or damaged, ...	single_hop_specifc_query_synthesizer
1	Wht asistance is provided by the airline durin...	[Flight Delays\n\nFlight delays can be caused ...	Depending on the length of the delay, Ragas Ai...	single_hop_specifc_query_synthesizer
2	What is Step 1: Check Fare Rules in the contex...	[Flight Cancellations\n\nFlight cancellations ...	Step 1: Check Fare Rules involves logging into...	single_hop_specifc_query_synthesizer
3	How can I access my booking online with Ragas ...	[Managing Reservations\n\nManaging your reserv...	To access your booking online with Ragas Airli...	single_hop_specifc_query_synthesizer
4	What assistance does Ragas Airlines provide fo...	[Special Assistance\n\nRagas Airlines provides...	Ragas Airlines provides special assistance ser...	single_hop_specifc_query_synthesizer
5	What steps should I take if my baggage is dela...	[Baggage Policies This section provides a deta...	If your baggage is delayed, lost, or damaged w...	single_hop_specifc_query_synthesizer
6	How can I resubmit the claim for my baggage is...	[Potential Issues and Resolutions for Baggage ...	To resubmit the claim for your baggage issue, ...	single_hop_specifc_query_synthesizer
7	Wut are the main causes of flight delays and h...	[Flight Delays Flight delays can be caused by ...	Flight delays can be caused by weather conditi...	single_hop_specifc_query_synthesizer
8	How can I request reimbursement for additional...	[2. Additional Expenses Incurred Due to Delay ...	To request reimbursement for additional expens...	single_hop_specifc_query_synthesizer
9	What are passenger-initiated cancelations?	[Flight Cancellations Flight cancellations can...	Passenger-initiated cancellations occur when a...	single_hop_specifc_query_synthesizer

最后思考

在本教程中，我们探索了使用 Ragas 库进行测试集生成，主要关注单跳查询。在即将发布的教程中，我们将深入研究多跳查询，扩展这些概念以获得更丰富的测试集场景。