R2R 集成

R2R 是一个用于 AI 检索增强生成 (RAG) 的一体化解决方案，具有生产级特性，包括多模态内容摄取、混合搜索功能、用户/文档管理等。

概述

在本教程中，我们将：

利用 R2R 的 /rag 端点对小型数据集执行检索增强生成 (RAG)。
评估生成的回答。
分析评估过程的追踪。

R2R 设置

安装依赖项

首先，安装必要的软件包：

%pip install r2r -q

设置本地环境

配置 R2R_API_KEY、OPENAI_API_KEY 和 RAGAS_APP_TOKEN（可选）。

from dotenv import load_dotenv

load_dotenv()

获取数据

dataset = [
    "OpenAI is one of the most recognized names in the large language model space, known for its GPT series of models. These models excel at generating human-like text and performing tasks like creative writing, answering questions, and summarizing content. GPT-4, their latest release, has set benchmarks in understanding context and delivering detailed responses.",
    "Anthropic is well-known for its Claude series of language models, designed with a strong focus on safety and ethical AI behavior. Claude is particularly praised for its ability to follow complex instructions and generate text that aligns closely with user intent.",
    "DeepMind, a division of Google, is recognized for its cutting-edge Gemini models, which are integrated into various Google products like Bard and Workspace tools. These models are renowned for their conversational abilities and their capacity to handle complex, multi-turn dialogues.",
    "Meta AI is best known for its LLaMA (Large Language Model Meta AI) series, which has been made open-source for researchers and developers. LLaMA models are praised for their ability to support innovation and experimentation due to their accessibility and strong performance.",
    "Meta AI with it's LLaMA models aims to democratize AI development by making high-quality models available for free, fostering collaboration across industries. Their open-source approach has been a game-changer for researchers without access to expensive resources.",
    "Microsoft’s Azure AI platform is famous for integrating OpenAI’s GPT models, enabling businesses to use these advanced models in a scalable and secure cloud environment. Azure AI powers applications like Copilot in Office 365, helping users draft emails, generate summaries, and more.",
    "Amazon’s Bedrock platform is recognized for providing access to various language models, including its own models and third-party ones like Anthropic’s Claude and AI21’s Jurassic. Bedrock is especially valued for its flexibility, allowing users to choose models based on their specific needs.",
    "Cohere is well-known for its language models tailored for business use, excelling in tasks like search, summarization, and customer support. Their models are recognized for being efficient, cost-effective, and easy to integrate into workflows.",
    "AI21 Labs is famous for its Jurassic series of language models, which are highly versatile and capable of handling tasks like content creation and code generation. The Jurassic models stand out for their natural language understanding and ability to generate detailed and coherent responses.",
    "In the rapidly advancing field of artificial intelligence, several companies have made significant contributions with their large language models. Notable players include OpenAI, known for its GPT Series (including GPT-4); Anthropic, which offers the Claude Series; Google DeepMind with its Gemini Models; Meta AI, recognized for its LLaMA Series; Microsoft Azure AI, which integrates OpenAI’s GPT Models; Amazon AWS (Bedrock), providing access to various models including Claude (Anthropic) and Jurassic (AI21 Labs); Cohere, which offers its own models tailored for business use; and AI21 Labs, known for its Jurassic Series. These companies are shaping the landscape of AI by providing powerful models with diverse capabilities.",
]

设置 R2R 客户端

from r2r import R2RClient

client = R2RClient()

摄取数据

ingest_response = client.documents.create(
    chunks=dataset,
)

使用 `/rag` 端点

/rag 端点通过将搜索结果与语言模型输出集成来促进检索增强生成。生成过程可以使用 rag_generation_config 参数自定义，而检索过程可以使用 search_settings 配置。

query = "What makes Meta AI’s LLaMA models stand out?"

search_settings = {
        "limit": 2,
        "graph_settings": {"enabled": False, "limit": 2},
    }

response = client.retrieval.rag(
    query=query,
    search_settings=search_settings
)

print(response.results.generated_answer)

输出

Meta AI’s LLaMA models stand out due to their open-source nature, which supports innovation and experimentation by making high-quality models accessible to researchers and developers [1]. This approach democratizes AI development, fostering collaboration across industries and enabling researchers without access to expensive resources to work with advanced AI models [2].

评估

使用 Ragas 评估 `R2R Client`

在部署了 R2R Client 后，我们可以使用 Ragas 的 r2r 集成进行评估。此过程涉及以下关键组件：

1. R2R 客户端和配置
指定 RAG 设置的 R2RClient 和 /rag 配置。
2. 评估数据集
您需要一个包含 Ragas 指标所需的所有必要输入的 Ragas EvaluationDataset。
3. Ragas 指标
Ragas 提供了各种评估指标，用于评估 RAG 的不同方面，例如忠实度、答案相关性和上下文召回率。您可以在Ragas 文档中查看可用指标的完整列表。

构建 Ragas EvaluationDataset

EvaluationDataset 是 Ragas 中用于表示评估样本的数据类型。您可以在核心概念部分找到有关其结构和用法的更多详细信息。

我们将使用 ragas 中的 transform_to_ragas_dataset 函数获取我们数据的 EvaluationDataset。

questions = [
    "Who are the major players in the large language model space?",
    "What is Microsoft’s Azure AI platform known for?",
    "What kind of models does Cohere provide?",
]

references = [
    "The major players include OpenAI (GPT Series), Anthropic (Claude Series), Google DeepMind (Gemini Models), Meta AI (LLaMA Series), Microsoft Azure AI (integrating GPT Models), Amazon AWS (Bedrock with Claude and Jurassic), Cohere (business-focused models), and AI21 Labs (Jurassic Series).",
    "Microsoft’s Azure AI platform is known for integrating OpenAI’s GPT models, enabling businesses to use these models in a scalable and secure cloud environment.",
    "Cohere provides language models tailored for business use, excelling in tasks like search, summarization, and customer support.",
]

r2r_responses = []

search_settings = {
    "limit": 2,
    "graph_settings": {"enabled": False, "limit": 2},
}

for que in questions:
    response = client.retrieval.rag(query=que, search_settings=search_settings)
    r2r_responses.append(response)

from ragas.integrations.r2r import transform_to_ragas_dataset

ragas_eval_dataset = transform_to_ragas_dataset(
    user_inputs=questions, r2r_responses=r2r_responses, references=references
)

输出

EvaluationDataset(features=['user_input', 'retrieved_contexts', 'response', 'reference'], len=3)

选择指标

为了评估我们的 RAG 端点，我们将使用以下指标：

回答相关性：衡量回答与用户输入（查询）的相关程度。
上下文精确度：衡量成功检索到的相关文档（或信息片段）的数量。
忠实度：衡量回答与检索到的上下文在事实上的一致性程度。

from ragas.metrics import AnswerRelevancy, ContextPrecision, Faithfulness
from ragas import evaluate
from langchain_openai import ChatOpenAI
from ragas.llms import LangchainLLMWrapper

llm = ChatOpenAI(model="gpt-4o-mini")
evaluator_llm = LangchainLLMWrapper(llm)

ragas_metrics = [AnswerRelevancy(llm=evaluator_llm ), ContextPrecision(llm=evaluator_llm ), Faithfulness(llm=evaluator_llm )]

results = evaluate(dataset=ragas_eval_dataset, metrics=ragas_metrics)

输出

Querying Client: 100%|██████████| 3/3 [00:00<?, ?it/s]

Evaluating: 100%|██████████| 9/9 [00:00<?, ?it/s]

	用户输入	检索到的上下文	回答	参考	回答相关性	上下文精确度	忠实度
0	大型语言模型的主要参与者有谁...	[在快速发展的人工智能领域...	大型语言模型的主要参与者...	主要参与者包括 OpenAI (GPT 系列)，...	1.000000	1.0	1.000000
1	Microsoft 的 Azure AI 平台以什么闻名？	[Microsoft 的 Azure AI 平台以其...	Microsoft 的 Azure AI 平台以其集成...而闻名	Microsoft 的 Azure AI 平台以其集成...而闻名	0.948908	1.0	0.833333
2	Cohere 提供哪种类型的模型？	[Cohere 以其语言模型...而闻名	Cohere 提供为业务...量身定制的语言模型	Cohere 提供为业务...量身定制的语言模型	0.903765	1.0	1.000000

追踪评估过程

为了更好地理解评估分数，我们可以使用下面的代码获取追踪和判决的原因。

results.upload()

愉快地编程