评估简单的 RAG 系统

本指南旨在演示使用 ragas 测试和评估 RAG 系统的简单工作流程。它假定您对构建 RAG 系统和评估只有最低限度的了解。请参阅我们的安装说明来安装 ragas。

基本设置

我们将使用 langchain_openai 设置用于构建简单 RAG 的 LLM 和嵌入模型。您可以选择任何其他 LLM 和嵌入模型，具体请参阅在 langchain 中自定义模型。

from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
llm = ChatOpenAI(model="gpt-4o")
embeddings = OpenAIEmbeddings()

构建简单的 RAG 系统

要构建一个简单的 RAG 系统，我们需要定义以下组件：

定义一个方法来向量化我们的文档
定义一个方法来检索相关文档
定义一个方法来生成响应

点击查看代码

import numpy as np

class RAG:
    def __init__(self, model="gpt-4o"):
        self.llm = ChatOpenAI(model=model)
        self.embeddings = OpenAIEmbeddings()
        self.doc_embeddings = None
        self.docs = None

    def load_documents(self, documents):
        """Load documents and compute their embeddings."""
        self.docs = documents
        self.doc_embeddings = self.embeddings.embed_documents(documents)

    def get_most_relevant_docs(self, query):
        """Find the most relevant document for a given query."""
        if not self.docs or not self.doc_embeddings:
            raise ValueError("Documents and their embeddings are not loaded.")

        query_embedding = self.embeddings.embed_query(query)
        similarities = [
            np.dot(query_embedding, doc_emb)
            / (np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb))
            for doc_emb in self.doc_embeddings
        ]
        most_relevant_doc_index = np.argmax(similarities)
        return [self.docs[most_relevant_doc_index]]

    def generate_answer(self, query, relevant_doc):
        """Generate an answer for a given query based on the most relevant document."""
        prompt = f"question: {query}\n\nDocuments: {relevant_doc}"
        messages = [
            ("system", "You are a helpful assistant that answers questions based on given documents only."),
            ("human", prompt),
        ]
        ai_msg = self.llm.invoke(messages)
        return ai_msg.content

加载文档

现在，让我们加载一些文档并测试我们的 RAG 系统。

sample_docs = [
    "Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.",
    "Marie Curie was a physicist and chemist who conducted pioneering research on radioactivity and won two Nobel Prizes.",
    "Isaac Newton formulated the laws of motion and universal gravitation, laying the foundation for classical mechanics.",
    "Charles Darwin introduced the theory of evolution by natural selection in his book 'On the Origin of Species'.",
    "Ada Lovelace is regarded as the first computer programmer for her work on Charles Babbage's early mechanical computer, the Analytical Engine."
]

# Initialize RAG instance
rag = RAG()

# Load documents
rag.load_documents(sample_docs)

# Query and retrieve the most relevant document
query = "Who introduced the theory of relativity?"
relevant_doc = rag.get_most_relevant_docs(query)

# Generate an answer
answer = rag.generate_answer(query, relevant_doc)

print(f"Query: {query}")
print(f"Relevant Document: {relevant_doc}")
print(f"Answer: {answer}")

输出

Query: Who introduced the theory of relativity?
Relevant Document: ['Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.']
Answer: Albert Einstein introduced the theory of relativity.

收集评估数据

为了收集评估数据，我们首先需要一组用于运行 RAG 的查询。我们可以通过 RAG 系统运行查询，并为每个查询收集 response 和 retrieved_contexts。您也可以选择为每个查询准备一组“黄金答案”来评估系统的性能。

sample_queries = [
    "Who introduced the theory of relativity?",
    "Who was the first computer programmer?",
    "What did Isaac Newton contribute to science?",
    "Who won two Nobel Prizes for research on radioactivity?",
    "What is the theory of evolution by natural selection?"
]

expected_responses = [
    "Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.",
    "Ada Lovelace is regarded as the first computer programmer for her work on Charles Babbage's early mechanical computer, the Analytical Engine.",
    "Isaac Newton formulated the laws of motion and universal gravitation, laying the foundation for classical mechanics.",
    "Marie Curie was a physicist and chemist who conducted pioneering research on radioactivity and won two Nobel Prizes.",
    "Charles Darwin introduced the theory of evolution by natural selection in his book 'On the Origin of Species'."
]

dataset = []

for query,reference in zip(sample_queries,expected_responses):

    relevant_docs = rag.get_most_relevant_docs(query)
    response = rag.generate_answer(query, relevant_docs)
    dataset.append(
        {
            "user_input":query,
            "retrieved_contexts":relevant_docs,
            "response":response,
            "reference":reference
        }
    )

现在，将数据集加载到 EvaluationDataset 对象中。

from ragas import EvaluationDataset
evaluation_dataset = EvaluationDataset.from_list(dataset)

评估

我们已经成功收集了评估数据。现在，我们可以使用一组常用的 RAG 评估指标，在收集的数据集上评估我们的 RAG 系统。您可以选择任何模型作为评估 LLM 进行评估。

from ragas import evaluate
from ragas.llms import LangchainLLMWrapper


evaluator_llm = LangchainLLMWrapper(llm)
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness

result = evaluate(dataset=evaluation_dataset,metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness()],llm=evaluator_llm)
result

输出

{'context_recall': 1.0000, 'faithfulness': 0.8571, 'factual_correctness': 0.7280}

想通过评估来改进您的 AI 应用吗？

在过去的两年里，我们见证并帮助许多 AI 应用通过评估得到了改进。

我们将这些知识压缩成一个产品，用评估循环取代“凭感觉”，让您能够专注于构建出色的 AI 应用。

如果您希望通过评估获得改进和扩展 AI 应用的帮助。

🔗 预定一个时间，或者给我们发邮件：founders@explodinggradients.com。

下一步

为评估 RAG 生成测试数据