评估一个简单的 RAG 系统
本指南旨在说明使用 ragas 测试和评估 RAG 系统的简单工作流程。它假设您对构建 RAG 系统和评估有最基本的了解。请参阅我们的安装说明来安装 ragas。
基本设置
我们将使用 langchain_openai 来设置 LLM 和嵌入模型,以构建我们简单的 RAG。您可以选择任何其他您喜欢的 LLM 和嵌入模型,具体操作请参阅在 langchain 中自定义模型。
from langchain_openai import ChatOpenAI
from ragas.embeddings import OpenAIEmbeddings
import openai
llm = ChatOpenAI(model="gpt-4o")
openai_client = openai.OpenAI()
embeddings = OpenAIEmbeddings(client=openai_client)
OpenAI Embeddings API
ragas.embeddings.OpenAIEmbeddings 暴露了 embed_text (单个) 和 embed_texts (批量) 方法,而不像一些 LangChain 包装器那样是 embed_query/embed_documents。下面的示例对文档使用 embed_texts,对查询使用 embed_text。请参阅 OpenAI embeddings 实现。
构建一个简单的 RAG 系统
要构建一个简单的 RAG 系统,我们需要定义以下组件
- 定义一个方法来向量化我们的文档
- 定义一个方法来检索相关文档
- 定义一个方法来生成响应
点击查看代码
import numpy as np
class RAG:
def __init__(self, model="gpt-4o"):
import openai
self.llm = ChatOpenAI(model=model)
openai_client = openai.OpenAI()
self.embeddings = OpenAIEmbeddings(client=openai_client)
self.doc_embeddings = None
self.docs = None
def load_documents(self, documents):
"""Load documents and compute their embeddings."""
self.docs = documents
self.doc_embeddings = self.embeddings.embed_texts(documents)
def get_most_relevant_docs(self, query):
"""Find the most relevant document for a given query."""
if not self.docs or not self.doc_embeddings:
raise ValueError("Documents and their embeddings are not loaded.")
query_embedding = self.embeddings.embed_text(query)
similarities = [
np.dot(query_embedding, doc_emb)
/ (np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb))
for doc_emb in self.doc_embeddings
]
most_relevant_doc_index = np.argmax(similarities)
return [self.docs[most_relevant_doc_index]]
def generate_answer(self, query, relevant_doc):
"""Generate an answer for a given query based on the most relevant document."""
prompt = f"question: {query}\n\nDocuments: {relevant_doc}"
messages = [
("system", "You are a helpful assistant that answers questions based on given documents only."),
("human", prompt),
]
ai_msg = self.llm.invoke(messages)
return ai_msg.content
加载文档
现在,让我们加载一些文档并测试我们的 RAG 系统。
sample_docs = [
"Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.",
"Marie Curie was a physicist and chemist who conducted pioneering research on radioactivity and won two Nobel Prizes.",
"Isaac Newton formulated the laws of motion and universal gravitation, laying the foundation for classical mechanics.",
"Charles Darwin introduced the theory of evolution by natural selection in his book 'On the Origin of Species'.",
"Ada Lovelace is regarded as the first computer programmer for her work on Charles Babbage's early mechanical computer, the Analytical Engine."
]
# Initialize RAG instance
rag = RAG()
# Load documents
rag.load_documents(sample_docs)
# Query and retrieve the most relevant document
query = "Who introduced the theory of relativity?"
relevant_doc = rag.get_most_relevant_docs(query)
# Generate an answer
answer = rag.generate_answer(query, relevant_doc)
print(f"Query: {query}")
print(f"Relevant Document: {relevant_doc}")
print(f"Answer: {answer}")
输出
Query: Who introduced the theory of relativity?
Relevant Document: ['Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.']
Answer: Albert Einstein introduced the theory of relativity.
收集评估数据
要收集评估数据,我们首先需要一组查询来对我们的 RAG 系统运行。我们可以通过 RAG 系统运行这些查询,并为每个查询收集 response 和 retrieved_contexts。您也可以选择性地为每个查询准备一组标准答案(golden answers),以评估系统的性能。
sample_queries = [
"Who introduced the theory of relativity?",
"Who was the first computer programmer?",
"What did Isaac Newton contribute to science?",
"Who won two Nobel Prizes for research on radioactivity?",
"What is the theory of evolution by natural selection?"
]
expected_responses = [
"Albert Einstein proposed the theory of relativity, which transformed our understanding of time, space, and gravity.",
"Ada Lovelace is regarded as the first computer programmer for her work on Charles Babbage's early mechanical computer, the Analytical Engine.",
"Isaac Newton formulated the laws of motion and universal gravitation, laying the foundation for classical mechanics.",
"Marie Curie was a physicist and chemist who conducted pioneering research on radioactivity and won two Nobel Prizes.",
"Charles Darwin introduced the theory of evolution by natural selection in his book 'On the Origin of Species'."
]
dataset = []
for query,reference in zip(sample_queries,expected_responses):
relevant_docs = rag.get_most_relevant_docs(query)
response = rag.generate_answer(query, relevant_docs)
dataset.append(
{
"user_input":query,
"retrieved_contexts":relevant_docs,
"response":response,
"reference":reference
}
)
现在,将数据集加载到 EvaluationDataset 对象中。
评估
我们已经成功收集了评估数据。现在,我们可以使用一组常用的 RAG 评估指标,在收集到的数据集上评估我们的 RAG 系统。您可以选择任何模型作为评估器 LLM 进行评估。
from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
evaluator_llm = LangchainLLMWrapper(llm)
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness
result = evaluate(dataset=evaluation_dataset,metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness()],llm=evaluator_llm)
result
输出
需要帮助使用评估来改进您的 AI 应用吗?
在过去的两年里,我们见证并帮助了许多 AI 应用通过评估得到改进。
我们正在将这些知识压缩成一个产品,用评估循环取代感觉检查,这样您就可以专注于构建卓越的 AI 应用。
如果您希望在使用评估来改进和扩展您的 AI 应用方面获得帮助。
🔗 预约一个时间段或给我们写信:founders@vibrantlabs.com。
