跳转到内容

LangChain 集成

本教程演示了如何使用 Ragas 评估一个基于 LangChain 构建的 RAG 问答应用程序。此外,我们还将探讨 Ragas App 如何帮助分析和提升该应用程序的性能。

构建一个简单的问答(Q&A)应用程序

为了构建一个问答系统,我们首先创建一个小数据集,并使用其嵌入(embeddings)在向量数据库中进行索引。

import os
from dotenv import load_dotenv
from langchain_core.documents import Document

load_dotenv()

content_list = [
    "Andrew Ng is the CEO of Landing AI and is known for his pioneering work in deep learning. He is also widely recognized for democratizing AI education through platforms like Coursera.",
    "Sam Altman is the CEO of OpenAI and has played a key role in advancing AI research and development. He is a strong advocate for creating safe and beneficial AI technologies.",
    "Demis Hassabis is the CEO of DeepMind and is celebrated for his innovative approach to artificial intelligence. He gained prominence for developing systems that can master complex games like AlphaGo.",
    "Sundar Pichai is the CEO of Google and Alphabet Inc., and he is praised for leading innovation across Google's vast product ecosystem. His leadership has significantly enhanced user experiences on a global scale.",
    "Arvind Krishna is the CEO of IBM and is recognized for transforming the company towards cloud computing and AI solutions. He focuses on providing cutting-edge technologies to address modern business challenges.",
]

langchain_documents = []

for content in content_list:
    langchain_documents.append(
        Document(
            page_content=content,
        )
    )
from ragas.embeddings import OpenAIEmbeddings
from langchain_core.vectorstores import InMemoryVectorStore
import openai

openai_client = openai.OpenAI()
embeddings = OpenAIEmbeddings(client=openai_client, model="text-embedding-3-small")
vector_store = InMemoryVectorStore(embeddings)

_ = vector_store.add_documents(langchain_documents)

我们现在将构建一个基于 RAG 的系统,该系统将检索器(retriever)、大语言模型(LLM)和提示(prompt)集成到一个检索问答链(Retrieval QA Chain)中。检索器从知识库中获取相关文档。LLM 将根据检索到的文档生成回答,而提示将指导模型的响应,帮助它理解上下文并生成相关且连贯的基于语言的输出。

在 LangChain 中,我们可以通过使用向量存储的 .as_retriever 方法来创建一个检索器。更多详情,请参阅 LangChain 关于向量存储检索器的文档

retriever = vector_store.as_retriever(search_kwargs={"k": 1})
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

我们将定义一个链(Chain),它处理用户查询和检索到的相关数据,并将其传递给一个结构化提示中的模型。然后,模型的输出被解析,以生成最终的字符串格式的回答。

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser


template = """Answer the question based only on the following context:
{context}

Question: {query}
"""
prompt = ChatPromptTemplate.from_template(template)

qa_chain = prompt | llm | StrOutputParser()

def format_docs(relevant_docs):
    return "\n".join(doc.page_content for doc in relevant_docs)


query = "Who is the CEO of OpenAI?"

relevant_docs = retriever.invoke(query)
qa_chain.invoke({"context": format_docs(relevant_docs), "query": query})
输出
'The CEO of OpenAI is Sam Altman.'

评估

sample_queries = [
    "Which CEO is widely recognized for democratizing AI education through platforms like Coursera?",
    "Who is Sam Altman?",
    "Who is Demis Hassabis and how did he gained prominence?",
    "Who is the CEO of Google and Alphabet Inc., praised for leading innovation across Google's product ecosystem?",
    "How did Arvind Krishna transformed IBM?",
]

expected_responses = [
    "Andrew Ng is the CEO of Landing AI and is widely recognized for democratizing AI education through platforms like Coursera.",
    "Sam Altman is the CEO of OpenAI and has played a key role in advancing AI research and development. He strongly advocates for creating safe and beneficial AI technologies.",
    "Demis Hassabis is the CEO of DeepMind and is celebrated for his innovative approach to artificial intelligence. He gained prominence for developing systems like AlphaGo that can master complex games.",
    "Sundar Pichai is the CEO of Google and Alphabet Inc., praised for leading innovation across Google's vast product ecosystem. His leadership has significantly enhanced user experiences globally.",
    "Arvind Krishna is the CEO of IBM and has transformed the company towards cloud computing and AI solutions. He focuses on delivering cutting-edge technologies to address modern business challenges.",
]

要评估问答系统,我们需要将查询、预期回答以及其他特定于指标的要求构建成 EvaluationDataset

from ragas import EvaluationDataset


dataset = []

for query, reference in zip(sample_queries, expected_responses):
    relevant_docs = retriever.invoke(query)
    response = qa_chain.invoke({"context": format_docs(relevant_docs), "query": query})
    dataset.append(
        {
            "user_input": query,
            "retrieved_contexts": [rdoc.page_content for rdoc in relevant_docs],
            "response": response,
            "reference": reference,
        }
    )

evaluation_dataset = EvaluationDataset.from_list(dataset)

为了评估我们的问答应用程序,我们将使用以下指标。

  • LLMContextRecall:评估检索到的上下文与参考答案中的声明的一致程度,无需手动标注参考上下文即可估算召回率。
  • Faithfulness:评估生成答案中的所有声明是否都可以直接从提供的上下文中推断出来。
  • Factual Correctness:通过与参考答案进行比较,使用基于声明的评估和自然语言推断来检查生成回答的事实准确性。

有关这些指标以及如何应用于评估 RAG 系统的更多详细信息,请访问 Ragas 指标文档

from ragas import evaluate
from ragas.llms import LangchainLLMWrapper
from ragas.metrics import LLMContextRecall, Faithfulness, FactualCorrectness

evaluator_llm = LangchainLLMWrapper(llm)

result = evaluate(
    dataset=evaluation_dataset,
    metrics=[LLMContextRecall(), Faithfulness(), FactualCorrectness()],
    llm=evaluator_llm,
)

result

输出

{'context_recall': 1.0000, 'faithfulness': 0.9000, 'factual_correctness': 0.9260}