跳转到内容

创建并评估集成了 Amazon Bedrock 知识库和操作组的 Amazon Bedrock 代理

在本笔记本中,您将学习如何评估 Amazon Bedrock 代理。我们将评估的代理是一个餐厅代理,它为客户提供有关成人和儿童菜单的信息,并管理餐桌预订系统。该代理的灵感来自 Amazon Bedrock 代理的一个功能示例笔记本,并稍作修改。您可以在此处了解有关代理创建过程的更多信息。

架构如下图所示

architecture image

本笔记本中涵盖的步骤包括

  • 导入必要的库
  • 创建代理
  • 定义 Ragas 指标
  • 评估代理
  • 清理创建的资源
点击查看代理创建过程

导入所需的库

第一步是安装必备的软件包

%pip install --upgrade -q boto3 opensearch-py botocore awscli retrying ragas langchain-aws

此命令将克隆包含本教程所需辅助文件的存储库。

! git clone https://hugging-face.cn/datasets/vibrantlabsai/booking_agent_utils
import os
import time
import boto3
import logging
import pprint
import json

from booking_agent_utils.knowledge_base import BedrockKnowledgeBase
from booking_agent_utils.agent import (
    create_agent_role_and_policies,
    create_lambda_role,
    delete_agent_roles_and_policies,
    create_dynamodb,
    create_lambda,
    clean_up_resources,
)
# Clients
s3_client = boto3.client("s3")
sts_client = boto3.client("sts")
session = boto3.session.Session()
region = session.region_name
account_id = sts_client.get_caller_identity()["Account"]
bedrock_agent_client = boto3.client("bedrock-agent")
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime")
logging.basicConfig(
    format="[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s",
    level=logging.INFO,
)
logger = logging.getLogger(__name__)
region, account_id
suffix = f"{region}-{account_id}"
agent_name = "booking-agent"
knowledge_base_name = f"{agent_name}-kb"
knowledge_base_description = (
    "Knowledge Base containing the restaurant menu's collection"
)
agent_alias_name = "booking-agent-alias"
bucket_name = f"{agent_name}-{suffix}"
agent_bedrock_allow_policy_name = f"{agent_name}-ba"
agent_role_name = f"AmazonBedrockExecutionRoleForAgents_{agent_name}"
agent_foundation_model = "amazon.nova-pro-v1:0"

agent_description = "Agent in charge of a restaurants table bookings"
agent_instruction = """
You are a restaurant agent responsible for managing clients’ bookings (retrieving, creating, or canceling reservations) and assisting with menu inquiries. When handling menu requests, provide detailed information about the requested items. Offer recommendations only when:

1. The customer explicitly asks for a recommendation, even if the item is available (include complementary dishes).
2. The requested item is unavailable—inform the customer and suggest suitable alternatives.
3. For general menu inquiries, provide the full menu and add a recommendation only if the customer asks for one.

In all cases, ensure that any recommended items are present in the menu.

Ensure all responses are clear, contextually relevant, and enhance the customer's experience.
"""

agent_action_group_description = """
Actions for getting table booking information, create a new booking or delete an existing booking"""

agent_action_group_name = "TableBookingsActionGroup"

设置代理

为 Amazon Bedrock 创建知识库

让我们首先为 Amazon Bedrock 创建一个知识库来存储餐厅菜单。在本例中,我们将把知识库与 Amazon OpenSearch Serverless 集成。

knowledge_base = BedrockKnowledgeBase(
    kb_name=knowledge_base_name,
    kb_description=knowledge_base_description,
    data_bucket_name=bucket_name,
)

将数据集上传到 Amazon S3

现在我们已经创建了知识库,让我们用餐厅菜单数据集来填充它。在本例中,我们将通过我们的辅助类使用 boto3 对 API 的抽象

首先,让我们将数据集文件夹中可用的菜单数据上传到 Amazon S3。

def upload_directory(path, bucket_name):
    for root, dirs, files in os.walk(path):
        for file in files:
            file_to_upload = os.path.join(root, file)
            print(f"uploading file {file_to_upload} to {bucket_name}")
            s3_client.upload_file(file_to_upload, bucket_name, file)


upload_directory("booking_agent_utils/dataset", bucket_name)

现在我们开始数据摄取任务

# ensure that the kb is available
time.sleep(30)
# sync knowledge base
knowledge_base.start_ingestion_job()

最后,我们收集知识库 ID,以便稍后将其与我们的代理集成。

kb_id = knowledge_base.get_knowledge_base_id()

使用检索和生成 API 测试知识库

首先,让我们使用检索和生成 API 来测试知识库,以确保知识库正常运行。

response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={"text": "Which are the mains available in the childrens menu?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(
                region, agent_foundation_model
            ),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {"numberOfResults": 5}
            },
        },
    },
)

print(response["output"]["text"], end="\n" * 2)

创建 DynamoDB 表

我们将创建一个包含餐厅预订信息的 DynamoDB 表。

table_name = "restaurant_bookings"
create_dynamodb(table_name)

创建 Lambda 函数

现在我们将创建一个与 DynamoDB 表交互的 Lambda 函数。

创建函数代码

创建实现 get_booking_detailscreate_bookingdelete_booking 函数的 Lambda 函数。

%%writefile lambda_function.py
import json
import uuid
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('restaurant_bookings')

def get_named_parameter(event, name):
    """
    Get a parameter from the lambda event
    """
    return next(item for item in event['parameters'] if item['name'] == name)['value']


def get_booking_details(booking_id):
    """
    Retrieve details of a restaurant booking

    Args:
        booking_id (string): The ID of the booking to retrieve
    """
    try:
        response = table.get_item(Key={'booking_id': booking_id})
        if 'Item' in response:
            return response['Item']
        else:
            return {'message': f'No booking found with ID {booking_id}'}
    except Exception as e:
        return {'error': str(e)}


def create_booking(date, name, hour, num_guests):
    """
    Create a new restaurant booking

    Args:
        date (string): The date of the booking
        name (string): Name to idenfity your reservation
        hour (string): The hour of the booking
        num_guests (integer): The number of guests for the booking
    """
    try:
        booking_id = str(uuid.uuid4())[:8]
        table.put_item(
            Item={
                'booking_id': booking_id,
                'date': date,
                'name': name,
                'hour': hour,
                'num_guests': num_guests
            }
        )
        return {'booking_id': booking_id}
    except Exception as e:
        return {'error': str(e)}


def delete_booking(booking_id):
    """
    Delete an existing restaurant booking

    Args:
        booking_id (str): The ID of the booking to delete
    """
    try:
        response = table.delete_item(Key={'booking_id': booking_id})
        if response['ResponseMetadata']['HTTPStatusCode'] == 200:
            return {'message': f'Booking with ID {booking_id} deleted successfully'}
        else:
            return {'message': f'Failed to delete booking with ID {booking_id}'}
    except Exception as e:
        return {'error': str(e)}


def lambda_handler(event, context):
    # get the action group used during the invocation of the lambda function
    actionGroup = event.get('actionGroup', '')

    # name of the function that should be invoked
    function = event.get('function', '')

    # parameters to invoke function with
    parameters = event.get('parameters', [])

    if function == 'get_booking_details':
        booking_id = get_named_parameter(event, "booking_id")
        if booking_id:
            response = str(get_booking_details(booking_id))
            responseBody = {'TEXT': {'body': json.dumps(response)}}
        else:
            responseBody = {'TEXT': {'body': 'Missing booking_id parameter'}}

    elif function == 'create_booking':
        date = get_named_parameter(event, "date")
        name = get_named_parameter(event, "name")
        hour = get_named_parameter(event, "hour")
        num_guests = get_named_parameter(event, "num_guests")

        if date and hour and num_guests:
            response = str(create_booking(date, name, hour, num_guests))
            responseBody = {'TEXT': {'body': json.dumps(response)}}
        else:
            responseBody = {'TEXT': {'body': 'Missing required parameters'}}

    elif function == 'delete_booking':
        booking_id = get_named_parameter(event, "booking_id")
        if booking_id:
            response = str(delete_booking(booking_id))
            responseBody = {'TEXT': {'body': json.dumps(response)}}
        else:
            responseBody = {'TEXT': {'body': 'Missing booking_id parameter'}}

    else:
        responseBody = {'TEXT': {'body': 'Invalid function'}}

    action_response = {
        'actionGroup': actionGroup,
        'function': function,
        'functionResponse': {
            'responseBody': responseBody
        }
    }

    function_response = {'response': action_response, 'messageVersion': event['messageVersion']}
    print("Response: {}".format(function_response))

    return function_response

创建所需权限

lambda_iam_role = create_lambda_role(agent_name, table_name)

创建函数

lambda_function_name = f"{agent_name}-lambda"
lambda_function = create_lambda(lambda_function_name, lambda_iam_role)

为代理创建所需的 IAM 策略

现在我们已经创建了知识库、DynamoDB 表以及用于执行代理任务的 Lambda 函数,让我们开始创建我们的代理吧。

agent_role = create_agent_role_and_policies(
    agent_name, agent_foundation_model, kb_id=kb_id
)

创建代理

现在我们已经创建了必要的 IAM 角色,我们可以使用 boto3 的 create_agent API 来创建一个新代理。

response = bedrock_agent_client.create_agent(
    agentName=agent_name,
    agentResourceRoleArn=agent_role["Role"]["Arn"],
    description=agent_description,
    idleSessionTTLInSeconds=1800,
    foundationModel=agent_foundation_model,
    instruction=agent_instruction,
)

让我们获取我们的代理 ID。这对于执行与我们代理相关的操作非常重要

agent_id = response["agent"]["agentId"]
print("The agent id is:", agent_id)

创建代理操作组

现在我们将创建一个代理操作组,该操作组使用之前创建的 Lambda 函数。为了告知代理操作组的功能,我们将提供一个描述其功能的说明。

要使用函数模式定义函数,您需要为每个函数提供名称、描述和参数。

agent_functions = [
    {
        "name": "get_booking_details",
        "description": "Retrieve details of a restaurant booking",
        "parameters": {
            "booking_id": {
                "description": "The ID of the booking to retrieve",
                "required": True,
                "type": "string",
            }
        },
    },
    {
        "name": "create_booking",
        "description": "Create a new restaurant booking",
        "parameters": {
            "date": {
                "description": "The date of the booking",
                "required": True,
                "type": "string",
            },
            "name": {
                "description": "Name to idenfity your reservation",
                "required": True,
                "type": "string",
            },
            "hour": {
                "description": "The hour of the booking",
                "required": True,
                "type": "string",
            },
            "num_guests": {
                "description": "The number of guests for the booking",
                "required": True,
                "type": "integer",
            },
        },
    },
    {
        "name": "delete_booking",
        "description": "Delete an existing restaurant booking",
        "parameters": {
            "booking_id": {
                "description": "The ID of the booking to delete",
                "required": True,
                "type": "string",
            }
        },
    },
]

现在我们使用函数模式通过 create_agent_action_group API 创建代理操作组

# Pause to make sure agent is created
time.sleep(30)

# Now, we can configure and create an action group here:
agent_action_group_response = bedrock_agent_client.create_agent_action_group(
    agentId=agent_id,
    agentVersion="DRAFT",
    actionGroupExecutor={"lambda": lambda_function["FunctionArn"]},
    actionGroupName=agent_action_group_name,
    functionSchema={"functions": agent_functions},
    description=agent_action_group_description,
)

允许代理调用操作组 Lambda

# Create allow to invoke permission on lambda
lambda_client = boto3.client("lambda")
response = lambda_client.add_permission(
    FunctionName=lambda_function_name,
    StatementId="allow_bedrock",
    Action="lambda:InvokeFunction",
    Principal="bedrock.amazonaws.com",
    SourceArn=f"arn:aws:bedrock:{region}:{account_id}:agent/{agent_id}",
)

将知识库与代理关联

response = bedrock_agent_client.associate_agent_knowledge_base(
    agentId=agent_id,
    agentVersion="DRAFT",
    description="Access the knowledge base when customers ask about the plates in the menu.",
    knowledgeBaseId=kb_id,
    knowledgeBaseState="ENABLED",
)

准备代理并创建别名

让我们创建一个可用于内部测试的代理的 DRAFT(草稿)版本。

response = bedrock_agent_client.prepare_agent(agentId=agent_id)
print(response)
# Pause to make sure agent is prepared
time.sleep(30)
response = bedrock_agent_client.create_agent_alias(
    agentAliasName="TestAlias",
    agentId=agent_id,
    description="Test alias",
)

alias_id = response["agentAlias"]["agentAliasId"]
print("The Agent alias is:", alias_id)
time.sleep(30)

invokeAgent 函数将用户查询发送到 Bedrock 代理,并返回代理的响应和跟踪数据。它处理事件流,捕获用于评估目的的跟踪信息。

def invokeAgent(query, session_id, session_state=dict()):
    end_session: bool = False

    # invoke the agent API
    agentResponse = bedrock_agent_runtime_client.invoke_agent(
        inputText=query,
        agentId=agent_id,
        agentAliasId=alias_id,
        sessionId=session_id,
        enableTrace=True,
        endSession=end_session,
        sessionState=session_state,
    )

    event_stream = agentResponse["completion"]
    try:
        traces = []
        for event in event_stream:
            if "chunk" in event:
                data = event["chunk"]["bytes"]
                agent_answer = data.decode("utf8")
                end_event_received = True
                return agent_answer, traces
                # End event indicates that the request finished successfully
            elif "trace" in event:
                traces.append(event["trace"])
            else:
                raise Exception("unexpected event.", event)
        return agent_answer, traces
    except Exception as e:
        raise Exception("unexpected event.", e)

定义 Ragas 指标

评估代理不同于测试传统软件,在传统软件中,您只需验证输出是否与预期结果匹配。这些代理执行复杂的任务,通常有多种有效的方法。

鉴于其固有的自主性,评估代理以确保其正常运行至关重要。

选择要评估的代理内容

评估指标的选择完全取决于您的用例。一个好的经验法则是选择与用户需求直接相关的指标或能明确驱动业务价值的指标。在上面的餐厅代理示例中,我们希望代理能够满足用户的请求而无需不必要的重复,在适当时提供有用的建议以增强客户体验,并与品牌基调保持一致。

我们将定义指标来评估这些优先事项。Ragas 提供了几个用户定义的指标用于评估。

在定义评估标准时,应关注二元决策或离散分类分数,而不是模糊的分数。二元或清晰的分类迫使您明确定义成功标准。避免使用分数在 0 到 100 之间但没有明确解释的指标,因为区分像 87 和 91 这样的相近分数可能具有挑战性,尤其是在独立进行评估时。

Ragas 包含适合此类评估的指标,我们将在实践中探讨其中一些

from langchain_aws import ChatBedrock
from ragas.llms import LangchainLLMWrapper

model_id = "us.amazon.nova-pro-v1:0"   # Choose your desired model
region_name = "us-east-1"              # Choose your desired AWS region

bedrock_llm = ChatBedrock(model_id=model_id, region_name=region_name)
evaluator_llm = LangchainLLMWrapper(bedrock_llm)
from ragas.metrics import AspectCritic, RubricsScore
from ragas.dataset_schema import SingleTurnSample, MultiTurnSample, EvaluationDataset
from ragas import evaluate

rubrics = {
    "score-1_description": (
        "The item requested by the customer is not present in the menu and no recommendations were made."
    ),
    "score0_description": (
        "Either the item requested by the customer is present in the menu, or the conversation does not include any food or menu inquiry (e.g., booking, cancellation). This score applies regardless of whether any recommendation was provided."
    ),
    "score1_description": (
        "The item requested by the customer is not present in the menu and a recommendation was provided."
    ),
}

recommendations = RubricsScore(rubrics=rubrics, llm=evaluator_llm, name="Recommendations")


# Metric to evaluate if the AI fulfills all human requests completely.
request_completeness = AspectCritic(
    name="Request Completeness",
    llm=evaluator_llm,
    definition=(
        "Return 1 The agent completely fulfills all the user requests with no omissions. "
        "otherwise, return 0."
    ),
)

# Metric to assess if the AI's communication aligns with the desired brand voice.
brand_tone = AspectCritic(
    name="Brand Voice Metric",
    llm=evaluator_llm,
    definition=(
        "Return 1 if the AI's communication is friendly, approachable, helpful, clear, and concise; "
        "otherwise, return 0."
    ),
)

使用 Ragas 评估代理

为了使用 Ragas 进行评估,需要将跟踪数据转换为 Ragas 识别的格式。为了将 Amazon Bedrock 代理跟踪数据转换为适合 Ragas 评估的格式,Ragas 提供了函数 [convert_to_ragas_messages][ragas.integrations.amazon_bedrock.convert_to_ragas_messages],该函数可用于将 Amazon Bedrock 消息转换为 Ragas 期望的格式。您可以在此处阅读更多相关信息。

%%time
import uuid
session_id:str = str(uuid.uuid1())
query = "If you have children food then book a table for 2 people at 7pm on the 5th of May 2025."
agent_answer, traces_1 = invokeAgent(query, session_id)

print(agent_answer)
输出
Your booking for 2 people at 7pm on the 5th of May 2025 has been successfully created. Your booking ID is ca2fab70.

query = "Can you check my previous booking? Can you please delete the booking?"
agent_answer, traces_2 = invokeAgent(query, session_id)

print(agent_answer)
输出
Your reservation was found and has been successfully canceled.

from ragas.integrations.amazon_bedrock import convert_to_ragas_messages

# Convert Amazon Bedrock traces to messages accepted by Ragas.
# The convert_to_ragas_messages function transforms Bedrock-specific trace data 
# into a format that Ragas can process as conversation messages.
ragas_messages_trace_1 = convert_to_ragas_messages(traces_1)
ragas_messages_trace_2 = convert_to_ragas_messages(traces_2)

# Initialize MultiTurnSample objects.
# MultiTurnSample is a data type defined in Ragas that encapsulates conversation
# data for multi-turn evaluation. This conversion is necessary to perform evaluations.
sample_1 = MultiTurnSample(user_input=ragas_messages_trace_1)
sample_2 = MultiTurnSample(user_input=ragas_messages_trace_2)

result = evaluate(
    # Create an evaluation dataset from the multi-turn samples
    dataset=EvaluationDataset(samples=[sample_1, sample_2]),
    metrics=[request_completeness, brand_tone],
)

result.to_pandas()
输出
Evaluating: 100%|██████████| 4/4 [00:00<?, ?it/s]

user_input 请求完整性 品牌声音指标
0 [{'content': '[{text=如果你有儿童食品...'}] 1 1
1 [{'content': '[{text=如果你有儿童食品...'}] 1 1

两次对话都获得了 1 分,因为代理完全满足了所有用户请求,没有任何遗漏(完整性),并且以友好、平易近人、乐于助人、清晰简洁的方式进行沟通(品牌声音)。

%%time
import uuid

session_id:str = str(uuid.uuid1())
query = "Do you serve Chicken Wings?"

agent_answer, traces_3 = invokeAgent(query, session_id)
print(agent_answer)
输出
Yes, we serve Chicken Wings. Here are the details:
- **Buffalo Chicken Wings**: Classic buffalo wings served with celery sticks and blue cheese dressing. Allergens: Dairy (in blue cheese dressing), Gluten (in the coating), possible Soy (in the sauce).

%%time
session_id:str = str(uuid.uuid1())
query = "For desserts, do you have chocolate truffle cake?"
agent_answer, traces_4 = invokeAgent(query, session_id)
print(agent_answer)
输出
I'm sorry, but we do not have chocolate truffle cake on our dessert menu. However, we have several delicious alternatives you might enjoy:

1. **Classic New York Cheesecake** - Creamy cheesecake with a graham cracker crust, topped with a choice of fruit compote or chocolate ganache.
2. **Apple Pie à la Mode** - Warm apple pie with a flaky crust, served with a scoop of vanilla ice cream and a drizzle of caramel sauce.
3. **Chocolate Lava Cake** - Rich and gooey chocolate cake with a molten center, dusted with powdered sugar and served with a scoop of raspberry sorbet.
4. **Pecan Pie Bars** - Buttery shortbread crust topped with a gooey pecan filling, cut into bars for easy serving.
5. **Banana Pudding Parfait** - Layers of vanilla pudding, sliced bananas, and vanilla wafers, topped with whipped cream and a sprinkle of crushed nuts.

May I recommend the **Chocolate Lava Cake** for a decadent treat?

%%time
from datetime import datetime
today = datetime.today().strftime('%b-%d-%Y')

session_id:str = str(uuid.uuid1())
query = "Do you have indian food?"
session_state = {
    "promptSessionAttributes": {
        "name": "John",
        "today": today
    }
}

agent_answer, traces_5 = invokeAgent(query, session_id, session_state=session_state)
print(agent_answer)
输出
I could not find Indian food on our menu. However, we offer a variety of other cuisines including American, Italian, and vegetarian options. Would you like to know more about these options? 

from ragas.integrations.amazon_bedrock import convert_to_ragas_messages

ragas_messages_trace_3 = convert_to_ragas_messages(traces_3)
ragas_messages_trace_4 = convert_to_ragas_messages(traces_4)
ragas_messages_trace_5 = convert_to_ragas_messages(traces_5)

sample_3 = MultiTurnSample(user_input=ragas_messages_trace_3)
sample_4 = MultiTurnSample(user_input=ragas_messages_trace_4)
sample_5 = MultiTurnSample(user_input=ragas_messages_trace_5)

result = evaluate(
    dataset=EvaluationDataset(samples=[sample_3, sample_4, sample_5]),
    metrics=[recommendations],
)

result.to_pandas()
Evaluating: 100%|██████████| 3/3 [00:00<?, ?it/s]

user_input 建议
0 [{'content': '[{text=你们有鸡翅吗...'}] 0
1 [{'content': '[{text=甜点方面,你们有...'}] 1
2 [{'content': '[{text=你们有印度菜吗?}...'}] 1

对于建议指标,鸡翅查询得分为 0,因为该项目有售。巧克力松露蛋糕和印度菜的查询都得分为 1,因为请求的菜品不在菜单上,并且代理提供了替代建议。

为了评估我们的代理利用从知识库中检索到的信息的程度,我们使用 Ragas 提供的 RAG 评估指标。您可以在此处了解更多关于这些指标的信息。

在本教程中,我们将使用以下 RAG 指标

from ragas.metrics import ContextRelevance, Faithfulness,  ResponseGroundedness

metrics = [
    ContextRelevance(llm=evaluator_llm),
    Faithfulness(llm=evaluator_llm),
    ResponseGroundedness(llm=evaluator_llm),
]
from ragas.integrations.amazon_bedrock import extract_kb_trace

kb_trace_3 = extract_kb_trace(traces_3)
kb_trace_4 = extract_kb_trace(traces_4)

trace_3_single_turn_sample = SingleTurnSample(
    user_input=kb_trace_3[0].get("user_input"),
    retrieved_contexts=kb_trace_3[0].get("retrieved_contexts"),
    response=kb_trace_3[0].get("response"),
    reference="Yes, we do serve chicken wings prepared in Buffalo style, chicken wing that’s typically deep-fried and then tossed in a tangy, spicy Buffalo sauce.",
)

trace_4_single_turn_sample = SingleTurnSample(
    user_input=kb_trace_4[0].get("user_input"),
    retrieved_contexts=kb_trace_4[0].get("retrieved_contexts"),
    response=kb_trace_4[0].get("response"),
    reference="The desserts on the adult menu are:\n1. Classic New York Cheesecake\n2. Apple Pie à la Mode\n3. Chocolate Lava Cake\n4. Pecan Pie Bars\n5. Banana Pudding Parfait",
)

single_turn_samples = [trace_3_single_turn_sample, trace_4_single_turn_sample]

dataset = EvaluationDataset(samples=single_turn_samples)

kb_results = evaluate(dataset=dataset, metrics=metrics)
kb_results.to_pandas()
Evaluating: 100%|██████████| 6/6 [00:00<?, ?it/s]

user_input retrieved_contexts response reference nv_context_relevance faithfulness nv_response_groundedness
0 鸡翅 [令人遗憾的体验 -- 晚餐菜单主菜...] 是的,我们提供鸡翅。以下是详细信息... 是的,我们确实提供布法罗风味鸡翅... 1.0 1.00 1.0
1 巧克力松露蛋糕 [过敏原:麸质(在面包屑中)。3. B...] 很抱歉,我们没有巧克力松露蛋糕... 成人菜单上的甜点有:\n1. 经典... 0.0 0.75 0.5

为了评估代理是否能够实现其目标,我们可以使用以下指标

from ragas.metrics import (
    AgentGoalAccuracyWithoutReference,
    AgentGoalAccuracyWithReference,
)

goal_accuracy_with_reference = AgentGoalAccuracyWithReference(llm=evaluator_llm)
goal_accuracy_without_reference = AgentGoalAccuracyWithoutReference(llm=evaluator_llm)

%%time
import uuid

session_id:str = str(uuid.uuid1())
query = "What entrees do you have for children?"

agent_answer, traces_6 = invokeAgent(query, session_id)
print(agent_answer)
输出
Here are the entrees available for children:
1. CHICKEN NUGGETS - Crispy chicken nuggets served with a side of ketchup or ranch dressing. Allergens: Gluten (in the coating), possible Soy. Suitable for Vegetarians: No
2. MACARONI AND CHEESE - Classic macaroni pasta smothered in creamy cheese sauce. Allergens: Dairy, Gluten. Suitable for Vegetarians: Yes
3. MINI CHEESE QUESADILLAS - Small flour tortillas filled with melted cheese, served with a mild salsa. Allergens: Dairy, Gluten. Suitable for Vegetarians: Yes
4. PEANUT BUTTER AND BANANA SANDWICH - Peanut butter and banana slices on whole wheat bread. Allergens: Nuts (peanut), Gluten. Suitable for Vegetarians: Yes (if using vegetarian peanut butter)
5. VEGGIE PITA POCKETS - Mini whole wheat pita pockets filled with hummus, cucumber, and cherry tomatoes. Allergens: Gluten, possible Soy. Suitable for Vegetarians: Yes

from ragas.integrations.amazon_bedrock import convert_to_ragas_messages

ragas_messages_trace_6 = convert_to_ragas_messages(traces_6)

sample_6 = MultiTurnSample(
    user_input=ragas_messages_trace_6,
    reference="Response contains entrees food items for the children.",
)

result = evaluate(
    dataset=EvaluationDataset(samples=[sample_6]),
    metrics=[goal_accuracy_with_reference],
)

result.to_pandas()
Evaluating: 100%|██████████| 1/1 [00:00<?, ?it/s]

user_input reference agent_goal_accuracy
0 [{'content': '[{text=你们有什么主菜...'}] 最终结果提供了适合儿童的主菜... 1.0

sample_6 = MultiTurnSample(user_input=ragas_messages_trace_6)

result = evaluate(
    dataset=EvaluationDataset(samples=[sample_6]),
    metrics=[goal_accuracy_without_reference],
)

result.to_pandas()
Evaluating: 100%|██████████| 1/1 [00:00<?, ?it/s]

user_input agent_goal_accuracy
0 [{'content': '[{text=你们有什么主菜...'}] 1.0

在这两种情况下,代理都通过全面提供所有可用选项——特别是列出所有儿童主菜——而获得了 1 分。

清理

让我们删除所有创建的关联资源,以避免不必要的成本。

clean_up_resources(
    table_name,
    lambda_function,
    lambda_function_name,
    agent_action_group_response,
    agent_functions,
    agent_id,
    kb_id,
    alias_id,
)
# Delete the agent roles and policies
delete_agent_roles_and_policies(agent_name)
# delete KB
knowledge_base.delete_kb(delete_s3_bucket=True, delete_iam_roles_and_policies=True)