创建并评估与Amazon Bedrock知识库和操作组集成的Amazon Bedrock代理

在本notebook中，您将学习如何评估Amazon Bedrock代理。我们将评估的代理是一个餐厅代理，它为客户提供成人和儿童菜单信息，并管理餐桌预订系统。该代理的灵感来自于Amazon Bedrock Agents的一个功能示例notebooks，并进行了一些微小更改。您可以在此处了解有关代理创建过程的更多信息。

架构图如下所示

本notebook涵盖的步骤包括

导入必要的库
创建代理
定义Ragas指标
评估代理
清理创建的资源

点击查看代理创建过程

导入所需的库

第一步是安装先决条件包

%pip install --upgrade -q boto3 opensearch-py botocore awscli retrying ragas langchain-aws

此命令将克隆包含本教程所需帮助文件的仓库。

! git clone https://hugging-face.cn/datasets/explodinggradients/booking_agent_utils

import os
import time
import boto3
import logging
import pprint
import json

from booking_agent_utils.knowledge_base import BedrockKnowledgeBase
from booking_agent_utils.agent import (
    create_agent_role_and_policies,
    create_lambda_role,
    delete_agent_roles_and_policies,
    create_dynamodb,
    create_lambda,
    clean_up_resources,
)

# Clients
s3_client = boto3.client("s3")
sts_client = boto3.client("sts")
session = boto3.session.Session()
region = session.region_name
account_id = sts_client.get_caller_identity()["Account"]
bedrock_agent_client = boto3.client("bedrock-agent")
bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime")
logging.basicConfig(
    format="[%(asctime)s] p%(process)s {%(filename)s:%(lineno)d} %(levelname)s - %(message)s",
    level=logging.INFO,
)
logger = logging.getLogger(__name__)
region, account_id

suffix = f"{region}-{account_id}"
agent_name = "booking-agent"
knowledge_base_name = f"{agent_name}-kb"
knowledge_base_description = (
    "Knowledge Base containing the restaurant menu's collection"
)
agent_alias_name = "booking-agent-alias"
bucket_name = f"{agent_name}-{suffix}"
agent_bedrock_allow_policy_name = f"{agent_name}-ba"
agent_role_name = f"AmazonBedrockExecutionRoleForAgents_{agent_name}"
agent_foundation_model = "amazon.nova-pro-v1:0"

agent_description = "Agent in charge of a restaurants table bookings"
agent_instruction = """
You are a restaurant agent responsible for managing clients’ bookings (retrieving, creating, or canceling reservations) and assisting with menu inquiries. When handling menu requests, provide detailed information about the requested items. Offer recommendations only when:

1. The customer explicitly asks for a recommendation, even if the item is available (include complementary dishes).
2. The requested item is unavailable—inform the customer and suggest suitable alternatives.
3. For general menu inquiries, provide the full menu and add a recommendation only if the customer asks for one.

In all cases, ensure that any recommended items are present in the menu.

Ensure all responses are clear, contextually relevant, and enhance the customer's experience.
"""

agent_action_group_description = """
Actions for getting table booking information, create a new booking or delete an existing booking"""

agent_action_group_name = "TableBookingsActionGroup"

设置代理

为Amazon Bedrock创建知识库

首先，让我们为Amazon Bedrock创建一个知识库来存储餐厅菜单。在本例中，我们将把知识库与Amazon OpenSearch Serverless集成。

knowledge_base = BedrockKnowledgeBase(
    kb_name=knowledge_base_name,
    kb_description=knowledge_base_description,
    data_bucket_name=bucket_name,
)

将数据集上传到Amazon S3

现在我们已经创建了知识库，让我们使用餐厅菜单数据集填充它。在本例中，我们将通过我们的辅助类使用API的boto3抽象。

首先，我们将数据集文件夹中可用的菜单数据上传到Amazon S3。

def upload_directory(path, bucket_name):
    for root, dirs, files in os.walk(path):
        for file in files:
            file_to_upload = os.path.join(root, file)
            print(f"uploading file {file_to_upload} to {bucket_name}")
            s3_client.upload_file(file_to_upload, bucket_name, file)


upload_directory("booking_agent_utils/dataset", bucket_name)

现在我们开始摄取任务

# ensure that the kb is available
time.sleep(30)
# sync knowledge base
knowledge_base.start_ingestion_job()

最后，我们收集知识库ID，以便稍后将其与我们的代理集成。

kb_id = knowledge_base.get_knowledge_base_id()

使用检索和生成API测试知识库

首先，让我们使用检索和生成API测试知识库，以确保知识库功能正常。

response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={"text": "Which are the mains available in the childrens menu?"},
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "knowledgeBaseId": kb_id,
            "modelArn": "arn:aws:bedrock:{}::foundation-model/{}".format(
                region, agent_foundation_model
            ),
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {"numberOfResults": 5}
            },
        },
    },
)

print(response["output"]["text"], end="\n" * 2)

创建DynamoDB表

我们将创建一个包含餐厅预订信息的DynamoDB表。

table_name = "restaurant_bookings"
create_dynamodb(table_name)

创建Lambda函数

现在我们将创建一个与DynamoDB表交互的Lambda函数。

创建函数代码

创建实现get_booking_details、create_booking和delete_booking功能的Lambda函数。

%%writefile lambda_function.py
import json
import uuid
import boto3

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('restaurant_bookings')

def get_named_parameter(event, name):
    """
    Get a parameter from the lambda event
    """
    return next(item for item in event['parameters'] if item['name'] == name)['value']


def get_booking_details(booking_id):
    """
    Retrieve details of a restaurant booking

    Args:
        booking_id (string): The ID of the booking to retrieve
    """
    try:
        response = table.get_item(Key={'booking_id': booking_id})
        if 'Item' in response:
            return response['Item']
        else:
            return {'message': f'No booking found with ID {booking_id}'}
    except Exception as e:
        return {'error': str(e)}


def create_booking(date, name, hour, num_guests):
    """
    Create a new restaurant booking

    Args:
        date (string): The date of the booking
        name (string): Name to idenfity your reservation
        hour (string): The hour of the booking
        num_guests (integer): The number of guests for the booking
    """
    try:
        booking_id = str(uuid.uuid4())[:8]
        table.put_item(
            Item={
                'booking_id': booking_id,
                'date': date,
                'name': name,
                'hour': hour,
                'num_guests': num_guests
            }
        )
        return {'booking_id': booking_id}
    except Exception as e:
        return {'error': str(e)}


def delete_booking(booking_id):
    """
    Delete an existing restaurant booking

    Args:
        booking_id (str): The ID of the booking to delete
    """
    try:
        response = table.delete_item(Key={'booking_id': booking_id})
        if response['ResponseMetadata']['HTTPStatusCode'] == 200:
            return {'message': f'Booking with ID {booking_id} deleted successfully'}
        else:
            return {'message': f'Failed to delete booking with ID {booking_id}'}
    except Exception as e:
        return {'error': str(e)}


def lambda_handler(event, context):
    # get the action group used during the invocation of the lambda function
    actionGroup = event.get('actionGroup', '')

    # name of the function that should be invoked
    function = event.get('function', '')

    # parameters to invoke function with
    parameters = event.get('parameters', [])

    if function == 'get_booking_details':
        booking_id = get_named_parameter(event, "booking_id")
        if booking_id:
            response = str(get_booking_details(booking_id))
            responseBody = {'TEXT': {'body': json.dumps(response)}}
        else:
            responseBody = {'TEXT': {'body': 'Missing booking_id parameter'}}

    elif function == 'create_booking':
        date = get_named_parameter(event, "date")
        name = get_named_parameter(event, "name")
        hour = get_named_parameter(event, "hour")
        num_guests = get_named_parameter(event, "num_guests")

        if date and hour and num_guests:
            response = str(create_booking(date, name, hour, num_guests))
            responseBody = {'TEXT': {'body': json.dumps(response)}}
        else:
            responseBody = {'TEXT': {'body': 'Missing required parameters'}}

    elif function == 'delete_booking':
        booking_id = get_named_parameter(event, "booking_id")
        if booking_id:
            response = str(delete_booking(booking_id))
            responseBody = {'TEXT': {'body': json.dumps(response)}}
        else:
            responseBody = {'TEXT': {'body': 'Missing booking_id parameter'}}

    else:
        responseBody = {'TEXT': {'body': 'Invalid function'}}

    action_response = {
        'actionGroup': actionGroup,
        'function': function,
        'functionResponse': {
            'responseBody': responseBody
        }
    }

    function_response = {'response': action_response, 'messageVersion': event['messageVersion']}
    print("Response: {}".format(function_response))

    return function_response

创建所需的权限

lambda_iam_role = create_lambda_role(agent_name, table_name)

创建函数

lambda_function_name = f"{agent_name}-lambda"
lambda_function = create_lambda(lambda_function_name, lambda_iam_role)

为代理创建所需的IAM策略

现在我们已经创建了知识库、DynamoDB表以及执行代理任务的Lambda函数，让我们开始创建我们的代理。

agent_role = create_agent_role_and_policies(
    agent_name, agent_foundation_model, kb_id=kb_id
)

创建代理

现在我们已经创建了必要的IAM角色，可以使用boto3的create_agent API创建一个新的代理。

response = bedrock_agent_client.create_agent(
    agentName=agent_name,
    agentResourceRoleArn=agent_role["Role"]["Arn"],
    description=agent_description,
    idleSessionTTLInSeconds=1800,
    foundationModel=agent_foundation_model,
    instruction=agent_instruction,
)

让我们获取代理ID。这对于执行与我们代理相关的操作非常重要。

agent_id = response["agent"]["agentId"]
print("The agent id is:", agent_id)

创建代理操作组

现在我们将创建一个使用之前创建的Lambda函数的代理操作组。为了告知代理此操作组的功能，我们将提供一个描述，概述其功能。

要使用函数模式定义函数，您需要为每个函数提供名称、描述和参数。

agent_functions = [
    {
        "name": "get_booking_details",
        "description": "Retrieve details of a restaurant booking",
        "parameters": {
            "booking_id": {
                "description": "The ID of the booking to retrieve",
                "required": True,
                "type": "string",
            }
        },
    },
    {
        "name": "create_booking",
        "description": "Create a new restaurant booking",
        "parameters": {
            "date": {
                "description": "The date of the booking",
                "required": True,
                "type": "string",
            },
            "name": {
                "description": "Name to idenfity your reservation",
                "required": True,
                "type": "string",
            },
            "hour": {
                "description": "The hour of the booking",
                "required": True,
                "type": "string",
            },
            "num_guests": {
                "description": "The number of guests for the booking",
                "required": True,
                "type": "integer",
            },
        },
    },
    {
        "name": "delete_booking",
        "description": "Delete an existing restaurant booking",
        "parameters": {
            "booking_id": {
                "description": "The ID of the booking to delete",
                "required": True,
                "type": "string",
            }
        },
    },
]

现在我们使用函数模式和create_agent_action_group API来创建代理操作组。

# Pause to make sure agent is created
time.sleep(30)

# Now, we can configure and create an action group here:
agent_action_group_response = bedrock_agent_client.create_agent_action_group(
    agentId=agent_id,
    agentVersion="DRAFT",
    actionGroupExecutor={"lambda": lambda_function["FunctionArn"]},
    actionGroupName=agent_action_group_name,
    functionSchema={"functions": agent_functions},
    description=agent_action_group_description,
)

允许代理调用操作组Lambda

# Create allow to invoke permission on lambda
lambda_client = boto3.client("lambda")
response = lambda_client.add_permission(
    FunctionName=lambda_function_name,
    StatementId="allow_bedrock",
    Action="lambda:InvokeFunction",
    Principal="bedrock.amazonaws.com",
    SourceArn=f"arn:aws:bedrock:{region}:{account_id}:agent/{agent_id}",
)

将知识库与代理关联

response = bedrock_agent_client.associate_agent_knowledge_base(
    agentId=agent_id,
    agentVersion="DRAFT",
    description="Access the knowledge base when customers ask about the plates in the menu.",
    knowledgeBaseId=kb_id,
    knowledgeBaseState="ENABLED",
)

准备代理并创建别名

让我们创建一个代理的草稿版本，可用于内部测试。

response = bedrock_agent_client.prepare_agent(agentId=agent_id)
print(response)
# Pause to make sure agent is prepared
time.sleep(30)

response = bedrock_agent_client.create_agent_alias(
    agentAliasName="TestAlias",
    agentId=agent_id,
    description="Test alias",
)

alias_id = response["agentAlias"]["agentAliasId"]
print("The Agent alias is:", alias_id)
time.sleep(30)

invokeAgent函数将用户查询发送到Bedrock代理，并返回代理的响应和跟踪数据。它处理事件流，捕获跟踪信息用于评估目的。

def invokeAgent(query, session_id, session_state=dict()):
    end_session: bool = False

    # invoke the agent API
    agentResponse = bedrock_agent_runtime_client.invoke_agent(
        inputText=query,
        agentId=agent_id,
        agentAliasId=alias_id,
        sessionId=session_id,
        enableTrace=True,
        endSession=end_session,
        sessionState=session_state,
    )

    event_stream = agentResponse["completion"]
    try:
        traces = []
        for event in event_stream:
            if "chunk" in event:
                data = event["chunk"]["bytes"]
                agent_answer = data.decode("utf8")
                end_event_received = True
                return agent_answer, traces
                # End event indicates that the request finished successfully
            elif "trace" in event:
                traces.append(event["trace"])
            else:
                raise Exception("unexpected event.", event)
        return agent_answer, traces
    except Exception as e:
        raise Exception("unexpected event.", e)

定义Ragas指标

评估代理与测试传统软件不同，传统软件只需验证输出是否与预期结果匹配即可。这些代理执行复杂的任务，通常有多种有效的解决方案。

考虑到其固有的自主性，评估代理对于确保其正常运行至关重要。

选择评估代理中的哪些方面

选择评估指标完全取决于您的用例。一个好的经验法则是选择直接与用户需求相关联的指标，或明确驱动业务价值的指标。在上面的餐厅代理示例中，我们希望代理能够满足用户请求，避免不必要的重复，在适当时提供有用的建议以提升客户体验，并保持与品牌语气的一致性。

我们将定义指标来评估这些优先事项。Ragas提供了几个用户自定义的评估指标。

在定义评估标准时，应侧重于二元决策或离散分类得分，而不是含糊不清的得分。二元或清晰的分类会迫使您明确定义成功标准。避免使用0到100之间但没有明确解释的指标，因为区分像87和91这样的接近得分可能具有挑战性，尤其是在独立评估的情况下。

Ragas包含了适合此类评估的指标，我们将实际探索其中的一些。

方面评价指标：通过利用LLM判断得出二元结果，评估提交内容是否遵循用户定义的标准。
评分标准得分指标：根据详细的用户定义评分标准评估响应，以便一致地分配反映质量的得分。

from langchain_aws import ChatBedrock
from ragas.llms import LangchainLLMWrapper

model_id = "us.amazon.nova-pro-v1:0"   # Choose your desired model
region_name = "us-east-1"              # Choose your desired AWS region

bedrock_llm = ChatBedrock(model_id=model_id, region_name=region_name)
evaluator_llm = LangchainLLMWrapper(bedrock_llm)

from ragas.metrics import AspectCritic, RubricsScore
from ragas.dataset_schema import SingleTurnSample, MultiTurnSample, EvaluationDataset
from ragas import evaluate

rubrics = {
    "score-1_description": (
        "The item requested by the customer is not present in the menu and no recommendations were made."
    ),
    "score0_description": (
        "Either the item requested by the customer is present in the menu, or the conversation does not include any food or menu inquiry (e.g., booking, cancellation). This score applies regardless of whether any recommendation was provided."
    ),
    "score1_description": (
        "The item requested by the customer is not present in the menu and a recommendation was provided."
    ),
}

recommendations = RubricsScore(rubrics=rubrics, llm=evaluator_llm, name="Recommendations")


# Metric to evaluate if the AI fulfills all human requests completely.
request_completeness = AspectCritic(
    name="Request Completeness",
    llm=evaluator_llm,
    definition=(
        "Return 1 The agent completely fulfills all the user requests with no omissions. "
        "otherwise, return 0."
    ),
)

# Metric to assess if the AI's communication aligns with the desired brand voice.
brand_tone = AspectCritic(
    name="Brand Voice Metric",
    llm=evaluator_llm,
    definition=(
        "Return 1 if the AI's communication is friendly, approachable, helpful, clear, and concise; "
        "otherwise, return 0."
    ),
)

使用Ragas评估代理

为了使用Ragas进行评估，需要将跟踪转换为Ragas识别的格式。要将Amazon Bedrock代理跟踪转换为适合Ragas评估的格式，Ragas提供了函数[convert_to_ragas_messages][ragas.integrations.amazon_bedrock.convert_to_ragas_messages]，该函数可用于将Amazon Bedrock消息转换为Ragas期望的格式。您可以在此处了解更多信息。

%%time
import uuid
session_id:str = str(uuid.uuid1())
query = "If you have children food then book a table for 2 people at 7pm on the 5th of May 2025."
agent_answer, traces_1 = invokeAgent(query, session_id)

print(agent_answer)

输出

Your booking for 2 people at 7pm on the 5th of May 2025 has been successfully created. Your booking ID is ca2fab70.

query = "Can you check my previous booking? Can you please delete the booking?"
agent_answer, traces_2 = invokeAgent(query, session_id)

print(agent_answer)

输出

Your reservation was found and has been successfully canceled.

from ragas.integrations.amazon_bedrock import convert_to_ragas_messages

# Convert Amazon Bedrock traces to messages accepted by RAGAS.
# The convert_to_ragas_messages function transforms Bedrock-specific trace data 
# into a format that RAGAS can process as conversation messages.
ragas_messages_trace_1 = convert_to_ragas_messages(traces_1)
ragas_messages_trace_2 = convert_to_ragas_messages(traces_2)

# Initialize MultiTurnSample objects.
# MultiTurnSample is a data type defined in RAGAS that encapsulates conversation
# data for multi-turn evaluation. This conversion is necessary to perform evaluations.
sample_1 = MultiTurnSample(user_input=ragas_messages_trace_1)
sample_2 = MultiTurnSample(user_input=ragas_messages_trace_2)

result = evaluate(
    # Create an evaluation dataset from the multi-turn samples
    dataset=EvaluationDataset(samples=[sample_1, sample_2]),
    metrics=[request_completeness, brand_tone],
)

result.to_pandas()

输出

Evaluating: 100%|██████████| 4/4 [00:00<?, ?it/s]

	用户输入	请求完整性	品牌语气指标
0	[{'content': '[{text=If you have children food...	1	1
1	[{'content': '[{text=If you have children food...	1	1

评分为1是因为代理完全满足了所有用户请求，没有遗漏（完整性），并且在两次对话中都以友好、易接近、乐于助人、清晰简洁的方式（品牌语气）进行沟通。

%%time
import uuid

session_id:str = str(uuid.uuid1())
query = "Do you serve Chicken Wings?"

agent_answer, traces_3 = invokeAgent(query, session_id)
print(agent_answer)

输出

Yes, we serve Chicken Wings. Here are the details:
- **Buffalo Chicken Wings**: Classic buffalo wings served with celery sticks and blue cheese dressing. Allergens: Dairy (in blue cheese dressing), Gluten (in the coating), possible Soy (in the sauce).

%%time
session_id:str = str(uuid.uuid1())
query = "For desserts, do you have chocolate truffle cake?"
agent_answer, traces_4 = invokeAgent(query, session_id)
print(agent_answer)

输出

I'm sorry, but we do not have chocolate truffle cake on our dessert menu. However, we have several delicious alternatives you might enjoy:

1. **Classic New York Cheesecake** - Creamy cheesecake with a graham cracker crust, topped with a choice of fruit compote or chocolate ganache.
2. **Apple Pie à la Mode** - Warm apple pie with a flaky crust, served with a scoop of vanilla ice cream and a drizzle of caramel sauce.
3. **Chocolate Lava Cake** - Rich and gooey chocolate cake with a molten center, dusted with powdered sugar and served with a scoop of raspberry sorbet.
4. **Pecan Pie Bars** - Buttery shortbread crust topped with a gooey pecan filling, cut into bars for easy serving.
5. **Banana Pudding Parfait** - Layers of vanilla pudding, sliced bananas, and vanilla wafers, topped with whipped cream and a sprinkle of crushed nuts.

May I recommend the **Chocolate Lava Cake** for a decadent treat?

%%time
from datetime import datetime
today = datetime.today().strftime('%b-%d-%Y')

session_id:str = str(uuid.uuid1())
query = "Do you have indian food?"
session_state = {
    "promptSessionAttributes": {
        "name": "John",
        "today": today
    }
}

agent_answer, traces_5 = invokeAgent(query, session_id, session_state=session_state)
print(agent_answer)

输出

I could not find Indian food on our menu. However, we offer a variety of other cuisines including American, Italian, and vegetarian options. Would you like to know more about these options?

from ragas.integrations.amazon_bedrock import convert_to_ragas_messages

ragas_messages_trace_3 = convert_to_ragas_messages(traces_3)
ragas_messages_trace_4 = convert_to_ragas_messages(traces_4)
ragas_messages_trace_5 = convert_to_ragas_messages(traces_5)

sample_3 = MultiTurnSample(user_input=ragas_messages_trace_3)
sample_4 = MultiTurnSample(user_input=ragas_messages_trace_4)
sample_5 = MultiTurnSample(user_input=ragas_messages_trace_5)

result = evaluate(
    dataset=EvaluationDataset(samples=[sample_3, sample_4, sample_5]),
    metrics=[recommendations],
)

result.to_pandas()

Evaluating: 100%|██████████| 3/3 [00:00<?, ?it/s]

	用户输入	推荐
0	[{'content': '[{text=Do you serve Chicken Wing...	0
1	[{'content': '[{text=For desserts, do you have...	1
2	[{'content': '[{text=Do you have indian food?}...	1

对于推荐指标，鸡翅查询得分0，因为该项目有货。松露巧克力蛋糕和印度食品查询都得分1，因为请求的项目不在菜单上，但提供了替代推荐。

为了评估我们的代理如何利用从知识库中检索到的信息，我们使用Ragas提供的RAG评估指标。您可以在此处了解有关这些指标的更多信息。

在本教程中，我们将使用以下RAG指标

上下文相关性：通过双重LLM判断评估检索到的上下文的相关性，衡量其与用户查询的匹配程度。
忠实性：通过确定响应中的所有主张是否可以得到提供的检索上下文的支持，评估响应的事实一致性。
响应基础性：确定响应中的每个主张在提供的上下文中的直接支持或“基础”程度。

from ragas.metrics import ContextRelevance, Faithfulness,  ResponseGroundedness

metrics = [
    ContextRelevance(llm=evaluator_llm),
    Faithfulness(llm=evaluator_llm),
    ResponseGroundedness(llm=evaluator_llm),
]

from ragas.integrations.amazon_bedrock import extract_kb_trace

kb_trace_3 = extract_kb_trace(traces_3)
kb_trace_4 = extract_kb_trace(traces_4)

trace_3_single_turn_sample = SingleTurnSample(
    user_input=kb_trace_3[0].get("user_input"),
    retrieved_contexts=kb_trace_3[0].get("retrieved_contexts"),
    response=kb_trace_3[0].get("response"),
    reference="Yes, we do serve chicken wings prepared in Buffalo style, chicken wing that’s typically deep-fried and then tossed in a tangy, spicy Buffalo sauce.",
)

trace_4_single_turn_sample = SingleTurnSample(
    user_input=kb_trace_4[0].get("user_input"),
    retrieved_contexts=kb_trace_4[0].get("retrieved_contexts"),
    response=kb_trace_4[0].get("response"),
    reference="The desserts on the adult menu are:\n1. Classic New York Cheesecake\n2. Apple Pie à la Mode\n3. Chocolate Lava Cake\n4. Pecan Pie Bars\n5. Banana Pudding Parfait",
)

single_turn_samples = [trace_3_single_turn_sample, trace_4_single_turn_sample]

dataset = EvaluationDataset(samples=single_turn_samples)

kb_results = evaluate(dataset=dataset, metrics=metrics)
kb_results.to_pandas()

Evaluating: 100%|██████████| 6/6 [00:00<?, ?it/s]

	用户输入	retrieved_contexts	response	reference	nv_context_relevance	faithfulness	nv_response_groundedness
0	鸡翅	[The Regrettable Experience -- Dinner Menu Ent...	Yes, we serve Chicken Wings. Here are the deta...	Yes, we do serve chicken wings prepared in Buf...	1.0	1.00	1.0
1	松露巧克力蛋糕	[Allergens: Gluten (in the breading). 3. B...	I'm sorry, but we do not have chocolate truffl...	The desserts on the adult menu are:\n1. Classi...	0.0	0.75	0.5

纠正的片段

为了评估代理是否能够实现其目标，我们可以使用以下指标

带参考的代理目标准确性：通过将AI的最终结果与标注的理想结果进行比较，确定AI是否实现了用户目标，得出二元结果。
不带参考的代理目标准确性：仅基于对话交互推断AI是否达到了用户目标，提供二元成功指标，无需显式参考。

from ragas.metrics import (
    AgentGoalAccuracyWithoutReference,
    AgentGoalAccuracyWithReference,
)

goal_accuracy_with_reference = AgentGoalAccuracyWithReference(llm=evaluator_llm)
goal_accuracy_without_reference = AgentGoalAccuracyWithoutReference(llm=evaluator_llm)

%%time
import uuid

session_id:str = str(uuid.uuid1())
query = "What entrees do you have for children?"

agent_answer, traces_6 = invokeAgent(query, session_id)
print(agent_answer)

输出

Here are the entrees available for children:
1. CHICKEN NUGGETS - Crispy chicken nuggets served with a side of ketchup or ranch dressing. Allergens: Gluten (in the coating), possible Soy. Suitable for Vegetarians: No
2. MACARONI AND CHEESE - Classic macaroni pasta smothered in creamy cheese sauce. Allergens: Dairy, Gluten. Suitable for Vegetarians: Yes
3. MINI CHEESE QUESADILLAS - Small flour tortillas filled with melted cheese, served with a mild salsa. Allergens: Dairy, Gluten. Suitable for Vegetarians: Yes
4. PEANUT BUTTER AND BANANA SANDWICH - Peanut butter and banana slices on whole wheat bread. Allergens: Nuts (peanut), Gluten. Suitable for Vegetarians: Yes (if using vegetarian peanut butter)
5. VEGGIE PITA POCKETS - Mini whole wheat pita pockets filled with hummus, cucumber, and cherry tomatoes. Allergens: Gluten, possible Soy. Suitable for Vegetarians: Yes

from ragas.integrations.amazon_bedrock import convert_to_ragas_messages

ragas_messages_trace_6 = convert_to_ragas_messages(traces_6)

sample_6 = MultiTurnSample(
    user_input=ragas_messages_trace_6,
    reference="Response contains entrees food items for the children.",
)

result = evaluate(
    dataset=EvaluationDataset(samples=[sample_6]),
    metrics=[goal_accuracy_with_reference],
)

result.to_pandas()

Evaluating: 100%|██████████| 1/1 [00:00<?, ?it/s]

	用户输入	reference	agent_goal_accuracy
0	[{'content': '[{text=What entrees do you have ...	The final outcome provides child-friendly entr...	1.0

sample_6 = MultiTurnSample(user_input=ragas_messages_trace_6)

result = evaluate(
    dataset=EvaluationDataset(samples=[sample_6]),
    metrics=[goal_accuracy_without_reference],
)

result.to_pandas()

Evaluating: 100%|██████████| 1/1 [00:00<?, ?it/s]

	用户输入	agent_goal_accuracy
0	[{'content': '[{text=What entrees do you have ...	1.0

在这两种情况下，代理都获得了1分，因为它全面提供了所有可用选项——特别是列出了所有儿童主菜。

清理

让我们删除所有创建的相关资源，以避免不必要的成本。

clean_up_resources(
    table_name,
    lambda_function,
    lambda_function_name,
    agent_action_group_response,
    agent_functions,
    agent_id,
    kb_id,
    alias_id,
)

# Delete the agent roles and policies
delete_agent_roles_and_policies(agent_name)

# delete KB
knowledge_base.delete_kb(delete_s3_bucket=True, delete_iam_roles_and_policies=True)