跳转到内容

转换

BaseGraphTransformation dataclass

BaseGraphTransformation(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)())

Bases: ABC

用于知识图谱(KnowledgeGraph)上图转换的抽象基类。

transform abstractmethod async

transform(kg: KnowledgeGraph) -> Any

转换知识图谱(KnowledgeGraph)的抽象方法。转换应是幂等的,这意味着多次应用转换应产生与应用一次相同的结果。

参数

名称 类型 描述 默认值
kg KnowledgeGraph

要转换的知识图谱。

必需

返回

类型 描述
Any

转换后的知识图谱。

源代码位于 src/ragas/testset/transforms/base.py
@abstractmethod
async def transform(self, kg: KnowledgeGraph) -> t.Any:
    """
    Abstract method to transform the KnowledgeGraph. Transformations should be
    idempotent, meaning that applying the transformation multiple times should
    yield the same result as applying it once.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.Any
        The transformed knowledge graph.
    """
    pass

filter

过滤知识图谱(KnowledgeGraph)并返回过滤后的图。

参数

名称 类型 描述 默认值
kg KnowledgeGraph

要过滤的知识图谱。

必需

返回

类型 描述
KnowledgeGraph

过滤后的知识图谱。

源代码位于 src/ragas/testset/transforms/base.py
def filter(self, kg: KnowledgeGraph) -> KnowledgeGraph:
    """
    Filters the KnowledgeGraph and returns the filtered graph.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be filtered.

    Returns
    -------
    KnowledgeGraph
        The filtered knowledge graph.
    """
    logger.debug("Filtering KnowledgeGraph with %s", self.filter_nodes.__name__)
    filtered_nodes = [node for node in kg.nodes if self.filter_nodes(node)]
    node_ids = {node.id for node in filtered_nodes}
    filtered_relationships = [
        rel
        for rel in kg.relationships
        if (rel.source.id in node_ids) and (rel.target.id in node_ids)
    ]
    logger.debug(
        "Filter reduced KnowledgeGraph by %d/%d nodes and %d/%d relationships",
        len(kg.nodes) - len(filtered_nodes),
        len(kg.nodes),
        len(kg.relationships) - len(filtered_relationships),
        len(kg.relationships),
    )
    return KnowledgeGraph(
        nodes=filtered_nodes,
        relationships=filtered_relationships,
    )

generate_execution_plan abstractmethod

generate_execution_plan(kg: KnowledgeGraph) -> Sequence[Coroutine]

生成一个由执行器(Executor)按顺序执行的协程序列。该协程在执行时,会将转换写入知识图谱(KnowledgeGraph)。

参数

名称 类型 描述 默认值
kg KnowledgeGraph

要转换的知识图谱。

必需

返回

类型 描述
Sequence[Coroutine]

一个将要并行执行的协程序列。

源代码位于 src/ragas/testset/transforms/base.py
@abstractmethod
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.Sequence[t.Coroutine]:
    """
    Generates a sequence of coroutines to be executed in sequence by the Executor. This
    coroutine will, upon execution, write the transformation into the KnowledgeGraph.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.Sequence[t.Coroutine]
        A sequence of coroutines to be executed in parallel.
    """
    pass

Extractor dataclass

Extractor(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)())

基类:BaseGraphTransformation

提取器的抽象基类,通过从知识图谱(KnowledgeGraph)的节点中提取特定属性来对其进行转换。

方法

名称 描述
transform

通过从知识图谱(KnowledgeGraph)的节点中提取属性来对其进行转换。

extract

从节点中提取特定属性的抽象方法。

transform async

transform(kg: KnowledgeGraph) -> List[Tuple[Node, Tuple[str, Any]]]

通过从节点中提取属性来转换知识图谱(KnowledgeGraph)。使用 filter 方法过滤图,并使用 extract 方法从每个节点提取属性。

参数

名称 类型 描述 默认值
kg KnowledgeGraph

要转换的知识图谱。

必需

返回

类型 描述
List[Tuple[Node, Tuple[str, Any]]]

一个元组列表,其中每个元组包含一个节点和提取的属性。

示例

>>> kg = KnowledgeGraph(nodes=[Node(id=1, properties={"name": "Node1"}), Node(id=2, properties={"name": "Node2"})])
>>> extractor = SomeConcreteExtractor()
>>> extractor.transform(kg)
[(Node(id=1, properties={"name": "Node1"}), ("property_name", "extracted_value")),
 (Node(id=2, properties={"name": "Node2"}), ("property_name", "extracted_value"))]
源代码位于 src/ragas/testset/transforms/base.py
async def transform(
    self, kg: KnowledgeGraph
) -> t.List[t.Tuple[Node, t.Tuple[str, t.Any]]]:
    """
    Transforms the KnowledgeGraph by extracting properties from its nodes. Uses
    the `filter` method to filter the graph and the `extract` method to extract
    properties from each node.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[t.Tuple[Node, t.Tuple[str, t.Any]]]
        A list of tuples where each tuple contains a node and the extracted
        property.

    Examples
    --------
    >>> kg = KnowledgeGraph(nodes=[Node(id=1, properties={"name": "Node1"}), Node(id=2, properties={"name": "Node2"})])
    >>> extractor = SomeConcreteExtractor()
    >>> extractor.transform(kg)
    [(Node(id=1, properties={"name": "Node1"}), ("property_name", "extracted_value")),
     (Node(id=2, properties={"name": "Node2"}), ("property_name", "extracted_value"))]
    """
    filtered = self.filter(kg)
    return [(node, await self.extract(node)) for node in filtered.nodes]

extract abstractmethod async

extract(node: Node) -> Tuple[str, Any]

从节点中提取特定属性的抽象方法。

参数

名称 类型 描述 默认值
node Node

要从中提取属性的节点。

必需

返回

类型 描述
Tuple[str, Any]

一个包含属性名称和提取值的元组。

源代码位于 src/ragas/testset/transforms/base.py
@abstractmethod
async def extract(self, node: Node) -> t.Tuple[str, t.Any]:
    """
    Abstract method to extract a specific property from a node.

    Parameters
    ----------
    node : Node
        The node from which to extract the property.

    Returns
    -------
    t.Tuple[str, t.Any]
        A tuple containing the property name and the extracted value.
    """
    pass

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> Sequence[Coroutine]

生成一个由执行器(Executor)并行执行的协程序列。

参数

名称 类型 描述 默认值
kg KnowledgeGraph

要转换的知识图谱。

必需

返回

类型 描述
Sequence[Coroutine]

一个将要并行执行的协程序列。

源代码位于 src/ragas/testset/transforms/base.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.Sequence[t.Coroutine]:
    """
    Generates a sequence of coroutines to be executed in parallel by the Executor.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.Sequence[t.Coroutine]
        A sequence of coroutines to be executed in parallel.
    """

    async def apply_extract(node: Node):
        property_name, property_value = await self.extract(node)
        if node.get_property(property_name) is None:
            node.add_property(property_name, property_value)
        else:
            logger.warning(
                "Property '%s' already exists in node '%.6s'. Skipping!",
                property_name,
                node.id,
            )

    filtered = self.filter(kg)
    plan = [apply_extract(node) for node in filtered.nodes]
    logger.debug(
        "Created %d coroutines for %s",
        len(plan),
        self.__class__.__name__,
    )
    return plan

NodeFilter dataclass

NodeFilter(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)())

基类:BaseGraphTransformation

custom_filter abstractmethod async

custom_filter(node: Node, kg: KnowledgeGraph) -> bool

根据提示过滤节点的抽象方法。

参数

名称 类型 描述 默认值
node Node

要过滤的节点。

必需

返回

类型 描述
bool

一个布尔值,指示是否应过滤该节点。

源代码位于 src/ragas/testset/transforms/base.py
@abstractmethod
async def custom_filter(self, node: Node, kg: KnowledgeGraph) -> bool:
    """
    Abstract method to filter a node based on a prompt.

    Parameters
    ----------
    node : Node
        The node to be filtered.

    Returns
    -------
    bool
        A boolean indicating whether the node should be filtered.
    """
    pass

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> Sequence[Coroutine]

生成一个待执行的协程序列

源代码位于 src/ragas/testset/transforms/base.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.Sequence[t.Coroutine]:
    """
    Generates a sequence of coroutines to be executed
    """

    async def apply_filter(node: Node):
        if await self.custom_filter(node, kg):
            kg.remove_node(node)

    filtered = self.filter(kg)
    plan = [apply_filter(node) for node in filtered.nodes]
    logger.debug(
        "Created %d coroutines for %s",
        len(plan),
        self.__class__.__name__,
    )
    return plan

RelationshipBuilder dataclass

RelationshipBuilder(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)())

基类:BaseGraphTransformation

用于在知识图谱(KnowledgeGraph)中构建关系的抽象基类。

方法

名称 描述
transform

通过构建关系来转换知识图谱(KnowledgeGraph)。

transform abstractmethod async

transform(kg: KnowledgeGraph) -> List[Relationship]

通过构建关系来转换知识图谱(KnowledgeGraph)。

参数

名称 类型 描述 默认值
kg KnowledgeGraph

要转换的知识图谱。

必需

返回

类型 描述
List[Relationship]

一个新关系的列表。

源代码位于 src/ragas/testset/transforms/base.py
@abstractmethod
async def transform(self, kg: KnowledgeGraph) -> t.List[Relationship]:
    """
    Transforms the KnowledgeGraph by building relationships.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.List[Relationship]
        A list of new relationships.
    """
    pass

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> Sequence[Coroutine]

生成一个由执行器(Executor)并行执行的协程序列。

参数

名称 类型 描述 默认值
kg KnowledgeGraph

要转换的知识图谱。

必需

返回

类型 描述
Sequence[Coroutine]

一个将要并行执行的协程序列。

源代码位于 src/ragas/testset/transforms/base.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.Sequence[t.Coroutine]:
    """
    Generates a sequence of coroutines to be executed in parallel by the Executor.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.Sequence[t.Coroutine]
        A sequence of coroutines to be executed in parallel.
    """

    async def apply_build_relationships(
        filtered_kg: KnowledgeGraph, original_kg: KnowledgeGraph
    ):
        relationships = await self.transform(filtered_kg)
        original_kg.relationships.extend(relationships)

    filtered_kg = self.filter(kg)
    plan = [apply_build_relationships(filtered_kg=filtered_kg, original_kg=kg)]
    logger.debug(
        "Created %d coroutines for %s",
        len(plan),
        self.__class__.__name__,
    )
    return plan

Splitter dataclass

Splitter(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)())

基类:BaseGraphTransformation

分割器的抽象基类,通过将其节点分割成更小的块来转换知识图谱(KnowledgeGraph)。

方法

名称 描述
transform

通过将其节点分割成更小的块来转换知识图谱(KnowledgeGraph)。

split

将节点分割成更小块的抽象方法。

transform async

transform(kg: KnowledgeGraph) -> Tuple[List[Node], List[Relationship]]

通过将其节点分割成更小的块来转换知识图谱(KnowledgeGraph)。

参数

名称 类型 描述 默认值
kg KnowledgeGraph

要转换的知识图谱。

必需

返回

类型 描述
Tuple[List[Node], List[Relationship]]

一个元组,包含一个新节点列表和一个新关系列表。

源代码位于 src/ragas/testset/transforms/base.py
async def transform(
    self, kg: KnowledgeGraph
) -> t.Tuple[t.List[Node], t.List[Relationship]]:
    """
    Transforms the KnowledgeGraph by splitting its nodes into smaller chunks.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.Tuple[t.List[Node], t.List[Relationship]]
        A tuple containing a list of new nodes and a list of new relationships.
    """
    filtered = self.filter(kg)

    all_nodes = []
    all_relationships = []
    for node in filtered.nodes:
        nodes, relationships = await self.split(node)
        all_nodes.extend(nodes)
        all_relationships.extend(relationships)

    return all_nodes, all_relationships

split abstractmethod async

split(node: Node) -> Tuple[List[Node], List[Relationship]]

将节点分割成更小块的抽象方法。

参数

名称 类型 描述 默认值
node Node

要被分割的节点。

必需

返回

类型 描述
Tuple[List[Node], List[Relationship]]

一个元组,包含一个新节点列表和一个新关系列表。

源代码位于 src/ragas/testset/transforms/base.py
@abstractmethod
async def split(self, node: Node) -> t.Tuple[t.List[Node], t.List[Relationship]]:
    """
    Abstract method to split a node into smaller chunks.

    Parameters
    ----------
    node : Node
        The node to be split.

    Returns
    -------
    t.Tuple[t.List[Node], t.List[Relationship]]
        A tuple containing a list of new nodes and a list of new relationships.
    """
    pass

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> Sequence[Coroutine]

生成一个由执行器(Executor)并行执行的协程序列。

参数

名称 类型 描述 默认值
kg KnowledgeGraph

要转换的知识图谱。

必需

返回

类型 描述
Sequence[Coroutine]

一个将要并行执行的协程序列。

源代码位于 src/ragas/testset/transforms/base.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.Sequence[t.Coroutine]:
    """
    Generates a sequence of coroutines to be executed in parallel by the Executor.

    Parameters
    ----------
    kg : KnowledgeGraph
        The knowledge graph to be transformed.

    Returns
    -------
    t.Sequence[t.Coroutine]
        A sequence of coroutines to be executed in parallel.
    """

    async def apply_split(node: Node):
        nodes, relationships = await self.split(node)
        kg.nodes.extend(nodes)
        kg.relationships.extend(relationships)

    filtered = self.filter(kg)
    plan = [apply_split(node) for node in filtered.nodes]
    logger.debug(
        "Created %d coroutines for %s",
        len(plan),
        self.__class__.__name__,
    )
    return plan

Parallel

Parallel(*transformations: Union[BaseGraphTransformation, 'Parallel'])

将要并行应用的转换集合。

示例

>>> Parallel(HeadlinesExtractor(), SummaryExtractor())
源代码位于 src/ragas/testset/transforms/engine.py
def __init__(self, *transformations: t.Union[BaseGraphTransformation, "Parallel"]):
    self.transformations = list(transformations)

EmbeddingExtractor dataclass

EmbeddingExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)(), property_name: str = 'embedding', embed_property_name: str = 'page_content', embedding_model: Union[BaseRagasEmbeddings, BaseRagasEmbedding] = embedding_factory())

基类:Extractor

一个用于从知识图谱节点中提取嵌入向量的类。

属性

名称 类型 描述
property_name str

用于存储嵌入向量的属性名称

embed_property_name str

包含要嵌入文本的属性名称

embedding_model BaseRagasEmbeddingsBaseRagasEmbedding

用于生成嵌入向量的嵌入模型

extract async

extract(node: Node) -> Tuple[str, Any]

提取给定节点的嵌入向量。

抛出

类型 描述
ValueError

如果待嵌入的属性不是字符串。

源代码位于 src/ragas/testset/transforms/extractors/embeddings.py
async def extract(self, node: Node) -> t.Tuple[str, t.Any]:
    """
    Extracts the embedding for a given node.

    Raises
    ------
    ValueError
        If the property to be embedded is not a string.
    """
    text = node.get_property(self.embed_property_name)
    if not isinstance(text, str):
        raise ValueError(
            f"node.property('{self.embed_property_name}') must be a string, found '{type(text)}'"
        )

    # Handle both modern (BaseRagasEmbedding) and legacy (BaseRagasEmbeddings) interfaces
    if hasattr(self.embedding_model, "aembed_text"):
        # Modern interface (BaseRagasEmbedding)
        # Check if the client supports async operations by checking if is_async exists and is True
        if hasattr(self.embedding_model, "is_async") and getattr(
            self.embedding_model, "is_async", False
        ):
            embedding = await self.embedding_model.aembed_text(text)  # type: ignore[attr-defined]
        else:
            # For sync clients, use the sync method wrapped in thread executor to avoid blocking
            warnings.warn(
                f"Using sync embedding model {self.embedding_model.__class__.__name__} "
                f"in async context. This may impact performance. "
                f"Consider using an async-compatible embedding model for better performance.",
                UserWarning,
                stacklevel=2,
            )
            embedding = await run_sync_in_async(
                self.embedding_model.embed_text, text
            )  # type: ignore[attr-defined]
    else:
        # Legacy interface (BaseRagasEmbeddings)
        embedding = await self.embedding_model.embed_text(text)  # type: ignore[misc]

    return self.property_name, embedding

HeadlinesExtractor dataclass

HeadlinesExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)(), llm: Union[BaseRagasLLM, InstructorBaseRagasLLM] = _default_llm_factory(), merge_if_possible: bool = True, max_token_limit: int = 32000, tokenizer: Encoding = DEFAULT_TOKENIZER, property_name: str = 'headlines', prompt: HeadlinesExtractorPrompt = HeadlinesExtractorPrompt(), max_num: int = 5)

基类:LLMBasedExtractor

从给定文本中提取标题。

属性

名称 类型 描述
property_name str

要提取的属性名称。

prompt HeadlinesExtractorPrompt

用于提取的提示。

KeyphrasesExtractor dataclass

KeyphrasesExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)(), llm: Union[BaseRagasLLM, InstructorBaseRagasLLM] = _default_llm_factory(), merge_if_possible: bool = True, max_token_limit: int = 32000, tokenizer: Encoding = DEFAULT_TOKENIZER, property_name: str = 'keyphrases', prompt: KeyphrasesExtractorPrompt = KeyphrasesExtractorPrompt(), max_num: int = 5)

基类:LLMBasedExtractor

从给定文本中提取顶级关键词。

属性

名称 类型 描述
property_name str

要提取的属性名称。

prompt KeyphrasesExtractorPrompt

用于提取的提示。

SummaryExtractor dataclass

SummaryExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)(), llm: Union[BaseRagasLLM, InstructorBaseRagasLLM] = _default_llm_factory(), merge_if_possible: bool = True, max_token_limit: int = 32000, tokenizer: Encoding = DEFAULT_TOKENIZER, property_name: str = 'summary', prompt: SummaryExtractorPrompt = SummaryExtractorPrompt())

基类:LLMBasedExtractor

从给定文本中提取摘要。

属性

名称 类型 描述
property_name str

要提取的属性名称。

prompt SummaryExtractorPrompt

用于提取的提示。

TitleExtractor dataclass

TitleExtractor(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)(), llm: Union[BaseRagasLLM, InstructorBaseRagasLLM] = _default_llm_factory(), merge_if_possible: bool = True, max_token_limit: int = 32000, tokenizer: Encoding = DEFAULT_TOKENIZER, property_name: str = 'title', prompt: TitleExtractorPrompt = TitleExtractorPrompt())

基类:LLMBasedExtractor

从给定文本中提取标题。

属性

名称 类型 描述
property_name str

要提取的属性名称。

prompt TitleExtractorPrompt

用于提取的提示。

CustomNodeFilter dataclass

CustomNodeFilter(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)(), llm: Union[BaseRagasLLM, InstructorBaseRagasLLM] = _default_llm_factory(), scoring_prompt: PydanticPrompt = QuestionPotentialPrompt(), min_score: int = 2, rubrics: Dict[str, str] = (lambda: DEFAULT_RUBRICS)())

基类:LLMBasedNodeFilter

如果分数低于 min_score 则返回 True

CosineSimilarityBuilder dataclass

CosineSimilarityBuilder(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)(), property_name: str = 'embedding', new_property_name: str = 'cosine_similarity', threshold: float = 0.9, block_size: int = 1024)

基类:RelationshipBuilder

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> List[Coroutine]

为寻找相似嵌入向量对生成一个协程任务,该任务可由执行器(Executor)调度/执行。

源代码位于 src/ragas/testset/transforms/relationship_builders/cosine.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.List[t.Coroutine]:
    """
    Generates a coroutine task for finding similar embedding pairs, which can be scheduled/executed by an Executor.
    """
    filtered_kg = self.filter(kg)

    embeddings = []
    for node in filtered_kg.nodes:
        embedding = node.get_property(self.property_name)
        if embedding is None:
            raise ValueError(f"Node {node.id} has no {self.property_name}")
        embeddings.append(embedding)
    self._validate_embedding_shapes(embeddings)

    async def find_and_add_relationships():
        similar_pairs = self._find_similar_embedding_pairs(
            np.array(embeddings), self.threshold
        )
        for i, j, similarity_float in similar_pairs:
            rel = Relationship(
                source=filtered_kg.nodes[i],
                target=filtered_kg.nodes[j],
                type=self.new_property_name,
                properties={self.new_property_name: similarity_float},
                bidirectional=True,
            )
            kg.relationships.append(rel)

    return [find_and_add_relationships()]

SummaryCosineSimilarityBuilder dataclass

SummaryCosineSimilarityBuilder(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)(), property_name: str = 'summary_embedding', new_property_name: str = 'summary_cosine_similarity', threshold: float = 0.1, block_size: int = 1024)

JaccardSimilarityBuilder dataclass

JaccardSimilarityBuilder(name: str = '', filter_nodes: Callable[[Node], bool] = (lambda: default_filter)(), property_name: str = 'entities', key_name: Optional[str] = None, new_property_name: str = 'jaccard_similarity', threshold: float = 0.5)

基类:RelationshipBuilder

generate_execution_plan

generate_execution_plan(kg: KnowledgeGraph) -> List[Coroutine]

为寻找相似对生成一个协程任务,该任务可由执行器(Executor)调度/执行。

源代码位于 src/ragas/testset/transforms/relationship_builders/traditional.py
def generate_execution_plan(self, kg: KnowledgeGraph) -> t.List[t.Coroutine]:
    """
    Generates a coroutine task for finding similar pairs, which can be scheduled/executed by an Executor.
    """

    async def find_and_add_relationships():
        similar_pairs = self._find_similar_embedding_pairs(kg)
        for i, j, similarity_float in similar_pairs:
            rel = Relationship(
                source=kg.nodes[i],
                target=kg.nodes[j],
                type=self.new_property_name,
                properties={self.new_property_name: similarity_float},
                bidirectional=True,
            )
            kg.relationships.append(rel)

    return [find_and_add_relationships()]

default_transforms

default_transforms(documents: List[Document], llm: Union[BaseRagasLLM, 'InstructorBaseRagasLLM'], embedding_model: BaseRagasEmbeddings) -> 'Transforms'

创建并返回一套用于处理知识图谱的默认转换。

该函数定义了一系列应用于知识图谱的转换步骤,包括提取摘要、关键词、标题、大标题和嵌入向量,以及构建节点之间的相似性关系。

返回

类型 描述
转换

一个应用于知识图谱的转换步骤列表。

源代码位于 src/ragas/testset/transforms/default.py
def default_transforms(
    documents: t.List[LCDocument],
    llm: t.Union[BaseRagasLLM, "InstructorBaseRagasLLM"],
    embedding_model: BaseRagasEmbeddings,
) -> "Transforms":
    """
    Creates and returns a default set of transforms for processing a knowledge graph.

    This function defines a series of transformation steps to be applied to a
    knowledge graph, including extracting summaries, keyphrases, titles,
    headlines, and embeddings, as well as building similarity relationships
    between nodes.



    Returns
    -------
    Transforms
        A list of transformation steps to be applied to the knowledge graph.

    """

    def count_doc_length_bins(documents, bin_ranges):
        data = [num_tokens_from_string(doc.page_content) for doc in documents]
        bins = {f"{start}-{end}": 0 for start, end in bin_ranges}

        for num in data:
            for start, end in bin_ranges:
                if start <= num <= end:
                    bins[f"{start}-{end}"] += 1
                    break  # Move to the next number once it’s placed in a bin

        return bins

    def filter_doc_with_num_tokens(node, min_num_tokens=500):
        return (
            node.type == NodeType.DOCUMENT
            and num_tokens_from_string(node.properties["page_content"]) > min_num_tokens
        )

    def filter_docs(node):
        return node.type == NodeType.DOCUMENT

    def filter_chunks(node):
        return node.type == NodeType.CHUNK

    bin_ranges = [(0, 100), (101, 500), (501, float("inf"))]
    result = count_doc_length_bins(documents, bin_ranges)
    result = {k: v / len(documents) for k, v in result.items()}

    transforms = []

    if result["501-inf"] >= 0.25:
        headline_extractor = HeadlinesExtractor(
            llm=llm, filter_nodes=lambda node: filter_doc_with_num_tokens(node)
        )
        splitter = HeadlineSplitter(min_tokens=500)
        summary_extractor = SummaryExtractor(
            llm=llm, filter_nodes=lambda node: filter_doc_with_num_tokens(node)
        )

        theme_extractor = ThemesExtractor(
            llm=llm, filter_nodes=lambda node: filter_chunks(node)
        )
        ner_extractor = NERExtractor(
            llm=llm, filter_nodes=lambda node: filter_chunks(node)
        )

        summary_emb_extractor = EmbeddingExtractor(
            embedding_model=embedding_model,
            property_name="summary_embedding",
            embed_property_name="summary",
            filter_nodes=lambda node: filter_doc_with_num_tokens(node),
        )

        cosine_sim_builder = CosineSimilarityBuilder(
            property_name="summary_embedding",
            new_property_name="summary_similarity",
            threshold=0.7,
            filter_nodes=lambda node: filter_doc_with_num_tokens(node),
        )

        ner_overlap_sim = OverlapScoreBuilder(
            threshold=0.01, filter_nodes=lambda node: filter_chunks(node)
        )

        node_filter = CustomNodeFilter(
            llm=llm, filter_nodes=lambda node: filter_chunks(node)
        )
        transforms = [
            headline_extractor,
            splitter,
            summary_extractor,
            node_filter,
            Parallel(summary_emb_extractor, theme_extractor, ner_extractor),
            Parallel(cosine_sim_builder, ner_overlap_sim),
        ]
    elif result["101-500"] >= 0.25:
        summary_extractor = SummaryExtractor(
            llm=llm, filter_nodes=lambda node: filter_doc_with_num_tokens(node, 100)
        )
        summary_emb_extractor = EmbeddingExtractor(
            embedding_model=embedding_model,
            property_name="summary_embedding",
            embed_property_name="summary",
            filter_nodes=lambda node: filter_doc_with_num_tokens(node, 100),
        )

        cosine_sim_builder = CosineSimilarityBuilder(
            property_name="summary_embedding",
            new_property_name="summary_similarity",
            threshold=0.5,
            filter_nodes=lambda node: filter_doc_with_num_tokens(node, 100),
        )

        ner_extractor = NERExtractor(llm=llm)
        ner_overlap_sim = OverlapScoreBuilder(threshold=0.01)
        theme_extractor = ThemesExtractor(
            llm=llm, filter_nodes=lambda node: filter_docs(node)
        )
        node_filter = CustomNodeFilter(llm=llm)

        transforms = [
            summary_extractor,
            node_filter,
            Parallel(summary_emb_extractor, theme_extractor, ner_extractor),
            Parallel(cosine_sim_builder, ner_overlap_sim),
        ]
    else:
        raise ValueError(
            "Documents appears to be too short (ie 100 tokens or less). Please provide longer documents."
        )

    return transforms

apply_transforms

apply_transforms(kg: KnowledgeGraph, transforms: Transforms, run_config: RunConfig = RunConfig(), callbacks: Optional[Callbacks] = None)

递归地就地对知识图谱应用转换。

源代码位于 src/ragas/testset/transforms/engine.py
def apply_transforms(
    kg: KnowledgeGraph,
    transforms: Transforms,
    run_config: RunConfig = RunConfig(),
    callbacks: t.Optional[Callbacks] = None,
):
    """
    Recursively apply transformations to a knowledge graph in place.
    """
    # apply nest_asyncio to fix the event loop issue in jupyter
    apply_nest_asyncio()

    max_workers = getattr(run_config, "max_workers", -1)

    if isinstance(transforms, t.Sequence):
        for transform in transforms:
            apply_transforms(kg, transform, run_config, callbacks)
    elif isinstance(transforms, Parallel):
        apply_transforms(kg, transforms.transformations, run_config, callbacks)
    elif isinstance(transforms, BaseGraphTransformation):
        logger.debug(
            f"Generating execution plan for transformation {transforms.__class__.__name__}"
        )
        coros = transforms.generate_execution_plan(kg)
        desc = get_desc(transforms)
        run_async_tasks(
            coros,
            batch_size=None,
            show_progress=True,
            progress_bar_desc=desc,
            max_workers=max_workers,
        )
    else:
        raise ValueError(
            f"Invalid transforms type: {type(transforms)}. Expects a sequence of BaseGraphTransformations or a Parallel instance."
        )
    logger.debug("All transformations applied successfully.")

rollback_transforms

rollback_transforms(kg: KnowledgeGraph, transforms: Transforms)

从知识图谱中回滚一系列转换。

注意

此功能尚未实现。如果您需要此功能,请提交一个 issue。

源代码位于 src/ragas/testset/transforms/engine.py
def rollback_transforms(kg: KnowledgeGraph, transforms: Transforms):
    """
    Rollback a sequence of transformations from a knowledge graph.

    Note
    ----
    This is not yet implemented. Please open an issue if you need this feature.
    """
    # this will allow you to roll back the transformations
    raise NotImplementedError