跳到内容

NodeType

基类: str, Enum

知识图谱中节点类型的枚举。

当前支持的节点类型有:UNKNOWN, DOCUMENT, CHUNK

Node

基类: BaseModel

表示知识图谱中的一个节点。

属性

名称 类型 描述
id UUID

节点的唯一标识符。

properties dict

与节点关联的属性字典。

type NodeType

节点的类型。

add_property

add_property(key: str, value: Any)

向节点添加属性。

抛出

类型 描述
ValueError

如果属性已存在。

源代码在 src/ragas/testset/graph.py
def add_property(self, key: str, value: t.Any):
    """
    Adds a property to the node.

    Raises
    ------
    ValueError
        If the property already exists.
    """
    if key.lower() in self.properties:
        raise ValueError(f"Property {key} already exists")
    self.properties[key.lower()] = value

get_property

get_property(key: str) -> Optional[Any]

按键检索属性值。

注意

键不区分大小写。

源代码在 src/ragas/testset/graph.py
def get_property(self, key: str) -> t.Optional[t.Any]:
    """
    Retrieves a property value by key.

    Notes
    -----
    The key is case-insensitive.
    """
    return self.properties.get(key.lower(), None)

Relationship

基类: BaseModel

表示知识图谱中两个节点之间的关系。

属性

名称 类型 描述
id (UUID, 可选)

关系的唯一标识符。默认为一个新的 UUID。

type str

关系的类型。

source Node

关系的源节点。

target Node

关系的目标节点。

bidirectional (bool, 可选)

关系是否是双向的。默认为 False。

properties (dict, 可选)

与关系关联的属性字典。默认为空字典。

get_property

get_property(key: str) -> Optional[Any]

按键检索属性值。键不区分大小写。

源代码在 src/ragas/testset/graph.py
def get_property(self, key: str) -> t.Optional[t.Any]:
    """
    Retrieves a property value by key. The key is case-insensitive.
    """
    return self.properties.get(key.lower(), None)

KnowledgeGraph dataclass

KnowledgeGraph(nodes: List[Node] = list(), relationships: List[Relationship] = list())

表示包含节点和关系的知识图谱。

属性

名称 类型 描述
nodes List[Node]

知识图谱中的节点列表。

relationships List[Relationship]

知识图谱中的关系列表。

add

add(item: Union[Node, Relationship])

向知识图谱添加节点或关系。

抛出

类型 描述
ValueError

如果项类型不是 Node 或 Relationship。

源代码在 src/ragas/testset/graph.py
def add(self, item: t.Union[Node, Relationship]):
    """
    Adds a node or relationship to the knowledge graph.

    Raises
    ------
    ValueError
        If the item type is not Node or Relationship.
    """
    if isinstance(item, Node):
        self._add_node(item)
    elif isinstance(item, Relationship):
        self._add_relationship(item)
    else:
        raise ValueError(f"Invalid item type: {type(item)}")

save

save(path: Union[str, Path])

将知识图谱保存到 JSON 文件。

参数

名称 类型 描述 默认值
path Union[str, Path]

应保存 JSON 文件的路径。

必需
注意

文件使用 UTF-8 编码保存,以确保在不同平台上正确处理 Unicode 字符。

源代码在 src/ragas/testset/graph.py
def save(self, path: t.Union[str, Path]):
    """Saves the knowledge graph to a JSON file.

    Parameters
    ----------
    path : Union[str, Path]
        Path where the JSON file should be saved.

    Notes
    -----
    The file is saved using UTF-8 encoding to ensure proper handling of Unicode characters
    across different platforms.
    """
    if isinstance(path, str):
        path = Path(path)

    data = {
        "nodes": [node.model_dump() for node in self.nodes],
        "relationships": [rel.model_dump() for rel in self.relationships],
    }
    with open(path, "w", encoding="utf-8") as f:
        json.dump(data, f, cls=UUIDEncoder, indent=2, ensure_ascii=False)

load classmethod

load(path: Union[str, Path]) -> KnowledgeGraph

从路径加载知识图谱。

参数

名称 类型 描述 默认值
path Union[str, Path]

包含知识图谱的 JSON 文件的路径。

必需

返回

类型 描述
KnowledgeGraph

加载的知识图谱。

注意

文件使用 UTF-8 编码读取,以确保在不同平台上正确处理 Unicode 字符。

源代码在 src/ragas/testset/graph.py
@classmethod
def load(cls, path: t.Union[str, Path]) -> "KnowledgeGraph":
    """Loads a knowledge graph from a path.

    Parameters
    ----------
    path : Union[str, Path]
        Path to the JSON file containing the knowledge graph.

    Returns
    -------
    KnowledgeGraph
        The loaded knowledge graph.

    Notes
    -----
    The file is read using UTF-8 encoding to ensure proper handling of Unicode characters
    across different platforms.
    """
    if isinstance(path, str):
        path = Path(path)

    with open(path, "r", encoding="utf-8") as f:
        data = json.load(f)

    nodes = [Node(**node_data) for node_data in data["nodes"]]

    nodes_map = {str(node.id): node for node in nodes}
    relationships = [
        Relationship(
            id=rel_data["id"],
            type=rel_data["type"],
            source=nodes_map[rel_data["source"]],
            target=nodes_map[rel_data["target"]],
            bidirectional=rel_data["bidirectional"],
            properties=rel_data["properties"],
        )
        for rel_data in data["relationships"]
    ]

    kg = cls()
    kg.nodes.extend(nodes)
    kg.relationships.extend(relationships)
    return kg

find_indirect_clusters

find_indirect_clusters(relationship_condition: Callable[[Relationship], bool] = lambda _: True, depth_limit: int = 3) -> List[Set[Node]]

根据关系条件在知识图谱中查找节点的间接簇。例如,如果 A -> B -> C -> D,则 A, B, C 和 D 形成一个簇。如果还有路径 A -> B -> C -> E,它将形成一个单独的簇。

参数

名称 类型 描述 默认值
relationship_condition Callable[[Relationship], bool]

一个接受 Relationship 并返回布尔值的函数,默认为 lambda _: True

lambda _: True

返回

类型 描述
List[Set[Node]]

一个集合列表,每个集合包含形成簇的节点。

源代码在 src/ragas/testset/graph.py
def find_indirect_clusters(
    self,
    relationship_condition: t.Callable[[Relationship], bool] = lambda _: True,
    depth_limit: int = 3,
) -> t.List[t.Set[Node]]:
    """
    Finds indirect clusters of nodes in the knowledge graph based on a relationship condition.
    Here if A -> B -> C -> D, then A, B, C, and D form a cluster. If there's also a path A -> B -> C -> E,
    it will form a separate cluster.

    Parameters
    ----------
    relationship_condition : Callable[[Relationship], bool], optional
        A function that takes a Relationship and returns a boolean, by default lambda _: True

    Returns
    -------
    List[Set[Node]]
        A list of sets, where each set contains nodes that form a cluster.
    """
    clusters = []
    visited_paths = set()

    relationships = [
        rel for rel in self.relationships if relationship_condition(rel)
    ]

    def dfs(node: Node, cluster: t.Set[Node], depth: int, path: t.Tuple[Node, ...]):
        if depth >= depth_limit or path in visited_paths:
            return
        visited_paths.add(path)
        cluster.add(node)

        for rel in relationships:
            neighbor = None
            if rel.source == node and rel.target not in cluster:
                neighbor = rel.target
            elif (
                rel.bidirectional
                and rel.target == node
                and rel.source not in cluster
            ):
                neighbor = rel.source

            if neighbor is not None:
                dfs(neighbor, cluster.copy(), depth + 1, path + (neighbor,))

        # Add completed path-based cluster
        if len(cluster) > 1:
            clusters.append(cluster)

    for node in self.nodes:
        initial_cluster = set()
        dfs(node, initial_cluster, 0, (node,))

    # Remove duplicates by converting clusters to frozensets
    unique_clusters = [
        set(cluster) for cluster in set(frozenset(c) for c in clusters)
    ]

    return unique_clusters

remove_node

remove_node(node: Node, inplace: bool = True) -> Optional[KnowledgeGraph]

从知识图谱中移除一个节点及其关联的关系。

参数

名称 类型 描述 默认值
node Node

要从知识图谱中移除的节点。

必需
inplace bool

如果为 True,则原地修改知识图谱。如果为 False,则返回移除节点后的修改副本。

True

返回

类型 描述
KnowledgeGraph 或 None

如果 inplace 为 False,返回知识图谱的修改副本。如果 inplace 为 True,返回 None。

抛出

类型 描述
ValueError

如果节点不在知识图谱中。

源代码在 src/ragas/testset/graph.py
def remove_node(
    self, node: Node, inplace: bool = True
) -> t.Optional["KnowledgeGraph"]:
    """
    Removes a node and its associated relationships from the knowledge graph.

    Parameters
    ----------
    node : Node
        The node to be removed from the knowledge graph.
    inplace : bool, optional
        If True, modifies the knowledge graph in place.
        If False, returns a modified copy with the node removed.

    Returns
    -------
    KnowledgeGraph or None
        Returns a modified copy of the knowledge graph if `inplace` is False.
        Returns None if `inplace` is True.

    Raises
    ------
    ValueError
        If the node is not present in the knowledge graph.
    """
    if node not in self.nodes:
        raise ValueError("Node is not present in the knowledge graph.")

    if inplace:
        # Modify the current instance
        self.nodes.remove(node)
        self.relationships = [
            rel
            for rel in self.relationships
            if rel.source != node and rel.target != node
        ]
    else:
        # Create a deep copy and modify it
        new_graph = deepcopy(self)
        new_graph.nodes.remove(node)
        new_graph.relationships = [
            rel
            for rel in new_graph.relationships
            if rel.source != node and rel.target != node
        ]
        return new_graph

find_two_nodes_single_rel

find_two_nodes_single_rel(relationship_condition: Callable[[Relationship], bool] = lambda _: True) -> List[Tuple[Node, Relationship, Node]]

根据关系条件在知识图谱中查找节点。(NodeA, NodeB, Rel)三元组被视为多跳节点。

参数

名称 类型 描述 默认值
relationship_condition Callable[[Relationship], bool]

一个接受 Relationship 并返回布尔值的函数,默认为 lambda _: True

lambda _: True

返回

类型 描述
List[Set[Node, Relationship, Node]]

一个集合列表,每个集合包含两个节点和一个关系,形成一个多跳节点。

源代码在 src/ragas/testset/graph.py
def find_two_nodes_single_rel(
    self, relationship_condition: t.Callable[[Relationship], bool] = lambda _: True
) -> t.List[t.Tuple[Node, Relationship, Node]]:
    """
    Finds nodes in the knowledge graph based on a relationship condition.
    (NodeA, NodeB, Rel) triples are considered as multi-hop nodes.

    Parameters
    ----------
    relationship_condition : Callable[[Relationship], bool], optional
        A function that takes a Relationship and returns a boolean, by default lambda _: True

    Returns
    -------
    List[Set[Node, Relationship, Node]]
        A list of sets, where each set contains two nodes and a relationship forming a multi-hop node.
    """

    relationships = [
        relationship
        for relationship in self.relationships
        if relationship_condition(relationship)
    ]

    triplets = set()

    for relationship in relationships:
        if relationship.source != relationship.target:
            node_a = relationship.source
            node_b = relationship.target
            # Ensure the smaller ID node is always first
            if node_a.id < node_b.id:
                normalized_tuple = (node_a, relationship, node_b)
            else:
                normalized_relationship = Relationship(
                    source=node_b,
                    target=node_a,
                    type=relationship.type,
                    properties=relationship.properties,
                )
                normalized_tuple = (node_b, normalized_relationship, node_a)

            triplets.add(normalized_tuple)

    return list(triplets)