嵌入

BaseRagasEmbeddings

BaseRagasEmbeddings(cache: Optional[CacheInterface] = None)

基类: Embeddings, ABC

Ragas embedding 的抽象基类。

此类继承自 Embeddings 类，提供用于 embedding 文本和管理运行配置的方法。

属性: run_config (RunConfig): 用于运行 embedding 操作的配置。

源代码位于 src/ragas/embeddings/base.py

def __init__(self, cache: t.Optional[CacheInterface] = None):
    super().__init__()
    self.cache = cache
    if self.cache is not None:
        self.embed_query = cacher(cache_backend=self.cache)(self.embed_query)
        self.embed_documents = cacher(cache_backend=self.cache)(
            self.embed_documents
        )
        self.aembed_query = cacher(cache_backend=self.cache)(self.aembed_query)
        self.aembed_documents = cacher(cache_backend=self.cache)(
            self.aembed_documents
        )

embed_text `async`

embed_text(text: str, is_async=True) -> List[float]

嵌入单个文本字符串。

源代码位于 src/ragas/embeddings/base.py

async def embed_text(self, text: str, is_async=True) -> t.List[float]:
    """
    Embed a single text string.
    """
    embs = await self.embed_texts([text], is_async=is_async)
    return embs[0]

embed_texts `async`

embed_texts(texts: List[str], is_async: bool = True) -> List[List[float]]

嵌入多个文本。

源代码位于 src/ragas/embeddings/base.py

async def embed_texts(
    self, texts: t.List[str], is_async: bool = True
) -> t.List[t.List[float]]:
    """
    Embed multiple texts.
    """
    if is_async:
        aembed_documents_with_retry = add_async_retry(
            self.aembed_documents, self.run_config
        )
        return await aembed_documents_with_retry(texts)
    else:
        loop = asyncio.get_event_loop()
        embed_documents_with_retry = add_retry(
            self.embed_documents, self.run_config
        )
        return await loop.run_in_executor(None, embed_documents_with_retry, texts)

set_run_config

set_run_config(run_config: RunConfig)

为 embedding 操作设置运行配置。

源代码位于 src/ragas/embeddings/base.py

def set_run_config(self, run_config: RunConfig):
    """
    Set the run configuration for the embedding operations.
    """
    self.run_config = run_config

HuggingfaceEmbeddings

HuggingfaceEmbeddings(cache: Optional[CacheInterface] = None)

基类: BaseRagasEmbeddings

用于使用预训练模型生成 embedding 的 Hugging Face embedding 类。

此类提供了加载和使用 Hugging Face 模型来生成文本输入 embedding 的功能。

参数

名称	类型	描述	默认值
`model_name`	`str`	要使用的预训练模型的名称，默认为 DEFAULT_MODEL_NAME。	必需
`cache_folder`	`str`	存储下载模型的路径。也可以通过 SENTENCE_TRANSFORMERS_HOME 环境变量设置。	必需
`model_kwargs`	`dict`	传递给模型的额外关键字参数。	必需
`encode_kwargs`	`dict`	传递给编码方法的额外关键字参数。	必需

属性

名称	类型	描述
`model`	`Union[SentenceTransformer, CrossEncoder]`	加载的 Hugging Face 模型。
`is_cross_encoder`	`bool`	指示模型是否为 cross-encoder 的标志。

方法

名称	描述
`embed_query`	嵌入单个查询文本。
`embed_documents`	嵌入多个文档。
`predict`	使用 cross-encoder 模型进行预测。

注意

此类需要安装 sentence_transformers 和 transformers 包。

示例

>>> embeddings = HuggingfaceEmbeddings(model_name="bert-base-uncased")
>>> query_embedding = embeddings.embed_query("What is the capital of France?")
>>> doc_embeddings = embeddings.embed_documents(["Paris is the capital of France.", "London is the capital of the UK."])

源代码位于 src/ragas/embeddings/base.py

def __init__(self, cache: t.Optional[CacheInterface] = None):
    super().__init__()
    self.cache = cache
    if self.cache is not None:
        self.embed_query = cacher(cache_backend=self.cache)(self.embed_query)
        self.embed_documents = cacher(cache_backend=self.cache)(
            self.embed_documents
        )
        self.aembed_query = cacher(cache_backend=self.cache)(self.aembed_query)
        self.aembed_documents = cacher(cache_backend=self.cache)(
            self.aembed_documents
        )

embed_query

embed_query(text: str) -> List[float]

嵌入单个查询文本。

源代码位于 src/ragas/embeddings/base.py

def embed_query(self, text: str) -> t.List[float]:
    """
    Embed a single query text.
    """
    return self.embed_documents([text])[0]

embed_documents

embed_documents(texts: List[str]) -> List[List[float]]

嵌入多个文档。

源代码位于 src/ragas/embeddings/base.py

def embed_documents(self, texts: t.List[str]) -> t.List[t.List[float]]:
    """
    Embed multiple documents.
    """
    from sentence_transformers.SentenceTransformer import SentenceTransformer
    from torch import Tensor

    assert isinstance(
        self.model, SentenceTransformer
    ), "Model is not of the type Bi-encoder"
    embeddings = self.model.encode(
        texts, normalize_embeddings=True, **self.encode_kwargs
    )

    assert isinstance(embeddings, Tensor)
    return embeddings.tolist()

predict

predict(texts: List[List[str]]) -> List[List[float]]

使用 cross-encoder 模型进行预测。

源代码位于 src/ragas/embeddings/base.py

def predict(self, texts: t.List[t.List[str]]) -> t.List[t.List[float]]:
    """
    Make predictions using a cross-encoder model.
    """
    from sentence_transformers.cross_encoder import CrossEncoder
    from torch import Tensor

    assert isinstance(
        self.model, CrossEncoder
    ), "Model is not of the type CrossEncoder"

    predictions = self.model.predict(texts, **self.encode_kwargs)

    assert isinstance(predictions, Tensor)
    return predictions.tolist()

LangchainEmbeddingsWrapper

LangchainEmbeddingsWrapper(embeddings: Embeddings, run_config: Optional[RunConfig] = None, cache: Optional[CacheInterface] = None)

基类: BaseRagasEmbeddings

langchain 中任何 embedding 的包装器。

源代码位于 src/ragas/embeddings/base.py

def __init__(
    self,
    embeddings: Embeddings,
    run_config: t.Optional[RunConfig] = None,
    cache: t.Optional[CacheInterface] = None,
):
    super().__init__(cache=cache)
    self.embeddings = embeddings
    if run_config is None:
        run_config = RunConfig()
    self.set_run_config(run_config)

embed_query

embed_query(text: str) -> List[float]

嵌入单个查询文本。

源代码位于 src/ragas/embeddings/base.py

def embed_query(self, text: str) -> t.List[float]:
    """
    Embed a single query text.
    """
    return self.embeddings.embed_query(text)

embed_documents

embed_documents(texts: List[str]) -> List[List[float]]

嵌入多个文档。

源代码位于 src/ragas/embeddings/base.py

def embed_documents(self, texts: t.List[str]) -> t.List[t.List[float]]:
    """
    Embed multiple documents.
    """
    return self.embeddings.embed_documents(texts)

aembed_query `async`

aembed_query(text: str) -> List[float]

异步嵌入单个查询文本。

源代码位于 src/ragas/embeddings/base.py

async def aembed_query(self, text: str) -> t.List[float]:
    """
    Asynchronously embed a single query text.
    """
    return await self.embeddings.aembed_query(text)

aembed_documents `async`

aembed_documents(texts: List[str]) -> List[List[float]]

异步嵌入多个文档。

源代码位于 src/ragas/embeddings/base.py

async def aembed_documents(self, texts: t.List[str]) -> t.List[t.List[float]]:
    """
    Asynchronously embed multiple documents.
    """
    return await self.embeddings.aembed_documents(texts)

set_run_config

set_run_config(run_config: RunConfig)

为 embedding 操作设置运行配置。

源代码位于 src/ragas/embeddings/base.py

def set_run_config(self, run_config: RunConfig):
    """
    Set the run configuration for the embedding operations.
    """
    self.run_config = run_config

    # run configurations specially for OpenAI
    if isinstance(self.embeddings, OpenAIEmbeddings):
        try:
            from openai import RateLimitError
        except ImportError:
            raise ImportError(
                "openai.error.RateLimitError not found. Please install openai package as `pip install openai`"
            )
        self.embeddings.request_timeout = run_config.timeout
        self.run_config.exception_types = RateLimitError

LlamaIndexEmbeddingsWrapper

LlamaIndexEmbeddingsWrapper(embeddings: BaseEmbedding, run_config: Optional[RunConfig] = None, cache: Optional[CacheInterface] = None)

基类: BaseRagasEmbeddings

llama-index 中任何 embedding 的包装器。

此类提供了 llama-index embedding 的包装器，使其可在 Ragas 框架中使用。它支持查询和文档的同步和异步 embedding 操作。

参数

名称	类型	描述	默认值
`embeddings`	`BaseEmbedding`	要包装的 llama-index embedding 模型。	必需
`run_config`	`RunConfig`	运行配置。如果未提供，将使用默认 RunConfig。	`无`

属性

名称	类型	描述
`embeddings`	`BaseEmbedding`	包装的 llama-index embedding 模型。

示例

>>> from llama_index.embeddings import OpenAIEmbedding
>>> from ragas.embeddings import LlamaIndexEmbeddingsWrapper
>>> llama_embeddings = OpenAIEmbedding()
>>> wrapped_embeddings = LlamaIndexEmbeddingsWrapper(llama_embeddings)
>>> query_embedding = wrapped_embeddings.embed_query("What is the capital of France?")
>>> document_embeddings = wrapped_embeddings.embed_documents(["Paris is the capital of France.", "London is the capital of the UK."])

源代码位于 src/ragas/embeddings/base.py

def __init__(
    self,
    embeddings: BaseEmbedding,
    run_config: t.Optional[RunConfig] = None,
    cache: t.Optional[CacheInterface] = None,
):
    super().__init__(cache=cache)
    self.embeddings = embeddings
    if run_config is None:
        run_config = RunConfig()
    self.set_run_config(run_config)

HaystackEmbeddingsWrapper

HaystackEmbeddingsWrapper(embedder: Any, run_config: Optional[RunConfig] = None, cache: Optional[CacheInterface] = None)

基类: BaseRagasEmbeddings

用于在 Ragas 框架中使用 Haystack embedder 的包装器。

此类允许你使用同步和异步方法（embed_query/embed_documents 和 aembed_query/aembed_documents）通过 Haystack embedder 生成 embeddings。

参数

名称	类型	描述	默认值
`embedder`	`AzureOpenAITextEmbedder \| HuggingFaceAPITextEmbedder \| OpenAITextEmbedder \| SentenceTransformersTextEmbedder`	受支持的 Haystack embedder 类实例。	必需
`run_config`	`RunConfig`	用于管理 embedding 执行设置的配置对象，默认为 None。	`无`
`cache`	`CacheInterface`	用于存储和检索 embedding 结果的缓存实例，默认为 None。	`无`

源代码位于 src/ragas/embeddings/haystack_wrapper.py

def __init__(
    self,
    embedder: t.Any,
    run_config: t.Optional[RunConfig] = None,
    cache: t.Optional[CacheInterface] = None,
):
    super().__init__(cache=cache)

    # Lazy Import of required Haystack components
    try:
        from haystack import AsyncPipeline
        from haystack.components.embedders import (
            AzureOpenAITextEmbedder,
            HuggingFaceAPITextEmbedder,
            OpenAITextEmbedder,
            SentenceTransformersTextEmbedder,
        )
    except ImportError as exc:
        raise ImportError(
            "Haystack is not installed. Please install it with `pip install haystack-ai`."
        ) from exc

    # Validate embedder type
    if not isinstance(
        embedder,
        (
            AzureOpenAITextEmbedder,
            HuggingFaceAPITextEmbedder,
            OpenAITextEmbedder,
            SentenceTransformersTextEmbedder,
        ),
    ):
        raise TypeError(
            "Expected 'embedder' to be one of: AzureOpenAITextEmbedder, "
            "HuggingFaceAPITextEmbedder, OpenAITextEmbedder, or "
            f"SentenceTransformersTextEmbedder, but got {type(embedder).__name__}."
        )

    self.embedder = embedder

    # Initialize an asynchronous pipeline and add the embedder component
    self.async_pipeline = AsyncPipeline()
    self.async_pipeline.add_component("embedder", self.embedder)

    # Set or create the run configuration
    if run_config is None:
        run_config = RunConfig()
    self.set_run_config(run_config)

embedding_factory

embedding_factory(model: str = 'text-embedding-ada-002', run_config: Optional[RunConfig] = None) -> BaseRagasEmbeddings

创建并返回一个 BaseRagasEmbeddings 实例。用于 Ragas 中使用的默认 embedding (OpenAI)。

此工厂函数创建并用 LangchainEmbeddingsWrapper 包装一个 OpenAIEmbeddings 实例，以提供与 BaseRagasEmbeddings 兼容的对象。

参数

名称	类型	描述	默认值
`model`	`str`	要使用的 OpenAI embedding 模型的名称，默认为 "text-embedding-ada-002"。	`'text-embedding-ada-002'`
`run_config`	`RunConfig`	运行配置，默认为 None。	`无`

返回值

类型	描述
`BaseRagasEmbeddings`	使用指定参数配置的 BaseRagasEmbeddings 实例。

源代码位于 src/ragas/embeddings/base.py

def embedding_factory(
    model: str = "text-embedding-ada-002", run_config: t.Optional[RunConfig] = None
) -> BaseRagasEmbeddings:
    """
    Create and return a BaseRagasEmbeddings instance. Used for default embeddings
    used in Ragas (OpenAI).

    This factory function creates an OpenAIEmbeddings instance and wraps it with
    LangchainEmbeddingsWrapper to provide a BaseRagasEmbeddings compatible object.

    Parameters
    ----------
    model : str, optional
        The name of the OpenAI embedding model to use, by default "text-embedding-ada-002".
    run_config : RunConfig, optional
        Configuration for the run, by default None.

    Returns
    -------
    BaseRagasEmbeddings
        An instance of BaseRagasEmbeddings configured with the specified parameters.
    """
    openai_embeddings = OpenAIEmbeddings(model=model)
    if run_config is not None:
        openai_embeddings.request_timeout = run_config.timeout
    else:
        run_config = RunConfig()
    return LangchainEmbeddingsWrapper(openai_embeddings, run_config=run_config)

嵌入

BaseRagasEmbeddings

embed_text async

embed_texts async

set_run_config

HuggingfaceEmbeddings

embed_query

embed_documents

predict

LangchainEmbeddingsWrapper

embed_query

embed_documents

aembed_query async

aembed_documents async

set_run_config

LlamaIndexEmbeddingsWrapper

HaystackEmbeddingsWrapper

embedding_factory

embed_text `async`

embed_texts `async`

aembed_query `async`

aembed_documents `async`