跳转到内容

嵌入

BaseRagasEmbedding

基类:ABC

用于 Ragas 嵌入实现的现代抽象基类。

此类为使用各种提供商嵌入文本提供了一致的接口。实现应提供嵌入单个文本的同步和异步方法,并自动提供批处理方法。

embed_text abstractmethod

embed_text(text: str, **kwargs: Any) -> List[float]

嵌入单个文本。

参数: text:要嵌入的文本 **kwargs:嵌入调用的附加参数

返回: 表示嵌入的浮点数列表

源代码位于 src/ragas/embeddings/base.py
@abstractmethod
def embed_text(self, text: str, **kwargs: t.Any) -> t.List[float]:
    """Embed a single text.

    Args:
        text: The text to embed
        **kwargs: Additional arguments for the embedding call

    Returns:
        List of floats representing the embedding
    """
    pass

aembed_text abstractmethod async

aembed_text(text: str, **kwargs: Any) -> List[float]

异步嵌入单个文本。

参数: text:要嵌入的文本 **kwargs:嵌入调用的附加参数

返回: 表示嵌入的浮点数列表

源代码位于 src/ragas/embeddings/base.py
@abstractmethod
async def aembed_text(self, text: str, **kwargs: t.Any) -> t.List[float]:
    """Asynchronously embed a single text.

    Args:
        text: The text to embed
        **kwargs: Additional arguments for the embedding call

    Returns:
        List of floats representing the embedding
    """
    pass

embed_texts

embed_texts(texts: List[str], **kwargs: Any) -> List[List[float]]

嵌入多个文本。

默认实现单独处理文本。可重写以进行批处理优化。

参数: texts:要嵌入的文本列表 **kwargs:嵌入调用的附加参数

返回: 嵌入列表,每个输入文本对应一个

源代码位于 src/ragas/embeddings/base.py
def embed_texts(self, texts: t.List[str], **kwargs: t.Any) -> t.List[t.List[float]]:
    """Embed multiple texts.

    Default implementation processes texts individually. Override for
    batch optimization.

    Args:
        texts: List of texts to embed
        **kwargs: Additional arguments for the embedding calls

    Returns:
        List of embeddings, one for each input text
    """
    texts = validate_texts(texts)
    return [self.embed_text(text, **kwargs) for text in texts]

aembed_texts async

aembed_texts(texts: List[str], **kwargs: Any) -> List[List[float]]

异步嵌入多个文本。

默认实现并发处理文本。可重写以进行批处理优化。

参数: texts:要嵌入的文本列表 **kwargs:嵌入调用的附加参数

返回: 嵌入列表,每个输入文本对应一个

源代码位于 src/ragas/embeddings/base.py
async def aembed_texts(
    self, texts: t.List[str], **kwargs: t.Any
) -> t.List[t.List[float]]:
    """Asynchronously embed multiple texts.

    Default implementation processes texts concurrently. Override for
    batch optimization.

    Args:
        texts: List of texts to embed
        **kwargs: Additional arguments for the embedding calls

    Returns:
        List of embeddings, one for each input text
    """
    texts = validate_texts(texts)
    tasks = [self.aembed_text(text, **kwargs) for text in texts]
    return await asyncio.gather(*tasks)

BaseRagasEmbeddings

BaseRagasEmbeddings(cache: Optional[CacheInterface] = None)

基类:Embeddings, ABC

Ragas 嵌入的抽象基类。

此类扩展了 Embeddings 类,并提供了用于嵌入文本和管理运行配置的方法。

属性: run_config (RunConfig):用于运行嵌入操作的配置。

源代码位于 src/ragas/embeddings/base.py
def __init__(self, cache: t.Optional[CacheInterface] = None):
    super().__init__()
    self.cache = cache
    if self.cache is not None:
        self.embed_query = cacher(cache_backend=self.cache)(self.embed_query)
        self.embed_documents = cacher(cache_backend=self.cache)(
            self.embed_documents
        )
        self.aembed_query = cacher(cache_backend=self.cache)(self.aembed_query)
        self.aembed_documents = cacher(cache_backend=self.cache)(
            self.aembed_documents
        )

embed_text async

embed_text(text: str, is_async=True) -> List[float]

嵌入单个文本字符串。

源代码位于 src/ragas/embeddings/base.py
async def embed_text(self, text: str, is_async=True) -> t.List[float]:
    """
    Embed a single text string.
    """
    embs = await self.embed_texts([text], is_async=is_async)
    return embs[0]

embed_texts async

embed_texts(texts: List[str], is_async: bool = True) -> List[List[float]]

嵌入多个文本。

源代码位于 src/ragas/embeddings/base.py
async def embed_texts(
    self, texts: t.List[str], is_async: bool = True
) -> t.List[t.List[float]]:
    """
    Embed multiple texts.
    """
    if is_async:
        aembed_documents_with_retry = add_async_retry(
            self.aembed_documents, self.run_config
        )
        return await aembed_documents_with_retry(texts)
    else:
        loop = asyncio.get_event_loop()
        embed_documents_with_retry = add_retry(
            self.embed_documents, self.run_config
        )
        return await loop.run_in_executor(None, embed_documents_with_retry, texts)

set_run_config

set_run_config(run_config: RunConfig)

设置嵌入操作的运行配置。

源代码位于 src/ragas/embeddings/base.py
def set_run_config(self, run_config: RunConfig):
    """
    Set the run configuration for the embedding operations.
    """
    self.run_config = run_config

HuggingfaceEmbeddings

HuggingfaceEmbeddings(cache: Optional[CacheInterface] = None)

基类:BaseRagasEmbeddings

用于使用预训练模型生成嵌入的 Hugging Face 嵌入类。

此类提供了加载和使用 Hugging Face 模型来生成文本输入嵌入的功能。

参数

名称 类型 描述 默认值
model_name str

要使用的预训练模型名称,默认为 DEFAULT_MODEL_NAME。

必需
cache_folder str

存储下载模型的路径。也可以通过 SENTENCE_TRANSFORMERS_HOME 环境变量设置。

必需
model_kwargs dict

传递给模型的附加关键字参数。

必需
encode_kwargs dict

传递给编码方法的附加关键字参数。

必需

属性

名称 类型 描述
model Union[SentenceTransformer, CrossEncoder]

已加载的 Hugging Face 模型。

is_cross_encoder bool

指示模型是否为交叉编码器的标志。

方法

名称 描述
embed_query

嵌入单个查询文本。

embed_documents

嵌入多个文档。

predict

使用交叉编码器模型进行预测。

备注

此类需要安装 sentence_transformerstransformers 包。

示例

>>> embeddings = HuggingfaceEmbeddings(model_name="bert-base-uncased")
>>> query_embedding = embeddings.embed_query("What is the capital of France?")
>>> doc_embeddings = embeddings.embed_documents(["Paris is the capital of France.", "London is the capital of the UK."])
源代码位于 src/ragas/embeddings/base.py
def __init__(self, cache: t.Optional[CacheInterface] = None):
    super().__init__()
    self.cache = cache
    if self.cache is not None:
        self.embed_query = cacher(cache_backend=self.cache)(self.embed_query)
        self.embed_documents = cacher(cache_backend=self.cache)(
            self.embed_documents
        )
        self.aembed_query = cacher(cache_backend=self.cache)(self.aembed_query)
        self.aembed_documents = cacher(cache_backend=self.cache)(
            self.aembed_documents
        )

embed_query

embed_query(text: str) -> List[float]

嵌入单个查询文本。

源代码位于 src/ragas/embeddings/base.py
def embed_query(self, text: str) -> t.List[float]:
    """
    Embed a single query text.
    """
    return self.embed_documents([text])[0]

embed_documents

embed_documents(texts: List[str]) -> List[List[float]]

嵌入多个文档。

源代码位于 src/ragas/embeddings/base.py
def embed_documents(self, texts: t.List[str]) -> t.List[t.List[float]]:
    """
    Embed multiple documents.
    """
    from sentence_transformers.SentenceTransformer import SentenceTransformer
    from torch import Tensor

    assert isinstance(self.model, SentenceTransformer), (
        "Model is not of the type Bi-encoder"
    )
    embeddings = self.model.encode(
        texts, normalize_embeddings=True, **self.encode_kwargs
    )

    assert isinstance(embeddings, Tensor)
    return embeddings.tolist()

predict

predict(texts: List[List[str]]) -> List[List[float]]

使用交叉编码器模型进行预测。

源代码位于 src/ragas/embeddings/base.py
def predict(self, texts: t.List[t.List[str]]) -> t.List[t.List[float]]:
    """
    Make predictions using a cross-encoder model.
    """
    from sentence_transformers.cross_encoder import CrossEncoder
    from torch import Tensor

    assert isinstance(self.model, CrossEncoder), (
        "Model is not of the type CrossEncoder"
    )

    predictions = self.model.predict(texts, **self.encode_kwargs)

    assert isinstance(predictions, Tensor)
    return predictions.tolist()

GoogleEmbeddings

GoogleEmbeddings(client: Optional[Any] = None, model: str = 'text-embedding-004', use_vertex: bool = False, project_id: Optional[str] = None, location: Optional[str] = 'us-central1', **kwargs: Any)

基类:BaseRagasEmbedding

使用 Vertex AI 或 Google AI (Gemini) 的 Google 嵌入。

支持 Vertex AI 和 Google AI (Gemini) 嵌入模型。对于 Vertex AI,需要 google-cloud-aiplatform 包。对于 Google AI,需要 google-generativeai 包。

client 参数是灵活的: - 对于 Gemini:可以是 None(自动导入 genai)、genai 模块或 GenerativeModel 实例 - 对于 Vertex:应该是已配置的 vertex 客户端

示例:# Gemini - 自动导入(最简单) embeddings = GoogleEmbeddings(client=None, model="text-embedding-004")

# Gemini - explicit genai module
import google.generativeai as genai
genai.configure(api_key="...")
embeddings = GoogleEmbeddings(client=genai, model="text-embedding-004")

# Gemini - from LLM client (auto-extracts genai module)
llm_client = genai.GenerativeModel("gemini-2.0-flash")
embeddings = GoogleEmbeddings(client=llm_client, model="text-embedding-004")
源代码位于 src/ragas/embeddings/google_provider.py
def __init__(
    self,
    client: t.Optional[t.Any] = None,
    model: str = "text-embedding-004",
    use_vertex: bool = False,
    project_id: t.Optional[str] = None,
    location: t.Optional[str] = "us-central1",
    **kwargs: t.Any,
):
    self._original_client = client
    self.model = model
    self.use_vertex = use_vertex
    self.project_id = project_id
    self.location = location
    self.kwargs = kwargs

    # Resolve the actual client to use
    self.client = self._resolve_client(client, use_vertex)

embed_text

embed_text(text: str, **kwargs: Any) -> List[float]

使用 Google 的嵌入服务嵌入单个文本。

源代码位于 src/ragas/embeddings/google_provider.py
def embed_text(self, text: str, **kwargs: t.Any) -> t.List[float]:
    """Embed a single text using Google's embedding service."""
    if self.use_vertex:
        return self._embed_text_vertex(text, **kwargs)
    else:
        return self._embed_text_genai(text, **kwargs)

aembed_text async

aembed_text(text: str, **kwargs: Any) -> List[float]

使用 Google 的嵌入服务异步嵌入单个文本。

Google 的 SDK 不提供原生异步支持,因此我们使用 ThreadPoolExecutor。

源代码位于 src/ragas/embeddings/google_provider.py
async def aembed_text(self, text: str, **kwargs: t.Any) -> t.List[float]:
    """Asynchronously embed a single text using Google's embedding service.

    Google's SDK doesn't provide native async support, so we use ThreadPoolExecutor.
    """
    return await run_sync_in_async(self.embed_text, text, **kwargs)

embed_texts

embed_texts(texts: List[str], **kwargs: Any) -> List[List[float]]

使用 Google 的嵌入服务嵌入多个文本。

源代码位于 src/ragas/embeddings/google_provider.py
def embed_texts(self, texts: t.List[str], **kwargs: t.Any) -> t.List[t.List[float]]:
    """Embed multiple texts using Google's embedding service."""
    texts = validate_texts(texts)
    if not texts:
        return []

    if self.use_vertex:
        return self._embed_texts_vertex(texts, **kwargs)
    else:
        return self._embed_texts_genai(texts, **kwargs)

aembed_texts async

aembed_texts(texts: List[str], **kwargs: Any) -> List[List[float]]

使用 Google 的嵌入服务异步嵌入多个文本。

源代码位于 src/ragas/embeddings/google_provider.py
async def aembed_texts(
    self, texts: t.List[str], **kwargs: t.Any
) -> t.List[t.List[float]]:
    """Asynchronously embed multiple texts using Google's embedding service."""
    texts = validate_texts(texts)
    if not texts:
        return []

    return await run_sync_in_async(self.embed_texts, texts, **kwargs)

HaystackEmbeddingsWrapper

HaystackEmbeddingsWrapper(embedder: Any, run_config: Optional[RunConfig] = None, cache: Optional[CacheInterface] = None)

基类:BaseRagasEmbeddings

在 Ragas 框架内使用 Haystack embedder 的包装器。

此类允许您使用同步和异步方法(embed_query/embed_documentsaembed_query/aembed_documents)通过 Haystack embedder 生成嵌入。

参数

名称 类型 描述 默认值
embedder AzureOpenAITextEmbedder | HuggingFaceAPITextEmbedder | OpenAITextEmbedder | SentenceTransformersTextEmbedder

受支持的 Haystack embedder 类的实例。

必需
run_config RunConfig

用于管理嵌入执行设置的配置对象,默认为 None。

None
cache CacheInterface

用于存储和检索嵌入结果的缓存实例,默认为 None。

None
源代码位于 src/ragas/embeddings/haystack_wrapper.py
def __init__(
    self,
    embedder: t.Any,
    run_config: t.Optional[RunConfig] = None,
    cache: t.Optional[CacheInterface] = None,
):
    super().__init__(cache=cache)

    # Lazy Import of required Haystack components
    try:
        from haystack import AsyncPipeline
        from haystack.components.embedders.azure_text_embedder import (
            AzureOpenAITextEmbedder,
        )
        from haystack.components.embedders.hugging_face_api_text_embedder import (
            HuggingFaceAPITextEmbedder,
        )
        from haystack.components.embedders.openai_text_embedder import (
            OpenAITextEmbedder,
        )
        from haystack.components.embedders.sentence_transformers_text_embedder import (
            SentenceTransformersTextEmbedder,
        )
    except ImportError as exc:
        raise ImportError(
            "Haystack is not installed. Please install it with `pip install haystack-ai`."
        ) from exc

    # Validate embedder type
    if not isinstance(
        embedder,
        (
            AzureOpenAITextEmbedder,
            HuggingFaceAPITextEmbedder,
            OpenAITextEmbedder,
            SentenceTransformersTextEmbedder,
        ),
    ):
        raise TypeError(
            "Expected 'embedder' to be one of: AzureOpenAITextEmbedder, "
            "HuggingFaceAPITextEmbedder, OpenAITextEmbedder, or "
            f"SentenceTransformersTextEmbedder, but got {type(embedder).__name__}."
        )

    self.embedder = embedder

    # Initialize an asynchronous pipeline and add the embedder component
    self.async_pipeline = AsyncPipeline()
    self.async_pipeline.add_component("embedder", self.embedder)

    # Set or create the run configuration
    if run_config is None:
        run_config = RunConfig()
    self.set_run_config(run_config)

HuggingFaceEmbeddings

HuggingFaceEmbeddings(model: str, use_api: bool = False, api_key: Optional[str] = None, device: Optional[str] = None, normalize_embeddings: bool = True, batch_size: int = 32, **model_kwargs: Any)

基类:BaseRagasEmbedding

支持本地和基于 API 的模型的 HuggingFace 嵌入。

支持使用 sentence-transformers 的本地模型和 HuggingFace API 的托管模型。提供高效的批处理和缓存。

源代码位于 src/ragas/embeddings/huggingface_provider.py
def __init__(
    self,
    model: str,
    use_api: bool = False,
    api_key: t.Optional[str] = None,
    device: t.Optional[str] = None,
    normalize_embeddings: bool = True,
    batch_size: int = 32,
    **model_kwargs: t.Any,
):
    self.model = model
    self.use_api = use_api
    self.api_key = api_key
    self.device = device
    self.normalize_embeddings = normalize_embeddings
    self.batch_size = batch_size
    self.model_kwargs = model_kwargs

    if use_api:
        self._setup_api_client()
    else:
        self._setup_local_model()

embed_text

embed_text(text: str, **kwargs: Any) -> List[float]

使用 HuggingFace 嵌入单个文本。

源代码位于 src/ragas/embeddings/huggingface_provider.py
def embed_text(self, text: str, **kwargs: t.Any) -> t.List[float]:
    """Embed a single text using HuggingFace."""
    if self.use_api:
        return self._embed_text_api(text, **kwargs)
    else:
        return self._embed_text_local(text, **kwargs)

aembed_text async

aembed_text(text: str, **kwargs: Any) -> List[float]

使用 HuggingFace 异步嵌入单个文本。

源代码位于 src/ragas/embeddings/huggingface_provider.py
async def aembed_text(self, text: str, **kwargs: t.Any) -> t.List[float]:
    """Asynchronously embed a single text using HuggingFace."""
    if self.use_api:
        return await self._aembed_text_api(text, **kwargs)
    else:
        return await run_sync_in_async(self._embed_text_local, text, **kwargs)

embed_texts

embed_texts(texts: List[str], **kwargs: Any) -> List[List[float]]

使用 HuggingFace 批量嵌入多个文本。

源代码位于 src/ragas/embeddings/huggingface_provider.py
def embed_texts(self, texts: t.List[str], **kwargs: t.Any) -> t.List[t.List[float]]:
    """Embed multiple texts using HuggingFace with batching."""
    texts = validate_texts(texts)
    if not texts:
        return []

    if self.use_api:
        return self._embed_texts_api(texts, **kwargs)
    else:
        return self._embed_texts_local(texts, **kwargs)

aembed_texts async

aembed_texts(texts: List[str], **kwargs: Any) -> List[List[float]]

使用 HuggingFace 异步嵌入多个文本。

源代码位于 src/ragas/embeddings/huggingface_provider.py
async def aembed_texts(
    self, texts: t.List[str], **kwargs: t.Any
) -> t.List[t.List[float]]:
    """Asynchronously embed multiple texts using HuggingFace."""
    texts = validate_texts(texts)
    if not texts:
        return []

    if self.use_api:
        return await run_sync_in_async(self._embed_texts_api, texts, **kwargs)
    else:
        return await run_sync_in_async(self._embed_texts_local, texts, **kwargs)

LiteLLMEmbeddings

LiteLLMEmbeddings(model: str, api_key: Optional[str] = None, api_base: Optional[str] = None, api_version: Optional[str] = None, timeout: int = 600, max_retries: int = 3, batch_size: Optional[int] = None, **litellm_params: Any)

基类:BaseRagasEmbedding

使用 LiteLLM 的通用嵌入接口。

支持 OpenAI、Azure、Google、Cohere、Anthropic 等超过 100 种模型。提供智能批处理和特定于提供商的优化。

源代码位于 src/ragas/embeddings/litellm_provider.py
def __init__(
    self,
    model: str,
    api_key: t.Optional[str] = None,
    api_base: t.Optional[str] = None,
    api_version: t.Optional[str] = None,
    timeout: int = 600,
    max_retries: int = 3,
    batch_size: t.Optional[int] = None,
    **litellm_params: t.Any,
):
    self.litellm = safe_import("litellm", "litellm")
    self.model = model
    self.api_key = api_key
    self.api_base = api_base
    self.api_version = api_version
    self.timeout = timeout
    self.max_retries = max_retries
    self.batch_size = batch_size or get_optimal_batch_size("litellm", model)
    self.litellm_params = litellm_params

embed_text

embed_text(text: str, **kwargs: Any) -> List[float]

使用 LiteLLM 嵌入单个文本。

源代码位于 src/ragas/embeddings/litellm_provider.py
def embed_text(self, text: str, **kwargs: t.Any) -> t.List[float]:
    """Embed a single text using LiteLLM."""
    call_kwargs = self._prepare_kwargs(**kwargs)
    response = self.litellm.embedding(input=[text], **call_kwargs)
    return response.data[0]["embedding"]

aembed_text async

aembed_text(text: str, **kwargs: Any) -> List[float]

使用 LiteLLM 异步嵌入单个文本。

源代码位于 src/ragas/embeddings/litellm_provider.py
async def aembed_text(self, text: str, **kwargs: t.Any) -> t.List[float]:
    """Asynchronously embed a single text using LiteLLM."""
    call_kwargs = self._prepare_kwargs(**kwargs)
    response = await self.litellm.aembedding(input=[text], **call_kwargs)
    return response.data[0]["embedding"]

embed_texts

embed_texts(texts: List[str], **kwargs: Any) -> List[List[float]]

使用 LiteLLM 智能批处理嵌入多个文本。

源代码位于 src/ragas/embeddings/litellm_provider.py
def embed_texts(self, texts: t.List[str], **kwargs: t.Any) -> t.List[t.List[float]]:
    """Embed multiple texts using LiteLLM with intelligent batching."""
    texts = validate_texts(texts)
    if not texts:
        return []

    embeddings = []
    batches = batch_texts(texts, self.batch_size)

    for batch in batches:
        call_kwargs = self._prepare_kwargs(**kwargs)
        response = self.litellm.embedding(input=batch, **call_kwargs)
        embeddings.extend([item["embedding"] for item in response.data])

    return embeddings

aembed_texts async

aembed_texts(texts: List[str], **kwargs: Any) -> List[List[float]]

使用 LiteLLM 智能批处理异步嵌入多个文本。

源代码位于 src/ragas/embeddings/litellm_provider.py
async def aembed_texts(
    self, texts: t.List[str], **kwargs: t.Any
) -> t.List[t.List[float]]:
    """Asynchronously embed multiple texts using LiteLLM with intelligent batching."""
    texts = validate_texts(texts)
    if not texts:
        return []

    embeddings = []
    batches = batch_texts(texts, self.batch_size)

    for batch in batches:
        call_kwargs = self._prepare_kwargs(**kwargs)
        response = await self.litellm.aembedding(input=batch, **call_kwargs)
        embeddings.extend([item["embedding"] for item in response.data])

    return embeddings

OpenAIEmbeddings

OpenAIEmbeddings(client: Any, model: str = 'text-embedding-3-small')

基类:BaseRagasEmbedding

具有批处理优化的 OpenAI 嵌入实现。

支持同步和异步 OpenAI 客户端,并能自动检测。提供优化的批处理以获得更好的性能。

源代码位于 src/ragas/embeddings/openai_provider.py
def __init__(self, client: t.Any, model: str = "text-embedding-3-small"):
    self.client = client
    self.model = model
    self.is_async = self._check_client_async(client)

embed_text

embed_text(text: str, **kwargs: Any) -> List[float]

使用 OpenAI 嵌入单个文本。

对于异步客户端,这将在适当的事件循环中运行异步方法。

源代码位于 src/ragas/embeddings/openai_provider.py
def embed_text(self, text: str, **kwargs: t.Any) -> t.List[float]:
    """Embed a single text using OpenAI.

    For async clients, this will run the async method in the appropriate event loop.
    """
    if self.is_async:
        result = self._run_async_in_current_loop(self.aembed_text(text, **kwargs))
    else:
        response = self.client.embeddings.create(
            input=text, model=self.model, **kwargs
        )
        result = response.data[0].embedding

    # Track usage
    track(
        EmbeddingUsageEvent(
            provider="openai",
            model=self.model,
            embedding_type="modern",
            num_requests=1,
            is_async=self.is_async,
        )
    )
    return result

aembed_text async

aembed_text(text: str, **kwargs: Any) -> List[float]

使用 OpenAI 异步嵌入单个文本。

源代码位于 src/ragas/embeddings/openai_provider.py
async def aembed_text(self, text: str, **kwargs: t.Any) -> t.List[float]:
    """Asynchronously embed a single text using OpenAI."""
    if not self.is_async:
        raise TypeError(
            "Cannot use aembed_text() with a synchronous client. Use embed_text() instead."
        )

    response = await self.client.embeddings.create(
        input=text, model=self.model, **kwargs
    )
    result = response.data[0].embedding

    # Track usage
    track(
        EmbeddingUsageEvent(
            provider="openai",
            model=self.model,
            embedding_type="modern",
            num_requests=1,
            is_async=True,
        )
    )
    return result

embed_texts

embed_texts(texts: List[str], **kwargs: Any) -> List[List[float]]

使用 OpenAI 的批处理 API 优化嵌入多个文本。

源代码位于 src/ragas/embeddings/openai_provider.py
def embed_texts(self, texts: t.List[str], **kwargs: t.Any) -> t.List[t.List[float]]:
    """Embed multiple texts using OpenAI's batch API for optimization."""
    texts = validate_texts(texts)
    if not texts:
        return []

    if self.is_async:
        result = self._run_async_in_current_loop(self.aembed_texts(texts, **kwargs))
    else:
        # OpenAI supports batch embedding natively
        response = self.client.embeddings.create(
            input=texts, model=self.model, **kwargs
        )
        result = [item.embedding for item in response.data]

    # Track usage
    track(
        EmbeddingUsageEvent(
            provider="openai",
            model=self.model,
            embedding_type="modern",
            num_requests=len(texts),
            is_async=self.is_async,
        )
    )
    return result

aembed_texts async

aembed_texts(texts: List[str], **kwargs: Any) -> List[List[float]]

使用 OpenAI 的批处理 API 异步嵌入多个文本。

源代码位于 src/ragas/embeddings/openai_provider.py
async def aembed_texts(
    self, texts: t.List[str], **kwargs: t.Any
) -> t.List[t.List[float]]:
    """Asynchronously embed multiple texts using OpenAI's batch API."""
    texts = validate_texts(texts)
    if not texts:
        return []

    if not self.is_async:
        raise TypeError(
            "Cannot use aembed_texts() with a synchronous client. Use embed_texts() instead."
        )

    response = await self.client.embeddings.create(
        input=texts, model=self.model, **kwargs
    )
    result = [item.embedding for item in response.data]

    # Track usage
    track(
        EmbeddingUsageEvent(
            provider="openai",
            model=self.model,
            embedding_type="modern",
            num_requests=len(texts),
            is_async=True,
        )
    )
    return result

batch_texts

batch_texts(texts: List[str], batch_size: int) -> List[List[str]]

将文本列表分批成更小的块。

参数: texts:要分批的文本列表 batch_size:每个批次的大小

返回: 批次列表,其中每个批次是文本列表

源代码位于 src/ragas/embeddings/utils.py
def batch_texts(texts: t.List[str], batch_size: int) -> t.List[t.List[str]]:
    """Batch a list of texts into smaller chunks.

    Args:
        texts: List of texts to batch
        batch_size: Size of each batch

    Returns:
        List of batches, where each batch is a list of texts
    """
    if batch_size <= 0:
        raise ValueError("Batch size must be positive")

    batches = []
    for i in range(0, len(texts), batch_size):
        batches.append(texts[i : i + batch_size])
    return batches

get_optimal_batch_size

get_optimal_batch_size(provider: str, model: str) -> int

获取提供商/模型组合的最佳批处理大小。

参数: provider:嵌入提供商 model:模型名称

返回: 提供商/模型的最佳批处理大小

源代码位于 src/ragas/embeddings/utils.py
def get_optimal_batch_size(provider: str, model: str) -> int:
    """Get optimal batch size for a provider/model combination.

    Args:
        provider: The embedding provider
        model: The model name

    Returns:
        Optimal batch size for the provider/model
    """
    provider_lower = provider.lower()

    # Provider-specific batch sizes
    if "openai" in provider_lower:
        return 100  # OpenAI supports large batches
    elif "cohere" in provider_lower:
        return 96  # Cohere's documented limit
    elif "google" in provider_lower or "vertex" in provider_lower:
        return 5  # Google/Vertex AI is more conservative
    elif "huggingface" in provider_lower:
        return 32  # HuggingFace default
    else:
        return 10  # Conservative default for unknown providers

validate_texts

validate_texts(texts: Union[str, List[str]]) -> List[str]

验证并规范化文本输入。

参数: texts:单个文本或文本列表

返回: 经过验证的文本列表

引发: ValueError:如果文本无效

源代码位于 src/ragas/embeddings/utils.py
def validate_texts(texts: t.Union[str, t.List[str]]) -> t.List[str]:
    """Validate and normalize text inputs.

    Args:
        texts: Single text or list of texts

    Returns:
        List of validated texts

    Raises:
        ValueError: If texts are invalid
    """
    if isinstance(texts, str):
        texts = [texts]

    if not isinstance(texts, list):
        raise ValueError("Texts must be a string or list of strings")

    if not texts:
        raise ValueError("Texts list cannot be empty")

    for i, text in enumerate(texts):
        if not isinstance(text, str):
            raise ValueError(f"Text at index {i} must be a string, got {type(text)}")
        if not text.strip():
            raise ValueError(f"Text at index {i} cannot be empty or whitespace only")

    return texts

embedding_factory

embedding_factory(*args, **kwargs)

已弃用:请直接使用 base 模块中的 embedding_factory。

源代码位于 src/ragas/embeddings/__init__.py
def embedding_factory(*args, **kwargs):
    """Deprecated: Use embedding_factory from base module directly."""
    import warnings

    warnings.warn(
        "Importing embedding_factory from ragas.embeddings is deprecated. "
        "Import directly from ragas.embeddings.base or use modern providers: "
        "from ragas.embeddings import OpenAIEmbeddings, GoogleEmbeddings, HuggingFaceEmbeddings",
        DeprecationWarning,
        stacklevel=2,
    )
    return _embedding_factory(*args, **kwargs)