Skip to content

embedding ¤

This module defines protocols and concrete implementations for embedding models used for text representation.

EmbeddingModelLike ¤

Bases: Protocol

A protocol that defines the methods and attributes that an embedding model should implement.

tokenizer property ¤

tokenizer: TokenizerLike

Returns:

model_max_length property ¤

model_max_length: int

Returns:

  • int

    The maximum length of the input that the model can accept.

embed ¤

embed(
    text: str | list[str],
) -> list[float] | list[list[float]]

Embed the input text or list of texts.

Parameters:

  • text (str | list[str]) –

    The input text or list of input texts to embed.

Returns:

SentenceTransformerEmbeddingModel ¤

SentenceTransformerEmbeddingModel(
    model_name: str, *, use_gpu: bool = False
)

An embedding model that uses the SentenceTransformer library.

It implements the EmbeddingModelLike protocol.

Attributes:

  • model_name (str) –

    The name of the model to use.

  • use_gpu (bool) –

    Whether to use the GPU for inference.

  • model (SentenceTransformer) –

    The SentenceTransformer model.

Parameters:

  • model_name (str) –

    The name of the model to use.

  • use_gpu (bool, default: False ) –

    Whether to use the GPU for inference.

tokenizer property ¤

Returns: