embedding ¤

This module defines protocols and concrete implementations for embedding models used for text representation.

EmbeddingModelLike ¤

Bases: Protocol

A protocol that defines the methods and attributes that an embedding model should implement.

tokenizer: TokenizerLike

Returns:

model_max_length: int

Returns:

embed(
    text: str | list[str],
) -> list[float] | list[list[float]]

Embed the input text or list of texts.

Parameters:

text (str | list[str]) –

The input text or list of input texts to embed.

Returns:

list[float] | list[list[float]] –

The embeddings of the input text or list of texts.

SentenceTransformerEmbeddingModel(
    model_name: str, *, use_gpu: bool = False
)

An embedding model that uses the SentenceTransformer library.

It implements the EmbeddingModelLike protocol.

Attributes:

Parameters:

model_name (str) –

The name of the model to use.
use_gpu (bool, default: False ) –

Whether to use the GPU for inference.

tokenizer: PreTrainedTokenizerBase

Returns: