config
¤
SelectionMode
¤
ClusterTreeConfig
dataclass
¤
ClusterTreeConfig(
embedding_model: EmbeddingModelLike,
summarization_model: SummarizationModelLike,
document_splitter: DocumentSplitterLike,
clustering_func: ClusteringFunctionLike = raptor_clustering,
clustering_backend: ClusteringBackendLike | None = None,
max_length_in_cluster: int = 3500,
max_num_layers: int = 5,
)
Configuration for ClusterTreeBuilder.
Parameters:
-
embedding_model(EmbeddingModelLike) –The embedding model to use.
-
summarization_model(SummarizationModelLike) –The summarization model to use.
-
document_splitter(DocumentSplitterLike) –The document splitter to use.
-
clustering_func(ClusteringFunctionLike, default:raptor_clustering) –The clustering function to use.
-
clustering_backend(ClusteringBackendLike | None, default:None) –The clustering backend to use.
-
max_length_in_cluster(int, default:3500) –The maximum length of a cluster.
-
max_num_layers(int, default:5) –The maximum number of layers
embedding_tokenizer
property
¤
embedding_tokenizer: TokenizerLike
Returns:
-
TokenizerLike–The tokenizer of the embedding model.
summarization_tokenizer
property
¤
summarization_tokenizer: TokenizerLike
Returns:
-
TokenizerLike–The tokenizer of the summarization model.
TreeRetrieverConfig
dataclass
¤
TreeRetrieverConfig(
embedding_model: EmbeddingModelLike,
threshold: float = 0.5,
top_k: int = 5,
selection_mode: SelectionMode = SelectionMode.TOP_K,
max_tokens: int = 3500,
)
Configuration for TreeRetriever.
Parameters:
-
embedding_model(EmbeddingModelLike) –The embedding model to use.
-
threshold(float, default:0.5) –The threshold value for selection when using threshold mode for selection.
-
top_k(int, default:5) –The number of top results to return when using top k mode for selection.
-
selection_mode(SelectionMode, default:TOP_K) –The selection mode to use.
-
max_tokens(int, default:3500) –The maximum number of tokens to retrieve.
tokenizer
property
¤
tokenizer: TokenizerLike
Returns:
-
TokenizerLike–The tokenizer of the embedding model.