builder
¤
TreeBuilderLike
¤
Bases: Protocol
A protocol that defines the interface for a RAPTOR tree builder.
build_from_documents
¤
build_from_documents(
documents: list[Document],
chunk_size: int | None = None,
chunk_overlap: int | None = None,
*args,
**kwargs
) -> Tree
Build a tree from a list of documents.
Parameters:
-
documents
(list[Document]
) –A list of documents to build the tree from.
-
chunk_size
(int | None
, default:None
) –The size of the chunks to split the documents into.
-
chunk_overlap
(int | None
, default:None
) –The overlap between the chunks.
-
*args
–Additional positional arguments.
-
**kwargs
–Additional keyword arguments.
Returns:
-
Tree
–A tree built from the documents.
ClusterTreeBuilder
¤
ClusterTreeBuilder(config: ClusterTreeConfig)
A RAPTOR tree builder that clusters nodes at each subsequent tree layer to build the tree.
It implements the TreeBuilderLike protocol.
Attributes:
-
config
(RaptorTreeConfig
) –The configuration for the tree builder.
Parameters:
-
config
(ClusterTreeConfig
) –The configuration for the tree builder.
create_leaf_nodes
¤
create_next_tree_level
¤
create_next_tree_level(
clusters: list[list[Node]],
first_node_index: int,
layer: int,
) -> dict[int, Node]
Create the next tree level from the given clusters.
For each cluster
- The texts of the nodes in the cluster are concatenated.
- The concatenated text is summarized.
- The summarized text is embedded.
- A Node is created with the summarized text, embeddings, and the indices of the children nodes.
Parameters:
-
clusters
(list[list[Node]]
) –The clusters to create the next tree level from.
-
first_node_index
(int
) –The global index of the first node in the new layer.
-
layer
(int
) –The layer of the tree the clusters belong to.
Returns:
construct_tree
¤
construct_tree(
chunks: list[Document],
embeddings: list[list[float]],
reduction_dimension: int = 10,
) -> Tree
Construct a RAPTOR tree from the given chunks and embeddings.
The tree is built in a bottom-up manner, starting from the leaf nodes and going up to the root nodes.
To build the tree
- The leaf nodes are created from the chunks and embeddings.
- The leaf nodes are clustered to create the next tree level using create_next_tree_level().
- The process is repeated until the maximum number of layers is reached or the number of nodes in the next level is less than the reduction dimension.
Parameters:
-
chunks
(list[Document]
) –The chunks to construct the tree from.
-
embeddings
(list[list[float]]
) –The embeddings of the chunks.
-
reduction_dimension
(int
, default:10
) –The dimension to reduce the embeddings to before clustering.
Returns:
-
Tree
–A RAPTOR tree constructed from the chunks and embeddings.
build_from_documents
¤
build_from_documents(
documents: list[Document],
chunk_size: int | None = None,
chunk_overlap: int | None = None,
) -> Tree
Build a RAPTOR tree from the given documents.
Each document is split into chunks and each chunk is embedded. These are then passed to the construct_tree() method to build the tree.
Parameters:
-
documents
(list[Document]
) –The documents to build the tree from.
-
chunk_size
(int | None
, default:None
) –The size of the chunks to split the documents into. When
None
, it defaults to the maximum length supported by the embedding model. -
chunk_overlap
(int | None
, default:None
) –The overlap between the chunks. When
None
, it defaults to half the chunk size.
Returns:
-
Tree
–A RAPTOR tree built from the documents.