Skip to content

builder ¤

TreeBuilderLike ¤

Bases: Protocol

A protocol that defines the interface for a RAPTOR tree builder.

build_from_documents ¤

build_from_documents(
    documents: list[Document],
    chunk_size: int | None = None,
    chunk_overlap: int | None = None,
    *args,
    **kwargs
) -> Tree

Build a tree from a list of documents.

Parameters:

  • documents (list[Document]) –

    A list of documents to build the tree from.

  • chunk_size (int | None, default: None ) –

    The size of the chunks to split the documents into.

  • chunk_overlap (int | None, default: None ) –

    The overlap between the chunks.

  • *args

    Additional positional arguments.

  • **kwargs

    Additional keyword arguments.

Returns:

  • Tree

    A tree built from the documents.

ClusterTreeBuilder ¤

ClusterTreeBuilder(config: ClusterTreeConfig)

A RAPTOR tree builder that clusters nodes at each subsequent tree layer to build the tree.

It implements the TreeBuilderLike protocol.

Attributes:

  • config (RaptorTreeConfig) –

    The configuration for the tree builder.

Parameters:

create_leaf_nodes ¤

create_leaf_nodes(
    chunks: list[Document], embeddings: list[list[float]]
) -> dict[int, Node]

Create leaf nodes from the given chunks.

Parameters:

  • chunks (list[Document]) –

    The chunks to create the leaf nodes from.

  • embeddings (list[list[float]]) –

    The embeddings of the chunks.

Returns:

  • dict[int, Node]

    A mapping of the global index to the created leaf nodes.

create_next_tree_level ¤

create_next_tree_level(
    clusters: list[list[Node]],
    first_node_index: int,
    layer: int,
) -> dict[int, Node]

Create the next tree level from the given clusters.

For each cluster
  • The texts of the nodes in the cluster are concatenated.
  • The concatenated text is summarized.
  • The summarized text is embedded.
  • A Node is created with the summarized text, embeddings, and the indices of the children nodes.

Parameters:

  • clusters (list[list[Node]]) –

    The clusters to create the next tree level from.

  • first_node_index (int) –

    The global index of the first node in the new layer.

  • layer (int) –

    The layer of the tree the clusters belong to.

Returns:

  • dict[int, Node]

    A mapping of the global indices to the created nodes.

construct_tree ¤

construct_tree(
    chunks: list[Document],
    embeddings: list[list[float]],
    reduction_dimension: int = 10,
) -> Tree

Construct a RAPTOR tree from the given chunks and embeddings.

The tree is built in a bottom-up manner, starting from the leaf nodes and going up to the root nodes.

To build the tree
  • The leaf nodes are created from the chunks and embeddings.
  • The leaf nodes are clustered to create the next tree level using create_next_tree_level().
  • The process is repeated until the maximum number of layers is reached or the number of nodes in the next level is less than the reduction dimension.

Parameters:

  • chunks (list[Document]) –

    The chunks to construct the tree from.

  • embeddings (list[list[float]]) –

    The embeddings of the chunks.

  • reduction_dimension (int, default: 10 ) –

    The dimension to reduce the embeddings to before clustering.

Returns:

  • Tree

    A RAPTOR tree constructed from the chunks and embeddings.

build_from_documents ¤

build_from_documents(
    documents: list[Document],
    chunk_size: int | None = None,
    chunk_overlap: int | None = None,
) -> Tree

Build a RAPTOR tree from the given documents.

Each document is split into chunks and each chunk is embedded. These are then passed to the construct_tree() method to build the tree.

Parameters:

  • documents (list[Document]) –

    The documents to build the tree from.

  • chunk_size (int | None, default: None ) –

    The size of the chunks to split the documents into. When None, it defaults to the maximum length supported by the embedding model.

  • chunk_overlap (int | None, default: None ) –

    The overlap between the chunks. When None, it defaults to half the chunk size.

Returns:

  • Tree

    A RAPTOR tree built from the documents.