Usage¤

Example RAG Application¤

This guide demonstrates how to create a Retrieval-Augmented Generation (RAG) pipeline using bookacle. You only need 16 lines of code to get started!

Full Code¤

The complete example shows how to load a document, create a RAPTOR tree for retrieval, and run a query on the document.

Code

from bookacle.loaders import pymupdf_loader
from bookacle.models.embedding import SentenceTransformerEmbeddingModel
from bookacle.models.message import Message
from bookacle.models.qa import OllamaQAModel
from bookacle.models.summarization import HuggingFaceLLMSummarizationModel
from bookacle.splitters import HuggingFaceTextSplitter
from bookacle.tree.builder import ClusterTreeBuilder
from bookacle.tree.config import ClusterTreeConfig, TreeRetrieverConfig
from bookacle.tree.retriever import TreeRetriever

documents = pymupdf_loader(file_path="data/the-godfather.pdf")

embedding_model = SentenceTransformerEmbeddingModel(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

summarization_model = HuggingFaceLLMSummarizationModel(
    model_name="Qwen/Qwen2-0.5B-Instruct",
    summarization_length=100,
)

qa_model = OllamaQAModel(model_name="qwen2.5:0.5b-instruct")

document_splitter = HuggingFaceTextSplitter(tokenizer=embedding_model.tokenizer)

config = ClusterTreeConfig(
    embedding_model=embedding_model,
    summarization_model=summarization_model,
    document_splitter=document_splitter,
)

tree_builder = ClusterTreeBuilder(config=config)

tree = tree_builder.build_from_documents(documents=documents)

retriever_config = TreeRetrieverConfig(embedding_model=embedding_model)
retriever = TreeRetriever(config=retriever_config)

query = "Who are the cast members of The Godfather?"

_, context = retriever.retrieve(query=query, tree=tree)

system_prompt = """You are a helpful assistant, designed to help users understand documents and answer questions on the documents.
Use your knowledge and the context passed to you to answer user queries.
The context will be text extracted from the document. It will be denoted by CONTEXT: in the prompt.
The user's query will be denoted by QUERY: in the prompt.
Do NOT explicitly state that you are referring to the context.
"""

history = [Message(role="system", content=system_prompt)]

answer = qa_model.answer(question=query, context=context, stream=False, history=history)

print(f"Answer:\n{answer['content']}")

Step-by-Step Walkthrough¤

We now walk through the code step-by-step with explanations of each part.

Imports

from bookacle.loaders import pymupdf_loader
from bookacle.models.embedding import SentenceTransformerEmbeddingModel
from bookacle.models.qa import OllamaQAModel
from bookacle.models.summarization import HuggingFaceLLMSummarizationModel
from bookacle.splitters import HuggingFaceTextSplitter
from bookacle.tree.builder import ClusterTreeBuilder
from bookacle.tree.config import ClusterTreeConfig, TreeRetrieverConfig
from bookacle.tree.retriever import TreeRetriever
from bookacle.models.message import Message

We start by loading the data file using pymupdf_loader(), which uses PyMuPDF to load the PDF file as text. The example uses the first 2 pages (when exported in A3) of the Wikipedia entry on The Godfather:

documents = pymupdf_loader(file_path="data/the-godfather.pdf")

print(f"Number of documents: {len(documents)}")
print(f"First document:\n{documents[0]}")

Chat Interface¤

bookacle comes with a built-in terminal-based chat interface powered by rich and prompt-toolkit, which supports the following:

Autocompletion in the chat.
Custom user avatars.
Markdown rendering.
Streaming output with a nice progress bar.
Pass a system prompt to the question-answering model.
Store chat history in a file as you chat, etc.

Launch from a script¤

The chat interface can be launched in a script by using Chat.

from rich.console import Console
from bookacle.chat import Chat

console = Console()

chat = Chat(
    retriever=retriever,
    qa_model=qa_model,
    console=console,
)

system_prompt = """You are a helpful assistant, designed to help users understand documents and answer questions on the documents.
Use your knowledge and the context passed to you to answer user queries.
The context will be text extracted from the document. It will be denoted by CONTEXT: in the prompt.
The user's query will be denoted by QUERY: in the prompt.
Always respond in Markdown.
"""

chat.run(tree=tree, stream=True, system_prompt=system_prompt)

Here is an example interaction:

Chat Interaction

Terminal-based Chat¤

You can also use the chat via the CLI to interact with your documents.

bookacle --help

 Usage: bookacle [OPTIONS] FILE_PATH                                                                                                                                                                                                                       

╭─ Arguments ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ *    file_path      FILE  Path to the PDF file. [required]                                                                                                                                                                                              │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Options ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --loader              -l      [pymupdf4llm|pymupdf]  Loader to use. [default: pymupdf4llm]                                                                                                                                                              │
│ --start-page          -s      INTEGER                The page (0-based) in the PDF file to start reading from. If not provided, defaults to 0, reading from the beginning.                                                                              │
│ --end-page            -e      INTEGER                The page (0-based) in the PDF file to stop reading at (not inclusive). If not provided, the document will be read till the end.                                                                    │
│ --user-avatar         -a      TEXT                   Avatar that should be used for the user during chat. [default: 👤]                                                                                                                                 │
│ --history_file        -h      TEXT                   File where chat history should be stored. [default: /home/malay_agr/.bookacle-chat-history.txt]                                                                                                    │
│ --config-file         -c      FILE                   Custom configuration file. If not provided, the default settings are used.                                                                                                                         │
│ --prompt-file         -p      FILE                   Custom prompts file. If not provided, the default prompts are used.                                                                                                                                │
│ --version             -v                             Print version and exit.                                                                                                                                                                            │
│ --install-completion                                 Install completion for the current shell.                                                                                                                                                          │
│ --show-completion                                    Show completion for the current shell, to copy it or customize the installation.                                                                                                                   │
│ --help                                               Show this message and exit.                                                                                                                                                                        │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

See Command-Line Interface for more information on the usage.