Skip to content

summarization ¤

This module defines protocols and concrete implementations for summarization models used for summarizing texts in intermediate RAPTOR tree layers.

SummarizationModelLike ¤

Bases: Protocol

A protocol that defines the methods and attributes that a summarization model should implement.

tokenizer property ¤

tokenizer: TokenizerLike

Returns:

summarize ¤

summarize(text: str | list[str]) -> str | list[str]

Summarize the input text or list of texts.

Parameters:

  • text (str | list[str]) –

    The input text or list of input texts to summarize.

Returns:

  • str | list[str]

    The summary of the input text or list of texts.

HuggingFaceSummarizationModel ¤

HuggingFaceSummarizationModel(
    model_name: str,
    summarization_length: int = 100,
    *,
    use_gpu: bool = False
)

A class that uses a Hugging Face model for summarization.

It implements the SummarizationModelLike protocol.

Attributes:

  • model_name (str) –

    The name of the Hugging Face model to use.

  • summarization_length (int) –

    The maximum length of the summary.

  • use_gpu (bool) –

    Whether to use the GPU for inference.

  • model (AutoModelForSeq2SeqLM) –

    The Hugging Face model for summarization.

  • pipeline (Pipeline) –

    The Hugging Face pipeline for summarization.

Parameters:

  • model_name (str) –

    The name of the Hugging Face model to use.

  • summarization_length (int, default: 100 ) –

    The maximum length of the summary.

  • use_gpu (bool, default: False ) –

    Whether to use the GPU for inference.

tokenizer property ¤

Returns:

HuggingFaceLLMSummarizationModel ¤

HuggingFaceLLMSummarizationModel(
    model_name: str,
    summarization_length: int = 100,
    *,
    system_prompt: str = "",
    use_gpu: bool = False
)

A class that uses a Hugging Face LLM for summarization.

It implements the SummarizationModelLike protocol.

Attributes:

  • model_name (str) –

    The name of the Hugging Face LLM to use.

  • summarization_length (int) –

    The maximum length of the summary.

  • system_prompt (str) –

    The system prompt passed to the LLM for summarization.

  • use_gpu (bool) –

    Whether to use the GPU for inference.

  • model (AutoModelForCausalLM) –

    The Hugging Face LLM for summarization.

  • pipeline (Pipeline) –

    The Hugging Face pipeline for summarization.

Parameters:

  • model_name (str) –

    The name of the Hugging Face model to use.

  • summarization_length (int, default: 100 ) –

    The maximum length of the summary.

  • system_prompt (str, default: '' ) –

    The system prompt to pass to the LLM for summarization.

  • use_gpu (bool, default: False ) –

    Whether to use the GPU for inference.

tokenizer property ¤

Returns:

format_as_chat_message ¤

format_as_chat_message(
    text: str | list[str],
) -> list[Message] | list[list[Message]]

Format the input text or list of texts as chat messages.

A chat message is a dictionary with the keys ‘role’ and ‘content’.

If the input is a list of texts
  • If the system prompt is provided, a list of lists containing the system prompt and user message is returned.
  • If the system prompt is not provided, a list of lists containing the user messages is returned.
If the input is a single text
  • If the system prompt is provided, a list containing the system prompt and user message is returned.
  • If the system prompt is not provided, a list containing only the user message is returned.

Parameters:

  • text (str | list[str]) –

    The input text or list of texts to format.

Returns:

Examples:

Single Text
from bookacle.models.summarization import HuggingFaceLLMSummarizationModel
model = HuggingFaceLLMSummarizationModel(model_name="Qwen/Qwen2-0.5B-Instruct")
text = "This is a test"
print(model.format_as_chat_message(text))
[{'role': 'user', 'content': 'Summarize the following in not more than 100 words:\nThis is a test'}]
Mutliple Texts
from bookacle.models.summarization import HuggingFaceLLMSummarizationModel
model = HuggingFaceLLMSummarizationModel(model_name="Qwen/Qwen2-0.5B-Instruct")
text = ["This is a test", "This is another test"]
print(model.format_as_chat_message(text))
[[{'role': 'user', 'content': 'Summarize the following in not more than 100 words:\nThis is a test'}], [{'role': 'user', 'content': 'Summarize the following in not more than 100 words:\nThis is another test'}]]

summarize ¤

summarize(text: str | list[str]) -> str | list[str]

Summarize the input text or list of texts.

The input is first formatted into chat messages using format_as_chat_message() and then passed to the underlying LLM for summarization.

Each input text is passed to the LLM with the following format:

"Summarize the following in not more than {summarization_length} words:\n{text}"

Parameters:

  • text (str | list[str]) –

    The input text or list of texts to summarize.

Returns:

  • str | list[str]

    The summary of the input text or list of texts.