Skip to content

WindowChunk

WindowChunk

Overview

The WindowChunk Block splits a document into chunks where each chunk consists of a central sentence and its surrounding sentences based on a defined window size. This method captures context by including a specified number of sentences before and after the central sentence, ensuring that each chunk provides a windowed view of the text.

This Block is particularly useful when you need to maintain local context around sentences, such as for tasks like summarization, entity extraction, or contextual analysis.

Description

Sentence window chunk parser.

Splits a document into Chunks Each chunk contains a window from the surrounding sentences.

Args: window_size: The number of sentences on each side of a sentence to capture.

Metadata

  • Category: Function

Configuration Options

Name Data Type Description Default Value
window_size int 3

Inputs

Name Data Type Description
text str or list[str]

Outputs

Name Data Type Description
result list[str]

State Variables

No state variables available.

Example(s)

Example 1: Chunk a document with a window of 3 sentences

  • Create a WindowChunk Block.
  • Set the window_size to 3 (each chunk will include 3 sentences before and after the central sentence).
  • Provide the input text:
    "Sentence 1. Sentence 2. Sentence 3. Sentence 4. Sentence 5. Sentence 6. Sentence 7."
    
  • The Block will output chunks, such as:
    [
      "Sentence 1. Sentence 2. Sentence 3.",
      "Sentence 2. Sentence 3. Sentence 4.",
      "Sentence 3. Sentence 4. Sentence 5.",
      "Sentence 4. Sentence 5. Sentence 6.",
      "Sentence 5. Sentence 6. Sentence 7."
    ]
    

Example 2: Use a custom window size

  • Set up a WindowChunk Block.
  • Set the window_size to 2.
  • Provide the input text:
    "The sky is blue. The sun is shining. The birds are singing. It's a beautiful day."
    
  • The Block will output windowed chunks, such as:
    [
      "The sky is blue. The sun is shining.",
      "The sun is shining. The birds are singing.",
      "The birds are singing. It's a beautiful day."
    ]
    

Example 3: Handle multiple documents

  • Create a WindowChunk Block.
  • Provide a list of documents as input:
    [
      "Document 1: Sentence 1. Sentence 2.",
      "Document 2: Sentence 1. Sentence 2. Sentence 3."
    ]
    
  • The Block will process each document and return windowed chunks for both:
    [
      ["Sentence 1. Sentence 2."],
      ["Sentence 1. Sentence 2.", "Sentence 2. Sentence 3."]
    ]
    

Error Handling

  • If the input text is invalid or an error occurs during chunking, the Block will raise a RuntimeError with a descriptive error message.
  • If no valid chunks are generated, the Block will return a list containing an empty string.

FAQ

What does the window_size parameter control?

The window_size parameter specifies how many sentences before and after the central sentence should be included in each chunk. A larger window size provides more context for each chunk, while a smaller window size results in tighter sentence groups.

What happens if the document is too short to fill a chunk?

If the document is shorter than the specified window size, the Block will return smaller chunks that include only the available sentences. For example, if a document has 2 sentences and the window size is 3, the Block will still return the entire document in one chunk.

Can I use this Block for multiple documents at once?

Yes, the WindowChunk Block can process a list of documents. Each document will be split into windowed chunks independently, and the Block will return the results for each document as a list of chunks.

Does this Block ensure that sentences are not split?

Yes, the WindowChunk Block is designed to preserve complete sentences. It chunks the text based on full sentences, ensuring that no sentence is split across chunks.