VectorSearch

Overview¶

The VectorSearch Block performs a vector-based search over specified data spaces using embeddings to find the most relevant documents or items based on input queries. It uses an embeddings service to transform the input queries into vector embeddings, and searches for documents within the token limit and data space constraints.

This Block is useful for retrieving relevant documents from large datasets using semantic search methods, and it supports token limits to ensure the search results are manageable within model constraints.

Description¶

Metadata¶

Category: Data

Configuration Options¶

Name	Data Type	Default Value
topn	`int`	`10`
token_limit	`int`	`2000`
dataspace_ids	`list[str] or Constant(value=None)`	`[]`

Inputs¶

Name	Data Type	Description
queries	`list[str]`

Outputs¶

Name	Data Type	Description
results	`list[SearchResult]`

State Variables¶

No state variables available.

Example(s)¶

Example 1: Perform a vector search with default parameters¶

Create a VectorSearch Block.
Provide a list of queries, such as ["Find documents about AI ethics."].
The Block will return the top 10 most relevant search results from the available data spaces, ranked by relevance.

Example 2: Search across multiple data spaces¶

Set up a VectorSearch Block.
Provide a list of dataspace_ids to specify which data spaces to search through.
Provide a list of queries, such as ["Machine learning in healthcare."].
The Block will perform a vector search across the provided data spaces and return the top search results.

Example 3: Search with a custom token limit¶

Create a VectorSearch Block.
Set the token_limit to 1000.
Provide a list of queries: ["AI and automation in manufacturing."].
The Block will return search results while ensuring that the total token count for the results stays under the specified limit.

Error Handling¶

If the workspace is not defined, the Block will raise an error indicating that it can only be used within a workspace.
If no results are found within the token limit, the Block will return an empty list.

FAQ¶

What happens if the token limit is exceeded?

The Block will stop adding results once the token limit is reached. Any results that would cause the token count to exceed the limit will be excluded from the final output.

Can I search across multiple data spaces?

Yes, you can specify multiple dataspace_ids to search across different data spaces. If no dataspace_ids are provided, the search will be performed in the default data spaces associated with the workspace.

What happens if the query returns too many results?

The Block will sort the results by relevance (score) and return only the top results based on the topn parameter, which defaults to 10.

How does the Block handle duplicate results?

The Block checks for duplicates based on the document id and merges content from duplicate paths, ensuring that the final results are unique and comprehensive.