PyCon DE & PyData 2025

Vector Streaming: The Memory Efficient Indexing for Vector Databases
2025-04-25 , Hassium

Vector databases are everywhere, powering LLMs. But indexing embeddings, especially multivector embeddings like ColPali and Colbert, at a bulk is memory intensive. Vector streaming solves this problem by parallelizing the tasks of parsing, chunking, and embedding generation and indexing it continuously chunk by chunk instead of bulk. This not only increase the speed but also makes the whole task more optimized and memory efficient.

The library gives many vector database supports, like Pinecone, Weavaite, and Elastic.


Embedding creation is mostly done synchronously; a lot of time is wasted while the chunks are being created, as chunking is not a compute-heavy operation. As the chunks are being made, passing them to the embedding model would be efficient. This problem further intensifies with late interaction embeddings like CoLBert or ColPali.

The solution is to create an asynchronous chunking and embedding task. We can effectively spawn threads to handle this task using Rust's concurrency patterns and thread safety. This is done using Rust's MPSC (Multi-producer Single Consumer) module, which passes messages between threads. Thus, this creates a stream of chunks passed into the embedding thread with a buffer. Once the buffer is complete, it embeds the chunks and sends the embeddings back to the main thread, where they are sent to the vector database. This ensures no time is wasted on a single operation and no bottlenecks. Moreover, only the chunks and embeddings in the buffer are stored in the system memory. They are erased from the memory once moved to the vector database.

All this is then bound into Python using pyo3 and maturin, so it's easily accessible from Python, but the core is still asynchronous with rust.


Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

Advanced

Public link to supporting material, e.g. videos, Github, etc.:

https://github.com/StarlightSearch/EmbedAnything

Sonam is the creator of the open-source library called Embed-Anything, which helps to create local and multimodal embeddings and index them efficiently to vector databases, it’s built in rust and thus it’s more greener and efficient. She works as the GenerativeAI Evangelist at Articul8, spun-off of Interl, Articul8 is the go-to generativeAI platform for enterprise.