PyCon Lithuania 2025

Image deduplication using embeddings
2025-04-24 , 203

This presentation examines approaches for detecting and eliminating near-duplicate images across datasets ranging from small collections to repositories containing millions of images. We will compare the performance of several embedding models, including CLIP, ResNet, and other variants, assessing their ability to capture semantic and perceptual similarity and performance tradeoffs. We will benchmark various vector database solutions on query speed and memory consumption.

Passionate about all things data.