Image deduplication using embeddings
This presentation examines approaches for detecting and eliminating near-duplicate images across datasets ranging from small collections to repositories containing millions of images. We will compare the performance of several embedding models, including CLIP, ResNet, and other variants, assessing their ability to capture semantic and perceptual similarity and performance tradeoffs. We will benchmark various vector database solutions on query speed and memory consumption.