Jonas Jarutis PyCon Lithuania 2025

Jonas Jarutis
.ical

Passionate about all things data.

Session

04-24

15:00

25min

Image deduplication using embeddings

Jonas Jarutis

This presentation examines approaches for detecting and eliminating near-duplicate images across datasets ranging from small collections to repositories containing millions of images. We will compare the performance of several embedding models, including CLIP, ResNet, and other variants, assessing their ability to capture semantic and perceptual similarity and performance tradeoffs. We will benchmark various vector database solutions on query speed and memory consumption.

Data Day - Apr 24

203

Jonas Jarutis .ical

Session

Jonas Jarutis
.ical