PyCon DE & PyData 2025

Ian Ormesher

Ian Ormesher is a seasoned full-stack Data Scientist with a robust background in training and deploying AI models in production environments. With a career spanning over four decades, he has honed his skills in Machine Learning, Deep Neural Networks, Reinforcement Learning, and Computer Vision. He is proficient in a wide array of programming languages and data analysis tools with a proven track record of implementing data-oriented solutions in the Cloud.


LinkedIn

https://www.linkedin.com/in/ianormy/

Github

https://github.com/ianormy


Session

04-23
11:45
30min
Duplicate record detection using GenAI techniques to improve data quality
Ian Ormesher

Duplicate records can have a negative impact on many areas of a business. Current methods to detect duplicate records use traditional NLP techniques known as “Entity Matching”. An improvement to this traditional method can be achieved by incorporating GenAI techniques that do not entail any calls to OpenAI. Not only does this produce better matches, but it also keeps the data safe, since no information is transferred externally.

PyData: Natural Language Processing & Audio (incl. Generative AI NLP)
Platinum3