PyCon DE & PyData 2026

Is my AI Recruiting biased? - How to evaluate these systems
, Europium [3rd Floor]

AI recruiting systems are increasingly used to filter, rank, and select applicants at scale. Yet their deployment raises essential questions: How reliable are these models in real hiring environments, and how do we ensure fairness and safety across diverse applicant profiles? This talk presents a structured approach to testing and validating AI-driven recruiting pipelines. It highlights the role of synthetic test data, data augmentation, and fairness metrics in uncovering systemic risks and mitigating bias. Attendees will walk through a complete evaluation workflow. The session also incorporates insights from real-world testing practices, demonstrating how rigorous validation can increase trust and transparency in recruitment AI.


AI recruiting systems are rapidly reshaping talent acquisition by automating candidate filtering, ranking, and selection. However, their growing influence raises critical concerns around fairness, robustness, and decision transparency. This talk introduces a practical testing methodology for evaluating AI recruiting pipelines beyond traditional accuracy metrics.

We will examine how synthetic data and augmentation techniques can expose hidden weaknesses, improve coverage, and stress-test edge cases. The talk will address the role of proxy variables, why they matter, and how they can help uncover unintended model behavior. We will also explore fairness measurement strategies, including individual and group fairness metrics, and discuss how these approaches reveal structural bias in ranking and scoring outcomes.

Because parts of the evaluation process can be automated, the session will demonstrate how Python-based agents and LLM “referees” can assist in generating and augmenting CVs and certificates, validating predictions, and assessing explanation quality. This automation can accelerate workflows, increase reproducibility, and reduce human error.

Participants will walk through a complete testing pipeline, supported by insights from real-world projects that illustrate how different tools and strategies expose systemic risks and guide mitigation. Attendees will leave with practical techniques to make recruiting systems more reliable, transparent, and trustworthy in real deployment contexts.


Expected audience expertise in your talk's domain:: Intermediate Expected audience expertise in Python:: Novice

My name is Sebastian and I work as an AI Test Engineer at Validaitor. With a background in Mechatronics and Autonomous Systems, and hands-on experience at Bosch, Fraunhofer, and in international research settings, I focus on the intersection of AI trustworthiness and real-world deployment. My current work involves developing methods to test AI models for vulnerabilities, safety risks, and secure behavior - ensuring AI systems perform reliably and ethically. I like to share my experience with other techies all around the world. When I don't look into a screen, I like bouldering and books. :)