2026-07-21 –, Room 1.19 (Ground Floor, Shannon)
LLM-as-a-Judge systems are increasingly deployed in high-stakes settings - screening job applicants, triaging medical cases, assessing credit risk, and flagging legal exposure. As the EU AI Act takes effect in August 2026 with penalties up to €35M for biased high-risk systems, organizations are investing heavily in fairness audits. But passing a bias check does not guarantee fairness. Standard Python fairness pipelines rarely detect this shift. In a controlled hiring experiment on real resumes, we demonstrate how alignment and potentially bias-mitigation techniques can reduce aggregate disparities while redistributing harm across intersectional subgroups.
Consider a hiring model that shows equal acceptance rates for men and women, and equal rates for white and non-white candidates. Every single-axis dashboard is green. Yet Black women are rejected at nearly twice the rate of any other group. Social scientists call this intersectionality - the recognition that discrimination operates non-additively. A Black woman's experience isn't racism + sexism; the intersection creates distinct disadvantages. The bias doesn't disappear - it moves.
We’ll walk through Python workflows that:
Move beyond single-attribute slicing to multi-dimensional group analysis
Implement additivity testing (quantify non-linear discrimination)
Detect dimensional heterogeneity (when gender improves but race worsens)
Surface trade-offs introduced by alignment and tuning
Although the empirical case centers on hiring, the evaluation framework generalizes to any high-stakes LLM-as-Judge deployment. Attendees will leave with a reproducible evaluation framework grounded in 50 years of social science research, practical tools for EU AI Act compliance, and a clearer understanding of what meaningful compliance requires in regulated environments.
- Currently interested in AI Safety and Alignment Research | Writes at https://permutedsense.substack.com/
- Open source and privacy-preserving tools enthusiast
Previous experience working as a data scientist on varied business propositions ranging from detecting scientific fraud in publishing, supply chain optimization, customer attrition, upselling/cross-selling card products, web personalization and customer-merchant affinity.