BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//pretalx//pretalx.com//pyconde-pydata-2026//speaker//TLZK97
BEGIN:VTIMEZONE
TZID:CET
BEGIN:STANDARD
DTSTART:20001029T040000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=10
TZNAME:CET
TZOFFSETFROM:+0200
TZOFFSETTO:+0100
END:STANDARD
BEGIN:DAYLIGHT
DTSTART:20000326T030000
RRULE:FREQ=YEARLY;BYDAY=-1SU;BYMONTH=3
TZNAME:CEST
TZOFFSETFROM:+0100
TZOFFSETTO:+0200
END:DAYLIGHT
END:VTIMEZONE
BEGIN:VEVENT
UID:pretalx-pyconde-pydata-2026-8YTYEN@pretalx.com
DTSTART;TZID=CET:20260415T165500
DTEND;TZID=CET:20260415T172500
DESCRIPTION:Many machine learning tools assume abundant\, independent data\
 , rely on a single data split plus cross-validation\, and leave test-set s
 eparation to the user.\n\nIn application-driven domains such as industrial
  materials science and pharmaceutical development\, data are scarce\, high
 -dimensional\, and often correlated\, creating conditions under which stan
 dard ML pipelines frequently fail. Small datasets are highly sensitive to 
 the random seed used for splitting\, and common pitfalls such as feature s
 election before splitting or distributing correlated samples across train 
 and test sets cause data leakage and inflated performance metrics.\n\nOcto
 pus is an open-source Python AutoML library explicitly designed for small-
 data\, high-dimensional regime. It enforces strict nested cross-validation
  for model and hyperparameter selection\, quantifies performance variabili
 ty across multiple splits\, and tightly controls data leakage. Its modular
  architecture embeds an internal ML engine\, several feature selection met
 hods (e.g.\, MRMR\, Boruta)\, and external AutoML solutions such as AutoGl
 uon into a unified\, rigorous validation framework\, enabling systematic a
 nd fair comparison of methods on limited data. In addition\, Octopus suppo
 rts survival analysis\, addressing time-to-event problems common in health
 care and materials science. This talk will use realistic small-scale datas
 ets to illustrate how conventional pipelines can be misleading and how to 
 obtain more reliable models when every sample matters.
DTSTAMP:20260412T141742Z
LOCATION:Europium [3rd Floor]
SUMMARY:Octopus AutoML: Extracting Signal from Small and High-Dimensional D
 ata - Nils Haase\, Andreas Wurl
URL:https://pretalx.com/pyconde-pydata-2026/talk/8YTYEN/
END:VEVENT
END:VCALENDAR
