What Can LLMs Do with Messy Residential Electrification Data? PyData London 2026

What Can LLMs Do with Messy Residential Electrification Data?
.ical
2026-06-07 15:30–16:15, Hardwick Hub

Residential energy models like NREL’s ResStock generate the kind of data most humans run from: thousands of buildings, dozens of columns, and at least 8,760 rows per column. Great for research, but difficult for anyone who just wants to ask, “What happens to electricity demand in Texas if homes used solar water heating?” or “How do HVAC upgrades change my annual cooling costs in North Carolina?”

Join us for this session as a University of Texas energy researcher and a Red Hat engineer team up to see what large language models can realistically do with this kind of messy, domain-heavy data using Python. We’ll show how we sample, reshape, and describe large datasets so LLMs can help generate and refine pandas/DuckDB queries, explain upgrade scenarios in plain English, and guide non-experts through “what if” electrification questions. This and more, all while being honest about where the models break down and why humans still need to do the science.

ResStock is an incredible tool for residential energy research, but quite tricky for anyone who isn’t deep in the weeds. It produces huge, domain-heavy datasets: thousands of simulated homes, dozens of variables, and hourly time series for a full year. Great if you’re writing a paper, overwhelming if you want to understand how electrification upgrades change bills or demand.

This talk asks a practical question: What can large language models actually do with ResStock-style data, using a Python workflow? Can LLMs help normal people make sense of the benefits of electrification upgrades without pretending the model is “doing the science” for us?

We ground everything in two real ResStock runs: (1) solar thermal water heater upgrades in Texas, and (2) HVAC upgrades across the Southeastern U.S. Both are large and messy, so we can’t just upload the parquet files. Instead, we:

Use Python (pandas/DuckDB) to sample and aggregate the data into representative slices that fit within context limits.
Build a clear schema description (“data card”) so the LLM understands variables, units, and constraints.
Ask the LLM to help where it shines: generating and refining pandas/DuckDB queries from natural-language questions, and explaining upgrade impacts in plain English.

Andrew (UT Austin) brings the ResStock data, research questions, and domain constraints; Cedric (Red Hat) brings the open source + LLM integration side. Attendees will leave with a realistic pattern for using LLMs as helpers, not replacements, when working with large, messy scientific or policy datasets in Python.

Cedric Clyburn

Cedric Clyburn (@cedricclyburn), Senior Developer Advocate at Red Hat, is an enthusiastic software developer with a background in Kubernetes, DevOps, and container tools. Focused on open-source software, he both contributes (e.g., Podman, vLLM) and enjoys speaking, with prior experience at Devoxx, WeAreDevelopers, The Linux Foundation, and more. Cedric also spends (too much) time creating video and written content helping developers learn new topics in emerging technologies, with over 2M+ views online. He’s based in New York City and is an organizer of the local Kubernetes Community Day.

Andrew Igdal

I study energy policy at the University of Texas at Austin. My work focuses on residential electrification and improving the efficacy of beneficial electrification upgrades.

What Can LLMs Do with Messy Residential Electrification Data? .ical 2026-06-07 15:30–16:15, Hardwick Hub

What Can LLMs Do with Messy Residential Electrification Data?
.ical
2026-06-07 15:30–16:15, Hardwick Hub