William Benton

William Benton is passionate about making it easier for machine learning practitioners to benefit from advanced infrastructure and making it possible for organizations to manage machine learning systems. His recent roles have included defining product strategy and professional services offerings related to data science and machine learning, leading teams of data scientists and engineers, and contributing to many open source communities related to data, ML, and distributed systems. Will was an early advocate of building machine learning systems on Kubernetes and developed and popularized the “intelligent applications” idiom for machine learning systems in the cloud. He has also conducted research and development related to static program analysis, language runtimes, cluster configuration management, and music technology.


Luxuries, necessities, and the challenges that remain: some experiences with accelerated data science
William Benton, Sophie Watson

The promise of accelerated computing presents an interesting paradox: while no one complains when new compute infrastructure is dramatically faster than its predecessor, few people realize how much they’d benefit from acceleration until they have it. It is perhaps unsurprising that a data scientist’s daily work consists of tasks that they can accomplish with their available computing resources, but simply running our existing work faster makes acceleration into a mere luxury. For accelerated computing to fulfill its promise, we need it to transform our work by enabling us to do new things that wouldn’t have been feasible without it. In this talk, we’ll discuss our experiences accelerating data science with specialized hardware and by scaling out on clusters. We’ll present examples of previously-impossible techniques becoming feasible, of the pleasant luxury of improved performance, and of the data science tasks that aren’t likely to justify additional hardware or implementation effort. You’ll leave this talk with a better understanding of how accelerated and scale-out computing can fit into your data science practice, a catalog of techniques that are still well served by standard hardware, and some actionable advice for how to take advantage of parallel and distributed computing across your workflow.