Nikolai

As CTO of Heartex / Label Studio, I specialize in machine learning, data-centric AI, and innovative data labeling techniques. My expertise spans weak supervision, zero-shot and few-shot learning, and reinforcement learning to drive cutting-edge AI solutions.


Github

https://github.com/niklub

LinkedIn

https://www.linkedin.com/in/liubimov/


Session

04-18
10:30
30min
Improving Machine Learning from Human Feedback
Erin Mikail Staples, Nikolai

Large generative models rely upon massive data sets that are collected automatically. For example, GPT-3 was trained with data from “Common Crawl” and “Web Text”, among other sources. As the saying goes — bigger isn’t always better. While powerful, these data sets (and the models that they create) often come at a cost, bringing their “internet-scale biases” along with their “internet-trained models.” While powerful, these models beg the question — is unsupervised learning the best future for machine learning?

ML researchers have developed new model-tuning techniques to address the known biases within existing models and improve their performance (as measured by response preference, truthfulness, toxicity, and result generalization). All of this at a fraction of the initial training cost. In this talk, we will explore these techniques, known as Reinforcement Learning from Human Feedback (RLHF), and how open-source machine learning tools like PyTorch and Label Studio can be used to tune off-the-shelf models using direct human feedback.

PyData: Machine Learning & Stats
B07-B08