Improving Machine Learning from Human Feedback
2023-04-18 , B07-B08

Large generative models rely upon massive data sets that are collected automatically. For example, GPT-3 was trained with data from “Common Crawl” and “Web Text”, among other sources. As the saying goes — bigger isn’t always better. While powerful, these data sets (and the models that they create) often come at a cost, bringing their “internet-scale biases” along with their “internet-trained models.” While powerful, these models beg the question — is unsupervised learning the best future for machine learning?

ML researchers have developed new model-tuning techniques to address the known biases within existing models and improve their performance (as measured by response preference, truthfulness, toxicity, and result generalization). All of this at a fraction of the initial training cost. In this talk, we will explore these techniques, known as Reinforcement Learning from Human Feedback (RLHF), and how open-source machine learning tools like PyTorch and Label Studio can be used to tune off-the-shelf models using direct human feedback.


Large generative models rely upon massive data sets that are collected automatically. For example, GPT-3 was trained with data from “Common Crawl” and “Web Text”, among other sources. As the saying goes — bigger isn’t always better. While powerful, these data sets (and the models that they create) often come at a cost, bringing their “internet-scale biases” along with their “internet-trained models.” While powerful, these models beg the question — is unsupervised learning the best future for machine learning?

ML researchers have developed new model-tuning techniques to address the known biases within existing models and improve the model’s performance (as measured by response preference, truthfulness, toxicity, and result generalization). All of this at a fraction of the training cost is very low compared to the initial training cost. This talk will explore these Reinforcement Learning from Human Feedback (RLHF) techniques and how open-source machine learning tools like PyTorch and Label Studio can tune off-the-shelf models using direct human feedback.

We’ll start by covering traditional RLHF, in which a model is given a set of prompts to generate outputs. These prompt/output pairs are then graded by human annotators who rank pairs according to a desired metric, which are then used as a reinforcement learning data set to optimize the model to produce results closer to the metric criteria.

Next, we’ll discuss recent advances within this field and the advantages they provide. One advance we’ll dive into is the use of Human Language Feedback, in which ranks are replaced with human-language summaries that take full advantage of the “full expressiveness of language that humans use.” This contextual feedback, along with the original prompt and output of the model, is used to generate a new set of model refinements. The model is then tuned with these refinements to match the new output to the human feedback. In a 2022 study, researchers at NYU reported that “using only 100 samples of human-written feedback finetunes a GPT-3 model to roughly human-level summarization ability.” It’s advances like these that are providing advantages in terms of accuracy and bias reduction.

Finally, we’ll leave you with examples and resources on implementing these training methods using publicly available models and open-source tools like PyTorch and Label Studio to help retrain models for targeted applications. As this industry continues to grow, evolve, and develop into more widespread applications, we must approach this space with ethics and sustainability in mind. By combining the power and expansiveness of these widely-popular “internet-scale models” with specific, targeted, human approaches, we can avoid the “internet-scale biases” that threaten the legitimacy and trustworthiness of the industry as a whole.


Expected audience expertise: Domain:

Intermediate

Expected audience expertise: Python:

Novice

Abstract as a tweet:

While powerful, models built off large datasets like GPT-3 often bring their biases along with them. However, is this the best future for machine learning? Join us to explore Reinforcement Learning from Human Feedback (RLHF) techniques and why they matter more now than ever.

Erin Mikail Staples is a very online individual passionate about facilitating better connections online and off. She’s forever thinking about how we can communicate, educate and elevate others through collaborative experiences.

Currently, Erin is a Senior Developer Community Advocate at Label Studio. At Label Studio — she empowers the open-source community through education and advocacy efforts. Outside of her day job, Erin is a comedian, graduate technical advisor, content creator, triathlete, avid reader, and dog parent.

Most importantly, she believes in the power of being unabashedly "into things" and works to help friends, strangers, colleagues, community builders, students, and whoever else might cross her path find their thing.

As CTO of Heartex / Label Studio, I specialize in machine learning, data-centric AI, and innovative data labeling techniques. My expertise spans weak supervision, zero-shot and few-shot learning, and reinforcement learning to drive cutting-edge AI solutions.